Discussion about the use of self-organisation for automatically "programming" networks of processing nodes.

Saturday, August 25, 2007

This blog is now closed

Please go to The Spline for future blog postings.

Friday, January 26, 2007


Jorge Luis Borges wrote:

The composition of vast books is a laborious and impoverishing extravagance. To go on for five hundred pages developing an idea whose perfect oral exposition is possible in a few minutes! A better course of procedure is to pretend that these books already exist, and then to offer a résumé, a commentary.

Well, that should save a bit of time!

Wednesday, January 24, 2007

Discrete network dynamics. Part 1: Operator theory (update 2)

I need to update what is going on with my paper "Discrete network dynamics. Part 1: Operator theory", which can be found in arXiv at cs.NE/0511027. Yes, it was that long ago and the paper is still not published, nor do I know how I can get it published. I previously blogged about this here, here, here, and here.

I have been getting rather evasive feedback from various people about this paper, and I don't like the sound of what I hear at all. In a nutshell, there is a widespread view that my use of quantum field theory is misguided and/or wrong.

One critic said that I didn't understand quantum mechanics, which I thought was rather odd given that my PhD is in quantum chromodynamics! That comment immediately told me that he had not read my paper, or at least had not read it very carefully, because I don't actually use quantum mechanics in the paper, as least not in the sense that physicists use it (i.e. the sense in which the criticism was made). In my paper I explain how I use operators to implement the elementary processes of adding samples to and removing samples from histogram bins, and that these operators have exactly the same algebraic properties as the bosonic creation and annihilation operators that are used in physics. To this extent, what I am doing is algebraically equivalent to a quantum field theory, and I explained this in greater detail in an earlier posting here. What I am certainly not doing is using a representation of these operators in which they act on an underlying wave function, because there isn't any wave function defined in my samples-in-histogram-bins model. My creation and annihilation operators should be thought of as little pieces of algorithm whose effect is to add and remove samples from histogram bins.

All I use is operators for creating and annihilating quanta; the model is defined at the level of these quanta, and there is no deeper level of theory. This is where the QFT that I use is a subset of the type of QFT that physicists are familiar with; I make use of only the combinatoric properties of field quanta, which are elegantly summarised in the algebraic properties of the corresponding creation and annihilation operators. I also use a lattice QFT, rather than a continuum QFT, but I assume that the difference between these two is relatively benign (pace, mathematical purists!). Because physicists are exposed to QFT in (relativistic) particle physics and in (non-relativistic) condensed matter physics, they naturally assume that QFT needs an underlying wave function with the usual quantum mechanical behaviours that they have been taught about. This is a very narrow definition of QFT, because you can have a QFT whose quanta are built in any way that you want.

One can manipulate these creation and annhilation operators by using their algebraic properties to rearrange "operator products" in various ways, and thus break operator expressions apart into a sum of contributions with different combinatoric properties, each weighted by a combinatoric factor that is automatically generated by the algebraic manipulations. These manipulations have the same general structure as the sorts of manipulation that occur in "operator product expansion" calculations in high energy physics; my PhD dissertation is full of this sort of calculation, and some relevant papers that I contributed to can be found by doing a search of the SPIRES database here.

All the above leaves me no choice but to describe what I write about in my paper "Discrete network dynamics. Part 1: Operator theory" as a "quantum field theory".

Tuesday, December 26, 2006

Four preconditions for civilisation?

If you are wondering why there is no news on the ACEnetica front, it's because none of the Four preconditions for civilisation is currently being met.

Monday, August 28, 2006

Burden of proof

There is an article by Marcus du Sautoy entitled Burden of Proof in this week's New Scientist, which discusses the issue of whether a computer generated mathematical proof can be considered to actually be a proof, especially since some such proofs are so lengthy that they cannot be checked by a human. An example of this type of proof is the 4-colour map theorem.

My own view on this issue is that a computer generated proof has exactly the same status as a human generated proof. The difference is only one of the degree of assistance provided to the brain of the human to help with the generation of the proof. A totally unassisted human would have to somehow do the whole proof mentally, which severely limits the length of proofs that are accessible. A human with the typical assistance that is allowed in an examination room (i.e. pen and paper) has the luxury of at least being able to write things down, which allows much longer proofs to be reliably generated. The mechanics of generating a proof then reduce to using well-defined rules to manipulate symbolic expressions, where pen and paper are used as a medium for representing these symbols, and the rules are implemented in the human brain.

The degree of assistence in generating a proof can be taken one stage further by using a computer to implement some or all of the rules for manipulating the symbolic expressions, rather than implementing all of the rules in the human brain. This seems to be a fairly radical step to take, because hitherto the only part of the proof that was "outside" the human brain was its "dumb" representation using pen and paper, whereas the "clever" bit involving the implementation of rules to manipulate this representation was "inside" the human brain.

Let us consider what these rules of manipulation actually are. Effectively, they define a procedure for taking an initial expression constructed out of symbols, and repeatedly operating on it using the rules to eventually generate the required final expression. The cleverness is in the construction of the set of rules, which is where a human is the best source of the cleverness needed to create the rules. There is no cleverness in the repeated application of these rules; all that is required is that their application is done reliably, which is where a computer is the best approach, especially if the proof has many steps.

Use a human to define the rules of manipulation, and use a computer to implement these rules. This approach seems to me to be entirely uncontroversial, and it is exactly how computer generated proofs are done. Note that software bugs in the computer part of the proof are dealt with in an analogous way to "software" bugs in human part of the proof, i.e. try a variety of approaches on a variety of platforms.

Interestingly, I wrote a paper (see here)

Luttrell S P, 1999, Self-organised modular neural networks for encoding data, in Combining Artificial Neural Networks: Ensemble and Modular Multi-Net Systems (Perspectives in Neural Computing, Springer-Verlag, ed. Sharkey A J C), pp. 235-263

in which I used a computer (running the symbolic algebra program Mathematica) to help me derive some results on the optimisation of vector quantisers so that their Euclidean distortion was minimised, and this optimisation was done non-parametrically in order to discover the globally optimal solution. The only way that I have found to do this type of optimisation involves intermediate expressions that can have a large number of terms, but which eventually leads to results that can be quite compact, and which are thus interesting. The only sensible way of handling this type of derivation is to automate it using a computer to implement the rules of manipulation (i.e. using symbolic algebra). As far as I can tell other people have not taken up my idea of using symbolic algebra to derive this type of result; my paper went down like a lead balloon, which is a great pity.

Update: OK, I can see that the Mathematica notebook in which I derive the results that I published in the above paper (see here) has now been downloaded 22 times (as of 31 December 2006). It would be interesting to hear your comments on it. Maybe you think it is just a load of algebraic twiddling which is not much use in the real world, but that conclusion would ignore the fact that exact algebraic solutions to toy problems are a powerful source of insight for interpreting approximate numerical solutions to real world problems.

Sunday, July 23, 2006


I will be on holiday in the south-west of England for the next 2 weeks, and I won't be able to access the internet, so any comments will remain unanswered until I return.

Friday, July 07, 2006

Modelling the probability density of Markov sources

I have just uploaded a paper to arXiv here.

Title: Modelling the probability density of Markov sources

Abstract: This paper introduces an objective function that seeks to minimise the average total number of bits required to encode the joint state of all of the layers of a Markov source. This type of encoder may be applied to the problem of optimising the bottom-up (recognition model) and top-down (generative model) connections in a multilayer neural network, and it unifies several previous results on the optimisation of multilayer neural networks.

I wrote this paper about a decade ago, and submitted it to Neural Computation on 2 February 1998, but in effect it was not accepted for publication because the referees asked for changes to be made to the paper that I thought (and still think) were unreasonable. Apart from minor reformatting changes, this arXiv version of the paper is identical to the version that I submitted to Neural Computation.

The paper contains material that is based on a preprint that I wrote at the Neural Networks and Machine Learning Scientific Programme at the Isaac Newton Institute in 1997. The Newton Institute preprint number is NI97039, and it is available online on this page of 1997 preprints, or can be accessed directly here.

The main idea in the paper is to optimise a multi-layer density-modelling network so that the joint probability of the state of all of its layers has desirable properties, rather than for the marginal probability of the state of only its input layer to have desirable properties. In effect, all of the layers of the network are put on an equal footing, rather than selecting the input layer to be treated differently from the rest of the layers. The approach to density modelling described in this paper turns out to unify various results that I had published in several of my earlier papers.

The objective function for optimising the joint probability of the state of all the layers of a network is almost the same as the one that is used to optimise a Helmholtz machine, but it omits the so-called "bits-back" term that is used by Helmholtz machines, so it penalises the use of distributed codes whilst encouraging the use of sparse codes in the hidden layers of the network. At the Neural Networks and Machine Learning Scientific Programme in 1997, I was told by Geoffrey Hinton (who is one of the originators of the Helmholtz machine) that they had looked at what happened if they omitted the bits-back term, and they had concluded that it didn't lead anywhere interesting! I found this remark quite amusing because much of my useful research output had derived from omitting the bits-back term. Also, with the decade of additional insight that I have accumulated since 1997, I can now see that joint (rather than marginal) probability optimisation is the key to obtaining useful results in multi-layer density-modelling networks. Watch this space, as they say!