Particle Swarm Optimisation was inspired by ‘swarm intelligence’ found in nature. Imagine that the space of allowable values of x is a large field and the function we are trying to minimise, f(x), is the height above sea level of each point in the field. We want to find the lowest point in the field, so we try to replicate how a swarm of animals would do it — i.e. by having many entities moving independently and sharing information. For this analogy to work you have to imagine the animals can’t see what the height is around them; let’s pretend it’s very foggy!
To start, we choose the number of particles to use and distribute them randomly in the field. We then let the particles move around the space, with two randomly fluctuating forces pulling on them and affecting the directions they move in. One of the forces acting on the particles is an attraction towards the best overall point found so far, which is referred to as the global best. This means that over time the particles will end up clustered around the global best. But if this happens too quickly then the result tends not to be very good because not much of the landscape has been explored. To avoid this, each particle is also attracted to the best point it has found so far, called the particle best. Finally, each particle has some inertia, meaning that it will tend to carry on in the direction it was travelling before. For example, it might be pulled towards the global best, overshoot it and end up finding a better point just beyond it.
As well as choosing how many particles to use and how long to run the optimisation for, you also have to choose the parameters which control how strong the inertia is (w) and how strongly each particle is pulled towards the global best and the particle best (ϕg and ϕp respectively). There is also an element of randomness: each time you calculate the new velocity for a particle you multiply the attraction to the global best by a random number rg and you multiply the attraction to the particle best by another random number rp, both uniformly distributed between 0 and 1. If x and v are the particle’s location and velocity from the previous time step, ug is the vector from the particle’s current location to the global best and up is the vector from the particle’s current location to the particle best, then the particle’s new position x′ and new velocity v′ are given by:
v′=wv+rgϕgug+rpϕpup
x′=x+v′
Or are they? These were the formulas in Wikipedia, the original paper on Particle Swarm Optimisation and in many other places. There is a slightly different equation for v′, however, which is widely used as well, for example by the Numerical Algorithms Group. Instead of multiplying every entry of the vector ϕg ug by the same random number rg, you instead multiply each component of it by a different random number. So rg is a vector of independent random numbers uniformly distributed between 0 and 1, and you combine it with ϕgug using component-wise multiplication, which we will write as ⊙. The same thing is done with rp, so the equation is now:
v′=wv+ϕgrg⊙ug+ϕprp⊙up
What I find most confusing is that no one ever mentions there are two different equations for v′ in use. It’s almost as if nobody is aware that there are two different algorithms, and since the difference boils down to a small amount of notation in one equation this actually seems plausible. The masters thesis of Daniel A Wilke from the University of Pretoria confirms that I am not the only one aware of this. He found that the first algorithm, which he calls PSOAF1, is prone to stagnation, which means the particles end up being limited to a smaller part of the landscape and hence gets worse results. The second algorithm (PSOAF2) is much more robust to this. This matched my findings. Perhaps most interestingly, much of the literature on Particle Swarm Optimisation involves adjusting the algorithm in various ways to stop this stagnation, which Daniel found was unnecessary with PSOAF2.