(As mentioned before, current MIT students should not read this material.)
Consider a grid of cells arranged in a lattice, each of which is randomly (iid) “occupied” with probability :
0 0 1 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 1 1
0 0 1 0 0 0 0 1 1 1
0 0 1 0 0 0 1 1 1 0
0 0 0 0 1 0 1 1 1 0
1 0 0 0 1 0 1 1 0 0
1 0 1 0 0 0 0 1 0 0
1 1 1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1 0 0
0 1 0 1 1 0 0 0 1 0
I have marked occupied sites with a 1, and unoccupied sites with a 0. We would like to know how the very simple behavior of each site controls the behavior of the grid as a whole. Certainly if we just count the number of occupied sites in the grid we will not find any surprises, as this does not involve the spatial geometry of the grid. However the behavior becomes much more interesting if we ask a different question: what happens if we group the occupied sites into contiguous clusters?
. . 1 . . . . . . .
5 . . . 2 . . . 5 5
. . 3 . . . . 5 5 5
. . 3 . . . 5 5 5 .
. . . . 4 . 5 5 5 .
6 . . . 4 . 5 5 . .
6 . 6 . . . . 5 . .
6 6 6 . . . . . 6 6
6 6 . 8 8 8 . 7 . .
. 6 . 8 8 . . . 9 .
Here, we have labeled each cluster of contiguous occupied sites with a unique integer (note that the grid is toroidal). Of course, each cluster has a random size and shape, but it is hard to see how cluster size is controlled by the individual site probability . Is it improbable for 15 / 100 sites to be in a single cluster?
Note: all grids and diagrams shown here were made with the help of a python program supplied to students which is capabable of efficiently identifying clusters and producing relevant statistics about them.
Let us bump up the grid size a little bit and focus on what happens to the largest cluster as we vary the probability .
Here we have used the same random seed for each run, but only varied the threshold of what is considered an “occupied” site. When we use , more than half the sites in the grid are occupied, but they do not form large clusters: the region of black cells in the bottom-center is the largest such cluster.
As we slowly increase the threshold , the number of occupied sites also slowly increases, but there are so many closely packed clusters that the newly occupied sites inevitably start merging some of the existing clusters, so the size of the largest cluster grows quite fast; observe the large blocks of solid color above, representing large regions that join the largest cluster all at once. By the time , one large cluster dominates the whole grid, containing most occupied sites.
If we repeat this experiment, we find that one massive cluster emerges every time at about the same probability. Indeed if we move to an infinite lattice, it can be proven that there exists a critical threshold such that when there exists an infinitely large cluster with probability 1, while conversely for the probability of an infinite cluster is 0.1The behavior at has only been proven for 2D and , for other dimensions it is conjectured. This abrupt shift in behavior is an example of a phase transition.
For a fixed but large grid size , let be the probability that a randomly selected occupied site is part of the largest cluster, and similarly for the second largest cluster. If are the sizes of the largest and second largest clusters, then (approximately)
where is the dimension of the grid. Let us estimate by simulating a number of random grids, and graph how depends on for a fixed and . When computing the , it is important to perform many simulations and take an average of the results to reduce noise in the graphs; note that the smaller the choice of , the more simulations will be needed.
Because and are mostly near either 0 or 1, especially for large , the most appropriate way to present the data is with the on a log scale.
(I have performed a greater number of simulations for values of nearer the critical point2I was pleased to discover when redoing these graphs year later that I had saved the simulation results and didn’t have to re-run them. Across all graphs I used in total about 2.5 million simulations, for each of which I recorded the 10 largest cluster sizes., and smoothed the graphs, as will be discussed shortly.)
Observe that when , and are quite close in value: the largest cluster is not especially distinguished, but simply one of many. In two dimensions take, say, and , for which . There are occupied sites in the grid, so the largest cluster is about 100 in size. With the grid having a few million clusters, it is not a surprise the second largest is close behind in size.
As increases, the largest and second largest clusters both tend to grow in size, and so do . (The drop in the that occurs for very small is because even as the number of clusters is increasing the largest cluster remains only a few sites in size.) But as we get close to there is a sudden shift in behavior; the largest clusters begin to approach the size of the whole grid, and any two big clusters are likely to meet. All of the significant clusters get absorbed into a single massive cluster, and the larger a cluster is the more likely it is to be absorbed. Thus as increases further, drops quickly.
How can we use this information to estimate the critical point ? There are several different approaches we can use here. When , then in the limit , and conversely when then as , so the critical point can be identified by anything sensitive to this transition; for example, we can look for the value of where or where the derivative is maximized, either of which will converge to in the limit of large (though not necessarily at the same rate).3On further thought I think this is an error, and might be continuous in even at infinite , so the former method would actually give a value very slightly larger than . I think the latter method still works, though. I prefer a different method, which is to identify the value of that maximizes , which also converges to .
Now certainly we can just eyeball the peak of in the figure, or use the value of from whichever run had the highest , as we used a large enough value of that this will give a decent result. However picking the run with the largest is not robust against random noise, and we can improve on our results by choosing a numerical method that is more robust to noise in the data. Probably the best way to deal with the noise is to fit the graph of or to a suitable functional form near the point of interest; this can be somewhat tricky, especially since (for example) the peak of is asymmetrical. A much simpler method is to smooth the graph before searching for the peak; then any spikes due to a single data point should be softened sufficiently that they will not be the highest point anymore.4Though note that the maximum of will be lowered by the smoothing process, and as the peak is asymmetric the location of the peak will be shifted slightly to a lower value of .
In my approach, I used Gaussian smoothing5Note that this is one of the many applications of the diffusion equation we worked with last time with the function
where is the smoothing length-scale. The larger the choice of , the more smoothing is applied; the goal is to use large enough to smooth out any peaks due to noise, but not so large as to actually smooth out the peak we are interested in. A function used to perform smoothing is called a kernel.
If we have a continuous function , smoothing it with a kernel gives us
which is to say we use to perform a weighted average of the for near . If we have a discrete series , as is more likely in practice, we use instead
Note that it is not necessary that the are uniformly spaced or that is one of the .
Different choices of kernel have different merits, and choosing an appropriate kernel is hard (and frequently arbitrary). A Gaussian kernel is the most common choice. As another example, using for and for gives us a smoothing where is just the average of for ; this is also referred to as a moving-window average.
Having smoothed the data, let us zoom in near the peak of :
We used a smoothing length scale of in the inset (where I have more data points) and in the full graphs shown previously. The slope was estimated by computing
where is the distance between consecutive data points. (Estimating a derivative based on just two data points should generally only be done with smoothed data.6Note that since smoothing and taking the derivative are both linear, it doesn’t matter which order we do them in; taking the derivative of smoothed data gives the same result as smoothing the derivative of unsmoothed data.)
We get a very good estimate of :
2D | 3D | |
---|---|---|
0.59188 | 0.31201 | |
0.59158 | 0.31068 | |
True | 0.592746 | 0.3116 |
One alternative method would be to consider the value of at which the largest cluster first spans the width of the whole domain.
Can we estimate the error in our method without reference to an authoritative value? As our calculation involves random simulations, we may repeat it and get a different answer; by repeating it many times we can develop a probability distribution, and thus estimate the standard deviation of sampling from this distribution.7Say we repeat the calculation 100 times and estimate the standard deviation of each calculation is 0.002, so we might guess that the error in our answer is about 0.002. However, having done 100 times as many simulations we can now improve on our answer by taking its mean; with 100 times as many simulations it will have a much lower error. But, now how will we know what this new method’s error is? Will we have to repeat this method 100 times, doing 10000 times as much as the original calculation? No, because when averaging from a random distribution the standard deviation scales with the square root of the number of samples, so we know the end result will have an error that scales like .
However that is not the only error, or even the main source of error, which is that we used a finite , whereas this method only converges to the true value of in the limit . We can reduce the error by using larger values of but eventually we must stop and have some largest size we simulated – I stopped at in two dimensions. So how can we improve without using larger ? By repeating this calculation with multiple smaller values of , we can see how the value we compute varies with . Extrapolating these values numerically lets us estimate the limit to much higher values of than we are capable of actually simulating. This general technique is called series acceleration, with many approaches such as Aitken’s method.
Indeed this was the whole point of the previous problem on random walks; even in 1D it was infeasible to directly estimate the probability of a random walker never returning to the origin,8The average recurrence time is infinite! so instead we estimated the probability of not returning for various finite times and extrapolated to infinity.
Follow RSS/Atom feed for updates.