2024 February 15

(As mentioned before, current MIT students should not read this material.)

- Exercises on random walks
- Percolation theory and finding the critical point
- Percolation theory and fractal dimension
- The coastline paradox
- Ant in a labyrinth: random walk with invasion percolation

Consider a grid of cells arranged in a lattice, each of which is randomly (iid) “occupied” with probability :

```
0 0 1 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 1 1
0 0 1 0 0 0 0 1 1 1
0 0 1 0 0 0 1 1 1 0
0 0 0 0 1 0 1 1 1 0
1 0 0 0 1 0 1 1 0 0
1 0 1 0 0 0 0 1 0 0
1 1 1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1 0 0
0 1 0 1 1 0 0 0 1 0
```

I have marked occupied sites with a 1, and unoccupied sites with a 0.
We would like to know how the very simple behavior of each site controls
the behavior of the grid as a whole. Certainly if we just *count*
the number of occupied sites in the grid we will not find any surprises,
as this does not involve the spatial geometry of the grid. However the
behavior becomes much more interesting if we ask a different question:
what happens if we group the occupied sites into contiguous
*clusters*?

```
. . 1 . . . . . . .
5 . . . 2 . . . 5 5
. . 3 . . . . 5 5 5
. . 3 . . . 5 5 5 .
. . . . 4 . 5 5 5 .
6 . . . 4 . 5 5 . .
6 . 6 . . . . 5 . .
6 6 6 . . . . . 6 6
6 6 . 8 8 8 . 7 . .
. 6 . 8 8 . . . 9 .
```

Here, we have labeled each cluster of contiguous occupied sites with a unique integer (note that the grid is toroidal). Of course, each cluster has a random size and shape, but it is hard to see how cluster size is controlled by the individual site probability . Is it improbable for 15 / 100 sites to be in a single cluster?

*Note: all grids and diagrams shown here were made with the help
of a python
program supplied to students which is capabable of efficiently
identifying clusters and producing relevant statistics about
them.*

Let us bump up the grid size a little bit and focus on what happens to the largest cluster as we vary the probability .

Here we have used the same random seed for each run, but only varied the threshold of what is considered an “occupied” site. When we use , more than half the sites in the grid are occupied, but they do not form large clusters: the region of black cells in the bottom-center is the largest such cluster.

As we slowly increase the threshold , the number of occupied sites also slowly increases, but there are so many closely packed clusters that the newly occupied sites inevitably start merging some of the existing clusters, so the size of the largest cluster grows quite fast; observe the large blocks of solid color above, representing large regions that join the largest cluster all at once. By the time , one large cluster dominates the whole grid, containing most occupied sites.

If we repeat this experiment, we find that one massive cluster
emerges every time at about the same probability. Indeed if we move to
an *infinite* lattice, it can be proven that there exists a
critical threshold such that when there exists an infinitely large cluster
with probability 1, while conversely for the probability of an infinite cluster
is 0.^{1}The behavior at has only been proven for 2D and , for other dimensions it is
conjectured. This abrupt shift in behavior is an example
of a phase transition.

For a fixed but large grid size , let be the probability that a randomly selected
occupied site is part of the largest cluster, and similarly for the *second* largest cluster. If
are the sizes of the largest and second
largest clusters, then (approximately)

where is the dimension of the grid. Let us estimate by simulating a number of random grids, and graph how depends on for a fixed and . When computing the , it is important to perform many simulations and take an average of the results to reduce noise in the graphs; note that the smaller the choice of , the more simulations will be needed.

Because and are mostly near either 0 or 1, especially for large , the most appropriate way to present the data is with the on a log scale.

(I have performed a greater number of simulations for values of nearer the critical point^{2}I was
pleased to discover when redoing these graphs year later that I had
saved the simulation results and didn’t have to re-run them. Across all
graphs I used in total about 2.5 million simulations, for each of which
I recorded the 10 largest cluster sizes., and smoothed the
graphs, as will be discussed shortly.)

Observe that when , and are quite close in value: the largest cluster is not especially distinguished, but simply one of many. In two dimensions take, say, and , for which . There are occupied sites in the grid, so the largest cluster is about 100 in size. With the grid having a few million clusters, it is not a surprise the second largest is close behind in size.

As increases, the largest and second largest clusters both tend to grow in size, and so do . (The drop in the that occurs for very small is because even as the number of clusters is increasing the largest cluster remains only a few sites in size.) But as we get close to there is a sudden shift in behavior; the largest clusters begin to approach the size of the whole grid, and any two big clusters are likely to meet. All of the significant clusters get absorbed into a single massive cluster, and the larger a cluster is the more likely it is to be absorbed. Thus as increases further, drops quickly.

How can we use this information to estimate the critical point ? There are several different approaches we can
use here. When , then in the limit , and conversely when then as , so the critical point can be
identified by anything sensitive to this transition; for example, we can
look for the value of where or where the derivative is maximized, either of which will
converge to in the limit of large (though not necessarily at the same rate).^{3}On
further thought I think this is an error, and might be continuous in even at infinite , so the former method would actually give a value
very slightly larger than . I think the latter method still works,
though. I prefer a different method, which is to identify
the value of that maximizes , which also converges to .

Now certainly we can just eyeball the peak of in the figure, or use the value of from whichever run had the highest , as we used a large enough value of that this will give a decent result. However
picking the run with the largest is not robust against random noise, and we can
improve on our results by choosing a numerical method that is more
robust to noise in the data. Probably the *best* way to deal with
the noise is to fit the graph of or to a suitable functional form near the point of
interest; this can be somewhat tricky, especially since (for example)
the peak of is asymmetrical. A much simpler method is to
smooth the graph before searching for the peak; then any spikes due to a
single data point should be softened sufficiently that they will not be
the highest point anymore.^{4}Though
note that the maximum of will be lowered by the smoothing process, and
as the peak is asymmetric the location of the peak will be shifted
slightly to a lower value of .

In my approach, I used *Gaussian
smoothing*^{5}Note
that this is one of the many applications of the diffusion equation we
worked with last time with the function

where is the smoothing length-scale. The larger the
choice of , the more smoothing is applied; the goal is
to use large enough to smooth out any peaks due to
noise, but not so large as to actually smooth out the peak we are
interested in. A function used to perform smoothing is called a *kernel*.

If we have a continuous function , smoothing it with a kernel gives us

which is to say we use to perform a weighted average of the for near . If we have a discrete series , as is more likely in practice, we use instead

Note that it is not necessary that the are uniformly spaced or that is one of the .

Different choices of kernel have different merits, and choosing an appropriate kernel is hard (and frequently arbitrary). A Gaussian kernel is the most common choice. As another example, using for and for gives us a smoothing where is just the average of for ; this is also referred to as a moving-window average.

Having smoothed the data, let us zoom in near the peak of :

We used a smoothing length scale of in the inset (where I have more data points) and in the full graphs shown previously. The slope was estimated by computing

where is the distance between consecutive data
points. (Estimating a derivative based on just two data points should
generally only be done with smoothed data.^{6}Note
that since smoothing and taking the derivative are both linear, it
doesn’t matter which order we do them in; taking the derivative of
smoothed data gives the same result as smoothing the derivative of
unsmoothed data.)

We get a very good estimate of :

2D | 3D | |
---|---|---|

0.59188 | 0.31201 | |

0.59158 | 0.31068 | |

True | 0.592746 | 0.3116 |

One alternative method would be to consider the value of at which the largest cluster first spans the width of the whole domain.

Can we estimate the error in our method without reference to an
authoritative value? As our calculation involves random simulations, we
may repeat it and get a different answer; by repeating it many times we
can develop a probability distribution, and thus estimate the standard
deviation of sampling from this distribution.^{7}Say we
repeat the calculation 100 times and estimate the standard deviation of
each calculation is 0.002, so we might guess that the error in our
answer is about 0.002. However, having done 100 times as many
simulations we can now improve on our answer by taking its mean; with
100 times as many simulations it will have a much lower error. But, now
how will we know what this new method’s error is? Will we have to repeat
*this* method 100 times, doing 10000 times as much as the
original calculation? No, because when averaging from a random
distribution the standard deviation scales with the square root of the
number of samples, so we know the end result will have an error that
scales like .

However that is not the only error, or even the main source of error,
which is that we used a finite , whereas this method only converges to the true
value of in the limit . We can reduce the error by using
larger values of but eventually we must stop and have some largest
size we simulated – I stopped at in two dimensions. So how can we improve
without using larger ? By repeating this calculation with multiple
*smaller* values of , we can see how the value we compute varies with
. Extrapolating these values numerically lets us
estimate the limit to much higher values of than we are capable of actually simulating. This
general technique is called series
acceleration, with many approaches such as Aitken’s
method.

Indeed this was the whole point of the previous problem on random
walks; even in 1D it was infeasible to directly estimate the probability
of a random walker never returning to the origin,^{8}The
average recurrence time is infinite! so instead we
estimated the probability of not returning for various finite times and
extrapolated to infinity.

*Follow RSS/Atom feed for updates.*