(As mentioned before, current MIT students should not read this material.)
Consider a grid of cells arranged in a lattice, each of which is
randomly (iid) “occupied” with probability :
0 0 1 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 1 1
0 0 1 0 0 0 0 1 1 1
0 0 1 0 0 0 1 1 1 0
0 0 0 0 1 0 1 1 1 0
1 0 0 0 1 0 1 1 0 0
1 0 1 0 0 0 0 1 0 0
1 1 1 0 0 0 0 0 1 1
1 1 0 1 1 1 0 1 0 0
0 1 0 1 1 0 0 0 1 0
I have marked occupied sites with a 1, and unoccupied sites with a 0. We would like to know how the very simple behavior of each site controls the behavior of the grid as a whole. Certainly if we just count the number of occupied sites in the grid we will not find any surprises, as this does not involve the spatial geometry of the grid. However the behavior becomes much more interesting if we ask a different question: what happens if we group the occupied sites into contiguous clusters?
. . 1 . . . . . . .
5 . . . 2 . . . 5 5
. . 3 . . . . 5 5 5
. . 3 . . . 5 5 5 .
. . . . 4 . 5 5 5 .
6 . . . 4 . 5 5 . .
6 . 6 . . . . 5 . .
6 6 6 . . . . . 6 6
6 6 . 8 8 8 . 7 . .
. 6 . 8 8 . . . 9 .
Here, we have labeled each cluster of contiguous occupied sites with
a unique integer (note that the grid is toroidal). Of course, each
cluster has a random size and shape, but it is hard to see how cluster
size is controlled by the individual site probability . Is it improbable for 15 / 100 sites to be
in a single cluster?
Note: all grids and diagrams shown here were made with the help of a python program supplied to students which is capabable of efficiently identifying clusters and producing relevant statistics about them.
Let us bump up the grid size a little bit and focus on what happens
to the largest cluster as we vary the probability .
Here we have used the same random seed for each run, but only varied
the threshold of what is considered an “occupied” site. When we use , more than half the sites in the grid are
occupied, but they do not form large clusters: the region of black cells
in the bottom-center is the largest such cluster.
As we slowly increase the threshold , the number of occupied sites also slowly
increases, but there are so many closely packed clusters that the newly
occupied sites inevitably start merging some of the existing clusters,
so the size of the largest cluster grows quite fast; observe the large
blocks of solid color above, representing large regions that join the
largest cluster all at once. By the time
, one large cluster dominates the whole
grid, containing most occupied sites.
If we repeat this experiment, we find that one massive cluster
emerges every time at about the same probability. Indeed if we move to
an infinite lattice, it can be proven that there exists a
critical threshold such that when
there exists an infinitely large cluster
with probability 1, while conversely for
the probability of an infinite cluster
is 0.1The behavior at
has only been proven for 2D and
, for other dimensions it is
conjectured. This abrupt shift in behavior is an example
of a phase transition.
For a fixed but large grid size , let
be the probability that a randomly selected
occupied site is part of the largest cluster, and similarly
for the second largest cluster. If
are the sizes of the largest and second
largest clusters, then (approximately)
where is the dimension of the grid. Let us estimate
by simulating a number of random grids, and
graph how
depends on
for a fixed
and
. When computing the
, it is important to perform many simulations
and take an average of the results to reduce noise in the graphs; note
that the smaller the choice of
, the more simulations will be needed.
Because and
are mostly near either 0 or 1, especially for
large
, the most appropriate way to present the data is
with the
on a log scale.
(I have performed a greater number of simulations for values of nearer the critical point2I was
pleased to discover when redoing these graphs year later that I had
saved the simulation results and didn’t have to re-run them. Across all
graphs I used in total about 2.5 million simulations, for each of which
I recorded the 10 largest cluster sizes., and smoothed the
graphs, as will be discussed shortly.)
Observe that when ,
and
are quite close in value: the largest cluster
is not especially distinguished, but simply one of many. In two
dimensions take, say,
and
, for which
. There are
occupied sites in the grid, so the
largest cluster is about 100 in size. With the grid having a few million
clusters, it is not a surprise the second largest is close behind in
size.
As increases, the largest and second largest
clusters both tend to grow in size, and so do
. (The drop in the
that occurs for very small
is because even as the number of clusters is
increasing the largest cluster remains only a few sites in size.) But as
we get close to
there is a sudden shift in behavior; the
largest clusters begin to approach the size of the whole grid, and any
two big clusters are likely to meet. All of the significant clusters get
absorbed into a single massive cluster, and the larger a cluster is the
more likely it is to be absorbed. Thus as
increases further,
drops quickly.
How can we use this information to estimate the critical point ? There are several different approaches we can
use here. When
, then
in the limit
, and conversely when
then
as
, so the critical point can be
identified by anything sensitive to this transition; for example, we can
look for the value of
where
or where the derivative
is maximized, either of which will
converge to
in the limit of large
(though not necessarily at the same rate).3On
further thought I think this is an error, and
might be continuous in
even at infinite
, so the former method would actually give a value
very slightly larger than
. I think the latter method still works,
though. I prefer a different method, which is to identify
the value of
that maximizes
, which also converges to
.
Now certainly we can just eyeball the peak of in the figure, or use the value of
from whichever run had the highest
, as we used a large enough value of
that this will give a decent result. However
picking the run with the largest
is not robust against random noise, and we can
improve on our results by choosing a numerical method that is more
robust to noise in the data. Probably the best way to deal with
the noise is to fit the graph of
or
to a suitable functional form near the point of
interest; this can be somewhat tricky, especially since (for example)
the peak of
is asymmetrical. A much simpler method is to
smooth the graph before searching for the peak; then any spikes due to a
single data point should be softened sufficiently that they will not be
the highest point anymore.4Though
note that the maximum of
will be lowered by the smoothing process, and
as the peak is asymmetric the location of the peak will be shifted
slightly to a lower value of
.
In my approach, I used Gaussian smoothing5Note that this is one of the many applications of the diffusion equation we worked with last time with the function
where is the smoothing length-scale. The larger the
choice of
, the more smoothing is applied; the goal is
to use
large enough to smooth out any peaks due to
noise, but not so large as to actually smooth out the peak we are
interested in. A function
used to perform smoothing is called a kernel.
If we have a continuous function , smoothing it with a kernel
gives us
which is to say we use to perform a weighted average of the
for
near
. If we have a discrete series
, as is more likely in practice, we use
instead
Note that it is not necessary that the are uniformly spaced or that
is one of the
.
Different choices of kernel have different merits, and choosing an
appropriate kernel is hard (and frequently arbitrary). A Gaussian kernel
is the most common choice. As another example, using
for
and
for
gives us a smoothing where
is just the average of
for
; this is also referred to
as a moving-window average.
Having smoothed the data, let us zoom in near the peak of :
We used a smoothing length scale of in the inset (where I have more
data points) and
in the full graphs shown previously.
The slope
was estimated by computing
where is the distance between consecutive data
points. (Estimating a derivative based on just two data points should
generally only be done with smoothed data.6Note
that since smoothing and taking the derivative are both linear, it
doesn’t matter which order we do them in; taking the derivative of
smoothed data gives the same result as smoothing the derivative of
unsmoothed data.)
We get a very good estimate of :
2D | 3D | |
---|---|---|
![]() |
0.59188 | 0.31201 |
![]() |
0.59158 | 0.31068 |
True |
0.592746 | 0.3116 |
One alternative method would be to consider the value of at which the largest cluster first spans the
width of the whole domain.
Can we estimate the error in our method without reference to an
authoritative value? As our calculation involves random simulations, we
may repeat it and get a different answer; by repeating it many times we
can develop a probability distribution, and thus estimate the standard
deviation of sampling from this distribution.7Say we
repeat the calculation 100 times and estimate the standard deviation of
each calculation is 0.002, so we might guess that the error in our
answer is about 0.002. However, having done 100 times as many
simulations we can now improve on our answer by taking its mean; with
100 times as many simulations it will have a much lower error. But, now
how will we know what this new method’s error is? Will we have to repeat
this method 100 times, doing 10000 times as much as the
original calculation? No, because when averaging from a random
distribution the standard deviation scales with the square root of the
number of samples, so we know the end result will have an error that
scales like .
However that is not the only error, or even the main source of error,
which is that we used a finite , whereas this method only converges to the true
value of
in the limit
. We can reduce the error by using
larger values of
but eventually we must stop and have some largest
size we simulated – I stopped at
in two dimensions. So how can we improve
without using larger
? By repeating this calculation with multiple
smaller values of
, we can see how the value we compute varies with
. Extrapolating these values numerically lets us
estimate the limit to much higher values of
than we are capable of actually simulating. This
general technique is called series
acceleration, with many approaches such as Aitken’s
method.
Indeed this was the whole point of the previous problem on random walks; even in 1D it was infeasible to directly estimate the probability of a random walker never returning to the origin,8The average recurrence time is infinite! so instead we estimated the probability of not returning for various finite times and extrapolated to infinity.
Follow RSS/Atom feed for updates.