Percolation theory and fractal dimension

2024 February 29
math

(As mentioned before, current MIT students should not read this material.)

  1. Exercises on random walks
  2. Percolation theory and finding the critical point
  3. Percolation theory and fractal dimension
  4. The coastline paradox
  5. Ant in a labyrinth: random walk with invasion percolation

Having looked at how the largest cluster size C_1 varies with p for a fixed L, now we consider the converse of what happens as L varies for a fixed p.

We learned in class that C_1 follows certain functional forms in the limit of large L:


\begin{aligned}
    C_1(L) &\sim \log L &&p < p_c \\
    C_1(L) &\sim L^D &&p = p_c \\
    C_1(L) &\sim L^d &&p > p_c
\end{aligned}

Here, d is the dimension of the grid being used, and D < d is the fractal dimension of the critical percolating cluster1The fractal scaling L^D is just one of many power laws that describe the behavior of a percolating cluster at or near the critical point. The exponents of these power laws are referred to as “universal” critical exponents because their value does not depend on the local shape of the grid being used as a substrate, but only on the grid’s dimension; thus all two-dimensional grids follow the same power laws. For d \geq 6, the exponents stop depending on the dimension, and equal the values according to mean-field theory. Generally speaking we only have exact results for d = 2, where conformal field theory can be used, or d \geq 6 where mean-field theory is correct..

As an aside, it is worth noting that this behavior for C_1 is not obvious, and it is plausible for there to exist a range of values of p which exhibit large clusters2i.e., cluster sizes distributed like a power law, and therefore C_1 would be a power law in L without having infinite clusters. It wasn’t until the 1980s that is was proved3eg “Sharpness of the Phase Transition in Percolation Models” by Michael Aizenman and David J. Barsky that there was a single phase transition at a single critical point p_c at which clusters go immediately from being small to infinite.

Can we observe these functional forms empirically? Let us start in 2D, and graph how C_1 and C_2 (the second-largest cluster size) vary with L for p = 0.4.

We have shown the least-squares fit of C_1 as a linear function of \log L for large L; it is quite a good fit. C_2 also appears to grow like \log L, although I did not perform a fit.

Moving up to p = 0.56:

We are now fitting C_1 against two different functional forms. Where C_1 is almost as large as one would expect at the critical point, we are fitting C_1 against a power law; where C_1 is significantly smaller than one expects at the critical point, we fit against \log L4Specifically, we use a power law where q_1 is at least 80% of q_1 at p_c for that L, and \log L where q_1 is at most 20% of q_1 at p_c for that L.. Again, for large L we see that C_1 is a good fit against \log L.

At p = 0.58:

It is hard to say at this point that C_1 is a good fit for \log L; at this point it is looking more and more like a power law.

Finally we get to p = 0.59:

An excellent fit to a power law! Or at least, it looks like an excellent fit until it is compared to the fit with p = 0.5927, very close to p_c = 0.592746:

I find it remarkable how much of a difference there is between 0.59 and 0.5927; the former has a very slight but visible bend to the graph as C_1 starts to fall behind slightly for large L. Also, one can see that C_2 and C_1 are not quite parallel when p = 0.59; C_2 is slowly but surely catching up on C_1. But at p = 0.5927 however, C_1 and C_2 both perfectly match a power law for as high a value of L as my computer could handle, and visually appear exactly parallel5Though from the exponent we see C_2 is in fact slightly catching up on C_1. This is because at p = p_c the system exhibits scale invariance.

What of the fractal dimension? We estimate D = 1.89406, just shy of the correct value known to be exactly D = 91 / 48 = 1.8958333.

Continuing, as soon as p = 0.60 it is already obvious we have overshot p_c:

And it just gets more obvious as p increases; at p = 0.61 the second-largest cluster can no longer keep up:

Here we see that C_1 \sim L^{1.98}, very close to the predicted relation of C_1 \sim L^2 for p > p_c.

Of course, the same observations hold in 3D. Here is the behavior at the critical point:

Now we can use these graphs to resolve a bit of a mystery. We saw previously that for any fixed L, q_1 varies continuously as a function of p. However, the theoretical functional forms for C_1 do not vary continuously: they abruptly change from \log L when p < p_c to a power law at p = p_c. How can we resolve this disparity?

It appears from these graphs that we were wrong about the functional forms, and the functional form actually does change continuously; first C_1 fits \log L well for small p, then gradually becomes more like a power law as p approaches p_c. However this isn’t true – the functional forms as stated above are correct. What actually happens is that although the functional form for large L does abruptly change at p = p_c, as p gets closer to p_c then how large L needs to be for the functional form to be applicable grows considerably.

This is related to the notion of uniform convergence; while q_1 is a continuous function of p for any fixed L, as we take the limit L \to \infty, q_1 converges to a discontinuous function. This is because the convergence is not uniform,6necessarily, as uniform convergence of continuous functions always gives a continuous function but rather the convergence is slower for p near p_c.

How large L needs to be to be considered “large” is called the correlation length \xi; we find it by asking the question, what is the probability that two occupied sites belong to the same cluster7More rigorously, how much more likely are they to be in the same cluster than two far-away points are in the same cluster? This is the same question when p < p_c, but for p \geq p_c two distant points have a nonzero chance of lying in the same cluster because there is an infinite cluster. Thus what we are really asking is the chance two sites lie in the same finite cluster.? If they are further than \xi apart, the probability falls exponentially with the distance between them, like e^{-r / \xi}. Another way to see this is that the radius of a random cluster tends to be at most around \xi8or a suitable power thereof.

Therefore when \xi is as big as L, the largest finite clusters tend to span the whole grid, and the system behaves roughly as if it were at p = p_c. Only for L \gg \xi can we observe the limiting behavior.

\xi itself obeys a power-law relationship (see the above footnote on universal critical percolation exponents):

\xi \sim \left| p - p_c \right|^{-\nu}

where \nu = 4 / 3 for all 2D grids. Thus when p = 0.59 we find \xi \approx 2600 and would need L to be significantly larger than that to observe the \log L behavior, whereas our graph only extended to L = 1000.

We can see this with a 3D plot of C_1:

The ramp region with p > p_c corresponds to a power law growth rate, and the constant slope of 2 shows that that whole region has the same exponent of d = 2. The flatter region with p < p_c has an asymptotic growth rate of \log L. As L gets larger, the transition between the flat region and the ramp becomes sharper and sharper, but for any fixed value of L it remains continuous and unbroken.

Why log L?

The L^d behavior for p > p_c is clear, as each occupied site has some fixed, high probability q_1 of being in the big cluster, so C_1 = q_1pL^d scales with the volume of the domain L^d. To understand the transition from \log L to L^d better let us see if we can explain the \log L behavior.

The key observation is that when p < p_c, far away sites are unlikely to belong to the same cluster; the probability decays exponentially with distance between the sites, so as L \to \infty most sites become uncorrelated. Thus we can imagine each of the clusters as being (mostly) independent of each other. There is some finite average size for a cluster, so as L is increased the number of clusters n grows proportionally to the volume of the domain:

n \sim L^d.

So to find the largest cluster, imagine taking n iid samples X_1, \ldots, X_n from some probability distribution f(x) of cluster sizes. f(x) will be somewhat “nice” in that it has finite mean, finite variance, and goes to 0 rapidly as x \to \infty.

Let M_n be the maximum of the X_i; we wish to find the probability distribution g(x) of M_n. Now certainly by definition of M_n,

P(M_n < x) = P(\text{each } X_i < x) = P(X_1 < x) \cdots P(X_n < x) = P(X_1 < x)^n

where we have used independence of the X_i and identity of the X_i. Note that P(x_1 < x) is simply the cumulative distribution function of f, i.e., its integral.

Let x_0 be the median of M_n. Then


\begin{aligned}
    0.5 = P(M_n < x_0) &= P(X_1 < x_0)^n \\
    &= (1 - P(X_1 > x_0))^n \\
    - \log 2 &= n \log (1 - P(X_1 > x_0)) \\
    &= -n (P(X_1 > x_0) + P(X_1 > x_0)^2 / 2 + P(X_1 > x_0)^...

We have used the Taylor series for \log; the approximation is accurate because n is extremely large and so P(X_1 > x_0) is extremely small.

So now we need to know the asymptotic behavior of f(x). A logical guess is that f is a normal distribution. In that case, its cdf is called the error function (after scaling and translating appropriately), in this context more specifically called the Q-function. An excellent approximation for the tail behavior of Q is Q(x) \approx \phi(x) / x where \phi(x) is the standard normal distribution, so we get


\begin{aligned}
    \frac {\log 2}{n} \approx P(X_1 > x_0) &\stackrel{?}{=} Q\left(\frac {x_0 - \mu}{\sigma}\right) \\
    &\approx \phi\left(\frac{x_0 - \mu}{\sigma}\right) \frac {\sigma}{x_0 - \...

Recall x_0 is the median of the maximum cluster size, and that n \sim L^d. In the end we find that C_1 \sim \sqrt{\log L}… which is not correct!

So our guess that cluster sizes are normally distributed is not good; the normal distribution has too little weight in the tails.

Let us see if we can derive the tail behavior of X_1 from first principles. To start, suppose we are in 1D, and we have p < p_c = 1.

Consider an occupied site, and for simplicity ignore its left neighbor. For it to be part of a cluster of size k, the k - 1 sites to the right must all be occupied, which has probability p^{k - 1}. Each step further imposes another factor of p, so f is an exponential distribution. That is, f(x) \sim e^{-x}, and likewise its integral P(X > x) = cdf_f(x) \sim e^{-x}. This gives:


\begin{aligned}
    \frac {\log 2}{n} &\sim e^{-x_0} \\
    x_0 &\sim \log n \sim \log L
\end{aligned}

which is now correct.

What about higher than 1D? The whole difficulty with d > 1 is the presence of loops in the grid, i.e., there is more than one route between any two particular sites. If there were only one route connecting two sites it would be very easy to predict whether they are in the same cluster.

Thus we turn to the Bethe lattice, which is the infinite tree where each vertex has z neighbors. If we take z = 2d, then the Bethe lattice strongly resembles the d-dimensional grid, in that both are regular graphs with vertex degree 2d. In fact, the Bethe lattice is the universal covering graph of the integer lattice; that is, it was what you get when removing all the loops from the integer lattice. E.g., if you go up, right, down, left on the Bethe lattice you do not return where you started – the only way to return to the origin is to exactly retrace your steps.

The Bethe lattice being a tree, there is a unique path connecting any two sites, which again makes it easy to calculate the cluster size distribution. An occupied site has z neighbors, each with a probability p of being occupied. Each “newly” occupied neighbor opens up an additional z - 1 new neighbors which have the potential to be occupied themselves. We can model this process of growing a cluster as a weighted random walk that starts z units to the right of the origin and at each time step has a p probability of going z - 2 units to the right and 1 - p probability going 1 unit to the left; the size of the cluster is one plus the number of times the random walk went to the right before reaching the origin.9The distance from the origin represents the number of sites adjacent to the incomplete cluster that have not yet been determined whether they are occupied or not. Thus the size of the cluster is 1 + (t - z) / (z - 1) where t is the time it takes for the random walk to reach the origin.

For p < 1 / (z - 1), which is the critical threshold for the Bethe lattice, this random walk is biased towards the left. The path of a biased random walk can be thought of as diffusing from a mean position which is moving with time. For large times t, this mean position is O(t) to the left of the origin (specifically, the mean position is at z - t (p(z - 1) - 1)), but the standard deviation only grows like O(\sqrt t), like all random walks / diffusion processes. Therefore to remain to the right of the origin at time t requires being O(\sqrt t) standard deviations to the right. Since diffusion creates a normal distribution, the probability of remaining \sqrt t standard deviations from the mean scales like

e^{-(\sqrt t)^2 / 2} = e^{-t / 2}

which is an exponential decay, exactly as we saw in 1D (recall that the time it takes this random walk to cross the origin is proportional to the size of the cluster).

Thus we have recovered the desired \log L scaling for clusters in the Bethe lattice, but how does this help us with the standard integer lattice when d > 1?

Let \alpha be the universal cover that maps the Bethe lattice to the integer lattice (i.e. \alpha “curls up” the Bethe lattice by restoring all the loops), and consider a random cluster C_B on the Bethe lattice. Let \alpha(C_B) be the cluster on the integer lattice consisting of the points \alpha(v) for v in C_B.10Note that if some collection C_B of points in the Bethe lattice are connected, then \alpha(C_B) is also connected, because \alpha is a covering map and therefore sends paths to paths. Certainly \alpha(C_B) is either the same size or smaller than C_B, depending on whether different points in C_B have the same image. (Eg, recall that the paths “right, right, up, up” and “up, up, right, right” go to different points on the Bethe lattice, and the same point on the integer lattice.)

Now \alpha(C_B) is some sort of a random cluster, but is it drawn from the same probability distribution as randomly generated clusters on the integer lattice? No, in fact it tends to be larger. Specifically, if we fix an origin point v_011As a slight abuse of notation we will use v_0 to mean the origin of both the Bethe lattice and the integer lattice, i.e. we identify v_0 with \alpha(v_0). and suppose C_B is a random cluster containing v_0 on the Bethe lattice, and C_Z is a random cluster containing v_0 on the integer lattice. (We are, of course, using the same value p < 1 / (2d - 1) for both lattices, and considering only finite clusters.) Then we claim that for any finite collection of points C, we have:

P(C \subset C_Z) \leq P(C \subset \alpha(C_B)).

For simplicity suppose C consists of a single point v \neq v_0. If v \in C_Z, then there exists a non self-intersecting path from v_0 to v that passes through points in C_Z. Let us make a list of all finite non self-intersecting path A_1, A_2, \ldots that go from v_0 to v. Then v \in C_Z if and only if at least one of the A_i \subset C_Z.

To compute this, we add up P(A_i \subset C_Z) over all of the A_i, and then remove the probability that at least two of the paths are in C_Z, etc. Each P(A_i \subset C_Z) is easy to compute, because it is just p^{|A_i|} where |A_i| is the length of A_i. Note that these events A_i \subset C_Z are not independet of each other because some of the paths pass through the same points as other paths.

To compute P(v \in \alpha(C_B)), we do exactly the same process involving adding up the P(A_i \subset \alpha(C_B)). Because \alpha is a covering map, each path A_i has a unique preimage \alpha^{-1}(A_i) on the Bethe lattice that is also a path. Thus we can do this adding up process on the Bethe lattice, and add up the P(\alpha^{-1}(A_i) \subset C_B) and subtract duplicates, etc.

Certainly

P(A_i \subset C_Z) = p^{|A_i|} = P(\alpha^{-1}(A_i) \subset C_B)

because the paths A_i and \alpha^{-1}(A_i) have the same length, and we are using the same probability p on both lattices.

However the joint probabilities of the paths are not the same, because there are paths that pass through the same points on the integer lattice but do not intersect on the Bethe lattice. (E.g. “right, up, up” and “up, right, right”; or indeed all of the A_i end at the same point on the integer lattice but different points on the Bethe lattice.) The events A_i \subset C_Z are more correlated with each other than the events \alpha^{-1}(A_i) \subset C_B, so the probability that at least one event happens is greater for the latter than the former. That is, P(v \in \alpha(C_B)) > P(v \in C_Z).

Similarly this holds if C is any finite collection of points, not just one point.

It follows that for any integer m,

P(|C_Z| \geq m) \leq P(|\alpha(C_B)| \geq m) \leq P(|C_B| \geq m)

where we have used that |C_B| \geq |\alpha(C_B)|.

Since we previously saw that when p < 1 / (z - 1) (the critical point on the Bethe lattice, where z = 2d here), the tail probability distribution of the |C_B| is exponential (i.e. \log P(|C_B| \geq m) \sim -m), it follows that |C_Z| is bounded from above by an exponential.

What about bounded from below? That is easy: the d dimensional integer lattice contains the 1D lattice as a subset, and so the size distribution is bounded from below by the distribution on the 1D lattice, which is also exponential. Therefore \log P(|C_z| \geq m) \sim -m for large m.

Now let us jump back to our earlier claim, that if we have n iid randomly generated clusters, and X_1 = |C_z| is the size of one of them, and x_0 is the median of the size of the largest one, then

P(X_1 > x_0) \approx \frac {\log 2}{n}

so therefore

-x_0 \sim \log P(X_1 > x_0) \sim -\log n \sim -\log L

and we have demonstrated the desired scaling law x_0 \sim \log L (recall n \sim L^d).

While this “proof” is certainly quite hand-wavey, it is largely sound but for one critical weakness: the critical points of the Bethe lattice and integer lattice are different. This argument is only valid for p < 1 / (2d - 1), the former critical point. For 1 / (2d - 1) < p < p_c a more sophisticated argument is needed. This kind of approximation process, where we replace the integer lattice with an easier lattice, will never be able to bring us all the way to p_c because the easier lattice has a lower critical threshold. To write a proof that is valid for all p < p_c requires somehow exploiting the critical behavior as the value of p_c is, in general, not definable analytically.

Follow RSS/Atom feed for updates.