Using Metropolis Hastings on the Ising Model
Ising Model
In the previous MCMC examples, we were only interested in sampling the uniform distribution. In the cases considered, it is not to hard to directly prove that for large $n$, the distribution of $X_n$ is roughly uniform on our desired set (proper configurations $\mathcal X$, or subgraph $\Gamma$). When $\pi$ is not uniform, however, it’s not quite clear how to modify the previous algorithm to make it work.
One example of this is the Ising model, which arises as a model of ferromagnetism. This considers a lattice of particles, each with a spin of $\pm 1$ representing the alignment of the magnetic field of the particles. Let $\Lambda \subseteq \R^d$ be a finite set (representing the lattice of particles), and $\sigma \colon \Lambda \to \set{\pm 1}$ be the spins. The energy of the configuration $\sigma$ is given by \begin{equation} H(\sigma) \defeq - J \sum_{\substack{i, j \in \Lambda\\i \sim j}} \sigma_i \sigma_j - B \sum_{i \in \Lambda} \sigma_i \,, \end{equation} where $i \sim j$ means $i$ and $j$ are nearest neighbors in the lattice $\Lambda$. Here $J \neq 0$ is the interaction strength. For ferromagnetic materials particles favor alignment and we have $J > 0$, and for anti-ferromagnetic materials particles do not favor alignment and we have $J < 0$. The quantity $B$ represents the strength of the external magnetic field. Having spins align with the external magnetic field (i.e. $\sign(\sigma_i) = \sign(B)$) reduces the total energy.
The Ising model dictates that we expect to see the configuration $\sigma$ with probability proportional to \begin{equation} \pi_u(\sigma) \defeq e^{-\beta H(\sigma)}\,, \quad\text{where } \beta = \frac{1}{k_B T}\,, \end{equation} Here, $T$ is the temperature, and $k_B$ is the Boltzmann constant.
To obtain the probability from $\pi_u$ you have to normalize it by computing the partition function: \begin{equation} Z_\beta \defeq \sum_\sigma \pi_u(\sigma) \, \end{equation} and then setting \begin{equation} \pi(\sigma) = \frac{\pi_u(\sigma)}{Z_\beta} \,. \end{equation} The distribution $\pi$ is called the Gibbs distribution. If there are $100$ points in the lattice, then the above sum has $2^{100}$ terms which is not computationally tractable.
If $\beta \approx 0$ then $\pi$ is roughly uniform, and there’s no alignment of spins. When $\beta$ is large, however, non-alignment of spins becomes energetically expensive. For instance with $B = 0$, $J = 1$, consider two states $\sigma = 1$ (everything aligned) and $\tau$ where the spins oscillate a lot. We compute \begin{equation} \frac{\pi_u(\sigma) }{\pi_u(\tau)} \approx \frac{e^{\beta \abs{\Lambda}}}{e^{\beta O(1)}} \,, \end{equation} and so an aligned state is $e^{\beta \abs{\Lambda}}$ times more likely to occur than a typical state. In fact, it is known that when $\beta = O(1)$, there are $O(2^{\abs{\Lambda}/2})$ low energy configurations that the system is typically in, and all the other configurations occur with negligible probability. If you blindly did uniform random sampling, then you will find the typical low energy configurations with probability roughly $2^{-\abs{\Lambda}/2}$, which is miniscule! However, a simple MCMC algorithm finds this set very quickly as shown below.
To illustrate the evolution, Here is a video of an MCMC method to sample from $\pi$ when $\beta = 1$. It finds a typical low energy configuration quickly (few thousand iterations).
Phase transitions
One question that is hard to analyze theoretically, but can be done quickly via MCMC is to study phase transitions. Define the magnetization of a spin configuration to be \begin{equation} M(\sigma) = \frac{1}{\abs{\Lambda}} \sum_{i \in \Lambda} \sigma(i) \,. \end{equation} By symmetry we clearly have \begin{equation} \E M = \sum_{\sigma} M(\sigma) \pi(\sigma) = \frac{1}{Z_\beta} \sum_{\sigma} M(\sigma) \pi_u(\sigma) = 0 \,. \end{equation}
When $\Lambda$ is an $N \times N$ lattice, it is possible to show that there exist two inverse temperatures $0 < \beta_1 < \beta_2$ such that:
-
For $\beta > \beta_2$, there exists $\delta(\beta) > 0$ such that \begin{equation} \E \abs{M} > \delta(\beta)\,. \end{equation}
-
If $\beta < \beta_1$, there exists $\delta(\beta) > 0$ such that
\begin{equation} \lim_{N \to \infty} \P( \abs{M} \leq \delta ) = 1 \end{equation}
This was originally proved by Peirels. While this is hard to prove, theoretically one can see it numerically quite easily. We simulate $m$ random spin configurations using an MCMC algorithm. Then numerically compute \begin{equation} \E \abs{M} \approx \frac{1}{m} \sum_{i = 1}^m \abs{M(\sigma_i)} \end{equation} and see if it stays away from $0$ large when $\beta = O(1)$, and whether it is close to $0$ when $\beta$ is small. Some numerical simulations are shown here.
Metropolis Hastings Algorithm
This algorithm tells you how to sample from an un-normalized probability distribution $\pi_u$ on a (possibly very large) state space $\mathcal X$. It can be used to efficiently sample from the Gibbs distribution in the Ising model; but also applies to a wide variety of situations.
Requirements:
-
A proposal mechanism $Q$: Given $x \in \mathcal X$, we are able to choose $y \in \mathcal X$ randomly with distribution $Q(x, y)$.
-
An un-normalized probability distribution $\pi_u \colon \mathcal X \to [0, \infty)$.
Output: Returns a random variable $X_n$ after $n$ steps. In typical situations the distribution of $X_n$ converges to the normalized probability distribution $\pi$ as $n \to \infty$.
Algorithm:
-
Choose $X_0 \in \mathcal X$ arbitrarily.
-
Given $X_n$, choose $y \in \mathcal X$ with probability $Q(X_n, y)$ using your proposal mechanism.
-
Define the acceptance probability $A(x, y)$ by
\begin{equation}\label{e:acceptanceRatio}\tag{A} A(x, y) \defeq \min\set[\Big]{ 1, \frac{\pi_u(y) Q(y, x)}{\pi_u(x) Q(x, y)} } \end{equation}
-
Choose $U_{n+1} \sim \Unif([0, 1])$ independently and define
\begin{equation} X_{n+1} = \begin{cases} y & \text{ if } U \leq A \,, \\ X_n & \text{ otherwise} \,. \end{cases} \end{equation}
Examples of proposition mechanisms are:
-
Given a spin configuration $\sigma \colon \Lambda \to \set{\pm 1}$ in the Ising model, pick a site $i \in \Lambda$ uniformly randomly, and flip the spin. (I.e. choose $i \in \Lambda$ uniformly, and set $\tau(j) = \sigma(j)$ if $j \neq i$ and $\tau(i) = - \sigma(i)$.)
-
Given a proper coloring in the hard sphere model, pick a random sphere and flip its color. If it’s a valid coloring, propose it.
While the Metropolis–Hastings algorithm is simple to explain, the reason it works is not so obvious. We will soon see that under certain assumptions (irreducibility and aperiodicity) the distribution of $X_n$ converges to normalised probability distribution $\pi = \pi_u / Z$ (where $Z = \sum_x \pi_u(x)$) as $n \to \infty$. So if you want to sample from $\pi$, you only have to simulate $X_n$ for some large $n$.
A few words of caution:
-
When implementing the Metropolis Hastings algorithm, computing $A$ using \eqref{e:acceptanceRatio} usually leads to catastrophic floating point errors. It is almost always better to pencil / paper simplify the expression first. If $y$ is obtained by making a small change to $x$, then there will typically be a lot of cancellation in $\pi_u(y) / \pi_u(x)$. You may also want to work with the logarithm of $A$ instead of $A$.
-
Irreducibility and aperiodicity aren’t always easy to check; many times the Metropolis–Hastings algorithm is used without checking the required conditions.
-
In the continuous space setting, one needs an additional condition to ensure convergence. Many treatments don’t even state these conditions. Checking them is usually hard.
-
The most important practical consideration is the rate of convergence. Even though $\dist(X_n) \to \pi$ as $n \to \infty$, the convergence rate may be extremely slow. You see this in practice if your chain gets stuck – new proposals keep getting rejected.
Code for figures
The code generating all figures on this page is here. The (short) function performing the MCMC simulation has been removed from this code as that is a homework problem.