Monte Carlo Methods, and why they are useful

What are Monte Carlo Methods

A Monte Carlo method is an algorithm that obtains a numerical approximation using repeated random trials. This was originally proposed by Stanislaw Ulam, inspired by his uncles gambling habits in the Monte Carlo casino in Monaco.

Example 1. The mean of a random variable can be estimated by taking an average of independent trials: \begin{equation} \E X = \lim_{N \to \infty} \frac{1}{N} \sum_{i=1}^N X_i \,, \end{equation} where $X_1$, …, $X_N$ are $N$-independent copies of the random variable $X$. (This follows from the law of large numbers.)

This is useful in practice if the random variable is easy to simulate; but hard to compute analytically.

Example 2 (Numerical integration). Let $\Omega \subseteq \R^d$, and $f \colon \Omega \to \R$ be an integrable function, and $X_1$, … $X_N$ be i.i.d. random variables with common distribution $\Unif(\Omega)$. Then \begin{equation}\label{e:mcint1}\tag{1} \lim_{N \to \infty} \frac{1}{N} \sum_{n=1}^N f(X_n) = \int_\Omega f(x) \, dx \,. \end{equation} This is a corollary of the law of large numbers.

The advantage of \eqref{e:mcint1} is that the error is of order \begin{equation} \sqrt{\frac{\var(f)}{N}} \end{equation} independent of the dimension $d$. On the other hand, if you use a standard quadrature algorithm the error is of order \begin{equation} \frac{ \sqrt{d} \max \abs{\grad f} }{2 N^{1/d}}\,. \end{equation} This makes the computational cost of quadrature exponential in the dimension $d$, and is known as the curse of dimensionality. On the other hand, the computational cost of Monte Carlo integration is independent of the dimension. (More details are here)

Example 3 (Travelling Salesman). Given $N$ points on the plane (cities), the travelling salesman problem is to find a route that travells through each city exactly once, and returns to the starting point. This is a “classic” problem which is known to be NP-hard, and you can read more about it on Wikipedia

This has been extensively studied, and there are several well known combinatorial algorithms that yield results close to the optimal path in practical amounts of time. We will numerically approximate the solution using an algorithm known as simulated annealing.

Example 4 (Substitution Ciphers). A substitution cipher is one where you create a key that is a permutation of the alphabet (e.g. $A \mapsto K$, $B \mapsto Z$, etc.). Using this key, you can encode and decode a message. At first sight this might seem uncrackable by brute force – your key is one permutation of $28!$ (26 letters plus a period and space punctuation).

This is a needle in an enormous haystack. If you could examine $10^{12}$ keys in a second (which is a generous overestimate), then it would still take you about a billion years to crack this code. Nevertheless, if you’re sending sufficiently long (few paragraphs) of readable text data, this method is crackable in seconds using simulated annealing.

Example 5 (Generative Modelling). Generative modelling algorithms are used to (for instance) produce realistic images from user input. They work by training a neural net on a large sample of images and learning a probability distribution associated with these images. Images are generated by using a Monte Carlo method to sample from this distribution.

Plan of the first half of this course

  1. In order to use Monte Carlo methods, you need to be able to sample from a given distribution. We will start with a quick introduction to basic sampling algorithms.

  2. We will then study the Metropolis Hastings algorithm; to understand why this works, we need to understand the basics of the convergence of Markov Chains to their stationary distribution.

  3. We will analyze a few applications of the Metropolis Hastings algorithm, and study commonly used numerical diagnostics.

  4. We will study simulated annealing and use it to solve a few optimization problems.

  5. Time permitting, I might sketch the algorithms used in generative modelling. To fully understand these we need to understand basics of SDEs and Langevin Monte Carlo and related sampling algorithms, which we will not have the time for. We will black-box the required background and obtain some intuition about how generative modelling works.

We will implement many of these algorithms in Python.