Monte Carlo Integration

How expensive is quadrature?

Proposition 1. Let $f \colon [0, 1]^d \to \R$ be a $C^1$ function. Divide $[0,1]^d$ into $N$ identical cubes $Q_1$, …, $Q_N$, and let $\xi_i$ denote the center of the $i$-th cube. Then, \begin{equation} \abs[\Big]{ \int_{[0,1]^d} f(x) \, dx - \sum_{i=1}^N f(\xi_i) \vol(Q_i) } \leq \frac{\sqrt{d} \max \abs{\grad f}}{2 N^{1/d}} \end{equation}

Remark 2. In order to approximate the integral of $f$ to order $\epsilon$, you need roughly $N = O(1/\epsilon^d)$ cubes. For $\epsilon = 0.01$ and $d = 10$, this is $O(10^{20})$ cubes. If you can examine about $10^{12}$ a second (a generous overestimate for my computer), it will take you a few years to use quadrature to compute this integral to two decimal places.

Monte Carlo Integration.

Theorem 3 (Monte Carlo Integration).

Let $X_n$ be $\R^d$ valued, i.i.d. random variables with common probability density function $p$. Let $f \colon \R^d \to \R$ be a function such that $\int_{\R^d} \abs{f(x)} p(x) \, dx < \infty$. Then \begin{equation}\label{e:mcIntF} \lim_{N \to \infty} \frac{1}{N} \sum_{n = 1}^N f(X_n) = \int_{\R^d} f(x) \, p(x) \, dx\,, \quad\text{almost surely}\,. \end{equation} If further $\int_{\R^d} \abs{f(x)}^2 p(x) \, dx < \infty$, then \begin{equation} \var\paren[\Big]{ \frac{1}{N} \sum_{n = 1}^N f(X_n) } = \frac{1}{N} \int_{\R^d} \abs{f(x) - \mu}^2 p(x) \, dx \,, \quad\text{where } \mu = \int_{\R^d} f(x) p(x) \, dx\,. \end{equation}

We will prove this using the law of large numbers.

Remark 4. If $X_n$ are mutually independent and uniformly distributed on $[0,1]^d$, then the above implies \begin{equation} \lim_{N \to \infty} \frac{1}{N} \sum_{n = 1}^N f(X_n) = \int_{\R^d} f(x) \, dx\,, \quad\text{almost surely}\,. \end{equation} By the central limit theorem and the three sigma rule \begin{equation} \P \paren[\Big]{ \abs[\Big]{ \frac{1}{N} \sum_{n = 1}^N f(X_n) - \int_{[0,1]^d} f(x) \, dx } < \frac{3}{\sqrt{N} } \paren[\Big]{ \int_{[0,1]^d} \abs{f(x)}^2 \, dx }^{1/2} } \geq 0.997 \end{equation} Thus if we want to attain an error of $\epsilon > 0$ with 99.7% certainty, we need to choose \begin{equation} N = \frac{9}{\epsilon^2} \int_{[0,1]^d} \abs{f(x)}^2 \, dx = O\paren[\Big]{ \frac{1}{\epsilon^2} }\,. \end{equation}

Remark 5. To approximate the integral of $f$ with accuracy $\epsilon$ you need to:

Choose $N = O(1/\epsilon^d)$ using quadrature.
Choose $N = O(1/\epsilon^2)$ using Monte Carlo.

Use quadrature in dimension 1 (and maybe dimension 2). Use Monte Carlo in higher dimensions.

Law of Large Numbers

Theorem 3 follows immediately from the Law of Large Numbers.

Theorem 6 (Law of large numbers). Let $X_n$ be a sequence of i.i.d. random variables with $\E \abs{X_n} < \infty$. Then \begin{equation}\label{e:lln}\tag{LLN} \lim_{N \to \infty} \frac{1}{N} \sum_{n = 1}^N X_n = \E X_1\,. \end{equation}

This is easy to prove if we assume $\E X_1^2 < \infty$, and is usually done in every introductory probability course. We will instead prove this without assuming $\E X_1^2 < \infty$ using characteristic functions, below. We were, however, somewhat imprecise when stating \eqref{e:lln}, which involves convergence of random variables. This requires measure theory to treat rigorously and goes beyond the scope of this course. Here are two more precise versions of \eqref{e:lln}.

The weak law of large numbers says \eqref{e:lln} holds in probability. That is, for any $\epsilon > 0$ we have \begin{equation} \lim_{N \to \infty} \P \paren[\Big]{ \abs[\Big]{\frac{1}{N} \sum_{n = 1}^N X_n - \E X_1} > \epsilon } = 0 \end{equation}
The strong law of large numbers says \eqref{e:lln} holds almost surely. That is, for any $\epsilon > 0$ we have \begin{equation} \P \paren[\Big]{ \lim_{N \to \infty} \frac{1}{N} \sum_{n = 1}^N X_n = \E X_1 } = 1 \,. \end{equation}

Proof of Theorem 3. Follows immediately from Theorem 6 and the fact that the variance of independent random variables adds.

Convergence in Distribution

Definition 7. We say a sequence of random variables $X_n$ converges to a random variable $X$ in distribution if the CDF of $X_n$ converges to the CDF of $X$ at all points where the CDF of $X$ is continuous.

For $\R^d$ valued random variables, it is better to define convergence in distribution using bounded continuous test functions instead. However, we’re not going to split hairs about this as we will test convergence in distribution using L'evy’s continuity theorem.

Theorem 8 (Levy’s continuity theorem). A sequence of random variables $X_n$ converge to $X$ in distribution if and only if the characteristic functions of $X_n$ (defined below) converge pointwise to the characteristic function of $X$. That is $X_n$ converges to $X$ in distribution if and only if \begin{equation} \lim_{n \to \infty} \varphi_{X_n}(\lambda) \to \varphi_X(\lambda) \quad \text{for every } \xi \in \R^d\,. \end{equation}

Proof. The proof goes beyond the scope of this course, but is in every standard measure theory based probability book.

Definition 9. Let $X$ be a $\R^d$ valued random variable. The characteristic function of $X$ is the function $\varphi_X \colon \R^d \to \C$ defined by \begin{equation} \varphi_X(\lambda) = \E e^{i \lambda \cdot X} \,, \quad\text{where } i = \sqrt{-1}\,. \end{equation}

Proposition 10. Let $X$ be a random variable.

$\varphi_X$ is continuous, and $\varphi_X(0) = 1$.
If $\E \abs{X} < \infty$, then $\varphi$ is differentiable and $\grad \varphi(0) = -i \E X$.

Proof. Proving continuity and differentiability require the dominated convergence theorem, which is beyond the scope of this course. Computing $\varphi_X(0)$ and $\grad \varphi_X(0)$ is direct, and will be done on the board.

Lemma 11. If $c \in \R$ and $X$ is a random variable then $\varphi_{cX}(\lambda) = \varphi_X(c \lambda)$.

Proposition 12. Two random variables $X$ and $Y$ are independent if and only if $\varphi_{(X, Y)}(\lambda, \mu) = \varphi_X(\lambda) \varphi_Y(\mu)$.

Proof. The reverse direction requires Fourier inversion, and is beyond the scope of this course. The forward direction can be done easily.

Proof of Theorem 6. We will show that the convergence \eqref{e:lln} holds in distribution using Levy’s continuity theorem