High-dimensional integration: Monte-Carlo method

Section 18.3 High-dimensional integration: Monte-Carlo method

Integration over a \(d\)-dimensional region with large \(d\) presents new challenges. One is setting up all the boundaries for multiple variables. Another is the number of evaluation points. If we use \(n\) evaluation points for each of \(d\) variables, the total number of evaluations is \(n^d\) which grows exponentially with dimension. It becomes crucial to not have large \(n\text{,}\) which is why Gaussian integration is often used.

There is an alternative: the Monte-Carlo integration method consists of picking evaluation points at random, from the uniform distribution over the region \(D\text{.}\) Averaging the computed values of \(f\text{,}\) we obtain an approximation to the average of \(f\) over \(D\text{.}\) To find the integral, multiply the average by the volume of \(D\text{.}\)

In terms of probability theory, the random choice of \(x\) makes \(f(x)\) a random variable. This random variable has mean value \(\mu\) equal to the average of \(f\) over \(D\text{.}\) Its variance \(\sigma^2\) is the average of \((f-\mu)^2\) over \(D\text{.}\) The Central Limit Theorem tells us that the mean of \(N\) independent samples is approximately normally distributed with mean \(\mu\) and variance \(\sigma^2 / N\text{.}\) In particular, with probability about \(99.7\%\) the sample mean is within three standard deviations of the mean, which means the error of Monte-Carlo method is very likely to be bounded by \(3\sigma /\sqrt{N}\text{.}\) On one hand, this tends to zero quite slowly. On another hand, the rate at which it tends to zero is independent of dimension \(d\text{.}\)

Example 18.3.1. Implement the Monte-Carlo method over a high-dimensional cube.

Integrate the function \(f(\mathbf x) = \sqrt{1+|\mathbf x|^2}\) over the 10-dimensional unit cube \([0, 1]^{10}\) using the Monte-Carlo method.

Answer

f = @(x) sqrt(1 + x*x');
d = 10;
N = 1e6;

s = 0;
for k = 1:N
    x = rand(1, d);
    s = s + f(x);
end
disp(s/N)

The output is, of course, random, but usually it is about 2.069. If your computer handles a million samples (N = 1e6) easily, you may want to try 1e7. It would be tedious to compute this integral with WolframAlpha; just entering the limits would take long.

The above code is not vectorized: it uses a loop to evaluate \(f\) at consecutive points. One could generate all these points at once with rand(N, d) and try to get all values of \(f\) at once. But this requires coding f in such a way that it can consume a matrix and produce a vector of values, and this may be difficult.

Remark 18.3.2. Averaging over a sphere.

Monte-Carlo method can also be used to average a function over a sphere with the help of normal distribution. Here is a brief summary of random number generators in Matlab:

rand(m, n) generates an \(m\times n\) matrix of random real numbers taken from the uniform distribution on the interval \([0, 1]\text{.}\)
randi([A B], m, n) generates an \(m\times n\) matrix of random integer numbers taken from the discrete uniform distribution on the finite set \(A, A+1, \dots, B\text{.}\)
randn(m, n) generates an \(m\times n\) matrix of random real numbers taken from the standard normal distribution on the real line.

Try them out with rand(1, 5), randi([2 7], 1, 5), and randn(1, 5) to observe the differences in output.

One of the distinguishing features of the normal distribution is the following: if one generates a random \(d\)-dimensional vector \(\mathbf x\) by taking each component \(x_1, \dots, x_d\) independently from the normal distribution, the vector \(\mathbf x\) is equally likely to point in any given direction. More precisely, the unit vectors \(\mathbf x / |\mathbf x|\) are uniformly distributed over the unit sphere. With Matlab, this process amounts to the computations

x = randn(1, d);
x = x/norm(x);

Note that x = randn(1, d)/norm(randn(1, d)) would not do the job, because each use of randn leads to a new random vector.

If one replaces x = rand(1, d) in Example 18.3.1 with the above code for generating random unit vectors, the result is an approximate value of the average of function \(f\) over the unit sphere \(\{\mathbf x\in \mathbb R^d\colon |\mathbf x|=1\}\text{.}\)

A remark for those familiar with probability theory: the process of generating vectors uniformly distributed over the unit sphere is based on the normal distribution because of the following property of its probability density function \(p\text{:}\) the product \(p(x_1)p(x_2)\cdots p(x_d)\) depends only on the norm of vector \(\mathbf x\text{.}\)