Overview

Monte Carlo integration is the statistical estimation of an integral using evaluations of a function with a set of points drawn randomly from a distribution over the range of integration.

Suppose we are interested in evaluating \(\int g(x)\; dx\). If \(g(x)\) can be represented as \(f(x)h(x)\) where \(f(x)\) is a simple probability distribution function, then
\(\int_a^b g(x)\; dx = \int_a^b f(x)h(x)\; dx = E_f[h(x)]\)

We can approximate \(\int g(x) dx\) by drawing an i.i.d. sample from the underlying distribution function \(f\), evaluating \(h(x)\), and calcuating the average.

Demonstration

Suppose we are estimating \(\int_1^3 g(x)\; dx\) where \(g(x) = \frac{x}{(1+x^2)^2}\)

Finding \(f\) and \(h\)

Let \(f(x) = \frac{1}{2}\) and \(h(x) = 2 \frac{x}{(1+x^2)^2}\).
Then \(\int_1^3 g(x)\; dx = \int_1^3 2 *\frac{1}{2} \frac{x}{(1+x^2)^2}\; dx = E_{f}[2\frac{x}{(1+x^2)^2}\ ], f\)~\(U(1,3)\)

h <- function(x) {
  2*x/(1+x^2)^2;
}
n <- 100000
x <- runif(n, min=1, max=3)
mean(h(x))
## [1] 0.1997645

The true value of the integral is

g <- function(x) {x/(1+x^2)^2}
integrate(g, lower = 1, upper = 3)
## 0.2 with absolute error < 1.1e-14


Suppose we are estimating \(\int_{-\infty}^{\infty} g(x) \; dx\) where \(g(x) = x^2e^{-x^2}\)

Finding \(f\) and \(h\)

Let \(f(x) = \frac{1}{\sqrt{\pi}}e^{-x^2}\) and \(h(x) = \sqrt{\pi}x^2\).
Then \(\int_{-\infty}^{\infty} g(x) \; dx = \int_{-\infty}^{\infty} \sqrt{\pi} \frac{1}{\sqrt{\pi}}e^{-x^2}x^2 \; dx = E_{f}[\sqrt{\pi}x^2], f\)~\(N(0,\sqrt{\frac{1}{2}})\)

h <- function(x) {
  sqrt(pi)*x^2;
}
n <- 100000
x <- rnorm(n, 0, sqrt(1/2))
mean(h(x))
## [1] 0.890181

The true value of the integral is

g <- function(x) {x^2*exp(-x^2)}
true_val <- integrate(g, lower = -Inf, upper = Inf)
true_val
## 0.8862269 with absolute error < 1.1e-06


Effect of Sample Size

For both of the above examples, a sample size of 100,000 was selected. We shall now examine how the estimation changes as sample size increases for the second example.

set.seed(20)
h <- function(x) {
  sqrt(pi)*x^2;
}
log_n <- seq(1,8,1)
est <- numeric(8)

for (i in log_n){
  x <- rnorm(10^i,0,sqrt(1/2))
  est[i] = mean(h(x))
}

plot(log_n, est, type="o")
abline(h = true_val$value, col="blue")


As sample size increases, the estimate converges to the true integral value.