5.1 Simple Monte Carlo Integration

In this example, we illustrate one of the fundamental uses of Monte Carlo methods: estimating the area of a function. Suppose we have some function, \(g(x)\), defined over the range \(a \leq x \leq b\) and we want to evaluate the integral:

\[ \theta = \int\limits_{a}^{b} g(x) \mathrm{d}x\]

Monte Carlo methods allow us to evaluate this integral by couching the problem as an estimation problem. It turns out that the problem can be translated into estimating the expected value of a well-chosen random variable. While a number of different choices for the random variable exist, we will pick one of the simplest for illustrative purposes. Define \(Y\) as follows with \(X \sim U(a,b)\):

\[\begin{equation} Y = \left(b-a\right)g(X) \tag{5.1} \end{equation}\]

Notice that \(Y\) is defined in terms of \(g(X)\), which is also a random variable. Because a function of a random variable is also a random variable, this makes \(Y\) a random variable . Thus, the expectation of \(Y\) can be computed as follows:

\[\begin{equation} E\lbrack Y \rbrack = \left(b-a\right)E\lbrack g(X)\rbrack \tag{5.2} \end{equation}\]

Now, let us derive,\(E\lbrack g(X) \rbrack\). By definition,

\[ E_{X}\lbrack g(x) \rbrack = \int\limits_{a}^{b} g(x)f_{X}(x)\mathrm{d}x\]

And, the probability density function for a \(X \sim U(a,b)\) random variable is:

\[f_{X}(x) = \begin{cases} \frac{1}{b-a} & a \leq x \leq b\\ 0 & \text{otherwise} \end{cases}\]
Therefore,

\[\begin{equation} E_{X}\lbrack g(x) \rbrack = \int\limits_{a}^{b} g(x)f_{X}(x)\mathrm{d}x = \int\limits_{a}^{b} g(x)\frac{1}{b-a}\mathrm{d}x \end{equation}\]

Substituting into Equation (5.2), yields,

\[\begin{aligned} E\lbrack Y \rbrack & = E\lbrack \left(b-a\right)g(X) \rbrack = \left(b-a\right)E\lbrack g(X) \rbrack\\ & = \left(b-a\right)\int\limits_{a}^{b} g(x)\frac{1}{b-a}\mathrm{d}x \\ & = \int\limits_{a}^{b} g(x)\mathrm{d}x = \theta\end{aligned}\]

Therefore, by estimating the expected value of \(Y\), we can estimate the desired integral. From basic statistics, we know that a good estimator for \(E\lbrack Y \rbrack\) is the sample average of observations of \(Y\). Let \(Y_{1}, Y_{2},...Y_{n}\) be a random sample of observations of \(Y\). Let \(X_{i}\) be the \(i^{th}\) observation of \(X\). Substituting each \(X_{i}\) into Equation (5.1) yields the \(i^{th}\) observation of \(Y\),

\[Y_{i} = \left(b-a\right)g(X_{i})\]

Then, the sample average of is:

\[\begin{aligned} \bar{Y}(n) & = \frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} = \left(b-a\right)\frac{1}{n}\sum\limits_{i=1}^{n}\left(b-a\right)g(X_{i})\\ & = \left(b-a\right)\frac{1}{n}\sum\limits_{i=1}^{n}g(X_{i})\\\end{aligned}\]

where \(X_{i} \sim U(a,b)\). Thus, by simply generating \(X_{i} \sim U(a,b)\), plugging the \(X_{i}\) into the function of interest, \(g(x)\), taking the average over the values and multiplying by \(\left(b-a\right)\), we can estimate the integral. This works for any integral and it works for multi-dimensional integrals. While this discussion is based on a single valued function, the theory scales to multi-dimensional integration through the use of multi-variate distributions.

Suppose that we want to estimate the area under \(f(x) = x^{\frac{1}{2}}\) over the range from \(1\) to \(4\). That is, we want to evaluate the integral:

\[\theta = \int\limits_{1}^{4} x^{\frac{1}{2}}\mathrm{d}x = \dfrac{14}{3}=4.6\bar{6}\]

According to the previously presented theory, we need to generate \(X_i \sim U(1,4)\) and then compute \(\bar{Y}\), where \(Y_i = (4-1)\sqrt{X{_i}}= 3\sqrt{X{_i}}\). In addition, for this simple example, we can easily check if our Monte Carlo approach is working because we know the true area.

double a = 1.0;
double b = 4.0;
UniformRV ucdf = new UniformRV(a, b);
Statistic stat = new Statistic("Area Estimator");
int n = 100; // sample size
for(int i=1;i<=n;i++){
    double x = ucdf.getValue();
    double gx = Math.sqrt(x);
    double y = (b-a)*gx;
    stat.collect(y);
}
System.out.printf("True Area = %10.3f\n", 14.0/3.0);
System.out.printf("Area estimate = %10.3f\n", stat.getAverage());
System.out.println("Confidence Interval");
System.out.println(stat.getConfidenceInterval());
True Area =      4.667
Area estimate =      4.781
Confidence Interval
[4.608646560421988, 4.952515649272401]

Because confidence intervals may form the basis for decision making, you can use the confidence interval half-width in determining the sample size. A review of these and other statistical concepts will be the focus of the next section.