3.2 Simple Monte Carlo Integration

In this example, we illustrate one of the fundamental uses of Monte Carlo methods: estimating the area of a function. Suppose we have some function, \(g(x)\), defined over the range \(a \leq x \leq b\) and we want to evaluate the integral:

\[ \theta = \int\limits_{a}^{b} g(x) \mathrm{d}x\]

Monte Carlo methods allow us to evaluate this integral by considering the problem as an estimation problem. It turns out that the problem can be translated into estimating the expected value of a well-chosen random variable. While a number of different choices for the random variable exist, we will pick one of the simplest for illustrative purposes. Define \(Y\) as follows with \(X \sim U(a,b)\):

\[\begin{equation} Y = \left(b-a\right)g(X) \tag{3.1} \end{equation}\]

Notice that \(Y\) is defined in terms of \(g(X)\), which is also a random variable. Because a function of a random variable is also a random variable, this makes \(Y\) a random variable . Thus, the expectation of \(Y\) can be computed as follows:

\[\begin{equation} E\lbrack Y \rbrack = \left(b-a\right)E\lbrack g(X)\rbrack \tag{3.2} \end{equation}\]

Now, let us derive,\(E\lbrack g(X) \rbrack\). By definition,

\[ E_{X}\lbrack g(x) \rbrack = \int\limits_{a}^{b} g(x)f_{X}(x)\mathrm{d}x\]

And, the probability density function for a \(X \sim U(a,b)\) random variable is:

\[f_{X}(x) = \begin{cases} \frac{1}{b-a} & a \leq x \leq b\\ 0 & \text{otherwise} \end{cases}\]
Therefore,

\[\begin{equation} E_{X}\lbrack g(x) \rbrack = \int\limits_{a}^{b} g(x)f_{X}(x)\mathrm{d}x = \int\limits_{a}^{b} g(x)\frac{1}{b-a}\mathrm{d}x \end{equation}\]

Substituting into Equation (3.2), yields,

\[\begin{aligned} E\lbrack Y \rbrack & = E\lbrack \left(b-a\right)g(X) \rbrack = \left(b-a\right)E\lbrack g(X) \rbrack\\ & = \left(b-a\right)\int\limits_{a}^{b} g(x)\frac{1}{b-a}\mathrm{d}x \\ & = \int\limits_{a}^{b} g(x)\mathrm{d}x = \theta\end{aligned}\]

Therefore, by estimating the expected value of \(Y\), we can estimate the desired integral. From basic statistics, we know that a good estimator for \(E\lbrack Y \rbrack\) is the sample average of observations of \(Y\). Let \(Y_{1}, Y_{2},...Y_{n}\) be a random sample of observations of \(Y\). Let \(X_{i}\) be the \(i^{th}\) observation of \(X\). Substituting each \(X_{i}\) into Equation (3.1) yields the \(i^{th}\) observation of \(Y\),

\[Y_{i} = \left(b-a\right)g(X_{i})\]

Then, the sample average of is:

\[\begin{aligned} \bar{Y}(n) & = \frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} = \left(b-a\right)\frac{1}{n}\sum\limits_{i=1}^{n}\left(b-a\right)g(X_{i})\\ & = \left(b-a\right)\frac{1}{n}\sum\limits_{i=1}^{n}g(X_{i})\\\end{aligned}\]

where \(X_{i} \sim U(a,b)\). Thus, by simply generating \(X_{i} \sim U(a,b)\), plugging the \(X_{i}\) into the function of interest, \(g(x)\), taking the average over the values and multiplying by \(\left(b-a\right)\), we can estimate the integral. This works for any integral and it works for multi-dimensional integrals. While this discussion is based on a single valued function, the theory scales to multi-dimensional integration through the use of multi-variate distributions.

Example 3.6 (Area Estimation) Suppose that we want to estimate the area under \(f(x) = x^{\frac{1}{2}}\) over the range from \(1\) to \(4\). That is, we want to evaluate the integral:

\[\theta = \int\limits_{1}^{4} x^{\frac{1}{2}}\mathrm{d}x = \dfrac{14}{3}=4.6\bar{6}\]

According to the previously presented theory, we need to generate \(X_i \sim U(1,4)\) and then compute \(\bar{Y}\), where \(Y_i = (4-1)\sqrt{X{_i}}= 3\sqrt{X{_i}}\). In addition, for this simple example, we can easily check if our Monte Carlo approach is working because we know the true area.

fun main() {
    val a = 1.0
    val b = 4.0
    val ucdf = UniformRV(a, b, streamNum = 3)
    val stat = Statistic("Area Estimator")
    val n = 100 // sample size
    for (i in 1..n) {
        val x = ucdf.value
        val gx = sqrt(x)
        val y = (b - a) * gx
        stat.collect(y)
    }
    System.out.printf("True Area = %10.3f %n", 14.0 / 3.0)
    System.out.printf("Area estimate = %10.3f %n", stat.average)
    println("Confidence Interval")
    println(stat.confidenceInterval)
}

True Area =      4.667
Area estimate =      4.781
Confidence Interval
[4.608646560421988, 4.952515649272401]

NOTE! Notice that the previous examples created the random variable outside of the loop that is used to generate values. It may not be an error to create random variables within loops, but it is generally bad practice and inefficient in terms of object creation. If you see yourself creating many instances of an object and then using it only one time before it is discarded, you should reconsider what you are doing.

Because confidence intervals may form the basis for decision making, you can use the confidence interval half-width in determining the sample size. A review of these and other statistical concepts will be the focus of the next section.