3.3 Review of Statistical Concepts

The simulation models that have been illustrated have estimated the quantity of interest with a point estimate (i.e. \(\bar{X}\)). For example, in Example 2.1, we knew that the true area under the curve is \(4.6\bar{6}\); however, the point estimate returned by the simulation was a value of 4.3872, based on 20 samples. Thus, there is sampling error in our estimate. The key question examined in this section is how to control the sampling error in a simulation experiment. The approach that we will take is to determine the number of samples so that we can have high confidence in our point estimate. In order to address these issues, we need to review some basic statistical concepts in order to understand the meaning of sampling error.

3.3.1 Point Estimates and Confidence Intervals

Let \(x_{i}\) represent the \(i^{th}\) observations in a sample, \(x_{1}, x_{2},...x_{n}\) of size \(n\). Represent the random variables in the sample as \(X_{1}, X_{2},...X_{n}\). The random variables form a random sample, if 1) the \(X_{i}\) are independent random variables and 2) every \(X_{i}\) has the same probability distribution. Let’s assume that these two assumptions have been established. Denote the unknown cumulative distribution function of \(X\) as \(F(x)\) and define the unknown expected value and variance of \(X\) with \(E[X] = \mu\) and \(Var[X] = \sigma^2\), respectively.

A statistic is any function of the random variables in a sample. Any statistic that is used to estimate an unknown quantity based on the sample is called an estimator. What would be a good estimator for the quantity \(E[X] = \mu\)? Without going into the details of the meaning of statistical goodness, one should remember that the sample average is generally a good estimator for \(E[X] = \mu\). Define the sample average as follows:

\[\begin{equation} \bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i \tag{3.1} \end{equation}\]

Notice that \(\bar{X}\) is a function of the random variables in the sample and therefore it is also a random variable. This makes \(\bar{X}\) a statistic, since it is a function of the random variables in the sample.

Any random variable has a corresponding probability distribution. The probability distribution associated with a statistic is called its sampling distribution. The sampling distribution of a statistic can be used to form a confidence interval on the point estimate associated with the statistic. The point estimate is simply the value obtained from the statistic once the data has been realized. The point estimate for the sample average is computed from the sample:

\[\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i\]

A confidence interval expresses a degree of certainty associated with a point estimate. A specific confidence interval does not imply that the parameter \(\mu\) is inside the interval. In fact, the true parameter is either in the interval or it is not within the interval. Instead you should think about the confidence level \(1-\alpha\) as an assurance about the procedure used to compute the interval. That is, a confidence interval procedure ensures that if a large number of confidence intervals are computed each based on \(n\) samples, then the proportion of the confidence intervals that actually contain the true value of \(\mu\) should be close to \(1-\alpha\). The value \(\alpha\) represents risk that the confidence interval procedure will produce a specific interval that does not contain the true parameter value. Any one particular confidence interval will either contain the true parameter of interest or it will not. Since you do not know the true value, you can use the confidence interval to assess the risk of making a bad decision based on the point estimate. You can be confident in your decision making or conversely know that you are taking a risk of \(\alpha\) of making a bad decision. Think of \(\alpha\) as the risk that using the confidence interval procedure will get you fired. If we know the sampling distribution of the statistic, then we can form a confidence interval procedure.

Under the assumption that the sample size is large enough such that the distribution of \(\bar{X}\) is normally distributed then you can form an approximate confidence interval on the point estimate, \(\bar{x}\). Assuming that the sample variance:

\[\begin{equation} S^{2}(n) = \frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2 \tag{3.2} \end{equation}\]

is a good estimator for \(Var[X] = \sigma^2\), then a \((1-\alpha)100\%\) two-sided confidence interval estimate for \(\mu\) is:

\[\begin{equation} \bar{x} \pm t_{1-(\alpha/2), n-1} \dfrac{s}{\sqrt{n}} \tag{3.3} \end{equation}\]

where \(t_{p, \nu}\) is the \(100p\) percentage point of the Student t-distribution with \(\nu\) degrees of freedom. The spreadsheet function T.INV(p, degrees freedom) computes the left-tailed Student-t distribution value for a given probability and degrees of freedom. That is, \(t_{p, \nu} = T.INV(p,\nu)\). The spreadsheet tab labeled Student-t in the spreadsheet that accompanies this chapter illustrates functions related to the Student-t distribution. Thus, in order to compute \(t_{1-(\alpha/2), n-1}\), use the following formula:

\[t_{1-(\alpha/2), n-1} = T.INV(1-(\alpha/2),n-1)\] The T.INV.2T(\(\alpha\), \(\nu\)) spreadsheet function directly computes the upper two-tailed value. That is,

\[ t_{1-(\alpha/2), n-1} =T.INV.2T(\alpha, n-1) \] Within the statistical package R, this computation is straight-forward by using the \(t_{p, n} = qt(p,n)\) function.

\[ t_{p, n} = qt(p,n) \]

#'qt(0.975, 5)
qt(0.975,5)

## [1] 2.570582

#' set alpha
alpha = 0.05
#'qt(1-(alpha/2, 5)
qt(1-(alpha/2),5)

## [1] 2.570582

3.3.2 Sample Size Determination

The confidence interval for a point estimator can serve as the basis for determining how many observations to have in the sample. From Equation ((3.3)), the quantity:

\[\begin{equation} h = t_{1-(\alpha/2), n-1} \dfrac{s}{\sqrt{n}} \tag{3.4} \end{equation}\]

is called the half-width of the confidence interval. You can place a bound, \(\epsilon\), on the half-width by picking a sample size that satisfies:

\[\begin{equation} h = t_{1-(\alpha/2), n-1} \dfrac{s}{\sqrt{n}} \leq \epsilon \tag{3.5} \end{equation}\]

We call \(\epsilon\) the margin of error for the bound. Unfortunately, \(t_{1-(\alpha/2), n-1}\) depends on \(n\), and thus Equation ((3.5)) is an iterative equation. That is, you must try different values of \(n\) until the condition is satisfied. Within this text, we call this method for determining the sample size the Student-T iterative method.

Alternatively, the required sample size can be approximated using the normal distribution. Solving Equation ((3.5)) for \(n\) yields:

\[n \geq \left(\dfrac{t_{1-(\alpha/2), n-1} \; s}{\epsilon}\right)^2\]

As \(n\) gets large, \(t_{1-(\alpha/2), n-1}\) converges to the \(100(1-(\alpha/2))\) percentage point of the standard normal distribution \(z_{1-(\alpha/2)}\). This yields the following approximation:

\[\begin{equation} n \geq \left(\dfrac{z_{1-(\alpha/2)} s}{\epsilon}\right)^2 \tag{3.6} \end{equation}\]

Within this text, we refer to this method for determining the sample size as the normal approximation method. This approximation generally works well for large \(n\), say \(n > 50\). Both of these methods require an initial value for the standard deviation.

In order to use either the normal approximation method or the Student-T iterative method, you must have an initial value for, \(s\), the sample standard deviation. The simplest way to get an initial estimate of \(s\) is to make a small initial pilot sample (e.g. \(n_0=5\)). Given a value for \(s\) you can then set a desired bound and use the formulas. The bound is problem and performance measure dependent and is under your subjective control. You must determine what bound is reasonable for your given situation. One thing to remember is that the bound is squared in the denominator for evaluating \(n\). Thus, very small values of \(\epsilon\) can result in very large sample sizes.

Example 3.2 Suppose we are required to estimate the output from a simulation so that we are 99% confidence that we are within \(\pm 0.1\) of the true population mean. After taking a pilot sample of size \(n=10\), we have estimated \(s=6\). What is the required sample size?

We will use Equation ((3.6)) to determine the sample size requirement. For a 99% confidence interval, we have \(\alpha = 0.01\) and \(\alpha/2 = 0.005\). Thus, \(z_{1-0.005} = z_{0.995} = 2.576\). Because the margin of error, \(\epsilon\) is \(0.1\), we have that,

\[n \geq \left(\dfrac{z_{1-(\alpha/2)}\; s}{\epsilon}\right)^2 = \left(\dfrac{2.576 \times 6}{0.1}\right)^2 = 23888.8 \approx 23889\]

If the quantity of interest is a proportion, then a different method can be used. In particular, a \(100\times(1 - \alpha)\%\) large sample two sided confidence interval for a proportion, \(p\), has the following form:

\[\begin{equation} \hat{p} \pm z_{1-(\alpha/2)} \sqrt{\dfrac{\hat{p}(1 - \hat{p})}{n}} \tag{3.7} \end{equation}\]

where \(\hat{p}\) is the estimate for \(p\). From this, you can determine the sample size via the following equation:

\[\begin{equation} n = \left(\dfrac{z_{1-(\alpha/2)}}{\epsilon}\right)^{2} \hat{p}(1 - \hat{p}) \tag{3.8} \end{equation}\]

Again, a pilot run is necessary for obtaining an initial estimate of \(\hat{p}\), for use in determining the sample size. If no pilot run is available, then \(\hat{p} =0.5\) is often assumed as a worse case approximation.

If you have more than one performance measure of interest, you can use these sample size techniques for each of your performance measures and then use the maximum sample size required across the performance measures. Now, let’s illustrate these methods based on a small Arena simulation.

3.3.3 Determining the Sample Size for an Arena Simulation

To facilitate some of the calculations related to determining the sample size for a simulation experiment, I have constructed a spreadsheet called SampleSizeDetermination.xlsx, which is found in the book support files for this chapter. You may want to utilize that spreadsheet as you go through this an subsequent sections.

Using a simple example, we will illustrate how to determine the sample size necessary to estimate a quantity of interest with a high level of confidence.

Example 3.3 Suppose that we want to simulate a normally distributed random variable, \(X\), with \(E[X] = 10\) and \(Var[X] = 16\). From the simulation, we want to estimate the true population mean with 99 percent confidence for a half-width \(\pm 0.50\) margin of error. In addition to estimating the population mean, we want to estimate the probability that the random variable exceeds 8. That is, we want to estimate, \(p= P[X>8]\). We want to be 95% confident that our estimate of \(P[X>8]\) has a margin of error of \(\pm 0.05\). What are the sample size requirements needed to meet the desired margins of error?

Using simulation for this example is for illustrative purposes. Naturally, we actually know the true population mean and we can easily compute the \(p= P[X>8]\). for this situation. However, the example will illustrate sample size determination, which is an important planning requirement for simulation experiments, and the example reinforces how to use the RECORD module.

Let \(X\) represent the unknown random variable of interest. Then, we are interested in estimating \(E[X]=\mu\) and \(p = P[X > 8]\). To estimate these quantities, we generate a random sample, \((X_1,X_2,...,X_n)\). \(E[X]\) can be estimated using \(\bar{X}\). To estimate a probability it is useful to define an indicator variable. An indicator variable has the value 1 if the condition associated with the probability is true and has the value 0 if it is false. To estimate \(p\), define the following indicator variable:

\[ Y_i = \begin{cases} 1 & X_i > 8\\ 0 & X_i \leq 8 \\ \end{cases} \] This definition of the indicator variable \(Y_i\) allows \(\bar{Y}\) to be used to estimate \(p\). Recall the definition of the sample average \(\bar{Y}\). Since \(Y_{i}\) is a \(1\) only if \(X_i > 8\), then the \(\sum \nolimits_{i=1}^{n} Y_{i}\) simply adds up the number of \(1\)’s in the sample. Thus, \(\sum\nolimits_{i=1}^{n} Y_{i}\) represents the count of the number of times the event \(X_i > 8\) occurred in the sample. Call this \(\#\lbrace X_i > 8\rbrace\). The ratio of the number of times an event occurred to the total number of possible occurrences represents the proportion. Thus, an estimator for \(p\) is:

\[ \hat{p} = \bar{Y} = \frac{1}{n}\sum_{i=1}^{n}Y_i = \frac{\#\lbrace X_i > 8\rbrace}{n} \] Therefore, computing the average of an indicator variable will estimate the desired probability. The pseudo-code for this situation is quite simple:

CREATE 1 entity
ASSIGN myX = NORM(10, 4)
Begin RECORD
    myX as EstimatedX
    (myX > 8) as EstimatedProb
End RECORD
DISPOSE

In the pseudo-code, the RECORD module is recording observations of the quantity \((myX > 8)\). This quantity is actually a logical condition which will evaluate to true (1) or false (0) given the value of \(myX\). Thus, the RECORD module will be recording observations of 1’s and 0’s. Because the RECORD module computes that average of the values that it “records,” we will get an estimate of the probability. Thus, the quantity, \((myX > 8)\) is an indicator variable for the desired event. Using a RECORD module to estimate a probability in this manner is quite effective.

Now, in more realistic simulation situations, we would not know the true population mean and variance. Thus, in solving this problem, we will ignore that information. To apply the previously discussed sample size determination methods, we need estimates of \(\sigma = \sqrt{Var[X]}\) and \(p = P[X>8]\). In order to get estimates of these quantities, we need to run our simulation experiment; however, in order to run our simulation experiment, we need to know how many observations to make. This is quite a catch-22 situation!

To proceed, we need to run our simulation model for a pilot experiment. From a small number of samples (called a pilot experiment, pilot run or pilot study), we will estimate \(\sigma\) and \(p = P[X>8]\) and then use those estimates to determine how many samples we need to meet the desired margins of error. We will then re-run the simulation using the recommended sample size.

The natural question to ask is how big should your pilot sample be? In general, this depends on how much it costs for you to collect the sample. Within a simulation experiment context, generally, cost translates into how long your simulation model takes to execute. For this small problem, the execution time is trivial, but for large and complex simulation models the execution time may be in hours or even days. A general rule of thumb is between 10 and 30 observations. For this example, we will use an initial pilot sample size, \(n_0 = 20\).

The Arena model can be easily built by following the pseudo-code. You can find the Arena model as part of the files associated with the chapter. Running the model for the 20 replications yields the following results.

Table 3.1: Arena results for \(n_0 = 20\) replications
Performance Measures	Average	95% Half-Width
EstimatedProb	0.70	0.22
EstimatedX	10.3224	2.24

Notice that the confidence interval for the true mean is \((8.0824,12.5624)\) with a half-width of \(2.24\), which exceeds the desired margin of error, \(\epsilon = 0.5\). Notice also that this particular confidence interval happens to contain the true mean \(E[X] = 10\). To determine the sample size so that we get a half-width that is less than \(\epsilon = 0.1\) with 99% confidence, we can use Equation (3.6). In order to do this, we must first have an estimate of the sample standard deviation, \(s\). Since Arena reports a half-width for a 95% confidence interval, we need to use Equation (3.4) and solve for \(s\) in terms of \(h\). Rearranging Equation (3.4) yields,

\[\begin{equation} s = \dfrac{h\sqrt{n}}{t_{1-(\alpha/2), n-1}} \tag{3.9} \end{equation}\]

Using the results from Table 3.1 and \(\alpha = 0.05\) in Equation (3.9) yields,

\[ s_0 = \dfrac{h\sqrt{n_0}}{t_{1-(\alpha/2), n_0-1}} = \dfrac{2.24\sqrt{20}}{t_{0.975, 19}}= \dfrac{2.24\sqrt{20}}{2.093}=4.7862 \]

Now, we can use Equation (3.6) to determine the sample size requirement. For a 99% confidence interval, we have \(\alpha = 0.01\) and \(\alpha/2 = 0.005\). Thus, \(z_{0.995} = 2.576\). Because the margin of error, \(\epsilon\) is \(0.5\), we have that,

\[n \geq \left(\dfrac{z_{1-(\alpha/2)}\; s}{\epsilon}\right)^2 = \left(\dfrac{2.576 \times 4.7862}{0.5}\right)^2 = 608.042 \approx 609\] To determine the sample size for estimating \(p=P[X>8]\) with 95% confidence to \(\pm 0.1\), we can use Equation (3.8)

\[ n = \left(\dfrac{z_{1-(\alpha/2)}}{\epsilon}\right)^{2} \hat{p}(1 - \hat{p})=\left(\dfrac{z_{0.975}}{0.1}\right)^{2} (0.22)(1 - 0.22)=\left(\dfrac{1.96}{0.1}\right)^{2} (0.22)(0.78) =65.92 \approx 66 \] By using the maximum of \(\max{(66, 609)=609}\), we can re-run the simulation this number of replications. Doing so, yields,

Table 3.2: Arena Results for \(n=609\) Replications
Performance Measures	Average	95% Half-Width
EstimatedProb	0.69	0.04
EstimatedX	9.9032	0.32

As can be seen in Table 3.2, the half-width values meet the desire margins of error. It may be possible that the margins of error might not be met. This suggests that more than \(n = 609\) observations is needed to meet the margin of error criteria. Equation (3.6) and Equation (3.8) are only approximations and based on a pilot sample. Thus, if there was considerable sampling error associated with the pilot sample, the approximations may be inadequate.

As can be noted from this example in order to apply the normal approximation method for determining the sample size based on the pilot run of an Arena model, we need to compute the initial sample standard deviation, \(s_0\), from the initial reported half-width, \(h\), that appears on the Arena report. This requires the use of Equation (3.9) to first compute the value of \(s\) from \(h\). We can avoid this calculation by using the half-width ratio method for determining the sample size.

Let \(h_0\) be the initial value for the half-width from the pilot run of \(n_0\) replications. Then, rewriting Equation (3.4) in terms of the pilot data yields:

\[ h_0 = t_{1-(\alpha/2), n_0 - 1} \dfrac{s_0}{\sqrt{n_0}} \] Solving for \(n_0\) yields:

\[\begin{equation} n_0 = t_{1-(\alpha/2), n_0 -1}^{2} \dfrac{s_{0}^{2}}{h_{0}^{2}} \tag{3.10} \end{equation}\]

Similarly for any \(n\), we have:

\[\begin{equation} n = t_{1-(\alpha/2), n-1}^{2} \dfrac{s^{2}}{h^{2}} \tag{3.11} \end{equation}\]

Taking the ratio of \(n_0\) to \(n\) (Equations (3.10) and (3.11)) and assuming that \(t_{1-(\alpha/2), n-1}\) is approximately equal to \(t_{1-(\alpha/2), n_0 - 1}\) and \(s^2\) is approximately equal to \(s_0^2\), yields,

\[n \cong n_0 \dfrac{h_0^2}{h^2} = n_0 \left(\frac{h_0}{h}\right)^2\]

This is the half-width ratio equation.

\[\begin{equation} n \geq n_0 \left(\frac{h_0}{h}\right)^2 \tag{3.12} \end{equation}\]

The issue that we face when applying Equation (3.12) is that Arena only reports the 95% confidence interval half-width on its reports. Thus, to apply Equation (3.12) the value of \(h_0\) will be based on an \(\alpha = 0.05\). If the desired half-width, \(h\), is required for a different confidence level, e.g. a 99% confidence interval, then you must first translate \(h_0\) to be based on the desired confidence level. This requires computing \(s\) from Equation (3.9) and then recomputing the actual half-width, \(h\), for the desired confidence level.

Using the results from Table 3.1 and \(\alpha = 0.05\) in Equation (3.9) we already know that \(s_0 = 4.7862\) for a 95% confidence interval. Thus, for a 99% confidence interval, where \(\alpha = 0.01\) and \(\alpha/2 = 0.005\), we have:

\[ h_0 = t_{1-(\alpha/2), n_0-1} \dfrac{s_0}{\sqrt{n_0}}= t_{0.995, 19}\dfrac{4.7862}{\sqrt{20}}=2.861\dfrac{4.7862}{\sqrt{20}}=3.0618 \] Now, we can apply Equation (3.12) to this problem with the desire half-width bound \(\epsilon = 0.5 = h\):

\[ n \geq n_0 \left(\frac{h_0}{h}\right)^2=20\left(\frac{3.0618}{h}\right)^2=20\left(\frac{3.0618}{0.5}\right)^2=749.98 \cong 750 \] We see that for this pilot sample, the half-width ratio method recommends a substantially large sample size, \(750\) versus \(609\). In practice, the half-width ratio method tends to be conservative by recommending a larger sample size. We will see another example of these methods within the next section.

Based on these examples, you should have a basic understanding of how simulation can be performed to meet a desired margin of error. In general, simulation models are much more interesting than this simple example. To deepen your knowledge, we will model a more interesting situation in the next section.