B.4 Modeling with Continuous Distributions

Continuous distributions can be used to model situations where the set of possible values occurs in an interval or set of intervals. Within discrete event simulation, the most common use of continuous distributions is for the modeling of the time to perform a task. Appendix F.2 summarizes the properties of common continuous distributions.

The continuous uniform distribution can be used to model situations in which you have a lack of data and it is reasonable to assume that everything is equally likely within an interval. Alternatively, if you have no a priori knowledge that some events are more likely than others, then a uniform distribution seems like a reasonable starting point. The uniform distribution is also commonly used to model machine processing times that have very precise time intervals for completion. The expected value and variance of a random variable with a continuous uniform distribution over the interval (a, b) is:

\[\begin{aligned} E[X] & = \frac{a+b}{2} \\ Var[X] & = \frac{(b-a)^2}{12}\end{aligned}\]

Often the continuous uniform distribution is specified by indicating the \(\pm\) around the expected value. For example, we can say that a continuous uniform over the range (5, 10) is the same as a uniform with 7.5 \(\pm\) 2.5. The uniform distribution is symmetric over its defined interval.

The triangular distribution is also useful in situations with a lack of data if you can characterize a most likely value for the random variable in addition to its range (minimum and maximum). This makes the triangular distribution very useful when the only data that you might have on task times comes from interviewing people. It is relatively easy for someone to specify the most likely task time, a minimum task time, and a maximum task time. You can create a survey instrument that asks multiple people familiar with the task to provide these three estimates. From, the survey you can average the responses to develop an approximate model. This is only one possibility for how to combine the survey values.

If the most likely value is equal to one-half the range, then the triangular distribution is symmetric. In other words, fifty percent of the data is above and below the most likely value. If the most likely value is closer to the minimum value then the triangular distribution is right-skewed (more area to the right). If the most likely value is closer to the maximum value then the triangular distribution is left-skewed. The ability to control the skewness of the distribution in this manner also makes this distribution attractive.

The beta distribution can also be used to model situations where there is a lack data. It is a bounded continuous distribution over the range from (0, 1) but can take on a wide variety of shapes and skewness characteristics. The beta distribution has been used to model the task times on activity networks and for modeling uncertainty concerning the probability parameter of a discrete distribution, such as the binomial. The beta distribution is commonly shifted to be over a range of values (a, b).

The exponential distribution is commonly used to model the time between events. Often, when only a mean value is available (from the data or from a guess), the exponential distribution can be used. A random variable, \(X\), with an exponential distribution rate parameter \(\lambda\) has:

\[\begin{aligned} E[X] & = \frac{1}{\lambda} \\ Var[X] & = \frac{1}{\lambda^2}\end{aligned}\]

Notice that the variance is the square of the expected value. This is considered to be highly variable. The coefficient of variation for the exponential distribution is \(c_v = 1\). Thus, if the coefficient of variation estimated from the data has a value near 1.0, then an exponential distribution may be possible choice for modeling the situation.

An important property of the exponential distribution is the lack of memory property. The lack of memory property of the exponential distribution states that given \(\Delta t\) is the time period that elapsed since the occurrence of the last event, the time \(t\) remaining until the occurrence of the next event is independent of \(\Delta t\). This implies that, \(P \lbrace X > \Delta t + t|X > t \rbrace = P \lbrace X > t \rbrace\). This property indicates that the probability of the occurrence of the next event is dependent upon the length of the interval since the last event, but not the absolute time of the last occurrence. It is the interval of elapsed time that matters. In a sense the process’s clock resets at each event time and the past does not matter when predicting the future. Thus, it “forgets” the past. This property has some very important implications, especially when modeling the time to failure. In most situations, the history of the process does matter (such as wear and tear on the machine). In which case, the exponential distribution may not be appropriate. Other distributions of the exponential family may be more useful in these situations such as the gamma and Weibull distributions. Why is the exponential distribution often used? Two reasons: 1) it often is a good model for many situations found in nature and 2) it has very convenient mathematical properties.

While the normal distribution is a mainstay of probability and statistics, you need to be careful when using it as a distribution for input models because it is defined over the entire range of real numbers. For example, within simulation the time to perform a task is often required; however, time must be a positive real number. Clearly, since a normal distribution can have negative values, using a normal distribution to model task times can be problematic. If you attempt to delay for negative time you will receive an error. Instead of using a normal distribution, you might use a truncated normal, see Section A.2.4. Alternatively, you can choose from any of the distributions that are defined on the range of positive real numbers, such as the lognormal, gamma, Weibull, and exponential distributions. The lognormal distribution is a convenient choice because it is also specified by two parameters: the mean and variance.

Table B.5 lists common modeling situations for various continuous distributions.

Table B.5: Common Modeling Situations for Continuous Distributions
Distribution Modeling Situations
Uniform when you have no data, everything is equally likely to occur within an interval, machine task times
Normal modeling errors, modeling measurements, length, etc., modeling the sum of a large number of other random variables
Exponential time to perform a task, time between failures, distance between defects
Erlang service times, multiple phases of service with each phase exponential
Weibull time to failure, time to complete a task
Gamma repair times, time to complete a task, replenishment lead time
Lognormal time to perform a task, quantities that are the product of a large number of other quantities
Triangular rough model in the absence of data assume a minimum, a maximum, and a most likely value
Beta useful for modeling task times on bounded range with little data, modeling probability as a random variable

Once we have a good idea about the type of random variable (discrete or continuous) and some ideas about the distribution of the random variable, the next step is to fit a distributional model to the data. In the following sections, we will illustrate how to fit continuous distributions to data.