3.4 Modeling Probability Distributions

The jsl.utilities.random.rvariable package is the key package for generating random variables; however, it does not facilitate performing calculations involving the underlying probability distributions. To perform calculations involving probability distributions, you should use the jsl.utilities.distribution package. This package has almost all the same distributions represented within the jsl.utilities.random.rvariable package.

Distribution Interfaces

Figure 3.2: Distribution Interfaces

Figure3.2 illustrates the interfaces used to define probability distributions. First, the interface, CDFIfc serves as the basis for discrete distributions via the DiscreteDistributionIfc interface, for continuous distributions via the ContinuousDistributionIfc interface and the general DistributionIfc interface. The discrete distributions such as the geometric, binomial, etc. implement the DiscreteDistributionIfc and PMFIfc interfaces. Similarly, continuous distributions like the normal, uniform, etc. implement the ContinuousDistributionIfc and PDFIfc interfaces. All concrete implementations of distributions extend from the abstract base class Distribution, which implements the DistributionIfc interface. Thus, all distributions have the following capabilities:

  • cdf(double b) - computes the cumulative probability, \(F(b) = P(X \leq b)\)
  • cdf(double a, double b) - computes the cumulative probability, \(P( a \leq X \leq b)\)
  • complementaryCDF(double b) - computes the cumulative probability, \(1-F(b) = P(X > b)\)
  • getMean() - returns the expected value (mean) of the distribution
  • getVariance() - returns the variance of the distribution
  • getStandardDeviation() - returns the standard deviation of the distribution
  • invCDF(double p) - returns the inverse of the cumulative distribution function \(F^{-1}(p)\). This is performed by numerical search if necessary

Discrete distributions have a method called pmf(double k) that returns the probability associated with the value \(k\). Continuous distributions have a probability density function, \(f(x)\), implemented in the method, pdf(double x). Finally, all distributions know how to create random variables through the GetRVariableIfc interface that provides the following methods.

  • RVariableIfc getRandomVariable(RNStreamIfc stream) - returns a new instance of a random variable based on the current values of the distribution’s parameters that uses the supplied stream
  • RVariableIfc getRandomVariable(int streamNum) - returns a new instance of a random variable based on the current values of the distribution’s parameters that uses the supplied stream number
  • RVariableIfc getRandomVariable() - returns a new instance of a random variable based on the current values of the distribution’s parameters that uses a newly created stream

As an example, the following code illustrates some calculations for the binomial distribution.

// make and use a Binomial(p, n) distribution
int n = 10;
double p = 0.8;
System.out.println("n = " + n);
System.out.println("p = " + p);
Binomial bnDF = new Binomial(p, n);
System.out.println("mean = " + bnDF.getMean());
System.out.println("variance = " + bnDF.getVariance());
// compute some values
System.out.printf("%3s %15s %15s %n", "k", "p(k)", "cdf(k)");
for (int i = 0; i <= 10; i++) {
    System.out.printf("%3d %15.10f %15.10f %n", i, bnDF.pmf(i), bnDF.cdf(i));
}

The output shows the mean, variance, and basic probability calculations.

n = 10
p = 0.8
mean = 8.0
variance = 1.5999999999999996
  k            p(k)          cdf(k) 
  0    0.0000001024    0.0000001024 
  1    0.0000040960    0.0000041984 
  2    0.0000737280    0.0000779264 
  3    0.0007864320    0.0008643584 
  4    0.0055050240    0.0063693824 
  5    0.0264241152    0.0327934976 
  6    0.0880803840    0.1208738816 
  7    0.2013265920    0.3222004736 
  8    0.3019898880    0.6241903616 
  9    0.2684354560    0.8926258176 
 10    0.1073741824    1.0000000000 

The jsl.utilities.random.rvariable package creates instances of random variables that are immutable. That is, once you create a random variable, its parameters cannot be changed. However, distributions permit their parameters to be changed and they also facilitate the creation of random variables. The following code uses the setParameters() method to change the parameters of the previously created binomial distribution and then creates a random variable based on the mutated distribution.

// change the probability and number of trials
bnDF.setParameters(0.5, 20);
System.out.println("mean = " + bnDF.getMean());
System.out.println("variance = " + bnDF.getVariance());
// make random variables based on the distributions
RVariableIfc brv = bnDF.getRandomVariable();
System.out.printf("%3s %15s %n", "n", "Values");
// generate some values
for (int i = 0; i < 5; i++) {
    // getValue() method returns generated values
    int x = (int)brv.getValue();
    System.out.printf("%3d %15d %n", i+1, x);
}

The results are as we would expect. Similar calculations can be made for continuous distributions. In most cases, the concrete implementations of the various distributions have specialize methods beyond those generic methods described here. Please refer to the java docs for further details.

mean = 10.0
variance = 5.0
  n          Values 
  1              11 
  2              14 
  3              16 
  4               7 
  5              14 

There are a number of useful static methods defined for the binomial, normal, gamma, and Student-T distributions. Specifically, for the binomial distribution, has the following static methods

  • binomialPMF(int j, int n, double p) - directly computes the probability for the value \(j\)
  • binomialCDF(int j, int n, double p) - directly computes the cumulative distribution function for the value \(j\)
  • binomialCCDF(int j, int n, double p)- directly computes the complementary cumulative distribution function for the value of \(j\)
  • binomialInvCDF(double x, int n, double p) - directly computes the inverse cumulative distribution function

These methods are designed to perform their calculations in a numerically stable manner to ensure numerical accuracy. The normal distribution has the following static methods for computations involving the standard normal distribution.

  • stdNormalCDF(double z) - the cumulative probability for a \(Z ~ N(0,1)\) random variable, i.e. \(F(z) = P(Z \leq z)\)
  • stdNormalComplementaryCDF(double z) - returns \(1-P(Z \leq z)\)
  • stdNormalInvCDF(double p) - returns \(z = F^{-1}(p)\) the inverse of the cumulative distribution function

The Student-T distribution also has two static convenience methods to facilitate computations.

  • getCDF(double dof, double x) - computes the cumulative distribution function for \(x\) given the degrees of freedom
  • getInvCDF(double dof, double p) - computes the inverse cumulative distribution function or t-value for the supplied probability given the degrees of freedom.

Within the gamma distribution there are some convenience methods for computing the gamma function, the natural logarithm of the gamma function, the incomplete gamma function, and the digamma function (derivative of the natural logarithm of the gamma function).