3.4 Modeling Probability Distributions
The jsl.utilities.random.rvariable
package is the key package for generating random variables; however, it does not facilitate performing calculations involving the underlying probability distributions. To perform calculations involving probability distributions, you should use the jsl.utilities.distribution
package. This package has almost all the same distributions represented within the jsl.utilities.random.rvariable
package.
Figure3.2 illustrates the interfaces used to define probability distributions. First, the interface, CDFIfc
serves as the basis for discrete distributions via the DiscreteDistributionIfc
interface, for continuous distributions via the ContinuousDistributionIfc
interface and the general DistributionIfc
interface. The discrete distributions such as the geometric, binomial, etc. implement the DiscreteDistributionIfc
and PMFIfc
interfaces. Similarly, continuous distributions like the normal, uniform, etc. implement the ContinuousDistributionIfc
and PDFIfc
interfaces. All concrete implementations of distributions extend from the abstract base class Distribution
, which implements the DistributionIfc
interface. Thus, all distributions have the following capabilities:
cdf(double b)
- computes the cumulative probability, \(F(b) = P(X \leq b)\)cdf(double a, double b)
- computes the cumulative probability, \(P( a \leq X \leq b)\)complementaryCDF(double b)
- computes the cumulative probability, \(1-F(b) = P(X > b)\)getMean()
- returns the expected value (mean) of the distributiongetVariance()
- returns the variance of the distributiongetStandardDeviation()
- returns the standard deviation of the distributioninvCDF(double p)
- returns the inverse of the cumulative distribution function \(F^{-1}(p)\). This is performed by numerical search if necessary
Discrete distributions have a method called pmf(double k)
that returns the probability associated with the value \(k\). Continuous distributions have a probability density function, \(f(x)\), implemented in the method, pdf(double x)
. Finally, all distributions know how to create random variables through the GetRVariableIfc
interface that provides the following methods.
RVariableIfc getRandomVariable(RNStreamIfc stream)
- returns a new instance of a random variable based on the current values of the distribution’s parameters that uses the supplied streamRVariableIfc getRandomVariable(int streamNum)
- returns a new instance of a random variable based on the current values of the distribution’s parameters that uses the supplied stream numberRVariableIfc getRandomVariable()
- returns a new instance of a random variable based on the current values of the distribution’s parameters that uses a newly created stream
As an example, the following code illustrates some calculations for the binomial distribution.
// make and use a Binomial(p, n) distribution
int n = 10;
double p = 0.8;
System.out.println("n = " + n);
System.out.println("p = " + p);
= new Binomial(p, n);
Binomial bnDF System.out.println("mean = " + bnDF.getMean());
System.out.println("variance = " + bnDF.getVariance());
// compute some values
System.out.printf("%3s %15s %15s %n", "k", "p(k)", "cdf(k)");
for (int i = 0; i <= 10; i++) {
System.out.printf("%3d %15.10f %15.10f %n", i, bnDF.pmf(i), bnDF.cdf(i));
}
The output shows the mean, variance, and basic probability calculations.
n = 10
p = 0.8
mean = 8.0
variance = 1.5999999999999996
k p(k) cdf(k)
0 0.0000001024 0.0000001024
1 0.0000040960 0.0000041984
2 0.0000737280 0.0000779264
3 0.0007864320 0.0008643584
4 0.0055050240 0.0063693824
5 0.0264241152 0.0327934976
6 0.0880803840 0.1208738816
7 0.2013265920 0.3222004736
8 0.3019898880 0.6241903616
9 0.2684354560 0.8926258176
10 0.1073741824 1.0000000000
The jsl.utilities.random.rvariable
package creates instances of random variables that are immutable. That is, once you create a random variable, its parameters cannot be changed. However, distributions permit their parameters to be changed and they also facilitate the creation of random variables. The following code uses the setParameters()
method to change the parameters of the previously created binomial distribution and then creates a random variable based on the mutated distribution.
// change the probability and number of trials
.setParameters(0.5, 20);
bnDFSystem.out.println("mean = " + bnDF.getMean());
System.out.println("variance = " + bnDF.getVariance());
// make random variables based on the distributions
= bnDF.getRandomVariable();
RVariableIfc brv System.out.printf("%3s %15s %n", "n", "Values");
// generate some values
for (int i = 0; i < 5; i++) {
// getValue() method returns generated values
int x = (int)brv.getValue();
System.out.printf("%3d %15d %n", i+1, x);
}
The results are as we would expect. Similar calculations can be made for continuous distributions. In most cases, the concrete implementations of the various distributions have specialize methods beyond those generic methods described here. Please refer to the java docs for further details.
mean = 10.0
variance = 5.0
n Values
1 11
2 14
3 16
4 7
5 14
There are a number of useful static methods defined for the binomial, normal, gamma, and Student-T distributions. Specifically, for the binomial distribution, has the following static methods
binomialPMF(int j, int n, double p)
- directly computes the probability for the value \(j\)binomialCDF(int j, int n, double p)
- directly computes the cumulative distribution function for the value \(j\)binomialCCDF(int j, int n, double p)
- directly computes the complementary cumulative distribution function for the value of \(j\)binomialInvCDF(double x, int n, double p)
- directly computes the inverse cumulative distribution function
These methods are designed to perform their calculations in a numerically stable manner to ensure numerical accuracy. The normal distribution has the following static methods for computations involving the standard normal distribution.
stdNormalCDF(double z)
- the cumulative probability for a \(Z ~ N(0,1)\) random variable, i.e. \(F(z) = P(Z \leq z)\)stdNormalComplementaryCDF(double z)
- returns \(1-P(Z \leq z)\)stdNormalInvCDF(double p)
- returns \(z = F^{-1}(p)\) the inverse of the cumulative distribution function
The Student-T distribution also has two static convenience methods to facilitate computations.
getCDF(double dof, double x)
- computes the cumulative distribution function for \(x\) given the degrees of freedomgetInvCDF(double dof, double p)
- computes the inverse cumulative distribution function or t-value for the supplied probability given the degrees of freedom.
Within the gamma distribution there are some convenience methods for computing the gamma function, the natural logarithm of the gamma function, the incomplete gamma function, and the digamma function (derivative of the natural logarithm of the gamma function).