KSLCore/ksl.utilities.statistic

Package-level declarations

Types

AbstractStatistic

abstract class AbstractStatistic(name: String? = null) : Collector, StatisticIfc, Comparable<AbstractStatistic>

Serves as an abstract base class for statistical collection.

AntitheticStatistic

class AntitheticStatistic(theName: String = "AntitheticStatistic_") : AbstractStatistic

In progress...

BasicStatistics

class BasicStatistics(aName: String? = null) : MVBSEstimatorIfc, IdentityIfc

BatchStatistic

class BatchStatistic(theMinNumBatches: Int = MIN_NUM_BATCHES, theMinBatchSize: Int = MIN_NUM_OBS_PER_BATCH, theMinNumBatchesMultiple: Int = MAX_BATCH_MULTIPLE, theName: String? = null, values: DoubleArray? = null) : AbstractStatistic, BatchStatisticIfc

This class automates the batching of observations that may be dependent. It computes the batch means of the batches and reports statistics across the batches. Suppose we have observations, Y(1), Y(2), Y(3), ... Y(n). This class specifies the minimum number of batches, the minimum number of observations per batch, and a maximum batch multiple. The defaults are 20, 16, and 2, respectively. This implies that the maximum number of batches will be 40 = (min number of batches times the maximum batch multiple). The class computes the average of each batch, which are called the batch means.

BatchStatisticIfc

interface BatchStatisticIfc : StatisticIfc

Bootstrap

open class Bootstrap(originalData: DoubleArray, val estimator: BSEstimatorIfc = BSEstimatorIfc.Average(), streamNumber: Int = 0, val streamProvider: RNStreamProviderIfc = KSLRandom.DefaultRNStreamProvider, name: String? = null) : IdentityIfc, RNStreamControlIfc, BootstrapEstimateIfc, StreamNumberIfc

A class to do statistical bootstrapping. The calculations occur via the method generateSamples(). Until generateSamples() is called the results are meaningless.

BootstrapEstimate

open class BootstrapEstimate(val name: String, val originalDataSampleSize: Int, val originalDataEstimate: Double, val bootstrapEstimates: DoubleArray) : BootstrapEstimateIfc

BootstrapEstimateIfc

interface BootstrapEstimateIfc

BootstrapSampler

open class BootstrapSampler(originalData: DoubleArray, val estimator: MVBSEstimatorIfc, streamNumber: Int = 0, val streamProvider: RNStreamProviderIfc = KSLRandom.DefaultRNStreamProvider) : RNStreamControlIfc, StreamNumberIfc

This class facilitates bootstrap sampling. The originalData is sampled from, with replacement, repeatedly to form bootstrap samples from which bootstrap statistics are computed. The estimator provides the mechanism for estimating statistical quantities from the original data. From the data, it can produce 1 or more estimated quantities. Bootstrap estimates are computed on the observed estimates from each bootstrap sample. The specified stream controls the bootstrap sampling process.

BoxPlotDataIfc

interface BoxPlotDataIfc

BoxPlotSummary

class BoxPlotSummary(data: DoubleArray, name: String? = null) : IdentityIfc, BoxPlotDataIfc

Prepares the statistical quantities typically found on a box plot. This implementation uses a full sort of the data. The original data is not changed. Users may want to look for more efficient methods for use with very large data sets.

BSEstimatorIfc

fun interface BSEstimatorIfc

CachedHistogram

class CachedHistogram @JvmOverloads constructor(val cacheSize: Int = 512, name: String? = null) : AbstractStatistic, HistogramIfc

Creates a dynamically configured histogram based on an observed cache. If the amount of data observed is less than cache size and greater than or equal to 2, the returned histogram will be configured on whatever data was available in the cache. Thus, bin settings may change as more data is collected until the cache is full. Once the cache is full the returned histogram is permanently configured based on all data in the cache. The default cache size cacheSize is 512 observations.

CaseBootEstimatorIfc

interface CaseBootEstimatorIfc

Given some data, produce multiple estimated statistics from the data and stores the estimated quantities in the returned array. It is up to the user to interpret the array values appropriately.

CaseBootstrapSampler

open class CaseBootstrapSampler(val estimator: CaseBootEstimatorIfc, stream: RNStreamIfc = KSLRandom.nextRNStream()) : RNStreamControlIfc, RNStreamChangeIfc

This class facilitates bootstrap sampling. The estimator provides the mechanism for estimating statistical quantities from the original data. From the data, it can produce 1 or more estimated quantities. Bootstrap estimates are computed on the observed estimates from each bootstrap sample. The specified stream controls the bootstrap sampling process.

Classification

data class Classification(val actual: Double, val predicted: Double)

Collector

abstract class Collector(name: String? = null) : Observable<Double> , CollectorIfc, IdentityIfc, DoubleEmitterIfc

CollectorIfc

interface CollectorIfc : LastValueIfc, ValueIfc

This interface represents a general set of methods for data collection. The collect() method takes in the supplied data and collects it in some manner as specified by the collector.

DefaultTimeGetter

class DefaultTimeGetter : GetTimeIfc

The default time value is 1.0

EmpDistType

enum EmpDistType : Enum<EmpDistType>

ErrorMatrix

class ErrorMatrix(data: Collection<ErrorResult>? = null, name: String? = null) : IdentityIfc

Computes the confusion matrix

ErrorMatrixData

data class ErrorMatrixData(var id: Int = 1, var name: String = "", var numTP: Int = 0, var numFP: Int = 0, var numTN: Int = 0, var numFN: Int = 0, var numP: Int = 0, var numN: Int = 0, var total: Int = 0, var numPP: Int = 0, var numPN: Int = 0, var prevalence: Double? = null, var accuracy: Double? = null, var truePositiveRate: Double? = null, var falseNegativeRate: Double? = null, var falsePositiveRate: Double? = null, var trueNegativeRate: Double? = null, var falseOmissionRate: Double? = null, var positivePredictiveValue: Double? = null, var falseDiscoveryRate: Double? = null, var negativePredictiveValue: Double? = null, var positiveLikelihoodRatio: Double? = null, var negativeLikelihoodRatio: Double? = null, var markedness: Double? = null, var diagnosticOddsRatio: Double? = null, var balancedAccuracy: Double? = null, var f1Score: Double? = null, var fowlkesMallowsIndex: Double? = null, var mathhewsCorrelationCoefficient: Double? = null, var threatScore: Double? = null, var informedness: Double? = null, var prevalenceThreshold: Double? = null)

ErrorMatrixRecord

data class ErrorMatrixRecord(var id: Int = 1, var name: String = "", var numTP: Int = 0, var numFP: Int = 0, var numTN: Int = 0, var numFN: Int = 0, var numP: Int = 0, var numN: Int = 0, var total: Int = 0, var numPP: Int = 0, var numPN: Int = 0, var prevalence: Double? = null, var accuracy: Double? = null, var truePositiveRate: Double? = null, var falseNegativeRate: Double? = null, var falsePositiveRate: Double? = null, var trueNegativeRate: Double? = null, var falseOmissionRate: Double? = null, var positivePredictiveValue: Double? = null, var falseDiscoveryRate: Double? = null, var negativePredictiveValue: Double? = null, var positiveLikelihoodRatio: Double? = null, var negativeLikelihoodRatio: Double? = null, var markedness: Double? = null, var diagnosticOddsRatio: Double? = null, var balancedAccuracy: Double? = null, var f1Score: Double? = null, var fowlkesMallowsIndex: Double? = null, var mathhewsCorrelationCoefficient: Double? = null, var threatScore: Double? = null, var informedness: Double? = null, var prevalenceThreshold: Double? = null) : DbTableData

ErrorResult

enum ErrorResult : Enum<ErrorResult>

There are two classes class 1 (positive) and class 0 (negative) An instance (exemplar) must be in one of the classes.

EstimateIfc

interface EstimateIfc

A functional interface that produces some estimate of some quantity of interest.

ExceedanceEstimator

class ExceedanceEstimator(thresholds: DoubleArray = doubleArrayOf(1.0), name: String? = null) : Collector

Tabulates the proportion and frequency for a random variable X > a(i) where a(i) are thresholds.

FrequencyData

data class FrequencyData(var id: Int = 1, var name: String = "", var cellLabel: String = "", var value: Int = 0, var count: Double = 0.0, var cum_count: Double = 0.0, var proportion: Double = 0.0, var cumProportion: Double = 0.0)

A data class holding the summary frequency data

FrequencyRecord

data class FrequencyRecord(var id: Int = 1, var name: String = "", var cellLabel: String = "", var value: Int = 0, var count: Double = 0.0, var cum_count: Double = 0.0, var proportion: Double = 0.0, var cumProportion: Double = 0.0) : DbTableData

A data table class suitable for insertion into a database

GetCSVStatisticIfc

interface GetCSVStatisticIfc

Each statistic value separated by a comma with a corresponding header

HalfWidthSequentialSampler

class HalfWidthSequentialSampler(aName: String? = null) : Observable<Double> , IdentityIfc, DoubleEmitterIfc

Continually gets the value of the supplied GetValueIfc in the run() until the supplied sampling half-width requirement is met or the default maximum number of iterations is reached, whichever comes first.

Histogram

class Histogram @JvmOverloads constructor(breakPoints: DoubleArray, name: String? = null) : AbstractStatistic, HistogramIfc

A Histogram tabulates data into bins. The user must specify the break points of the bins, b0, b1, b2, ..., bk, where there are k+1 break points, and k bins. b0 may be Double.NEGATIVE_INFINITY and bk may be Double.POSITIVE_INFINITY.

HistogramBin

class HistogramBin(theBinNumber: Int, theLowerLimit: Double, theUpperLimit: Double)

HistogramBinData

data class HistogramBinData(val id: Int, val name: String, val binNum: Int, val binLabel: String, val binLowerLimit: Double, val binUpperLimit: Double, val binCount: Double, val cumCount: Double, val proportion: Double, val cumProportion: Double)

Holds the data associated with a histogram bin

HistogramBinRecord

data class HistogramBinRecord(val id: Int, val name: String, val binNum: Int, val binLabel: String, val binLowerLimit: Double, val binUpperLimit: Double, val binCount: Double, val cumCount: Double, val proportion: Double, val cumProportion: Double) : DbTableData

Histogram Bin data suitable for a database table

HistogramIfc

interface HistogramIfc : CollectorIfc, IdentityIfc, StatisticIfc, GetCSVStatisticIfc, Comparable<AbstractStatistic>

IntegerFrequency

class IntegerFrequency(data: IntArray? = null, name: String? = null, val lowerLimit: Int = Int.MIN_VALUE, val upperLimit: Int = Int.MAX_VALUE) : IdentityIfc, IntegerFrequencyIfc

This class tabulates the frequency associated with the integers presented to it via the collect() method Every value presented is interpreted as an integer For every value presented a count is maintained. There could be space/time performance issues if the number of different values presented is large. Use lowerLimit and upperLimit to limit the values that can be observed. Values lower than the lower limit are counted as underflow and values greater than the upper limit are counted as overflow.

IntegerFrequencyIfc

interface IntegerFrequencyIfc

JackKnifeEstimator

class JackKnifeEstimator(originalData: DoubleArray, estimator: BSEstimatorIfc = BSEstimatorIfc.Average())

MatrixBootEstimator

class MatrixBootEstimator(matrix: Array<DoubleArray>, matrixEstimator: MatrixEstimatorIfc, estimatorNames: List<String> = emptyList()) : CaseBootEstimatorIfc

MatrixEstimatorIfc

fun interface MatrixEstimatorIfc

MCBIntervalData

data class MCBIntervalData(var id: Int = mcbIntervalDataCounter++, var context: String? = null, var subject: String? = null, var direction: String? = null, var indifferenceDelta: Double? = null, var alternative: String? = null, var lowerLimit: Double? = null, var upperLimit: Double? = null, var possibleBest: Boolean? = null, val probCorrectSelection: Double? = null, var keep: Boolean? = null) : DbTableData

MCBResultData

data class MCBResultData(var id: Int = mcbResultDataCounter++, var context: String? = null, var subject: String? = null, var maxDifferenceVariance: Double? = null, var minPerformerName: String? = null, var minPerformance: Double? = null, var maxPerformerName: String? = null, var maxPerformance: Double? = null, var minDifferenceName: String? = null, var minDifference: Double? = null, var maxDifferenceName: String? = null, var maxDifference: Double? = null) : DbTableData

MCBScreeningIntervalData

data class MCBScreeningIntervalData(var id: Int = mcbScreeningIntervalCounter++, var context: String? = null, var subject: String? = null, var direction: String? = null, var alternative: String? = null, var average: Double? = null, var screeningCase: String? = null, var confidenceLevel: Double? = null, var lowerLimit: Double? = null, var upperLimit: Double? = null, var contained: Boolean? = null) : DbTableData

MeanEstimateIfc

interface MeanEstimateIfc : EstimateIfc

A minimal interface to define an estimator that will produce an estimate of a population mean. We assume that the estimator has statistics available that represent the count, average, and variance of a sample. By default, the sample average is used as the estimate of the population mean; however, implementors may override this behavior by overriding the estimate() method.

MultiBootstrap

class MultiBootstrap(val estimator: BSEstimatorIfc = BSEstimatorIfc.Average(), dataMap: Map<String, DoubleArray>, name: String? = null) : RNStreamControlIfc

A collection of Bootstrap instances to permit multidimensional bootstrapping. Construction depends on a named mapping of double[] arrays that represent the original samples. A static create method also allows creation based on a mapping to implementations of the SampleIfc.

MultipleComparisonAnalyzer

class MultipleComparisonAnalyzer(dataMap: Map<String, DoubleArray>, responseName: String? = null) : IdentityIfc

Holds data to perform multiple comparisons Performs pairwise comparisons and computes pairwise differences and variances.

MVBSEstimatorIfc

interface MVBSEstimatorIfc : IdentityIfc

Given some data, produce multiple estimated statistics from the data and store the estimated quantities in the returned array. It is up to the user to interpret the array values appropriately.

MVStatistic

class MVStatistic(val names: List<String>)

Collects statistics for each dimension of the presented array.

OLSBootEstimator

object OLSBootEstimator : MatrixEstimatorIfc

OLSRegression

class OLSRegression(regressionData: RegressionData) : RegressionResultsIfc

Performs Ordinary Least Squares fit of the data with the response. The default is to assume that an intercept term will be estimated.

RegressionData

data class RegressionData(val response: DoubleArray, val data: Array<DoubleArray>, val hasIntercept: Boolean = true, var responseName: String = "Y", val predictorNames: List<String> = makePredictorNames(data))

The response is an n by 1 array of the data, where n is the number of observations for a response variable. The data is an n by k matrix of the data for the regression, where k is the number of regression coefficients and n is the number of observations. This data should not include a column of 1's for estimating an intercept term. The rows of the array represent the predictor values associated with each observation. The array must be rectangular. That is, each row has the same number of columns.

RegressionResultsIfc

interface RegressionResultsIfc

A useful resource for regression can be found at (https://online.stat.psu.edu/stat501/lesson/5/5.3)

Rinott

class Rinott

Functions used to calculate Rinott constants

State

open class State(theStateNumber: Int = stateCounter + 1, name: String = "State:", useStatistic: Boolean = false) : IdentityIfc, StateAccessorIfc

Create a state with given name and indicate usage of a Statistic object to collect additional statistics

StateAccessorIfc

interface StateAccessorIfc : IdentityIfc

StateFrequency

class StateFrequency(numStates: Int, name: String? = null) : IdentityIfc

Statistic

class Statistic @JvmOverloads constructor(name: String? = "Statistic_", values: DoubleArray? = null) : AbstractStatistic

The Statistic class allows the collection of summary statistics on data via the collect() methods. The primary statistical summary is for the statistical moments. Creates a Statistic with the given name

StatisticalRun

class StatisticalRun<T>(theStartingIndex: Int, theEndingIndex: Int, theStartingObj: T, theEndingObj: T)

A statistical run is a sequence of objects that are determined equal based on a comparator. A single item is a run of length 1. A set of items that are all the same are considered a single run. The set (0, 1, 1, 1, 0) has 3 runs.

StatisticData

data class StatisticData(val name: String, val count: Double, val average: Double, val standardDeviation: Double, val standardError: Double, val halfWidth: Double, val confidenceLevel: Double, val lowerLimit: Double, val upperLimit: Double, val min: Double, val max: Double, val sum: Double, val variance: Double, val deviationSumOfSquares: Double, val kurtosis: Double, val skewness: Double, val lag1Covariance: Double, val lag1Correlation: Double, val vonNeumannLag1TestStatistic: Double, val numberMissing: Double) : Comparable<StatisticData>

StatisticDataDb

data class StatisticDataDb(var id: Int = statDataCounter++, var context: String? = null, var subject: String? = null, var stat_name: String = "", var stat_count: Double? = null, var average: Double? = null, var std_dev: Double? = null, var std_err: Double? = null, var half_width: Double? = null, var conf_level: Double? = null, var minimum: Double? = null, var maximum: Double? = null, var sum_of_obs: Double? = null, var dev_ssq: Double? = null, var last_value: Double? = null, var kurtosis: Double? = null, var skewness: Double? = null, var lag1_cov: Double? = null, var lag1_corr: Double? = null, var von_neumann_lag1_stat: Double? = null, var num_missing_obs: Double? = null) : DbTableData

StatisticIfc

interface StatisticIfc : SummaryStatisticsIfc, GetCSVStatisticIfc, LastValueIfc, ValueIfc

The StatisticIfc interface presents a read-only view of a Statistic

StatisticXY

class StatisticXY(name: String? = "Statistic_") : IdentityIfc

StringFrequency

class StringFrequency(data: Collection<String>? = null, name: String? = null, val limitSet: Set<String>? = null) : IdentityIfc

This class tabulates the frequency associated with the strings presented to it via the collect() method. For every unique string presented a count is maintained. There could be space/time performance issues if the number of different strings presented is large. Use the limit set to limit the values that can be observed. If the presented strings are not in the limiting set, then they are counted as "Other".

StringFrequencyData

data class StringFrequencyData(var id: Int = 1, var name: String = "", var string: String = "", var count: Double = 0.0, var cum_count: Double = 0.0, var proportion: Double = 0.0, var cum_proportion: Double = 0.0)

A data class holding the summary frequency data

StringFrequencyRecord

data class StringFrequencyRecord(var id: Int = 1, var name: String = "", var string: String = "", var count: Double = 0.0, var cum_count: Double = 0.0, var proportion: Double = 0.0, var cum_proportion: Double = 0.0) : DbTableData

A data table class suitable for insertion into a database

SummaryStatisticsIfc

interface SummaryStatisticsIfc : MeanEstimateIfc

TimeArray

class TimeArray(var timeValues: DoubleArray) : GetTimeIfc

A helper class that turns an array of time values to a supplier of times

TimeWeightedStatistic

class TimeWeightedStatistic(var timeGetter: GetTimeIfc = DefaultTimeGetter(), initialValue: Double = 0.0, initialTime: Double = 0.0) : Collector, WeightedStatisticIfc

Collects time weighted statistics that are presented to the collect() method. The property, timeGetter, must provide values for each observed value that appears in the collect method.

U01Array

class U01Array(doubleArray: DoubleArray) : RandU01Ifc

Makes an array look like a RandU01Ifc

U01Test

object U01Test

WeightedCollector

abstract class WeightedCollector(name: String? = null) : Observable<Pair<Double, Double>> , WeightedCollectorIfc, IdentityIfc, DoublePairEmitterIfc

WeightedCollectorIfc

interface WeightedCollectorIfc : CollectorIfc

WeightedStatistic

class WeightedStatistic(name: String? = null) : WeightedCollector, WeightedStatisticIfc, Comparable<WeightedStatistic>

Collects a basic weighted statistical summary. If the observation or the weight is infinite or NaN, then the observation is not recorded and the number of missing observations is incremented. If the observed weight is negative or 0.0, then the observation is not recorded and the number of missing observations is incremented.

WeightedStatisticIfc

interface WeightedStatisticIfc : IdentityIfc, GetCSVStatisticIfc

If the observation or the weight is

Properties

DEFAULT_CONFIDENCE_LEVEL

const val DEFAULT_CONFIDENCE_LEVEL: Double = 0.95

Functions

asMCBIntervalDataFrame

fun List<MCBIntervalData>.asMCBIntervalDataFrame(): DataFrame<MCBIntervalData>

Converts the MCB interval data to a data frame

asMCBResultDataFrame

fun List<MCBResultData>.asMCBResultDataFrame(): DataFrame<MCBResultData>

Converts the MCB result data to a data frame

asMCBScreeningIntervalDataFrame

fun List<MCBScreeningIntervalData>.asMCBScreeningIntervalDataFrame(): DataFrame<MCBScreeningIntervalData>

Converts the MCB interval data to a data frame

asStatisticDataFrame

fun List<StatisticDataDb>.asStatisticDataFrame(): DataFrame<StatisticDataDb>

Converts the statistic data to a data frame

averages

fun List<StatisticIfc>.averages(): DoubleArray

confidenceIntervals

fun List<StatisticIfc>.confidenceIntervals(level: Double = 0.95): List<Interval>

counts

fun List<StatisticIfc>.counts(): DoubleArray

isEnum

fun Any?.isEnum(): Boolean

isMissing

fun Double.isMissing(): Boolean

Statistical collection defines a datum as missing if the value is Double.NaN or if it is infinite (i.e. Double.NEGATIVE_INFINITY, Double.POSITIVE_INFINITY).

fun main()

fun test1()

fun test2()

fun test3()

fun List<StatisticIfc>.variances(): DoubleArray