Histogram
A Histogram tabulates data into bins. The user must specify the break points of the bins, b0, b1, b2, ..., bk, where there are k+1 break points, and k bins. b0 may be Double.NEGATIVE_INFINITY and bk may be Double.POSITIVE_INFINITY.
If only one break point is supplied, then the bins are automatically defined as: (Double.NEGATIVE_INFINITY, b0] and (b0, Double.POSITIVE_INFINITY).
If two break points are provided, then there is one bin: [b0, b1), any values less than b0 will be counted as underflow and any values [b1, +infinity) will be counted as overflow.
If k+1 break points are provided then the bins are defined as: [b0,b1), [b1,b2), [b2,b3), ..., [bk-1,bk) and any values in (-infinity, b0) will be counted as underflow and any values [bk, +infinity) will be counted as overflow. If b0 equals Double.NEGATIVE_INFINITY then there can be no underflow. Similarly, if bk equals Double.POSITIVE_INFINITY there can be no overflow.
The break points do not have to define equally sized bins. Static methods within companion object are provided to create equal width bins and to create histograms with common characteristics.
If any presented value is Double.NaN, then the value is counted as missing and the observation is not tallied towards the total number of observations. Underflow and overflow counts also do not count towards the total number of observations.
Statistics are also automatically collected on the collected observations. The statistics do not include missing, underflow, and overflow observations. Statistics are only computed on those observations that were placed (counted) within some bin.
Parameters
the break points for the histogram, must be strictly increasing
an optional name for the histogram
Properties
Fills up an array with the statistics defined by this interface statistics0 = getCount() statistics1 = getAverage() statistics2 = getStandardDeviation() statistics3 = getStandardError() statistics4 = getHalfWidth() statistics5 = getConfidenceLevel() statistics6 = getMin() statistics7 = getMax() statistics8 = getSum() statistics9 = getVariance() statistics10 = getDeviationSumOfSquares() statistics11 = getLastValue() statistics12 = getKurtosis() statistics13 = getSkewness() statistics14 = getLag1Covariance() statistics15 = getLag1Correlation() statistics16 = getVonNeumannLag1TestStatistic() statistics17 = getNumberMissing()
Returns an array of Bins based on the current state of the histogram
Returns a List of Bins based on the current state of the histogram
A confidence interval for the mean based on the confidence level
Holds the confidence coefficient for the statistic
The header string for the CVS representation
A simple estimate of the "density" function for each bin using bin fraction/bin width values for each bin The bin width must be constant across the bins and not equal to 0.0
Gets the sum of squares of the deviations from the average This is the numerator in the classic sample variance formula
Lower limit of first histogram bin.
Gets the lag-1 generate correlation of the unweighted observations. Note: See Box, Jenkins, Reinsel, Time Series Analysis, 3rd edition, Prentice-Hall, pg 31
Gets the lag-1 generate covariance of the unweighted observations. Note: See Box, Jenkins, Reinsel, Time Series Analysis, 3rd edition, Prentice-Hall, pg 31
Upper limit of last histogram bin.
Counts the number of observations that were negative, strictly less than zero.
Used to count the number of missing data points presented When a data point having the value of (Double.NaN, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY) are presented it is excluded from the summary statistics and the number of missing points is noted. Implementers of subclasses are responsible for properly collecting this value and resetting this value.
Counts of values located above last bin.
Counts the number of observations that were positive, strictly greater than zero.
Returns the relative error: getStandardError() / getAverage()
Returns the relative width of the default confidence interval: 2.0 * getHalfWidth() / getAverage()
Gets the sample standard deviation of the observations. Simply the square root of variance
Gets the standard error of the observations. Simply the generate standard deviation divided by the square root of the number of observations
Fills the map with the values of the statistics. Key is statistic label and value is the value of the statistic. The keys are: "Count" "Average" "Standard Deviation" "Standard Error" "Half-width" "Confidence Level" "Lower Limit" "Upper Limit" "Minimum" "Maximum" "Sum" "Variance" "Deviation Sum of Squares" "Kurtosis" "Skewness" "Lag 1 Covariance" "Lag 1 Correlation" "Von Neumann Lag 1 Test Statistic" "Number of missing observations"
Total number of observations collected including overflow and underflow
Counts of values located below first bin.
Gets the Von Neumann Lag 1 test statistic for checking the hypothesis that the data are uncorrelated Note: See Handbook of Simulation, Jerry Banks editor, McGraw-Hill, pg 253.
Returns the asymptotic p-value for the Von Nueumann Lag-1 Test Statistic:
Functions
Allows the adding (attaching) of an observer to the observable
The bin that x falls in. The bin is a copy. It will not reflect observations collected after this call.
Returns an instance of a Bin for the supplied bin number The bin does not reflect changes to the histogram after this call. May throw IndexOutOfBoundsException
Returns the fraction of the data relative to those tabulated in the bins for the bin number associated with the x
Returns the fraction of the data relative to those tabulated in the bins for the supplied bin number
Returns the probability for each bin of the histogram based on a continuous interval interpretation of the bin . The distribution, cdf must implement the ContinuousDistributionIfc interface
Returns the probability for each bin of the histogram based on an open integer range interpretation of the bin . The discrete distribution, discreteCDF must implement the ProbInRangeIfc interface
Collects on the boolean value true = 1.0, false = 0.0
Collects on the values in the supplied array.
Collect on the double value return by the function
Collects on the Int value
Collects on the Long value
Collects on all the values in the supplied collection.
Collects on the values returned by the supplied GetValueIfc
Collect on the supplied value. Double.NaN, Double.NEGATIVE_INFINITY, and Double.POSITIVE_INFINITY values are counted as missing. Null values are not permitted.
Returns a negative integer, zero, or a positive integer if this object is less than, equal to, or greater than the specified object.
A confidence interval for the mean based on the confidence level
Return a copy of the information as an instance of a statistic
Returns how many observers are currently attached to the observable
Returns the cumulative count of all bins up to and including the bin containing the value x
Returns the cumulative count of all the bins up to and including the indicated bin number
Returns the cumulative fraction of the data up to and including the bin containing the value of x
Returns the cumulative fraction of the data up to and including the indicated bin number
Returns the cumulative count of all the data (including underflow and overflow) for all bins up to and including the bin containing x
Returns the cumulative count of all the data (including underflow and overflow) up to and including the indicated bin
Returns the cumulative fraction of all the data up to an including the bin containing the value x, (includes over and under flow)
Returns the cumulative fraction of all the data up to and including the supplied bin (includes over and under flow)
Detaches all the observers from the observable
Allows the deletion (removing) of an observer from the observable
Returns the expected count for each bin of the histogram based on a continuous interval interpretation of the bin . The distribution, cdf must implement the ContinuousDistributionIfc interface
Returns the expected count for each bin of the histogram based on a continuous interval interpretation of the bin . The discrete distribution, discreteCDF must implement the ProbInRangeIfc interface
The data of the histogram bins
Creates a plot for the histogram. The parameter, proportions indicates whether proportions (true) or frequencies (false) will be shown on the plot. The default is true.
Returns true if the observer is already attached
Computes the right most meaningful digit according to (int)Math.floor(Math.log10(a*getStandardError())) See doi 10.1287.opre.1080.0529 by Song and Schmeiser
Returns the relative width of the level of the confidence interval: 2.0 * getHalfWidth(level) / getAverage()
Returns a data class holding the statistical data with the confidence interval specified by the given level.
Returns a data class holding the statistical data with the confidence interval specified by the given level. The class is suitable for inserting into a database table.
Converts the histogram bin data into a dataframe representation
Converts a statistic to a data frame with two columns. The first column holds the names of the statistics and the second column holds the values. The valueLabel can be used to provide a column name for the value columns. By default, it is "Value".