Jive Forums API (5.5.20.2-oracle) Developer Javadocs

com.jivesoftware.base.stats
Class Histogram

java.lang.Object
  extended by com.jivesoftware.base.stats.Histogram
All Implemented Interfaces:
java.lang.Cloneable

public class Histogram
extends java.lang.Object
implements java.lang.Cloneable

A Histogram is used to obtain a set of counts for a set of elements. Elements are fed into the Histogram using the add() method. The add() method uses the element to maintain numerical statistics such as the element mean and variance (when the element can be meaningfully converted to a number) and to keep track of the minimum and maximum element added. Each Histogram has a BinSequence object that is used to put elements into Bins. The default BinSequence assigns each unique element to its own Bin. The add() method calls BinSequence.getBin() for each element added, obtains the Bin to which the element is to be assigned, and updates the count for that Bin.

The Histogram provides a variety summary statistics, including means, standard deviations, medians, and modes, though some statistics are available only for elements that can be converted into a numerical value through the Element.toLong() method (non-numerical elements, such as StringElements, have toLong() methods that always return null).

The Histogram makes the bin counts available in several different forms. The raw counts are available through the toArray() function, which returns the counts in an array of longs. The corresponding Bins can be obtained via the getBins() method. In addition, the Bins can be regrouped. For example, a set of bins covering the intervals [0,2), [2,4), [4,6), [6,8) can be recombined into the intervals [0,4), [4,8) by passing in a new set of bins.

Elements can be grouped into bins at two points during the data gathering process. First, they can be grouped as they are added to the histogram via a BinSequence object passed in at construction. Alternatively, they can be grouped after all elements have been added via the toArray(Bin[]) method. The decision on which method to use depends on performance vs. accuracy tradeoffs. Grouping at addition time ensures that the number of bins for which counts are maintained will be small. This reduces the memory required to store the counts, and it also reduces the amount of time required to look up bin counts as new elements are added. The tradeoff is that the accuracy of certain functions is reduced. For example, the median function returns the Bin containing the median element rather than returning the median element itself. Similarly, the functions that obtain the top counts give access to the Bins with the top counts, not the elements. If coarse bins are used rather than bins containing single elements, you will not have access to the median or top 10 or bottom 10 elements.

One way to manage this tradeoff is to use filters to reduce the precision of the elements added to the histogram to the minimum needed for the task at hand. For example, if you are interested in the length of time taken to respond to a posting, you might want to reduce the precision of the elements coming in from milliseconds to seconds. Or you might want to limit the precision of the time to one or two significant figures. That way, the most important information is kept in the Histogram, but the reduction in precision limits the proliferation of Bins created.

See Also:
Element, Bin, BinSequence

Constructor Summary
Histogram()
          Create a histogram that uses the specified binSequence.getBin() method to put incoming elements into bins as they are added.
Histogram(BinSequence binSequence)
          Create a histogram that uses the specified binSequence.getBin() method to put incoming elements into bins as they are added.
Histogram(BinSequence binSequence, Element minElement, Element maxElement)
          Create a histogram that uses the specified binSequence.getBin() method to put incoming elements into bins as they are added.
 
Method Summary
 void add(Element element)
          Add a new element to the histogram in the specified bin.
 void add(Element element, long increment)
          Add a new element to the histogram.
 java.lang.Object clone()
           
 Bin[] getBins()
           
 BinSequence getBinSequence()
          Get the histogram's BinSequence
 long getCount(Bin bin)
          Get the count of elements contained in the specified Bin.
 Histogram getCountHistogram(BinSequence binSequence, LongElement minElement, LongElement maxElement)
          Get a Histogram of the counts for the current histogram.
 Histogram getHistogram(BinSequence binSequence, Element minElement, Element maxElement)
          Get a new Histogram with the given set of bins.
 Bin[] getLargestNBins(int n)
           
 long getMaxCount()
          Get the maximum count
 Bin getMaxCountBin()
          Get the bin with the largest count.
 Element getMaxElement()
          Get the maximum element added to the histogram.
 double getMeanCount()
          The average bin count (includes non-zero counts only)
 double getMeanElement()
          Get the average of the elements that have been added to the histogram.
 Bin getMedianBin()
          Get the bin containing the median element.
 long getMedianCount()
          Get the median count.
 Element getMinElement()
          Get the minimum element added to the histogram.
 long getNBin()
          The number of non-empty bins in the histogram.
 long getNElement()
          The number of elements that have been added to the histogram.
 Bin[] getSmallestNBins(int n)
           
 double getStdDevCount()
          The standard deviation of the bin counts (includes non-zero counts only)
 double getStdDevElement()
          Get the standard deviation of the elements that have been added to the histogram.
 double getSumCount()
          The sum of the increments added to bins via the add() method.
 double getSumElement()
          Get the sum of all elements that have been added to the histogram.
 double getSumSquaredCount()
          The sum of all bin counts squared.
 double getSumSquaredElement()
          Get the sum of the squares of of all elements that have been added to the histogram.
 double getVarianceCount()
          The variance of the bin counts (includes non-zero counts only)
 double getVarianceElement()
          Get the variance of the elements that have been added to the histogram.
 void setCount(Bin bin, long count)
           
 long[] toArray()
           
 long[] toArray(Bin[] bins)
           
 java.lang.String toString()
          Convert a Histogram to a String containing diagnostic information.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Histogram

public Histogram(BinSequence binSequence,
                 Element minElement,
                 Element maxElement)
Create a histogram that uses the specified binSequence.getBin() method to put incoming elements into bins as they are added. If no binSequence is specified, use the DefaultBinSequence, which creates a bin for each unique element.

Parameters:
binSequence - The BinSequence used to place incoming elements into bins.

Histogram

public Histogram(BinSequence binSequence)
Create a histogram that uses the specified binSequence.getBin() method to put incoming elements into bins as they are added. If no binSequence is specified, use the DefaultBinSequence, which creates a bin for each unique element.

Parameters:
binSequence - The BinSequence used to place incoming elements into bins.

Histogram

public Histogram()
Create a histogram that uses the specified binSequence.getBin() method to put incoming elements into bins as they are added. If no binSequence is specified, use the DefaultBinSequence, which creates a bin for each unique element.

Method Detail

getBinSequence

public BinSequence getBinSequence()
Get the histogram's BinSequence


add

public void add(Element element,
                long increment)
Add a new element to the histogram. Increment the count for the bin given by binSequence.getBin(element).

Parameters:
element - The element to be added.
increment - The amount by which to increment the element's bin's count.

add

public void add(Element element)
Add a new element to the histogram in the specified bin. Increment the bin's count by 1.

Parameters:
element - The element whose count should be incremented.

getNElement

public long getNElement()
The number of elements that have been added to the histogram. Note that this is the total number of elements, not the total number of distinct elements. The # of elements is incremented every time we call add(). If we always call add with an increment of 1, getNElement will be the same as getSumCount().

Returns:
The number of elements added to the histogram.

getSumElement

public double getSumElement()
Get the sum of all elements that have been added to the histogram. Each element is converted to a long via its Element.toLong() method.

Returns:
The sum of the element values.

getSumSquaredElement

public double getSumSquaredElement()
Get the sum of the squares of of all elements that have been added to the histogram. Each element is converted to a long via its Element.toLong() method.

Returns:
The sum of the squares of the element values.

getMeanElement

public double getMeanElement()
Get the average of the elements that have been added to the histogram. Each element is converted to a long via its Element.toLong() method.

Returns:
The mean of the element values.

getVarianceElement

public double getVarianceElement()
Get the variance of the elements that have been added to the histogram. Each element is converted to a long via its Element.toLong() method.

Returns:
The variance of the element values.

getStdDevElement

public double getStdDevElement()
Get the standard deviation of the elements that have been added to the histogram. Each element is converted to a long via its Element.toLong() method.

Returns:
The standard deviation of the element values.

getMaxElement

public Element getMaxElement()
Get the maximum element added to the histogram.

Returns:
The maximum element.

getMinElement

public Element getMinElement()
Get the minimum element added to the histogram.

Returns:
The minimum element.

getNBin

public long getNBin()
The number of non-empty bins in the histogram.

Returns:
The number of non-empty bins added to the histogram.

getSumCount

public double getSumCount()
The sum of the increments added to bins via the add() method. If all increments are 1, this value will be the same as getNElement().

Returns:
The total of all increments added via the add() method.

getSumSquaredCount

public double getSumSquaredCount()
The sum of all bin counts squared.

Returns:
The sum of the squares of the bin counts.

getMeanCount

public double getMeanCount()
The average bin count (includes non-zero counts only)

Returns:
The average bin count.

getVarianceCount

public double getVarianceCount()
The variance of the bin counts (includes non-zero counts only)

Returns:
The variance of the bin counts.

getStdDevCount

public double getStdDevCount()
The standard deviation of the bin counts (includes non-zero counts only)

Returns:
The standard deviation of the bin counts.

getMaxCount

public long getMaxCount()
Get the maximum count

Returns:
The largest count for the bins in binCount

getMaxCountBin

public Bin getMaxCountBin()
Get the bin with the largest count.

Returns:
The bin containing the largest count.

getMedianCount

public long getMedianCount()
Get the median count.


getMedianBin

public Bin getMedianBin()

Get the bin containing the median element. If there are an odd # of elements numbered 1...2N+1, returns the bin containing element N. If there are an even number of elements, 1...2N, returns the bin containing element N (since we do not know a priori that the elements have an associated add operation, we can't necessarily average elements N and N+1 for the even size case).

Returns:
The bin containing the median element.

getBins

public Bin[] getBins()

getCount

public long getCount(Bin bin)
Get the count of elements contained in the specified Bin. Note that the bin passed in does not have to be a bin in the histogram. Each bin in the histogram is tested to see if it is contained in the bin passed in. For example, if the histogram contains bins [0,2), [2,4), [4,6), each with count 3, getCount( [2,6) ) would return 6. Note, however, that getCount( [1,5) ) will throw an exception. Because the histogram contains bins that partially overlap with [1,5), we don't know what portion of the counts for the partially overlapping bins to assign to [1,5).

Parameters:
bin - The bin for which the count should be determined.
Returns:
The count for the specified bin.

setCount

public void setCount(Bin bin,
                     long count)

getLargestNBins

public Bin[] getLargestNBins(int n)

getSmallestNBins

public Bin[] getSmallestNBins(int n)

toArray

public long[] toArray()

toArray

public long[] toArray(Bin[] bins)

getHistogram

public Histogram getHistogram(BinSequence binSequence,
                              Element minElement,
                              Element maxElement)
Get a new Histogram with the given set of bins. All counts are regrouped into the new bins. Summary statistics are adjusted accordingly.


getCountHistogram

public Histogram getCountHistogram(BinSequence binSequence,
                                   LongElement minElement,
                                   LongElement maxElement)
Get a Histogram of the counts for the current histogram. Say we have a histogram with element A, count 5; element B, count 10; element C, count 5. If we pass in bins [1-5], [6-10], we should get out counts of 2 and 1, respectively (2 for A and C and 1 for B). The Elements passed in must be LongElements


toString

public java.lang.String toString()
Convert a Histogram to a String containing diagnostic information.

Overrides:
toString in class java.lang.Object

clone

public java.lang.Object clone()
                       throws java.lang.CloneNotSupportedException
Overrides:
clone in class java.lang.Object
Throws:
java.lang.CloneNotSupportedException

Jive Forums Project Page

Copyright © 1999-2006 Jive Software.