A summarizer is an object that ingests (double precision floating point) values and computes some statistics (summaries) of the ingested values. The following code example creates a `Tally`

and feeds it 1000 uniformly distributed pseudo-random values.

```
Tally tally = new Tally("Example tally");
Random random = new Random(1234);
for (int i = 0; i < 1000; i++)
tally.ingest(random.nextDouble());
System.out.println("minimum: " + tally.getMin());
System.out.println("maximum: " + tally.getMax());
System.out.println("count: " + tally.getN());
System.out.println("sum: " + tally.getSum());
System.out.println("sample mean: " + tally.getSampleMean());
System.out.println("sample variance: " + tally.getSampleVariance());
System.out.println("sample standard deviation: " + tally.getSampleStDev());
System.out.println("sample skewness: " + tally.getSampleSkewness());
System.out.println("sample kurtosis: " + tally.getSampleKurtosis());
System.out.println("sample excess kurtosis: " + tally.getSampleExcessKurtosis());
System.out.println("population mean: " + tally.getPopulationMean());
System.out.println("population variance: " + tally.getPopulationVariance());
System.out.println("population standard deviation: " + tally.getPopulationStDev());
System.out.println("population skewness: " + tally.getPopulationSkewness());
System.out.println("population kurtosis: " + tally.getPopulationKurtosis());
System.out.println("population excess kurtosis: " + tally.getPopulationExcessKurtosis());
```

When run, this program outputs something like

minimum: 4.463828850445051E-4 maximum: 0.9993228356687273 count: 1000 sum: 487.6875254457159 sample mean: 0.4876875254457153 sample variance: 0.0839370407429099 sample standard deviation: 0.28971889952660995 sample skewness: 0.03986803012965087 sample kurtosis: 1.743485723621789 sample excess kurtosis: -1.2550414677445052 population mean: 0.4876875254457153 population variance: 0.08385310370216699 population standard deviation: 0.28957400384386545 population skewness: 0.03980820314948127 population kurtosis: 1.7452309545763656 population excess kurtosis: -1.2547690454236344

With the same java runtime environment as we used to run this example, you should get the exact same output because the output of a pseudo-random generator is predictable and reproducible. In these results, the mean and the median are expected to be 0.5, the expected variance 0.083333 (=1/12), expected standard deviation 0.288675 (√(1/12)), expected skewness 0.0, expected kurtosis 1.8 and the expected excess kurtosis -1.2. The differences with the observed values are reasonable for the sample size (1000).

The `getPopulationXXX`

methods return a result as it should be computed when an entire population has been ingested. The `getSampleXXX`

methods should be used when ingested values form *just a sample* of the entire population. In the example above, the `Tally`

has ingested 1000 values from a populatin that has infinite size (actually the number of double precision floating point values between 0.0 and 1.0 is not unlimited, but the sample of 1000 is nowhere close to the entire population). Thus, the `getSampleXXX`

methods should be used to summarize the ingested values.

Skewness is a measure for the asymmetry of the distribution. The skewness of a symmetric distribution is `0.0`

. A negative skewness indicates that the distribution has a longer tail on the left, a positive skewness indicates a longer tail on the right.

Kurtosis is a measure for the tailedness of the distribution. The kurtosis of normally distributed population is `3.0`

. The kurtosis of uniformly distributed population is `1.8`

. Larger values of the kurtosis indicate that a distribution has long tails. A large kurtosis in observed sample values is often caused by the presence of outliers in the sample.

Excess kurtosis is defined as `kurtosis minus 3`

. This makes the excess kurtosis of normally distributed values `0.0`

and the excess kurtosis of uniformly distributed values `-1.2`

.

A box plot is a nice way to graph minimum, first quartile, median, third quartile and maximum of a distribution. As shown above, the `Tally`

collects minimum and maximum values. The quartiles and the median can only be approximated by the `Tally`

. This is done by calling the getQuantile method:

```
System.out.println("first quartile: " + tally.getQuantile(0.25));
System.out.println("median: " + tally.getQuantile(0.5));
System.out.println("third quartile: " + tally.getQuantile(0.75));
```

The output of these extra statements is something like:

first quartile: 0.29227267857053363 median: 0.4876875254457153 third quartile: 0.683102372320897

The expected values for these quantiles are `0.25, 1.5 and 0.75`

; so what is going on? The `Tally`

assumes that the values are normally distributed and then estimates these quantile values from the observed mean and standard deviation. (It is possible to improve that approximation by taking the skewness and kurtosis of the data into account.) In our case, the values that were fed to the tally were uniformly distributed. The difference is rather striking. To fix this, we should construct a `Tally`

with a suitable quantile accumulator. In the quantileaccumulator package there are a couple to choose from:

- NoStorageAccumulator: this requires no memory and it is the one used when no specific quantile accumulator is specified at
`Tally`

construction time; e.g. the example above - FullStorageAccumulator: as the name suggests, this stores all ingested samples and may require more memory than available
- TDigestAccumulator: this one is the most complex, but strikes a good balance between memory use and accuracy

To create a `Tally`

with `FullStorageAccumulator`

, replace the first line of the program by:

`Tally tally = new Tally("Example tally with full storage accumulator", new FullStorageAccumulator());`

The output for the quantiles is now something like

first quartile: 0.22964815841745587 median: 0.4753812701997516 third quartile: 0.7515290688493804

The results differ from the expected values because we have only ingested a small fraction of the entire population. These values are exact (a.k.a. the *ground truth*). The `FullStorageAccumulator`

is perfect when not too many values will be ingested. When the number of ingested values runs in the millions or billions, the `TDigestAccumulator`

is the best choice for non-uniformly distributed values. The `TDigestAccumulator`

is based on the algorithm by Ted Dunning.

To use a `TDigestAccumulator`

, construct the `Tally`

like this:

`Tally tally = new Tally("Example tally with TDigest accumulator", new TDigestAccumulator());`

The output for the quantiles is now

first quartile: 0.229876868954619 median: 0.4751306678330976 third quartile: 0.7507889885484719

These values match the results of the `FullStorageAccumulator`

within `0.001`

. For most applications such differences won't matter. For higher precision (at the cost of more memory and CPU-time), the TDigestAccumulator can be constructed with an integer argument (the `compression`

setting) like:

`Tally tally = new Tally("Example tally with TDigest accumulator with higher precision", new TDigestAccumulator(1000));`

The default value for the compression is `100`

. The output using the TDigestAccumulator with compression set to 1000 is:

first quartile: 0.22962779231833413 median: 0.4753812701997516 third quartile: 0.7517244282398716

This matches the output of the `FullStorageAccumulator`

within `0.0002`

.