javahdrhistogram

HDR Histogram: Min not same as Max on 1 sample count


I am using Java implementation of HDR Histogram:

    <dependency>
        <groupId>org.hdrhistogram</groupId>
        <version>2.1.4</version>
        <artifactId>HdrHistogram</artifactId>
    </dependency>

I'v enoticed that minimum and maximum differ even when the sample count is 1:

@Test
public void testHistogram() throws Exception {
    Histogram stats = new Histogram(2);

    stats.recordValue(35071);
    assertEquals(1, stats.getTotalCount());
    assertEquals(35071, stats.getMaxValue());

    assertEquals(35071, stats.getMinNonZeroValue()); // Fails:
               // java.lang.AssertionError: 
               // Expected :35071
               // Actual   :34816


}

I see the following fragment in the Histogram code:

public long getMinNonZeroValue() {
    return (minNonZeroValue == Long.MAX_VALUE) ?
            Long.MAX_VALUE : lowestEquivalentValue(minNonZeroValue);
}

(That is in GitHub)

My question is: why can't we simply return the recorded minNonZeroValue ?


Solution

  • HdrHistograms are set up with a minimum configurable precision (e.g. 2 decimal points, or 3, or...). As a data structure, it uses logically exponential buckets with linear sub-buckets in each, to maintain the required precision across the entire dynamic range, all within a fixed-size (for a given dynamic range and precision level)data structure. As such, any recorded integer value in a histogram is indistinguishable from any other value in the range lowestEquivalentValue(value).. highestEquivalentValue(value).

    HdrHistogram carefully avoids providing any results "within" a range. When asked for the min, it will always respond with a value that is equivalent to the lowest recorded value. When asked for a max, it will always respond with a value that is equivalent to the highest recorded value. These are answers are clearly within the precision contract, and doing otherwise would result in "subtly surprising" behaviors such as iterating past the min or the max, or getting query answers (for mean, percentiles, etc.) that are outside of the reported min..max range.

    HTH.