dc.jscrossfilter

Histogram based on "reduceSummed" groups


I have CSV data with the following pattern:

Quarter,productCategory,unitsSold
2018-01-01,A,21766
2018-01-01,B,10076
2018-01-01,C,4060
2018-04-01,A,27014
2018-04-01,B,12219
2018-04-01,C,4740
2018-07-01,A,29503
2018-07-01,B,13020
2018-07-01,C,5549
2018-10-01,A,3796
2018-10-01,B,15110
2018-10-01,C,6137
2019-01-01,A,25008
2019-01-01,B,11655
2019-01-01,C,4630
2019-04-01,A,31633
2019-04-01,B,14837
2019-04-01,C,5863
2019-07-01,A,33813
2019-07-01,B,15442
2019-07-01,C,6293
2019-10-01,A,35732
2019-10-01,B,19482
2019-10-01,C,6841

As you can see, there are 3 product categories sold every day. I can make a histogram and count how many Quarters are involved per bin of unitsSold. The problem here is that every Quarter is counted separately. What I would like is a histogram where the bins of unitsSold are already grouped with a reduceSum on the Quarter.

This would result in something like this:

Quarter, unitsSold
2018-01-01,35902
2018-04-01,43973
2018-07-01,48072
2018-10-01,25043
2019-01-01,41293
2019-04-01,52333
2019-07-01,55548
2019-10-01,62055

Where, based on the bins of unitsSold, a number of Quarters would fall into. For example a bin of 50.000 - 70.000 would count 3 Quarters (2019-04-01, 2019-07-01 and 2019-10-01)

Normally I would do something like this:

const histogramChart = new dc.BarChart('#histogram');
const histogramDim = ndx.dimension(d => Math.round(d.unitsSold / binSize) * binSize);
const histogramGroup = histogramDim.group().reduceCount();

But in the desired situation the histogram is kind of created on something that has already been "reducedSummed". Ending up in a barchart histogram like this (data does not match with this example):

Histogram with a number of "Units sold" bins and the number of "Quarters" that fall in to them

How can this be done with dc.js/crossfilter.js?


Solution

  • Regrouping the data by value

    I think the major difference between your question and this previous question is that you want to bin the data when you "regroup" it. (Sometimes this is called a "double reduce"... no clear names for this stuff.)

    Here's one way to do that, using an offset and width:

    function regroup(group, width, offset = 0) {
      return {
        all: function() {
          const bins = {};
          group.all().forEach(({key, value}) => {
            const bin = Math.floor((value - offset) / width);
            bins[bin] = (bins[bin] || 0) + 1;
          });
          return Object.entries(bins).map(
            ([bin, count]) => ({key: bin*width + offset, value: count}));
        }
      }
    }
    

    What we do here is loop through the original group and

    1. map each value to its bin number
    2. increment the count for that bin number, or start at 1
    3. map the bins back to original numbers, with counts

    Testing it out

    I displayed your original data with the following chart (too lazy to figure out quarters, although I think it's not hard with recent D3):

    const quarterDim = cf.dimension(({Quarter}) => Quarter),
        unitsGroup = quarterDim.group().reduceSum(({unitsSold}) => unitsSold);
    
    quarterChart.width(300)
        .height(200)
      .margins({left: 50, top: 0, right: 0, bottom: 20})
        .dimension(quarterDim)
      .group(unitsGroup)
      .x(d3.scaleTime().domain([d3.min(data, d => d.Quarter), d3.timeMonth.offset(d3.max(data, d => d.Quarter), 3)]))
      .elasticY(true)
      .xUnits(d3.timeMonths);
    

    and the new chart with

    const rg = regroup(unitsGroup, 10000);
    countQuartersChart.width(500)
      .height(200)
      .dimension({})
      .group(rg)
      .x(d3.scaleLinear())
      .xUnits(dc.units.fp.precision(10000))
      .elasticX(true)
      .elasticY(true);
    

    (Note the empty dimension, which disables filtering. Filtering may be possible but you have to map back to the original dimension keys so I’m skipping that for now.)

    Here are the charts I get, which look correct at a glance:

    double-reduced charts

    Demo fiddle.

    Adding filtering to the chart

    To implement filtering on this "number of quarters by values" histogram, first let's enable filtering between the by-values chart and the quarters chart by putting the by-values chart on its own dimension:

    const quarterDim2 = cf.dimension(({Quarter}) => Quarter),
        unitsGroup2 = quarterDim2.group().reduceSum(({unitsSold}) => unitsSold);
    const byvaluesGroup = regroup(unitsGroup2, 10000);
    countQuartersChart.width(500)
        .height(200)
      .dimension(quarterDim2)
      .group(byvaluesGroup)
      .x(d3.scaleLinear())
      .xUnits(dc.units.fp.precision(10000))
      .elasticX(true)
      .elasticY(true);
    

    Then, we implement filtering with

    countQuartersChart.filterHandler((dimension, filters) => {
      if(filters.length === 0)
        dimension.filter(null);
      else {
        console.assert(filters.length === 1 && filters[0].filterType === 'RangedFilter');
        const range = filters[0];
        const included_quarters = unitsGroup2.all()
            .filter(({value}) => range[0] <= value && value < range[1])
            .map(({key}) => key.getTime());
        dimension.filterFunction(k => included_quarters.includes(k.getTime()));
      }
      return filters;
    });
    

    This finds all quarters in unitsGroup2 that have a value which falls in the range. Then it sets the dimension's filter to accept only the dates of those quarters.

    Odds and ends

    Quarters

    D3 supports quarters with interval.every:

    const quarterInterval = d3.timeMonth.every(3);
    
    chart.xUnits(quarterInterval.range);
    

    Eliminating the zeroth bin

    As discussed in the comments, when other charts have filters active, there may end up being many quarters with less than 10000 units sold, resulting in a very tall zero bar which distorts the chart.

    The zeroth bin can be removed with

      delete bins[0];
    

    before the return in regroup()

    Rounding the by-values brush

    If snapping to the bars is desired, you can enable it with

    .round(x => Math.round(x/10000)*10000)
    

    Otherwise, the filtered range can start or end inside of a bar, and the way the bars are colored when brushed is somewhat inaccurate as seen below.

    with filtering and quarters

    Here's the new fiddle.