dc.jscrossfilter

Using the first row in bin (instead of average) to calculate percentage gain


In the dc.js Nasdaq example, percentageGain is calculated as:

(p.absGain / p.avgIndex) * 100

Here avgIndex is the average of all the day-averages.

I'm more familiar with the equation:

A. (Price - Prev period's Close) / Prev period's Close * 100

I'm not sure whether this is possible (with filters set and so on), the way crossfilter/dc works. Therefor, an alternative and different equation ,that might fit crossfilter/dc better and would still be meaningful, could be:

B. absGain of group / open of first day of group * 100

B would also mean that: If only a filter is set on for example Q1, then only the absGain of Q1 is taken into account. The first day in this group is the the oldest Q1 date in the oldest year. Also, charts other than "yearly" with groups like quarter, month or day of the week should be able to display the value of this equation. For example in a month chart, the value of the month "June" is calculated by taking the open of the first day in the first June. The absGain is taken from all June months. (of course working with all current filters in place)

Question: Can A and/or B be solved the crossfilter/dc way and how (example)?

Even if only B could be solved (naturally with crossfilter/dc), that would already be great. I want to use the dc.js example for other stocks that have the same underlying data structure (open, close, high, low, volume)

thanks!


Solution

  • I agree that Equation B is easier to define using crossfilter, so I figured out one way to do it.

    Equation A could probably work but it's unclear which day's close should be used under filtering - the last day which is not in the current bin? The day before the first day in the current bin?

    Equation B needs the earliest row for the current bin, and that requires maintaining the array of all rows for each bin. This is not built into crossfilter but it's a feature which we have talked about adding.

    The complex reduce example does this, and we can reuse some of its code. It calculates the median/mode/min/max value from the arrays of rows which fall in each bin, using these functions to generate those arrays:

    function groupArrayAdd(keyfn) {
        var bisect = d3.bisector(keyfn);
        return function(elements, item) {
            var pos = bisect.right(elements, keyfn(item));
            elements.splice(pos, 0, item);
            return elements;
        };
    }
    
    function groupArrayRemove(keyfn) {
        var bisect = d3.bisector(keyfn);
        return function(elements, item) {
            var pos = bisect.left(elements, keyfn(item));
            if(keyfn(elements[pos])===keyfn(item))
                elements.splice(pos, 1);
            return elements;
        };
    }
    

    It's somewhat inefficient to maintain all these arrays, so you might test if it has an impact on your application. JS is pretty fast so it probably doesn't matter unless you have a lot of data.

    Unfortunately there is no other way to compute the minimum for a bin other than to keep an array of all the items in it. (If you tried to keep track of just the lowest item, or lowest N items, what would you do when they are removed?)

    Using these arrays inside the group reduce-add function:

        (p, v) => {
            ++p.count;
            p.rowsByDate = rbdAdd(p.rowsByDate, v);
            p.absGain += v.close - v.open;
            // ...
            p.percentageGain = p.rowsByDate.length ? (p.absGain / p.rowsByDate[0].open) * 100 : 0;
            return p;
        },
    

    In the reduce-remove function it's

            p.rowsByDate = rbdRemove(p.rowsByDate, v);
    

    and the same percentageGain change.

    Here is a demo in a notebook: https://jsfiddle.net/gordonwoodhull/08bzcd4y/17/

    first day open for percentage gain

    I only see slight changes in the Y positions of the bubbles; the changes are more apparent in the values printed in the tooltip.