dc.js

Correlation Matrix in dc.js


I'm trying to use dc.js for a correlation matrix. My data looks as follows:

name,v1,v2,v3,v4
john,1,1,2,0
mary,2,1,1,1
albert,2,1,0,1
lynn,2,2,1,1
...

and I would like to render a chart that looks like this:

v1 v2 v3 v4
v1 x .1 .1 .1
v2 .1 x .1 .1
v3 .1 .1 x .1
v4 .1 .1 .1 x

where each variable (v1-v4) creates a row and a column, and at their intersection a value is computed. Computation of the correlation factor itself is not part of this question.

Now my question: how would I do this "variables as rows AND as columns"-thing in dc.js? What are the dimensions/groups in this example?


Solution

  • There are at least two ways to deal with multidimensional data in dc.js:

    1. You can create "multi-keys" for the dimension and group, which causes the dimension to filter and aggregate based on two keys instead of one. This is how the heatmap and series chart examples work.
    2. You can create a group where the values are objects and the aggregation function aggregates multiple values under the keys of the objects.

    The second is probably easier to use with the data table, since you then specify an accessor for each column which extracts the sub-value from the group value object.

    I guess you are probably not using crossfilter to calculate the correlation matrix, since this doesn't sound like an ordinary aggregation operation amenable to crossfilter's dimension/group data model.

    So you probably will want to create a "fake group", i.e. an object that looks like a group but returns the correlation matrix values that you compute.

    A template for this function might look like:

    const variables = ['v1', 'v2', 'v3', 'v4'];
    function correlation_group(group) {
      // group = source data, some ordinary crossfilter group with
      // the data you need to calculate correlations
      // do calculations here
      return {
        all: function() {
          return variables.map(vy => ({
            key: vy,
            value: variables.reduce(vx => /* insert matrix value vx, vy */
          })
        }
      };
    }
    

    Now you can use the "fake group" as the dimension for a data table.

    I am aware this is a partial answer... it only covers how to put your data in a shape which the dc.js data table can use, and I'm omitting a lot of details about how to use group data to calculate correlations, and how to specify the data table.

    I guess it's kind of a big topic, and an open-ended question invites an open-ended answer. :)