vega-litedensity-plotweighted-graph

Vega Lite - Scaling to Large Datasets


I have used the density transform in Vega Lite for smaller datasets. However, I have a larger dataset with millions of observations that is represented more compactly for which I'd like to do a weighted density transform. My attempt as follows:Correct plot

`

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
//  My data set is represented more compactly as follows
//  "data": {
//    "values": [
//      {"size": 1, "observations": 1},
//      {"size": 2, "observations": 2},
//      {"size": 3, "observations": 4},
//      {"size": 4, "observations": 6},
//      {"size": 5, "observations": 3},
//    ]
//  },

//  Expanding the dataset produces the right plot but is impractical
//  given data volumes (in the millions of observations)
  "data": {
    "values": [
      {"size": 1, "observation": "observation 1 of 1"},
      {"size": 2, "observation": "observation 1 of 2"},
      {"size": 2, "observation": "observation 2 of 2"},
      {"size": 3, "observation": "observation 1 of 4"},
      {"size": 3, "observation": "observation 2 of 4"},
      {"size": 3, "observation": "observation 3 of 4"},
      {"size": 3, "observation": "observation 4 of 4"},
      {"size": 4, "observation": "observation 1 of 6"},
      {"size": 4, "observation": "observation 2 of 6"},
      {"size": 4, "observation": "observation 3 of 6"},
      {"size": 4, "observation": "observation 4 of 6"},
      {"size": 4, "observation": "observation 5 of 6"},
      {"size": 4, "observation": "observation 6 of 6"},
      {"size": 5, "observation": "observation 1 of 1"},
      {"size": 5, "observation": "observation 2 of 2"}
    ]
  },
  "mark": "area",
  "transform": [
    {
//  I believe Vega has a weight parameter in the density transform
//  Is there an equivalent in Vega Lite?
      //"weight": "observations",
      "density": "size"
    }
  ],
  "encoding": {
    "x": {"field": "value", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  }
}

`

The dataset I have available to me is commented out above. Expanding out the dataset produces the correct plot. However, given the number of observations, I suspect this is impractical unless there's a performant way to do this inside Vega Lite.

I believe Vega has a weight parameter in the density transform, but in the environment I'm working, I only have access to Vega Lite. Is there another way to think about producing a weighted density transform in Vega Lite?


Solution

  • That weight parameter in Vega isn't what you're looking for - it is to weight the different probability distributions if you need to use multiple types. Out of the box, both Vega and Vega-Lite are not suitable for scaling to huge datasets but there are several projects that use Vega to scale to large datasets.

    https://github.com/vega/scalable-vega

    https://vega.github.io/scalable-vega/

    https://vegafusion.io/

    If you can't use one of the other projects, you're only option it to precompute the distributions and get Vega to display the result.