I have used the density transform in Vega Lite for smaller datasets. However, I have a larger dataset with millions of observations that is represented more compactly for which I'd like to do a weighted density transform. My attempt as follows:
`
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
// My data set is represented more compactly as follows
// "data": {
// "values": [
// {"size": 1, "observations": 1},
// {"size": 2, "observations": 2},
// {"size": 3, "observations": 4},
// {"size": 4, "observations": 6},
// {"size": 5, "observations": 3},
// ]
// },
// Expanding the dataset produces the right plot but is impractical
// given data volumes (in the millions of observations)
"data": {
"values": [
{"size": 1, "observation": "observation 1 of 1"},
{"size": 2, "observation": "observation 1 of 2"},
{"size": 2, "observation": "observation 2 of 2"},
{"size": 3, "observation": "observation 1 of 4"},
{"size": 3, "observation": "observation 2 of 4"},
{"size": 3, "observation": "observation 3 of 4"},
{"size": 3, "observation": "observation 4 of 4"},
{"size": 4, "observation": "observation 1 of 6"},
{"size": 4, "observation": "observation 2 of 6"},
{"size": 4, "observation": "observation 3 of 6"},
{"size": 4, "observation": "observation 4 of 6"},
{"size": 4, "observation": "observation 5 of 6"},
{"size": 4, "observation": "observation 6 of 6"},
{"size": 5, "observation": "observation 1 of 1"},
{"size": 5, "observation": "observation 2 of 2"}
]
},
"mark": "area",
"transform": [
{
// I believe Vega has a weight parameter in the density transform
// Is there an equivalent in Vega Lite?
//"weight": "observations",
"density": "size"
}
],
"encoding": {
"x": {"field": "value", "type": "quantitative"},
"y": {"field": "density", "type": "quantitative"}
}
}
`
The dataset I have available to me is commented out above. Expanding out the dataset produces the correct plot. However, given the number of observations, I suspect this is impractical unless there's a performant way to do this inside Vega Lite.
I believe Vega has a weight parameter in the density transform, but in the environment I'm working, I only have access to Vega Lite. Is there another way to think about producing a weighted density transform in Vega Lite?
That weight parameter in Vega isn't what you're looking for - it is to weight the different probability distributions if you need to use multiple types. Out of the box, both Vega and Vega-Lite are not suitable for scaling to huge datasets but there are several projects that use Vega to scale to large datasets.
https://github.com/vega/scalable-vega
https://vega.github.io/scalable-vega/
If you can't use one of the other projects, you're only option it to precompute the distributions and get Vega to display the result.