crossfilter

Crossfilter temporary groups efficiency


I'm integrating Crossfilter with Vue and wonder about efficiency.

Whenever the state of the UI updates, I'm doing calculations using code like this one, to get various metrics from the dataset:

const originKeys = this.dimensions.origin.group().all().map(value => value.key)

At this point I realised that the group being created by the call is stored in the "registry" of the dimension every time the UI updates, but since I'm not storing the reference to the group, it's effectively "lost".
Then, whenever the data set updates, all of these "lost" groups do their calculations, even though the results are never used.

This is what I assumed, correct me if I'm wrong, please.

The calculations change according to the UI, so it's pointless to store references to the groups.

To overcome this issue, I created a simple helper function that creates a temporary group and disposes of it right after the calculation is done.

  function temporaryGroup(dimension, calculator) {
    const group = dim.group()
    const result = calculator(group, dim)
    group.dispose()
    return result
  }

Using it like this:

const originKeys = temporaryGroup(this.dimensions.origin, (group) => {
  return group.all().map(value => value.key)
})

The question is, is there a better (more efficient) way for temporary calculations like the one above?


Solution

  • The answer is no. Your stated assumptions are correct. That's not efficient, and there is no more efficient way of using this library for temporary groups.

    Crossfilter is designed for quick interaction between a fixed set of dimensions and groups.

    It's stateful by design, adding and removing just the specific rows from each group that have changed based on the changes to filters.

    This matters especially for range-based filters, since if you drag a brush interactively a small segment of domain is added and a small segment is removed at each mousemove.

    There are also array indices created to track the mapping from data -> key -> bin. One array of keys and one integer array of indices for the dimension, and one integer array of bin indices for the group. This makes updates fast, but it may be inefficient for temporary groups.

    If you don't have a consistent set of charts, it would be more efficient in principle to do the calculation yourself, using typed arrays and e.g. d3-array.

    On the other hand, if you are doing this to comply with Vue's data model, you might see if Vue has a concept similar to React's "context", where shared state is associated with a parent component. This is how e.g. react-dc-js holds onto crossfilter objects.