javamathstatisticscolt

Updating quantiles rather than recomputing


Is there a java library that allows me to update rather than recompute quantiles of a large sample set of data with addition/removal of data points ? My guess is that an efficient algorithms should take a constant time for the update (not a function of the number of points already existing).

Known algorithms are listed but dont' have a way of removing points from the sample set :

Here is a sample problem: Say I want to compute say an arbitrary but constant percentile fan speed of a set of windmills (as an estimate of the wind speed). The fan speeds are updated asynchronously every few milliseconds. This library should allow me to update the wind speeds of one windmill at a time without having to recalculate the median.


Solution

  • If you maintain an updatable sorted representation of the data, getting the quantiles is easy and efficient just by using the length of your array. For example, if you have N elements then the median will be at position N/2, and so on. When you insert a new element into your data structure, this will still hold. The efficiency is then just dependent on inserting a new element.