algorithmstatisticsgenetic-algorithm

Select k elements from a weighted stream of elements


I need an algorithm to select K elements from an array of length N. Since N can be a very large number, the algorithm should work with a steamed collection. To be more concrete, this is how the collection looks: [(el1, 1000), (el2, 500), ...., (elN, 230)]. Elements with higher weights should have more chances to get into the resulting array.

I did some research and found a paper WEIGHTED RESERVOIR SAMPLING. It seems to be a good solution and it does what I need.

However, I also read about Fitness proportionate selection, which seems like it solves the problem too.

I struggle to understand the difference between them and which one should I choose?


Solution

  • Neither one of those algorithms handles without-replacement sampling without modification.

    If you want to simulate the manual process of drawing tickets until there are k distinct winners, then you can assign each element e a key −log1p(−random())/weight(e) where random() returns a uniform random double-precision floating point number between 0 inclusive and 1 exclusive, i.e., an exponential random variate with rate weight(e), and choose the k elements with the smallest keys in a streaming fashion.