Items in a set have a weight property that can be set by an admin user. Items with more weight are more likely to achieve top position in a randomized list served to public visitors. Every item has a chance (however miniscule) to appear in any position.
What algorithm would ensure that changing an item’s weight would achieve a consistent result regardless of set size?
I know I’m out of my depth, but I don’t want to assume that the actual probability structure implied by the basic algorithm (Attempt 1) is valid. Am I tilting at windmills here?* Is my method flawed? Are my conclusions solid?
[a:2, b:2, c:2, d:2, e:3, f:3]
, the difference in chance between b
and e
will be the same as in the set [a:2, b:2, c:3, d:3, e:3, f:3]
(same length, different sum) and the same as in the set [a:1, b:2, e:3, f:6]
(different length, same sum).[3,3,17,17]
, bumping up the last item to 19 should have as much effect as bumping the first item up to 5; and the set [1,3]
should behave similarly to [3,5]
. This is so that users don’t have to understand the whole set to make an efficient change.function linear_ratio($weight_set, $weight) {
$sum = 0;
foreach ($weight_set as $item) $sum += $item;
return $weight / $sum;
}
function exponential_ratio($weight_set, $weight) {
$sum = 0;
foreach ($weight_set as $item) $sum += (2 ** $item);
$weight = 2 ** $weight;
return $weight / $sum;
}
This solves the 4th parameter, but the base (2) is arbitrary yet exponentially impactful.
function based_exponential_ratio($weight_set, $weight) {
$set_size = count($weight_set);
$sum = 0;
foreach ($weight_set as $item) $sum += $set_size ** $item;
$weight = $set_size ** $item;
return $weight / $sum;
}
This just switches out one problem (the arbitrary base) for another. Given a set of 100 items, adding 1 more shouldn’t make much difference, but set size + 1 increases weight’s effect exponentially (thus reducing randomization).
I do care about adding 500 more items to the set of 100. So do I care when set size changes by an order of magnitude? “Take the natural log of the set size and raise it to the weight power”? My mental model can’t progress any further than this point, but I would have thought that taking an exponent of a log would be counter-productive.
The exponential approach you have is the most reasonable given your conditions.
The way you are stating your final condition is not realistic. Generally, examining the percentage change in probability is not a useful way of looking at it.
[FAIL] If the set is [2,2,13,13], incrementing the lightest value by 1 changes it from 4/16392 to 8/16400—a 199.9% change. Incrementing the heaviest value by 1 changes it from 8192/16392 to 16384/24584—only a 133.4% change.
What if you have two products, and your initial weights are set to produce starting probabilities of [99/100, 1/100]
? Then suppose there is a real number dw
such that incrementing the weight of the second product by dw
takes the probabilities to [95/100, 5/100]
. That's a "400% increase" in the probability of the second product. There's no way to increase the probability of the first product in a remotely similar way.