I have a set of numbers for a given set of attributes:
red = 4
blue = 0
orange = 2
purple = 1
I need to calculate the distribution percentage. Meaning, how diverse is the selection? Is it 20% diverse? Is it 100% diverse (meaning an even distribution of say 4,4,4,4)?
I'm trying to create a sexy percentage that approaches 100% the more the individual values average to the same value, and a lower value the more they get lopsided.
Has anyone done this?
Here is the PHP conversion of the below example. For some reason it's not producing 1.0 with a 4,4,4,4 example.
$arrayChoices = array(4,4,4,4);
foreach($arrayChoices as $p)
$sum += $p;
print "sum: ".$sum."<br>";
$pArray = array();
foreach($arrayChoices as $rec)
{
print "p vector value: ".$rec." ".$rec / $sum."\n<br>";
array_push($pArray,$rec / $sum);
}
$total = 0;
foreach($pArray as $p)
if($p > 0)
$total = $total - $p*log($p,2);
print "total = $total <br>";
print round($total / log(count($pArray),2) *100);
Thanks in advance!
A simple, if rather naive, scheme is to sum the absolute differences between your observations and a perfectly uniform distribution
red = abs(4 - 7/4) = 9/4
blue = abs(0 - 7/4) = 7/4
orange = abs(2 - 7/4) = 1/4
purple = abs(1 - 7/4) = 3/4
for a total of 5.
A perfectly even spread will have a score of zero which you must map to 100%.
Assuming you have n
items in c
categories, a perfectly uneven spread will have a score of
(c-1)*n/c + 1*(n-n/c) = 2*(n-n/c)
which you should map to 0%. For a score d
, you might use the linear transformation
100% * (1 - d / (2*(n-n/c)))
For your example this would result in
100% * (1 - 5 / (2*(7-7/4))) = 100% * (1 - 10/21) ~ 52%
Better yet (although more complicated) is the Kolmogorov–Smirnov statistic with which you can make mathematically rigorous statements about the probability that a set of observations have some given underlying probability distribution.