phpmathaveragedistribution

Need to calculate the percentage of distribution


I have a set of numbers for a given set of attributes:

red    = 4
blue   = 0
orange = 2
purple = 1

I need to calculate the distribution percentage. Meaning, how diverse is the selection? Is it 20% diverse? Is it 100% diverse (meaning an even distribution of say 4,4,4,4)?

I'm trying to create a sexy percentage that approaches 100% the more the individual values average to the same value, and a lower value the more they get lopsided.

Has anyone done this?

Here is the PHP conversion of the below example. For some reason it's not producing 1.0 with a 4,4,4,4 example.

$arrayChoices = array(4,4,4,4);

foreach($arrayChoices as $p)
    $sum += $p;

print "sum: ".$sum."<br>";

$pArray = array();

foreach($arrayChoices as $rec)
{
    print "p vector value: ".$rec." ".$rec / $sum."\n<br>";
    array_push($pArray,$rec / $sum);
}   
$total = 0;

foreach($pArray as $p)
    if($p > 0)
        $total = $total - $p*log($p,2);

print "total = $total <br>";

print round($total / log(count($pArray),2) *100);

Thanks in advance!


Solution

  • A simple, if rather naive, scheme is to sum the absolute differences between your observations and a perfectly uniform distribution

    red    = abs(4 - 7/4) = 9/4
    blue   = abs(0 - 7/4) = 7/4
    orange = abs(2 - 7/4) = 1/4
    purple = abs(1 - 7/4) = 3/4
    

    for a total of 5.
    A perfectly even spread will have a score of zero which you must map to 100%.
    Assuming you have n items in c categories, a perfectly uneven spread will have a score of

    (c-1)*n/c + 1*(n-n/c) = 2*(n-n/c)
    

    which you should map to 0%. For a score d, you might use the linear transformation

    100% * (1 - d / (2*(n-n/c)))
    

    For your example this would result in

    100% * (1 - 5 / (2*(7-7/4))) = 100% * (1 - 10/21) ~ 52%
    

    Better yet (although more complicated) is the Kolmogorov–Smirnov statistic with which you can make mathematically rigorous statements about the probability that a set of observations have some given underlying probability distribution.