As seen here: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
Here's the Ruby code itself, implemented in the Statistics2 library:
# inverse of normal distribution ([2])
# Pr( (-\infty, x] ) = qn -> x
def pnormaldist(qn)
b = [1.570796288, 0.03706987906, -0.8364353589e-3,
-0.2250947176e-3, 0.6841218299e-5, 0.5824238515e-5,
-0.104527497e-5, 0.8360937017e-7, -0.3231081277e-8,
0.3657763036e-10, 0.6936233982e-12]
if(qn < 0.0 || 1.0 < qn)
$stderr.printf("Error : qn <= 0 or qn >= 1 in pnorm()!\n")
return 0.0;
end
qn == 0.5 and return 0.0
w1 = qn
qn > 0.5 and w1 = 1.0 - w1
w3 = -Math.log(4.0 * w1 * (1.0 - w1))
w1 = b[0]
1.upto 10 do |i|
w1 += b[i] * w3**i;
end
qn > 0.5 and return Math.sqrt(w1 * w3)
-Math.sqrt(w1 * w3)
end
This is pretty straightforward to translate:
module PNormalDist where
pnormaldist :: (Ord a, Floating a) => a -> Either String a
pnormaldist qn
| qn < 0 || 1 < qn = Left "Error: qn must be in [0,1]"
| qn == 0.5 = Right 0.0
| otherwise = Right $
let w3 = negate . log $ 4 * qn * (1 - qn)
b = [ 1.570796288, 0.03706987906, -0.8364353589e-3,
-0.2250947176e-3, 0.6841218299e-5, 0.5824238515e-5,
-0.104527497e-5, 0.8360937017e-7, -0.3231081277e-8,
0.3657763036e-10, 0.6936233982e-12]
w1 = sum . zipWith (*) b $ iterate (*w3) 1
in (signum $ qn - 0.5) * sqrt (w1 * w3)
First off, let's look at the ruby - it returns a value, but sometimes it prints an error message (when given an improper argument). This isn't very haskellish, so
let's have our return value be Either String a
- where we'll return a Left String
with an error message if given an improper argument, and a Right a
otherwise.
Now we check the two cases at the top:
qn < 0 || 1 < qn = Left "Error: qn must be in [0,1]"
- this is the error condition, when qn
is out of range.qn == 0.5 = Right 0.0
- this is the ruby check qn == 0.5 and return * 0.0
Next up, we define w1
in the ruby code. But we redefine it a few lines later, which isn't very rubyish. The value that we store in w1
the first time
is used immediately in the definition of w3
, so why don't we skip storing it in w1
? We don't even need to do the qn > 0.5 and w1 = 1.0 - w1
step, because
we use the product w1 * (1.0 - w1)
in the definition of w3.
So we skip all that, and move straight to the definition w3 = negate . log $ 4 * qn * (1 - qn)
.
Next up is the definition of b
, which is a straight lift from the ruby code (ruby's syntax for an array literal is haskell's syntax for a list).
Here's the most tricky bit - defining the ultimate value of w3
. What the ruby code does in
w1 = b[0]
1.upto 10 do |i|
w1 += b[i] * w3**i;
end
Is what's called a fold - reducing a set of values (stored in a ruby array) into a single value. We can restate this more functionally (but still in ruby) using Array#reduce
:
w1 = b.zip(0..10).reduce(0) do |accum, (bval,i)|
accum + bval * w3^i
end
Note how I pushed b[0]
into the loop, using the identity b[0] == b[0] * w3^0
.
Now we could port this directly to haskell, but it's a bit ugly
w1 = foldl 0 (\accum (bval,i) -> accum + bval * w3**i) $ zip b [0..10]
Instead, I broke it up into several steps - first off, we don't really need i
, we just need the powers of w3
(starting at w3^0 == 1
), so
let's calculate those with iterate (*w3) 1
.
Then, rather than zipping those into pairs with the elements of b, we ultimately just need their products, so we can zip them into
the products of each pair using zipWith (*) b
.
Now our folding function is really easy - we just need to sum up the products, which we can do using sum
.
Lastly, we decide whether to return plus or minus sqrt (w1 * w3)
, according to whether qn
is greater or less than 0.5 (we
already know it's not equal). So rather than calculating the square root in two separate locations as in the ruby code,
I calculated it once, and multiplied it by +1
or -1
according to the sign of qn - 0.5
(signum
just returns the sign of a value).