I answered a question regarding generating samples with a positive support and known mean and variance using the gamma distribution in NumPy. I thought I'd try the same in Incanter. But unlike the results I got with NumPy, I could not get a sample mean and variance close to the distribution's mean and variance.
(defproject incanter-repl "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.6.0"]
[incanter "1.5.4"]])
(require '[incanter
[core]
[distributions :refer [gamma-distribution mean variance draw]]
[stats :as stats]])
(def dist
(let [mean 0.71
variance 2.89
theta (/ variance mean)
k (/ mean theta) ]
(gamma-distribution k theta)))
(mean dist) ;=> 0.71
(variance dist) ;=> 2.89
(def samples (repeatedly 10000 #(draw dist)))
(stats/mean samples) ;=> 0.04595208774029654
(stats/variance samples) ;=> 0.01223348345651905
I expected these stats calculated on the sample to be much closer to the mean and variance of the distribution. What am I missing?
Incanter had a bug inherited from Parallel Colt. The treatment of parameters is not consistent across methods in Parallel Colt. See issue report https://github.com/incanter/incanter/issues/245.
Contrary to numpy.random.gamma
which takes shape (k) and scale (theta) as parameters,
clojures gamma-distribution
takes shape (k) and rate (1/theta) as parameters.
See (doc gamma-distribution)
and http://en.wikipedia.org/wiki/Gamma_distribution
Thus, to get the desired results, you may define dist
as
(def dist
(let [mean 0.71
variance 2.89
r (/ mean variance)
k (* mean r) ]
(gamma-distribution k r)))
A sample result is then
(def samples (repeatedly 10000 #(draw dist)))
#'incanter-test.core/samples
incanter-test.core=> (stats/mean samples)
0.7163908381930312
incanter-test.core=> (stats/variance samples)
2.940867216122528