clojureincanter

Incanter sample mean and variance not close to distribution mean and variance


I answered a question regarding generating samples with a positive support and known mean and variance using the gamma distribution in NumPy. I thought I'd try the same in Incanter. But unlike the results I got with NumPy, I could not get a sample mean and variance close to the distribution's mean and variance.

(defproject incanter-repl "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.4"]])

(require '[incanter 
           [core] 
           [distributions :refer [gamma-distribution mean variance draw]] 
           [stats :as stats]])

(def dist 
  (let [mean 0.71 
        variance 2.89 
        theta (/ variance mean) 
        k (/ mean theta) ] 
    (gamma-distribution k theta)))

Incanter calculates mean and variance of the distribution

(mean dist) ;=> 0.71
(variance dist) ;=> 2.89

I calculate a sample mean and variance based on draws from that distribution

(def samples (repeatedly 10000 #(draw dist)))

(stats/mean samples) ;=> 0.04595208774029654
(stats/variance samples) ;=> 0.01223348345651905

I expected these stats calculated on the sample to be much closer to the mean and variance of the distribution. What am I missing?

Answer

Incanter had a bug inherited from Parallel Colt. The treatment of parameters is not consistent across methods in Parallel Colt. See issue report https://github.com/incanter/incanter/issues/245.


Solution

  • Contrary to numpy.random.gamma which takes shape (k) and scale (theta) as parameters, clojures gamma-distribution takes shape (k) and rate (1/theta) as parameters. See (doc gamma-distribution)and http://en.wikipedia.org/wiki/Gamma_distribution

    Thus, to get the desired results, you may define dist as

    (def dist 
      (let [mean 0.71 
            variance 2.89 
            r (/ mean variance) 
            k (* mean r) ] 
        (gamma-distribution k r)))
    

    A sample result is then

    (def samples (repeatedly 10000 #(draw dist)))
    #'incanter-test.core/samples
    incanter-test.core=> (stats/mean samples)
    0.7163908381930312
    incanter-test.core=> (stats/variance samples)
    2.940867216122528