postgresqlrandomstatisticsgaussian

Gaussian random distribution in Postgresql


I have a table of let's say 250 URLs :

create table url (
  id serial,
  url varchar(64)
)

These URLs correspond each to a website. Each of the websites have a different popularity. Let's say that the id=125 (the one centered on the gaussian) is the most popular, the ones at id=1 or id=250 are the least popular.

I want to populate a "log" table like the following one with a value of url among the ones provided in the "url" table, but taking into account that different URLS might appear more frequently (for exemple url whose id is 125 will be the most popular).

create table log (
  id serial,
  url_id integer
)

I want to avoid using random() since it is uniform and not very "real".

How can this be achieved with Postgresql ?


Solution

  • The sum of 12 uniform distributions on the range [0, 1) is a good approximation to a Gaussian distribution bounded in the range [0, 12). This can then easily be re-scaled by multiplying by a constant and then adding/subtracting a constant.

    select
        random() + 
        random() + 
        random() +
        random() + 
        random() + 
        random() +
        random() + 
        random() + 
        random() +
        random() + 
        random() + 
        random();
    

    http://books.google.com/books?id=EKA-yeX2GVgC&pg=PA185&lpg=PA185&dq=%22sum+of+12+uniform+random+variables%22&source=bl&ots=YfwwE0fBB3&sig=HX9J9Oe6x316kVL8uamDU_GOsn4&hl=en&sa=X&ei=bJLZUur1GozaqwGHm4DQDQ&ved=0CEUQ6AEwAw#v=onepage&q=%22sum%20of%2012%20uniform%20random%20variables%22&f=false