rdistributionsimulateprobabilistic-programming

Simulate fat tail data in R


I need to simulate data in R with a fat tail distribution, and having never simulated data before I'm not sure where to start. I have looked into the FatTailsR package but the documentation is pretty cryptic and I can't seem to find any obvious tutorials.

Basically, I want to create an artificial dataframe with two columns (X and Y), of 10,000 observations, that uses the following logic/iterations:

Any guidance would be appreciated. Including suggestions of packages and functions to check out (maybe something like rlnorm ?)


Solution

  • This might work (not super-efficient, but ...)

    First figure out the probabilities of each outcome (P(1)=0.75, P(2)=0.75*0.25, P(3)=0.75*0.25^2 ...)

    cc <- cumprod(c(0.75,rep(0.25,9)))
    

    Choose a multinomial deviate with these probabilities (N=1 for each sample)

    rr <- t(rmultinom(1000,size=1,prob=cc))
    

    Figure out which value in each row is equal to 1:

    storage.mode(rr) <- "logical"
    out <- apply(rr,1,which)
    

    Check results:

    tt <- table(factor(out,levels=1:10))
      1   2   3   4   5   6   7   8   9  10 
    756 183  43  14   3   1   0   0   0   0 
    

    There might be a cleverer way to set this up in terms of a modified geometric distribution ...