rpower-lawbigdata

How to use poweRlaw package in R for very big datasets?


I am fitting a power law to a 45 million rows vector, for which I am using the poweRlaw package in R: https://arxiv.org/pdf/1407.3492.pdf

The most computationally intensive part of the process is estimating the lower bound, which is done with the estimate_xmin() function. It's taking a lot of time.

The code goes like this (w is the vector and c_pl comes from "continuous power-law"):

c_pl <- conpl$new(w)
est <- estimate_xmin(c_pl)
c_pl$setXmin(est)

I am wondering how to use the estimate_xmin() function in a way that minimises processing time (maybe parallel computations?) I am working on an AWS instance with 16 cores and 64GB of RAM. Thanks.


Solution

  • The reason that estimate_xmin takes so long is because it is trying all possible values of xmin. The function has an argument xmins that you can use to truncate this search, e.g.

    estimate_xmin(m, xmins=c(10, 100, 1000, 10000))
    

    will find the optimal xmin out of 10, 100, 1000 and 10000.