rigraphpower-lawkolmogorov-smirnov

small KS p-value in fitting power law using degree data from graph with expected power-law degree distributions


I am using the fit_power_law function from the igraph R package. The function returns "KS.p" (the p-value of the Kolmogorov-Smirnov test) as a significance test of the fitness. As I understand it, the smaller the KS p-value, the more likely one can reject the hypothesis that the data was drawn from a power law distribution. In other words, one would want a large KS p-value if one expects the data to fit the power law distribution.

However when I try this on a random graph with an expected power law degree distribution using the sample_fitness_pl function:

set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.2)

and then run the fit_power_law function:

fit_power_law(degree(g))

I get the following results:

$continuous
[1] FALSE

$alpha
[1] 2.363965

$xmin
[1] 8

$logLik
[1] -16114.15

$KS.stat
[1] 0.02660938

$KS.p
[1] 0.002639624

The KS.p is quite small. If I set the significance level to 0.05, I could reject that the degree distribution is drawn from power law. However, it is original draw from graph with power law distributed degrees. I also followed another bootstrap approach to estimate the p-value using poweRlaw based on guidance here.

data_pl = displ$new(degree(g)[degree(g)>0])
est <- estimate_xmin(data_pl)
data_pl$xmin <- est$xmin
data_pl$pars <- est$pars
bs <- bootstrap_p(data_pl)

and the returned p-value is 0

bs$p
[1] 0

Anybody have an idea on how to explain this discrepancy? Any comments are appreciated!


Solution

  • Citing doc of sample_fitness_sp:

    Note that significant finite size effects may be observed for exponents smaller than 3 in the original formulation of the game. This function provides an argument that lets you remove the finite size effects by assuming that the fitness of vertex i is (i+i_0-1)^{-alpha} is a constant chosen appropriately to ensure that the maximum degree is less than the square root of the number of edges times the average degree; see the paper of Chung and Lu, and Cho et al for more details.

    Even very small differences may be significative:

    library(igraph)
    set.seed(137)
    g <- sample_fitness_pl(10000, 60000, exponent.out=2.99999)
    fit_power_law(degree(g))["KS.p"]
    #> $KS.p
    #> [1] 0.1234042
    
    set.seed(137)
    g <- sample_fitness_pl(10000, 60000, exponent.out=3)
    fit_power_law(degree(g))["KS.p"]
    #> $KS.p
    #> [1] 0.9999999
    
    

    Finally you can set the finite size effect correction to change this behaviour:

    set.seed(137)
    g <- sample_fitness_pl(10000, 60000, exponent.out=2.2, finite.size.correction = F)
    fit_power_law(degree(g))["KS.p"]
    #> $KS.p
    #> [1] 0.9951813