I am using the fit_power_law
function from the igraph R package. The function returns "KS.p" (the p-value of the Kolmogorov-Smirnov test) as a significance test of the fitness. As I understand it, the smaller the KS p-value, the more likely one can reject the hypothesis that the data was drawn from a power law distribution. In other words, one would want a large KS p-value if one expects the data to fit the power law distribution.
However when I try this on a random graph with an expected power law degree distribution using the sample_fitness_pl
function:
set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.2)
and then run the fit_power_law
function:
fit_power_law(degree(g))
I get the following results:
$continuous
[1] FALSE
$alpha
[1] 2.363965
$xmin
[1] 8
$logLik
[1] -16114.15
$KS.stat
[1] 0.02660938
$KS.p
[1] 0.002639624
The KS.p is quite small. If I set the significance level to 0.05, I could reject that the degree distribution is drawn from power law. However, it is original draw from graph with power law distributed degrees. I also followed another bootstrap approach to estimate the p-value using poweRlaw
based on guidance here.
data_pl = displ$new(degree(g)[degree(g)>0])
est <- estimate_xmin(data_pl)
data_pl$xmin <- est$xmin
data_pl$pars <- est$pars
bs <- bootstrap_p(data_pl)
and the returned p-value is 0
bs$p
[1] 0
Anybody have an idea on how to explain this discrepancy? Any comments are appreciated!
Citing doc of sample_fitness_sp
:
Note that significant finite size effects may be observed for exponents smaller than 3 in the original formulation of the game. This function provides an argument that lets you remove the finite size effects by assuming that the fitness of vertex i is (i+i_0-1)^{-alpha} is a constant chosen appropriately to ensure that the maximum degree is less than the square root of the number of edges times the average degree; see the paper of Chung and Lu, and Cho et al for more details.
Even very small differences may be significative:
library(igraph)
set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.99999)
fit_power_law(degree(g))["KS.p"]
#> $KS.p
#> [1] 0.1234042
set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=3)
fit_power_law(degree(g))["KS.p"]
#> $KS.p
#> [1] 0.9999999
Finally you can set the finite size effect correction to change this behaviour:
set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.2, finite.size.correction = F)
fit_power_law(degree(g))["KS.p"]
#> $KS.p
#> [1] 0.9951813