I am trying to implement the head/tail breaks classification algorithm in R (see here). This relatively new algorithm is a less computationally expensive alternative to other classification methods used in Cartography for highly skewed data.
So far, I have been looking as template a code in Python (see here) with relatively success. Here is my implementation in R:
# fake data to classify
pareto_data <- c()
for (i in 1:100){
pareto_data[i] <- (1.0/i)^1.16
}
# head/tail breaks algorithm
ht <- function(data){
ln <- length(data)
mn <- mean(data)
res <- append(c(),mn) # this is where I was hopping to store my output
head <- subset(data,data>=mn)
while (length(head)>=1 & length(head)/ln <= 0.40){
print(res)
return(ht(head))
}
#return(res)
}
ht(pareto_data)
As a result of running above code, I have been able to print the following:
[1] 0.03849691
[1] 0.1779904
[1] 0.4818454
This output is very likely the same of running the original Python code I have been using as template. However, I have not been successful in storing it in either a vector or a list.
I would be really thankful if you can give hints to overcome this problem and also to improve my code (which is not exactly the same as the original one in Python, particularly in the conditions of the while
statement).
A possible recursive version of the algorithm could be the following.
ht_breaks <- function(x){
ht_inner <- function(x, mu){
n <- length(x)
mu <- c(mu, mean(x))
h <- x[x > mean(x)]
if(length(h) > 1 && length(h)/n <= 0.4){
ht_inner(h, mu)
} else mu
}
ht_inner(x, NULL)
}
pareto_data <- (1.0/(1:100))^1.16
ht_breaks(pareto_data)
#[1] 0.03849691 0.17799039 0.48184535