The excellent Metrics package provides a function to calculate average precision: apk
.
The problem is, it's based on a for
loop, and it's slow:
require('Metrics')
require('rbenchmark')
actual <- 1:20000
predicted <- c(1:20, 200:600, 900:1522, 14000:32955)
benchmark(replications=10,
apk(5000, actual, predicted),
columns= c("test", "replications", "elapsed", "relative"))
test replications elapsed relative
1 apk(5000, actual, predicted) 10 53.68 1
I'm at a loss as to how to vectorize this function, but I was wondering if perhaps there's a better way to implement this in R.
I'd have to agree the implementation looked pretty bad... Try this instead:
apk2 <- function (k, actual, predicted) {
predicted <- head(predicted, k)
is.new <- rep(FALSE, length(predicted))
is.new[match(unique(predicted), predicted)] <- TRUE
is.relevant <- predicted %in% actual & is.new
score <- sum(cumsum(is.relevant) * is.relevant / seq_along(predicted)) /
min(length(actual), k)
score
}
benchmark(replications=10,
apk(5000, actual, predicted),
apk2(5000, actual, predicted),
columns= c("test", "replications", "elapsed", "relative"))
# test replications elapsed relative
# 1 apk(5000, actual, predicted) 10 62.194 2961.619
# 2 apk2(5000, actual, predicted) 10 0.021 1.000
identical(apk(5000, actual, predicted),
apk2(5000, actual, predicted))
# [1] TRUE