I am looking for an efficient way to implement the GTIN-13 check digit algorithm. I have looked at some relevant SO posts such as this and this but it seems like efficiency was not the subject of attention in either case.
Briefly, the algorithm takes a numeric string (such as 123765) and multiples every other digit (from right to left) by 1 or 3 to calculate a sum (so 5 * 1 + 6 * 3 + 7 * 1 + 3 * 3 + 2 * 1 + 1 * 3 = 44
) and then subtracts this sum from the closest multiple of 10 that is equal to or greater to this sum (in this case 50 - 44 = 6
) to derive the final check digit (here, 6). The input is expected to be 12 digits long, but if shorter, it can be simply padded with zeros from the left (so 123765
is really expected as 000000123765
) but the result will be still the same.
A naive implementation of this would be as follows:
gtin13 <- function(n) {
s <- as.character(n)
check.sum <- 0
for (i in 1:nchar(s)) {
digit <- substr(s, nchar(s) - i + 1, nchar(s) - i + 1)
check.sum <- check.sum + as.numeric(digit) * ifelse(i %% 2, 1, 3)
}
10 - check.sum %% 10
}
However, this is inefficient because of the for loop as well as the conversion to a string and back to a number. For instance:
df <- data.frame(
num <- sample(1:1000000, 100000, T)
)
system.time(cd <- vapply(df$num, gtin13, 0))
Take about 6 seconds on an average desktop.
What is a more efficient to calculate this check.sum?
This version doesn't need the vapply so it's faster because we don't loop over the number of possible digits in R. For example
gtim13_vec <- function(x) {
d <- x %% 10
for(i in 1:12) { # Input can be up to 12 digits
d <- d +(x%/% 10^i %% 10) * c(1,3)[1+i%%2]
}
d
10-(d%%10)
}
I used set.seed(7)
for this experiment. I see
system.time(r1 <- vapply(df$num, gtim13, 0))
# user system elapsed
# 3.21 0.00 3.36
system.time(r2 <- gtim13_vec(df$num))
# user system elapsed
# 0.03 0.00 0.03
all(r1==r2)
# [1] TRUE
So there's a big speed improvement.