I have a project in which i need to be able to calculate different voting power indexes in R. As a first attempt at this I wrote a small function to calculate the banzhaf index. It takes two arguments, a dataframe that has two columns which must be labelled member and vote, and how many votes are needed for a majority (quota):
library(combinat)
banzhaf <- function(data,quota){
f <- vector()
m <- vector()
score <- vector()
name <- vector()
pivot <- vector()
for (n in 1:nrow(data)){
y <- as.matrix(combn(data$member,n))
for (i in 1:ncol(y)){
for ( j in 1:n){
f[j] <- data[data$member == y[j,i],]$vote
m[j] <- as.character(data[data$member == y[j,i],]$member)
o <- data.frame(member = m, vote = f)
}
if (sum(o$vote) >= quota){
for (k in 1:length(o$member)){
t <- o[-k,]
if (sum(t$vote) < quota){
pivot[length(pivot) + 1] <- as.character(o$member[k])
}
}
}
}
}
for (l in unique(pivot)){
score[length(score) + 1] <- sum(pivot == l)
name[length(name) + 1] <- l
}
out <- data.frame(name = name, score = score/length(pivot))
return(out)
}
The problem with this function is that it becomes incredibly slow when i have more than 8 members in the dataframe. This is due to the combn() function used in the outermost loop (I think). Does anyone know how this can be made to run faster?
Best, Thomas
P.S: If you want to test it use the following data, but beware that it might run forever!
x <- c("Germany","France","UK","Italy","Spain","Poland","Romania","Netherlands","Greece","Portugal","Belgium","Czech Rep.","Hungary","Sweden","Austria","Bulgaria","Denmark","Slovakia","Finland","Ireland","Lithuania","Latvia","Slovenia","Estonia","Cyprus","Luxembourg","Malta")
z <- c(29,29,29,29,27,27,14,13,12,12,12,12,12,10,10,10,7,7,7,7,7,4,4,4,4,4,3)
dat <- data.frame(member = as.character(x),vote = z)
oi <- banzhaf(dat, 255)
oi
My approach was similar to David's, using batched matrix operations to handle the size:
banzhaf = function(votes, pass=sum(votes) %/% 2 + 1, batch.size=500000, quiet=batches == 1) {
n = length(votes)
batches = ceiling((2^n / batch.size))
if (!quiet)
cat('calculating...\n')
Reduce(`+`, lapply(1:batches, function(b) {
if (!quiet)
cat('-', b, '/', batches, '\n')
i = ((b - 1) * batch.size + 1):min(2^n, b * batch.size)
m = do.call(cbind, lapply(as.integer(2^((1:n) - 1L)), function(j, k) (k %/% j) %% 2L, i))
x = drop(m %*% votes)
passed = x >= pass
colSums((outer(x[passed] - pass, votes, `<`) * m[passed, , drop=F]))
}))
}
Uses R's name propagation instead of a data.frame, avoid loops where possible, and use integers instead of numerics if possible. Still took over 6 minutes to run on my box:
# wikipedia examples
banzhaf(c(A=4, B=3, C=2, D=1), 6)
banzhaf(c('Hempstead #1'=9, 'Hempstead #2'=9, 'North Hempstead'=7, 'Oyster Bay'=3, 'Glen Cove'=1, 'Long Beach'=1), 16)
# stackoverflow data
system.time(banzhaf(setNames(as.integer(z), x), 255))
The thinking went something like:
After that I think it plays out in the same manner as David's. The only complication was ensuring use of integers for efficiency, and adding the batching as its not really feasible to create a matrix of 27:2^27!