I want to create a contingency table that displays the frequency distribution of pairs of variables. Here is an example dataset:
mm <- matrix(0, 5, 6)
df <- data.frame(apply(mm, c(1,2), function(x) sample(c(0,1),1)))
colnames(df) <- c("Horror", "Thriller", "Comedy", "Romantic", "Sci.fi", "gender")
All variables are binary with 1 indicating either the presence of specfic movie type or the male gender. In the end, I would like to have the table that counts the presence of different movie types under specific gender. Something like this:
male female
Horror 1 1
Thriller 1 3
Comedy 2 2
Romantic 0 0
Sci.fi 2 0
I know I can create two tables of different movie types for male and female individually (see TarJae's answer here Create count table under specific condition) and cbind
them later but I would like to do it in one chunk of code. How to achieve this in an efficient way?
You could do
sapply(split(df, df$gender), function(x) colSums(x[names(x)!="gender"]))
#> 0 1
#> Horror 1 1
#> Thriller 1 3
#> Comedy 0 0
#> Romantic 0 0
#> Sci.fi 1 3