rdata-manipulationdata-cleaningcrosstabcontingency

Create contingency table that displays the frequency distribution of pairs of variables


I want to create a contingency table that displays the frequency distribution of pairs of variables. Here is an example dataset:

mm <- matrix(0, 5, 6)
df <- data.frame(apply(mm, c(1,2), function(x) sample(c(0,1),1)))
colnames(df) <- c("Horror", "Thriller", "Comedy", "Romantic", "Sci.fi", "gender")

All variables are binary with 1 indicating either the presence of specfic movie type or the male gender. In the end, I would like to have the table that counts the presence of different movie types under specific gender. Something like this:

           male female
Horror      1      1
Thriller    1      3
Comedy      2      2
Romantic    0      0
Sci.fi      2      0

I know I can create two tables of different movie types for male and female individually (see TarJae's answer here Create count table under specific condition) and cbind them later but I would like to do it in one chunk of code. How to achieve this in an efficient way?


Solution

  • You could do

    sapply(split(df, df$gender), function(x) colSums(x[names(x)!="gender"]))    
    
    #>          0 1
    #> Horror   1 1
    #> Thriller 1 3
    #> Comedy   0 0
    #> Romantic 0 0
    #> Sci.fi   1 3