rcrosstabmultiple-valuetable-functions

cross table with a column containing multiple values in R


I want to know how many Low, Medium and High of Drama I have, and how many Low, Medium and High of Crime I have in my data frame.

Here's a sample of my data frame:

                               genres class_rentabilite
                       Crime, Drama         Medium
     Action, Crime, Drama, Thriller           High    
Action, Adventure, Sci-Fi, Thriller         Medium
                              Drama            Low
                       Crime, Drama           High
                      Comedy, Drama           high

I used table() for another column in my data, and it worked:

table(df$language, df$class_rentabilite)

The code above gives this:

                Low   Medium   High NA
                  1     1       0  3
  Aboriginal      0     0       2  0
  Arabic          0     0       1  3
  Aramaic         1     0       0  0
  Bosnian         1     0       0  0
  Cantonese       5     2       1  3

I want to use this approach for the sample data, but table() doesn't work because I have multiple values in each row in genres. How can I solve this situation?


Solution

  • Here is one approach for you. You split genres with separate_rows() and create a temporary data frame. Then, you use table() as you did.

    library(dplyr)
    library(tidyr)
    
    mydf %>%
    separate_rows(genres, sep = ", ") -> foo
    
    table(foo$genres, foo$class_rentabilite)
    
    #            High Low Medium
    #  Action       1   0      1
    #  Adventure    0   0      1
    #  Comedy       1   0      0
    #  Crime        2   0      1
    #  Drama        3   1      1
    #  Sci-Fi       0   0      1
    #  Thriller     1   0      1
    

    DATA

    mydf <- structure(list(genres = c("Crime, Drama", "Action, Crime, Drama, Thriller", 
    "Action, Adventure, Sci-Fi, Thriller", "Drama", "Crime, Drama", 
    "Comedy, Drama"), class_rentabilite = c("Medium", "High", "Medium", 
    "Low", "High", "High")), .Names = c("genres", "class_rentabilite"
    ), row.names = c(NA, -6L), class = "data.frame")