I want to know how many Low, Medium and High of Drama I have, and how many Low, Medium and High of Crime I have in my data frame.
Here's a sample of my data frame:
genres class_rentabilite
Crime, Drama Medium
Action, Crime, Drama, Thriller High
Action, Adventure, Sci-Fi, Thriller Medium
Drama Low
Crime, Drama High
Comedy, Drama high
I used table()
for another column in my data, and it worked:
table(df$language, df$class_rentabilite)
The code above gives this:
Low Medium High NA
1 1 0 3
Aboriginal 0 0 2 0
Arabic 0 0 1 3
Aramaic 1 0 0 0
Bosnian 1 0 0 0
Cantonese 5 2 1 3
I want to use this approach for the sample data, but table()
doesn't work because I have multiple values in each row in genres
. How can I solve this situation?
Here is one approach for you. You split genres with separate_rows()
and create a temporary data frame. Then, you use table()
as you did.
library(dplyr)
library(tidyr)
mydf %>%
separate_rows(genres, sep = ", ") -> foo
table(foo$genres, foo$class_rentabilite)
# High Low Medium
# Action 1 0 1
# Adventure 0 0 1
# Comedy 1 0 0
# Crime 2 0 1
# Drama 3 1 1
# Sci-Fi 0 0 1
# Thriller 1 0 1
DATA
mydf <- structure(list(genres = c("Crime, Drama", "Action, Crime, Drama, Thriller",
"Action, Adventure, Sci-Fi, Thriller", "Drama", "Crime, Drama",
"Comedy, Drama"), class_rentabilite = c("Medium", "High", "Medium",
"Low", "High", "High")), .Names = c("genres", "class_rentabilite"
), row.names = c(NA, -6L), class = "data.frame")