I have a data frame with a Genre
column that has rows like Action,Romance
. I want to split those values and create a binary vector. If Action,Romance,Drama
are all the possible genres, then the above mentioned row would be 1,1,0
in the output data frame.
I found this and this SO posts, and this CRAN doc covering cSplit_e, but when I use it I'm not getting a binary dataframe output, I'm getting the original data frame with a few values scrambled.
a = cSplit_e(df4, "Genre", sep = ",", mode = "binary", type = "character", drop=TRUE, fixed=TRUE,fill = 0)
Edit: The issue appears to be that it's adding the new columns to the old data frame, instead of creating a new frame. How can I get the Genres into their own frame?
> names(a)
[1] "Title" "Year" "Rated" "Released" "Runtime" "Genre" "Director" "Writer" "Actors"
[10] "Plot" "Language" "Country" "Awards" "Poster" "Metascore" "imdbRating" "imdbVotes" "imdbID"
[19] "Type" "tomatoMeter" "tomatoImage" "tomatoRating" "tomatoReviews" "tomatoFresh" "tomatoRotten" "tomatoConsensus" "tomatoUserMeter"
[28] "tomatoUserRating" "tomatoUserReviews" "tomatoURL" "DVD" "BoxOffice" "Production" "Website" "Response" "Budget"
[37] "Domestic_Gross" "Gross" "Date" "Genre_Action" "Genre_Adult" "Genre_Adventure" "Genre_Animation" "Genre_Biography" "Genre_Comedy"
[46] "Genre_Crime" "Genre_Documentary" "Genre_Drama" "Genre_Family" "Genre_Fantasy" "Genre_Film-Noir" "Genre_Game-Show" "Genre_History" "Genre_Horror"
[55] "Genre_Music" "Genre_Musical" "Genre_Mystery" "Genre_N/A" "Genre_News" "Genre_Reality-TV" "Genre_Romance" "Genre_Sci-Fi" "Genre_Short"
[64] "Genre_Sport" "Genre_Talk-Show" "Genre_Thriller" "Genre_War" "Genre_Western"
The drop
argument only applies to the column being split, not all of the other columns in the data.frame
. Thus, to subsequently extract just the split columns, use the original column name and extract just those columns.
Example:
> a <- cSplit_e(df4, "Genre", ",", mode = "binary", type = "character", fill = 0, drop = TRUE)
> a
id Genre_Action Genre_Drama Genre_Romance
1 1 1 0 1
2 2 1 1 1
> a[startsWith(names(a), "Genre")]
Genre_Action Genre_Drama Genre_Romance
1 1 0 1
2 1 1 1
Where:
df4 <- structure(list(Genre = c("Action,Romance", "Action,Romance,Drama"), id = 1:2),
.Names = c("Genre", "id"), row.names = 1:2, class = "data.frame")