rsplitstackshapecsplit

cSplit_e not returning a binary data frame


I have a data frame with a Genre column that has rows like Action,Romance. I want to split those values and create a binary vector. If Action,Romance,Drama are all the possible genres, then the above mentioned row would be 1,1,0 in the output data frame.

I found this and this SO posts, and this CRAN doc covering cSplit_e, but when I use it I'm not getting a binary dataframe output, I'm getting the original data frame with a few values scrambled.

a = cSplit_e(df4, "Genre", sep = ",", mode = "binary", type = "character", drop=TRUE, fixed=TRUE,fill = 0)

Edit: The issue appears to be that it's adding the new columns to the old data frame, instead of creating a new frame. How can I get the Genres into their own frame?

> names(a)
 [1] "Title"             "Year"              "Rated"             "Released"          "Runtime"           "Genre"             "Director"          "Writer"            "Actors"           
[10] "Plot"              "Language"          "Country"           "Awards"            "Poster"            "Metascore"         "imdbRating"        "imdbVotes"         "imdbID"           
[19] "Type"              "tomatoMeter"       "tomatoImage"       "tomatoRating"      "tomatoReviews"     "tomatoFresh"       "tomatoRotten"      "tomatoConsensus"   "tomatoUserMeter"  
[28] "tomatoUserRating"  "tomatoUserReviews" "tomatoURL"         "DVD"               "BoxOffice"         "Production"        "Website"           "Response"          "Budget"           
[37] "Domestic_Gross"    "Gross"             "Date"              "Genre_Action"      "Genre_Adult"       "Genre_Adventure"   "Genre_Animation"   "Genre_Biography"   "Genre_Comedy"     
[46] "Genre_Crime"       "Genre_Documentary" "Genre_Drama"       "Genre_Family"      "Genre_Fantasy"     "Genre_Film-Noir"   "Genre_Game-Show"   "Genre_History"     "Genre_Horror"     
[55] "Genre_Music"       "Genre_Musical"     "Genre_Mystery"     "Genre_N/A"         "Genre_News"        "Genre_Reality-TV"  "Genre_Romance"     "Genre_Sci-Fi"      "Genre_Short"      
[64] "Genre_Sport"       "Genre_Talk-Show"   "Genre_Thriller"    "Genre_War"         "Genre_Western"    

Solution

  • The drop argument only applies to the column being split, not all of the other columns in the data.frame. Thus, to subsequently extract just the split columns, use the original column name and extract just those columns.

    Example:

    > a <- cSplit_e(df4, "Genre", ",", mode = "binary", type = "character", fill = 0, drop = TRUE)
    > a
      id Genre_Action Genre_Drama Genre_Romance
    1  1            1           0             1
    2  2            1           1             1
    > a[startsWith(names(a), "Genre")]
      Genre_Action Genre_Drama Genre_Romance
    1            1           0             1
    2            1           1             1
    

    Where:

    df4 <- structure(list(Genre = c("Action,Romance", "Action,Romance,Drama"), id = 1:2), 
      .Names = c("Genre", "id"), row.names = 1:2, class = "data.frame")