rreshapereshape2xlconnect

How to Reshape based on "checked" vs "unchecked"


I have a dataset which includes data structured similar to this:

ID       | Treatment=Induction Chemo | Treatment=Hypomethylating Chemo | Treatment=Consolidation Chemo
Patient1           Checked                       Unchecked                Unchecked
Patient2           Unchecked                     Checked                  Unchecked
Patient3           Unchecked                     Unchecked                Checked

How would I go about formatting this data to make it look more like this?

ID          Treatment
Patient1  Induction Chemo     
Patient2  Hypomethylating Chemo        
Patient3  Consolidation Chemo       

I'd like to automate this using R, is it at all possible? I'm not sure if the reshape package has these capabilities. If all else fails I'm willing to manually edit the headers to remove "Treatment=" from each one, but I'd rather do it all automatically. Thank you!


Solution

  • You can try this, However, as a caveat I am assuming that you don't have duplicate value of checked in a particular column. If that is the case this should work.

    Assuming df is your input data.frame.

    df1 <- df
    df1$Final_col <- do.call("paste0",data.frame(sapply(names(df), function(x)ifelse(df[,x] == "Checked", gsub("Treatment=","",x), '')), stringsAsFactors=F))
    

    Logic:

    Using ifelse in sapply with condition == "Checked" on all the columns of df, then replacing the "Treatment=" in the names using gsub, with nothing on those columns, such that the value after ("Treatment=") will only remain as text can be obtained, wherever there is a successful ifelse value as TRUE, we shall replace those values with the obtained value by gsub. Finally pasting all the results using do.call paste functionality to get only one column.

    Data:

    df <- structure(list(ID = c("Patient1", "Patient2", "Patient3"), `Treatment=Induction Chemo` = c("Checked", 
    "Unchecked", "Unchecked"), `Treatment=Hypomethylating Chemo` = c("Unchecked", 
    "Checked", "Unchecked"), `Treatment=Consolidation Chemo` = c("Unchecked", 
    "Unchecked", "Checked")), .Names = c("ID", "Treatment=Induction Chemo", 
    "Treatment=Hypomethylating Chemo", "Treatment=Consolidation Chemo"
    ), class = "data.frame", row.names = c(NA, -3L))
    

    Output:

    You can check the Final_colin the answer output, you may drop other columns, I have kept them so that you can compare input and output.

    > df1
            ID Treatment=Induction Chemo Treatment=Hypomethylating Chemo
    1 Patient1                   Checked                       Unchecked
    2 Patient2                 Unchecked                         Checked
    3 Patient3                 Unchecked                       Unchecked
      Treatment=Consolidation Chemo             Final_col
    1                     Unchecked       Induction Chemo
    2                     Unchecked Hypomethylating Chemo
    3                       Checked   Consolidation Chemo