I have a dataset which includes data structured similar to this:
ID | Treatment=Induction Chemo | Treatment=Hypomethylating Chemo | Treatment=Consolidation Chemo
Patient1 Checked Unchecked Unchecked
Patient2 Unchecked Checked Unchecked
Patient3 Unchecked Unchecked Checked
How would I go about formatting this data to make it look more like this?
ID Treatment
Patient1 Induction Chemo
Patient2 Hypomethylating Chemo
Patient3 Consolidation Chemo
I'd like to automate this using R, is it at all possible? I'm not sure if the reshape package has these capabilities. If all else fails I'm willing to manually edit the headers to remove "Treatment=" from each one, but I'd rather do it all automatically. Thank you!
You can try this, However, as a caveat I am assuming that you don't have duplicate value of checked in a particular column. If that is the case this should work.
Assuming df is your input data.frame.
df1 <- df
df1$Final_col <- do.call("paste0",data.frame(sapply(names(df), function(x)ifelse(df[,x] == "Checked", gsub("Treatment=","",x), '')), stringsAsFactors=F))
Logic:
Using ifelse
in sapply
with condition == "Checked" on all the columns of df, then replacing the "Treatment=" in the names using gsub
, with nothing on those columns, such that the value after ("Treatment=") will only remain as text can be obtained, wherever there is a successful ifelse
value as TRUE, we shall replace those values with the obtained value by gsub
. Finally pasting all the results using do.call
paste functionality to get only one column.
Data:
df <- structure(list(ID = c("Patient1", "Patient2", "Patient3"), `Treatment=Induction Chemo` = c("Checked",
"Unchecked", "Unchecked"), `Treatment=Hypomethylating Chemo` = c("Unchecked",
"Checked", "Unchecked"), `Treatment=Consolidation Chemo` = c("Unchecked",
"Unchecked", "Checked")), .Names = c("ID", "Treatment=Induction Chemo",
"Treatment=Hypomethylating Chemo", "Treatment=Consolidation Chemo"
), class = "data.frame", row.names = c(NA, -3L))
Output:
You can check the Final_col
in the answer output, you may drop other columns, I have kept them so that you can compare input and output.
> df1
ID Treatment=Induction Chemo Treatment=Hypomethylating Chemo
1 Patient1 Checked Unchecked
2 Patient2 Unchecked Checked
3 Patient3 Unchecked Unchecked
Treatment=Consolidation Chemo Final_col
1 Unchecked Induction Chemo
2 Unchecked Hypomethylating Chemo
3 Checked Consolidation Chemo