I have a data frame (sample below), as follows:
df = structure(list(Stage1yBefore = c("3.1", "1", "4", "2", "NA"),
Stage2yBefore = c("NA", "2", "3.2", "2", "NA"), ClinicalActivity1yBefore =
c(TRUE,
TRUE, TRUE, TRUE, FALSE), ClinicalActivity2yBefore = c(FALSE,
TRUE, TRUE, TRUE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
I would like to convert it to a long format using dplyr, but for some reason get an error.
The output should look like (converting the first row of df):
Output = data_frame(TimeFrame = c("1y", "2y"), Stage = c(3, NA), Clinical =
c(T, F))
So that each row of df becomes 2 rows in the output.
What I tried doesnt work (and I'm actually not sure exactly how to do this):
Output = gather(df, TimeFrame, Stage, Clinical, Stage1yBefore:ClinicalActivity2yBefore)
I get:
Error in .f(.x[[i]],...): Object 'Clinical' not found.
Any ideas?
library(dplyr)
library(stringr)
library(tidyr)
df %>% rownames_to_column() %>%
gather(TimeFrame, Stage, Stage1yBefore:ClinicalActivity2yBefore) %>%
#From TimeFrame extract a digit followed by y, also Stage or Clinical
mutate(Time=str_extract(TimeFrame,'\\dy'), Key=str_extract(TimeFrame,'Stage|Clinical')) %>%
dplyr::select(-TimeFrame) %>%
spread(Key,Stage)
# A tibble: 10 x 4
rowname Time Clinical Stage
<chr> <chr> <chr> <chr>
1 1 1y TRUE 3.1
2 1 2y FALSE NA
3 2 1y TRUE 1
4 2 2y TRUE 2
5 3 1y TRUE 4
6 3 2y TRUE 3.2
7 4 1y TRUE 2
8 4 2y TRUE 2
9 5 1y FALSE NA
10 5 2y FALSE NA