I am learning R and attempting to figure out splitting a column. I am looking to spread my data from a single column in wide format. I was told to use dcast, but i haven't figured out the best way and was going to try to pipe it through tidyverse.
# sample data
> data <- data.frame(trimesterPeriod = c(first, second, third, PP, third, second, PP, first )
# dataframe
trimesterPeriod
1 first
2 second
3 third
4 PP
5 third
6 second
7 PP
8 first
and i would it to look like this:
#dataframe
ID first second third PP
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 0 1 0
6 0 1 0 0
7 0 0 0 1
8 1 0 0 0
i know that i will have to change the trimesterPeriod data from a character, but from then i'm not sure where to go. i was thinking to do:
data.frame %>%
mutate(rn = row_number(first, second, third, PP)) %>%
spread(trimesterPeriod) %>%
select(-rn)
but i'm not sure. any suggestions are greatly appreciated!
We could use table
from base R
table(seq_len(nrow(data)), data$trimesterPeriod)
-output
first PP second third
1 1 0 0 0
2 0 0 1 0
3 0 0 0 1
4 0 1 0 0
5 0 0 0 1
6 0 0 1 0
7 0 1 0 0
8 1 0 0 0
Or using tidyverse
library(dplyr)
library(tidyr)
data %>%
mutate(ID = row_number()) %>%
pivot_wider(names_from = trimesterPeriod,
values_from = trimesterPeriod, values_fn = length,
values_fill = 0)
-output
# A tibble: 8 × 5
ID first second third PP
<int> <int> <int> <int> <int>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 4 0 0 0 1
5 5 0 0 1 0
6 6 0 1 0 0
7 7 0 0 0 1
8 8 1 0 0 0
data <- structure(list(trimesterPeriod = c("first", "second", "third",
"PP", "third", "second", "PP", "first")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))