I have a dataframe that looks like this:
d = data.frame(obs = 1:4, comp = c("7_10","5","6_9","1_2_6_9"))
obs comp
1 1 7_10
2 2 5
3 3 6_9
4 4 1_2_6_9
The comp column contains information about the elements that can be found in each observation (row). So, the first observation contains the elements 7 and 10, the second, element 5 and so on. Each observation can contain one, two or four out of the possible 10 elements.
I can separate these into distinct columns:
library(tidyverse)
d %>%
separate_wider_delim(comp,
delim = "_",
names = c("e1", "e2", "e3", "e4"),
too_few = "align_start")
obs e1 e2 e3 e4
1 1 7 10 NA NA
2 2 5 NA NA NA
3 3 6 9 NA NA
4 4 1 2 6 9
However, instead of NAs, I need the information to repeat, like this:
obs e1 e2 e3 e4
1 1 7 10 7 10 # elements 7 and 10 repeat
2 2 5 5 5 5 # element 5 repeats
3 3 6 9 6 9
4 4 1 2 6 9
I know how to use pivot_longer
and fill
to fill in NAs with the preceding value:
d %>%
separate_wider_delim(comp,
delim = "_",
names = c("e1", "e2", "e3", "e4"),
too_few = "align_start") %>%
pivot_longer(2:5,
names_to = "e",
values_to = "code") %>%
fill(code, .direction = "down") %>%
pivot_wider(names_from = e,
values_from = code)
obs e1 e2 e3 e4
1 1 7 10 10 10
2 2 5 5 5 5
3 3 6 9 9 9
4 4 1 2 6 9
Is there a way to instead fill NAs with the set of preceding values, instead of just the last value?
If you know that comp sequences are either 1, 2, or 4 elements (if 1, then repeat 4 times, if 2 repeat twice, if 4 don't repeat), then you can use rep
and indicate a target length for each obs
of 4. The row_number
is added to show the data in wide format as in the example output above.
library(tidyverse)
d %>%
separate_longer_delim(comp, delim = "_") %>%
reframe(value = rep_len(comp, length.out = 4), .by = obs) %>%
mutate(name = row_number(), .by = obs) %>%
pivot_wider(id_cols = obs, names_from = name, values_from = value, names_prefix = "e")
Output
obs e1 e2 e3 e4
<int> <chr> <chr> <chr> <chr>
1 1 7 10 7 10
2 2 5 5 5 5
3 3 6 9 6 9
4 4 1 2 6 9