I have a dataframe that looks like this:
example <- data.frame(
date = as.Date(c('2023-02-12', '2023-02-13', '2023-02-14',
'2023-02-12', '2023-02-13', '2023-02-14',
'2023-03-06', '2023-03-07', '2023-03-08',
'2023-03-07', '2023-03-08', '2023-03-09')),
PID = c(1091, 1091, 1091,
2091, 2091, 2091,
1137, 1137, 1137,
2137, 2137, 2137),
give = c('a', 'c', 'e',
'aa', 'cc', 'ee',
'aaa', 'ccc', 'eee',
'aaaa', 'cccc', 'eeee'),
receive = c('b', 'd', 'f',
'bb', 'dd', 'ff',
'bbb', 'ddd', 'fff',
'bbbb', 'dddd', 'ffff')
)
I want to regroup and relabel the columns in the dataframe such that this is the output:
example_solution <- data.frame(
date = as.Date(c('2023-02-12', '2023-02-13', '2023-02-14',
'2023-03-06', '2023-03-07', '2023-03-08', '2023-03-09')),
CID = c(91, 91, 91,
137, 137, 137, 137),
PID_A = c(1091, 1091, 1091,
1137, 1137, 1137, NA),
PID_B = c(2091, 2091, 2091,
NA, 2137, 2137, 2137),
give_A = c('a', 'c', 'e',
'aaa', 'ccc', 'eee', NA),
receive_B = c('bb', 'dd', 'ff',
NA, 'bbbb', 'dddd', 'ffff'),
give_B = c('aa', 'cc', 'ee',
NA, 'aaaa', 'cccc', 'eeee'),
receive_A = c('b', 'd', 'f',
'bbb', 'ddd', 'fff', NA)
)
I have some functional code so far that does most of the job:
example <- example %>%
group_by(CID = sub('.', '', PID), date) %>%
mutate(name = LETTERS[1:n()]) %>%
pivot_wider(values_from = PID:receive) %>%
relocate(receive_A, .after = give_B) %>%
relocate(receive_B, .after = give_A)
The only problem is that when there is a date where there is only an entry for the CID
with the PID
starting with 2 and not 1 (i.e. 2137 on date
2023-03-09 in this case), 2137 is grouped as PID_A
instead of PID_B
. ffff is also grouped under give_A
instead of receive_B
and eeee is grouped under receive_A
instead of give_B
.
Essentially, data for all PIDs that start with 2 should always be associated the column names whatevername_B
, even if there is no corresponding PID that start with 1 on the same date.
Does anyone know how I can get to the desired output?
The value of the column name
is defined using n()
, which evaluates to 1
in the case of the date 2023-03-09
since there is only one row inside of example
corresponding to this date. Hence, the suffix _A
gets assigned.
If the requirement is "data for all PIDs that start with 2
should always be associated the column names whatevername_B
" then we have to take this into account when defining name
. This can be achieved e.g. by replacing mutate(name = LETTERS[1:n()])
:
example %>%
group_by(CID = sub('.', '', PID), date) %>%
mutate(name = case_when(substr(PID, 1, 1) == "2" ~ "B",
TRUE ~ LETTERS[1:n()])) %>%
pivot_wider(values_from = PID:receive) %>%
relocate(receive_A, .after = give_B) %>%
relocate(receive_B, .after = give_A)
Then the output will look like:
# A tibble: 7 × 8
# Groups: CID, date [7]
date CID PID_A PID_B give_A receive_B give_B receive_A
<date> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 2023-02-12 091 1091 2091 a bb aa b
2 2023-02-13 091 1091 2091 c dd cc d
3 2023-02-14 091 1091 2091 e ff ee f
4 2023-03-06 137 1137 NA aaa NA NA bbb
5 2023-03-07 137 1137 2137 ccc bbbb aaaa ddd
6 2023-03-08 137 1137 2137 eee dddd cccc fff
7 2023-03-09 137 NA 2137 NA ffff eeee NA