I have data frame called sample_df
as below:
sample_df <- data.frame(
date = c("2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27", "2023-02-27"),
weight = c(121.56, 459.25, 114.02, 459.25, 298.0, 56.78, 446.85, 215.05, 459.25, 114.02, 579.84, 189.96, 84.4, 446.85, 215.05, 298.0, 56.78, 459.25),
marker = c("weight_log", "weight_log", "weight_log", "weight_log", "weight_log", "weight_log", "weight_log", "weight_log", "weight_bc", "weight_bc", "weight_bc", "weight_bc", "weight_bc", "weight_bc", "weight_bc", "weight_bc", "weight_bc", "weight_bc")
)
date weight marker
1 2023-02-27 121.56 weight_log
2 2023-02-27 459.25 weight_log
3 2023-02-27 114.02 weight_log
4 2023-02-27 459.25 weight_log
5 2023-02-27 298.00 weight_log
6 2023-02-27 56.78 weight_log
7 2023-02-27 446.85 weight_log
8 2023-02-27 215.05 weight_log
9 2023-02-27 459.25 weight_bc
10 2023-02-27 114.02 weight_bc
11 2023-02-27 579.84 weight_bc
12 2023-02-27 189.96 weight_bc
13 2023-02-27 84.40 weight_bc
14 2023-02-27 446.85 weight_bc
15 2023-02-27 215.05 weight_bc
16 2023-02-27 298.00 weight_bc
17 2023-02-27 56.78 weight_bc
18 2023-02-27 459.25 weight_bc
I want to convert sample_df to become result_df
as below:
date weight_log weight_bc
1 2023-02-27 459.25 459.25
2 2023-02-27 459.25 459.25
3 2023-02-27 121.56 NA
4 2023-02-27 114.02 114.02
5 2023-02-27 298.00 298.00
6 2023-02-27 56.78 56.78
7 2023-02-27 446.85 446.85
8 2023-02-27 215.05 215.05
9 2023-02-27 NA 579.84
10 2023-02-27 NA 189.96
11 2023-02-27 NA 84.40
I've tried to use reshape2::dcast
, but the result is not like what I expected.
sample_df %>% reshape2::dcast(date + weight ~ marker,value.var = "weight") %>% select(-weight)
Aggregation function missing: defaulting to length
date weight_bc weight_log
1 2023-02-27 1 1
2 2023-02-27 1 0
3 2023-02-27 1 1
4 2023-02-27 0 1
5 2023-02-27 1 0
6 2023-02-27 1 1
7 2023-02-27 1 1
8 2023-02-27 1 1
9 2023-02-27 2 2
10 2023-02-27 1 0
I guess it's because there's duplicate weight value.
Is there another way to convert sample_df
to become result_df
?
One way is to first differentiate the duplicated weight values per marker and then add a group id that groups the similar weights
library(dplyr)
library(tidyr)
sample_df %>%
mutate(rn = row_number(), .by = c(marker, weight)) %>%
mutate(grp = cur_group_id(), .by = weight) %>%
pivot_wider(names_from = marker, values_from = weight) %>%
select(-c(rn, grp))
# A tibble: 11 × 3
date weight_log weight_bc
<chr> <dbl> <dbl>
1 2023-02-27 122. NA
2 2023-02-27 459. 459.
3 2023-02-27 114. 114.
4 2023-02-27 459. 459.
5 2023-02-27 298 298
6 2023-02-27 56.8 56.8
7 2023-02-27 447. 447.
8 2023-02-27 215. 215.
9 2023-02-27 NA 580.
10 2023-02-27 NA 190.
11 2023-02-27 NA 84.4