I am the beginner of R programming. I have to group the dataframe in R by the condition such as: it's all odd number of column names need to grouped based on it's corresponding values present in the even number of columns. I have attached the example images.
I have used the following R script which I got from chatGPT, however it is not giving accurate result. Though it groups the values correctly, it returns more number of rows and the order of columns also changing in the output result.
library(dplyr)
library(purrr)
library(tidyr)
group_odd_by_even <- function(df) {
# Get the column names
col_names <- colnames(df)
# Identify odd and even indexed columns
odd_cols <- col_names[seq(1, length(col_names), by = 2)]
even_cols <- col_names[seq(2, length(col_names), by = 2)]
# Initialize an empty list to store grouped data
grouped_list <- map2(odd_cols, even_cols, ~ {
# Group odd columns by their respective even columns
df %>%
select(!!sym(.x), !!sym(.y)) %>%
group_by(!!sym(.y)) %>%
summarise(!!sym(.x) := paste(!!sym(.x), collapse = ", ")) %>%
rename(!!sym(.y) := !!sym(.y))
})
# Reduce all grouped dataframes into a single dataframe by joining on even columns
grouped_df <- reduce(grouped_list, ~full_join(.x, .y, by = intersect(colnames(.x), colnames(.y))))
return(grouped_df)
}
The example input data,
structure(list(col1 = c("A", "B", "C", "D"), col2 = c(1, 2, 3,
4), col3 = c("A", "B", "C", "D"), col4 = c(2, 3, 2, 3), col5 = c("A",
"B", "C", "D"), col6 = c(1, 2, 3, 1), col7 = c("A", "B", "C",
"D"), col8 = c(1, 1, 1, 1), col9 = c("A", "B", "C", "D"), col10 = c(1,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
Please help me to write the generalized function for that. Thank you in advance.
I'd be curious to see any approaches that are simpler and shorter. Here's an approach using reshaping.
First I add row numbers for tracking, and reshape longer. From here, we can add variables to track which pair of columns we're in, and which type of column (value or group) we're in.
Then we can reshape wide again so each row has a grouping column and a value column. We can use summarize to concatenate the values within each group in each column pair.
And finally reshape wide again.
library(tidyverse)
df |>
mutate(row = row_number()) |>
pivot_longer(-row, values_transform = as.character) |>
mutate(pair_num = (row_number() + 1) %/% 2,
type = if_else(row_number() %% 2 == 1, "val", "grp"), .by = row) |>
select(-name) |>
pivot_wider(names_from = type, values_from = value) |>
summarize(vals = paste0(val, collapse = ", "),
.by = c(pair_num, grp)) |>
mutate(row = row_number(), .by = pair_num) |>
pivot_wider(names_from = pair_num, values_from = c(vals, grp), names_vary = "slowest") |>
select(-row) |>
`colnames<-`(colnames(df)) # apply original column names
Result (on provided input data, which varies from pictures in OP)
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A 1 A, C 2 A, D 1 A, B, C, D 1 A, B, C, D 1
2 B 2 B, D 3 B 2 NA NA NA NA
3 C 3 NA NA C 3 NA NA NA NA
4 D 4 NA NA NA NA NA NA NA NA