[SOLVED] Finding maximum difference between columns of same name in R

Finding maximum difference between columns of same name in R

I have the following table in R. I have 2 A columns, 3 B columns and 1 C column. I need to calculate the maximum difference possible between any columns of the same name and return the column name as output.

For row 1

The max difference between A is 2
The max difference between B is 4
I need the output as B

For row 2

The max difference between A is 3
The max difference between B is 2
I need the output as A

| A  | A | B | B | B | C |
| 2  | 4 |5  |2  |1  |0  |
| -3 |0  |2  |3  |4  |2  |

Solution

First of all, it's a bit dangerous (and not allowed in some cases) to have non-unique column names, so the first thing I did was to uniqueify the names using base::make.unique(). From there, I used tidyr::pivot_longer() so that the grouping information contained in the column names could be accessed more easily. Here I use a regex inside names_pattern to discard the differentiating parts of the column names so they will be the same again. Then we use dplyr::group_by() followed by dplyr::summarize() to get the largest difference in each id and grp which corresponds to your rows and similar columns in the original data. Finally we use dplyr::slice_max() to return only the largest difference per group.

library(tidyverse)

d <- structure(list(A = c(2L, -3L), A = c(4L, 0L), B = c(5L, 2L), B = 2:3, B = c(1L, 4L), C = c(0L, 2L)), row.names = c(NA, -2L), class = "data.frame")

# give unique names
names(d) <- make.unique(names(d), sep = "_")

d %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = "grp", names_pattern = "([A-Z])*") %>% 
  group_by(id, grp) %>% 
  summarise(max_diff = max(value) - min(value)) %>% 
  slice_max(order_by = max_diff, n = 1, with_ties = F)

#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 2 x 3
#> # Groups:   id [2]
#>      id grp   max_diff
#>   <int> <chr>    <int>
#> 1     1 B            4
#> 2     2 A            3

^{Created on 2022-02-14 by the reprex package (v2.0.1)}