rdata.tabletidyverse

Rearrage string elements by the number attached with it in rows


I have this data frame,

set.seed(124)
id <- rnorm(5, mean = 100, sd = 59)
charVar <- c("Eeny (2), meeny (10), miny (21), moe (1)",
  "Catch (112), a (2), tiger (33), by (44), the (2), toe (24)",
  NA,
  "If (2), he (33), hollers (15), let (66), him (1), go (55)",
  "Eeny (224), meeny (44), miny (50), moe (76)")
df <- data.frame(id, charVar)
df
> df
         id                                                    charVar
1  18.28083                   Eeny (2), meeny (10), miny (21), moe (1)
2 102.26107 Catch (112), a (2), tiger (33), by (44), the (2), toe (24)
3  54.98122                                                       <NA>
4 112.52606  If (2), he (33), hollers (15), let (66), him (1), go (55)
5 184.10674                Eeny (224), meeny (44), miny (50), moe (76)

I want to sort every element in the rows by the numbers with it. The expected output should look like this:

> df
         id                                                     charVar
1  18.28083                    miny (21), meeny (10), Eeny (2), moe (1)
2 102.26107 Catch (112), by (44), tiger (33), toe (24), a (2),  the (2)
3  54.98122                                                        <NA>
4 112.52606   let (66), go (55), he (33), hollers (15), If (2), him (1)
5 184.10674                 Eeny (224), moe (76), miny (50), meeny (44)

Any idea how to achive the expected result? Any help would be greatly appreciated.


Solution

  • As you've tagged and here are approaches using both.

    tidyverse approach

    Essentially we strsplit() charVar into a list-column where each element is a character vector, tidyr::unnest() into long form, extract the numbers, then dplyr::summarise() back into one row per id, where we paste() back together the values in decreasing order():

    library(dplyr)
    df |>
        mutate(charVar = strsplit(charVar, ", ")) |>
        tidyr::unnest(charVar) |>
        mutate(n = as.integer(gsub("\\D+", "", charVar))) |>
        summarise(
            charVar = paste(charVar[order(-n)], collapse = ", "),
            .by = id
        )
    
    
    #          id                                                    charVar
    # 1  18.28083                   miny (21), meeny (10), Eeny (2), moe (1)
    # 2 102.26107 Catch (112), by (44), tiger (33), toe (24), a (2), the (2)
    # 3  54.98122                                                         NA
    # 4 112.52606  let (66), go (55), he (33), hollers (15), If (2), him (1)
    # 5 184.10674                Eeny (224), moe (76), miny (50), meeny (44)
    

    data.table approach

    There is no equivalent of tidyr::unnest(). While the same results can be achieved with unlisting, here's an approach which feels more idiomatic, which modifies charVar in place:

    library(data.table)
    setDT(df)
    
    df[, charVar := lapply(charVar, \(x) {
        parts <- unlist(strsplit(x, ", "))
        n <- as.integer(gsub("\\D+", "", parts))
        paste(parts[order(-n)], collapse = ", ")
    })]
    
    #           id                                                    charVar
    #        <num>                                                     <list>
    # 1:  18.28083                   miny (21), meeny (10), Eeny (2), moe (1)
    # 2: 102.26107 Catch (112), by (44), tiger (33), toe (24), a (2), the (2)
    # 3:  54.98122                                                         NA
    # 4: 112.52606  let (66), go (55), he (33), hollers (15), If (2), him (1)
    # 5: 184.10674                Eeny (224), moe (76), miny (50), meeny (44)