I have this data frame,
set.seed(124)
id <- rnorm(5, mean = 100, sd = 59)
charVar <- c("Eeny (2), meeny (10), miny (21), moe (1)",
"Catch (112), a (2), tiger (33), by (44), the (2), toe (24)",
NA,
"If (2), he (33), hollers (15), let (66), him (1), go (55)",
"Eeny (224), meeny (44), miny (50), moe (76)")
df <- data.frame(id, charVar)
df
> df
id charVar
1 18.28083 Eeny (2), meeny (10), miny (21), moe (1)
2 102.26107 Catch (112), a (2), tiger (33), by (44), the (2), toe (24)
3 54.98122 <NA>
4 112.52606 If (2), he (33), hollers (15), let (66), him (1), go (55)
5 184.10674 Eeny (224), meeny (44), miny (50), moe (76)
I want to sort every element in the rows by the numbers with it. The expected output should look like this:
> df
id charVar
1 18.28083 miny (21), meeny (10), Eeny (2), moe (1)
2 102.26107 Catch (112), by (44), tiger (33), toe (24), a (2), the (2)
3 54.98122 <NA>
4 112.52606 let (66), go (55), he (33), hollers (15), If (2), him (1)
5 184.10674 Eeny (224), moe (76), miny (50), meeny (44)
Any idea how to achive the expected result? Any help would be greatly appreciated.
As you've tagged tidyverse and data.table here are approaches using both.
tidyverse
approachEssentially we strsplit()
charVar
into a list-column where each element is a character vector, tidyr::unnest()
into long form, extract the numbers, then dplyr::summarise()
back into one row per id, where we paste()
back together the values in decreasing order()
:
library(dplyr)
df |>
mutate(charVar = strsplit(charVar, ", ")) |>
tidyr::unnest(charVar) |>
mutate(n = as.integer(gsub("\\D+", "", charVar))) |>
summarise(
charVar = paste(charVar[order(-n)], collapse = ", "),
.by = id
)
# id charVar
# 1 18.28083 miny (21), meeny (10), Eeny (2), moe (1)
# 2 102.26107 Catch (112), by (44), tiger (33), toe (24), a (2), the (2)
# 3 54.98122 NA
# 4 112.52606 let (66), go (55), he (33), hollers (15), If (2), him (1)
# 5 184.10674 Eeny (224), moe (76), miny (50), meeny (44)
data.table
approachThere is no equivalent of tidyr::unnest()
. While the same results can be achieved with unlisting, here's an approach which feels more idiomatic, which modifies charVar
in place:
library(data.table)
setDT(df)
df[, charVar := lapply(charVar, \(x) {
parts <- unlist(strsplit(x, ", "))
n <- as.integer(gsub("\\D+", "", parts))
paste(parts[order(-n)], collapse = ", ")
})]
# id charVar
# <num> <list>
# 1: 18.28083 miny (21), meeny (10), Eeny (2), moe (1)
# 2: 102.26107 Catch (112), by (44), tiger (33), toe (24), a (2), the (2)
# 3: 54.98122 NA
# 4: 112.52606 let (66), go (55), he (33), hollers (15), If (2), him (1)
# 5: 184.10674 Eeny (224), moe (76), miny (50), meeny (44)