I'm a beginner at R and tidyverse. I'm looking for a fast solution using
mutate(across(c(col1, col2, ...),
~ create factors with levels sorted by alphabetical order)))
the base function factor() doesn't seem to be effective, for a reason I haven't found yet ( maybe it is because it works in a rowwise way?), I know there's the fct() function too. But I still didn't find the architecture. I then want to relocate the level "Nc" using fct_relevel(.x, "Nc", after = Inf) (amazingly that part worked).
Let's say this is what my df looks like:
| Row | Col1 | Col2 |
| --- | ------------------------ | -------------------- |
| 1 | PROFITEROLE | STRAWBERRY SHORTCAKE |
| 2 | APPLE TART | CHURRO |
| 3 | BLACK FOREST CAKE | CREAM PUFF |
| 4 | TATIN PIE 2 | ECLAIR |
| 5 | CHOCOLATE + VANILLA CAKE | MILLEFEUILLE |
| 6 | CROISSANT | Nc |
| 7 | Nc | LEMON MERINGUE PIE |
| 8 | OPERA CAKE | RUM BABA |
| 9 | KARDINAL SLICE | CANNOLI |
| 10 | MADELEINE | PARIS-BREST |
the code to obtain it is :
structure(
list(
Col1 = c(
"PROFITEROLE",
"APPLE TART",
"BLACK FOREST CAKE",
"TATIN PIE 2",
"CHOCOLATE + VANILLA CAKE",
"CROISSANT",
"Nc",
"KARDINAL SLICE",
"MADELEINE",
"OPERA CAKE"
),
Col2 = c(
"STRAWBERRY SHORTCAKE",
"CHURRO",
"CREAM PUFF",
"ÉCLAIR",
"MILLEFEUILLE",
"Nc",
"LEMON MERINGUE PIE",
"RUM BABA",
"CANNOLI",
"PARIS-BREST"
)
),
class = "data.frame",
row.names = c(NA, -10L)
)
At first
levels(df$Col1) returns NULL
levels(df$Col2) returns NULL
What I'm wishing for is :
# For Col1
levels(df$Col1)
returns
"APPLE TART",
"BLACK FOREST CAKE",
"CHOCOLATE + VANILLA CAKE",
"CROISSANT",
"KARDINAL SLICE",
"MADELEINE",
"OPERA CAKE",
"PROFITEROLE",
"TATIN PIE 2",
"Nc"
)
# For Col2
levels(df$Col2)
returns
"CANNOLI",
"CHURRO",
"CREAM PUFF",
"ECLAIR",
"LEMON MERINGUE PIE",
"MILLEFEUILLE",
"PARIS-BREST",
"RUM BABA",
"STRAWBERRY SHORTCAKE",
"Nc"
)
...
I tried:
df <- df %>%
mutate(
across(
c(
Column_A,
Column_B
),
~ fct(.x, levels = stri_sort(unique(as.character(.x))))
)
) %>%
mutate(
across(
c(
Column_A,
Column_B
),
~ fct_relevel(.x, "Nc", after = Inf)
)
)
I did it again but without the second pipe. It didn't show any error in both cases. the level order was in order of appearance and not alphabetical, except for the "Nc" level going nicely to the last position. What is your daily practice like? What are your suggestions?
P.S when I do for each col:
lvls_col <- df$col %>%
as.character()%>%
unique()%>%
stri_sort(locale = "fr_FR")
df <- df %>%
mutate(col = fct(col, levels = lvls_col))
df <- df %>%
mutate(col = fct_relevel(col, "Nc", after = Inf))
That works just fine, but that's long (Yes I'm French, but there are no accents in my df and the cols have a uniform cap for each one of them)
first experiment revealed considering 1 letter isn't enough.
You could use fct_reorder()
here to specify that the level order should match itself, sorted. The mutate
step here turns each variable into a factor, sorts it based on its natural sorting before it was a factor (ie alphabetical), and then puts Nc
last.
library(dplyr); library(forcats)
df2 <- df |>
mutate(across(Col1:Col2, ~fct(.x) |>
fct_reorder(.x) |>
fct_relevel("Nc", after = Inf)))
levels(df2$Col1)
# [1] "APPLE TART" "BLACK FOREST CAKE" "CHOCOLATE + VANILLA CAKE"
# [4] "CROISSANT" "KARDINAL SLICE" "MADELEINE"
# [7] "OPERA CAKE" "PROFITEROLE" "TATIN PIE 2"
#[10] "Nc"
levels(df2$Col2)
# [1] "CANNOLI" "CHURRO" "CREAM PUFF"
# [4] "ÉCLAIR" "LEMON MERINGUE PIE" "MILLEFEUILLE"
# [7] "PARIS-BREST" "RUM BABA" "STRAWBERRY SHORTCAKE"
#[10] "Nc"