In my DATA
below, I wonder how to table()
the variable lang_comb
ignoring the order?
For example, "english spanish french"
and "french english spanish"
are to be considered the same.
Is there a way to tabulate variable lang_comb
this way in R?
library(tidyverse)
DATA <- read.table(header=T, text="
ID lang1 lang2 lang3
1 spanish english NA
2 english spanish french
3 russian english NA
4 french english spanish
5 english russian NA
6 english french NA")
combs <- DATA %>%
mutate(lang_comb = paste(lang1, lang2, lang3, sep=","))
with(combs, table(lang_comb))
english,french,NA english,russian,NA english,spanish,french french,english,spanish russian,english,NA
1 1 1 1 1
spanish,english,NA
1
If you sort then collapse your data then order won't matter (because they'll be in the same order):
library(dplyr)
library(stringr)
DATA |>
rowwise() |>
mutate(lang = str_flatten_comma(sort(c_across(starts_with("lang"))))) |>
ungroup() |>
count(lang)
# lang n
# <chr> <int>
# 1 english, french 1
# 2 english, french, spanish 2
# 3 english, russian 2
# 4 english, spanish 1
Or if you want to use table()
you can pipe the output of the mutate statement to pull(lang) |> table()
.