rdataframefunctiondplyrfrequency

Tabulate a column in data.frame ignoring the order of elements in it in R


In my DATA below, I wonder how to table() the variable lang_comb ignoring the order?

For example, "english spanish french" and "french english spanish" are to be considered the same.

Is there a way to tabulate variable lang_comb this way in R?

library(tidyverse)

DATA <- read.table(header=T, text="
ID  lang1    lang2    lang3
1   spanish  english  NA
2   english  spanish  french
3   russian  english  NA
4   french   english  spanish
5   english  russian  NA
6   english  french   NA")


combs <- DATA %>% 
  mutate(lang_comb = paste(lang1, lang2, lang3, sep=","))

with(combs, table(lang_comb))


     english,french,NA     english,russian,NA english,spanish,french french,english,spanish     russian,english,NA 
                     1                      1                      1                      1                      1 
    spanish,english,NA 
                     1 

Solution

  • If you sort then collapse your data then order won't matter (because they'll be in the same order):

    library(dplyr)
    library(stringr)
    
    DATA |>
      rowwise() |>
      mutate(lang = str_flatten_comma(sort(c_across(starts_with("lang"))))) |>
      ungroup() |>
      count(lang)
    #   lang                         n
    #   <chr>                    <int>
    # 1 english, french              1
    # 2 english, french, spanish     2
    # 3 english, russian             2
    # 4 english, spanish             1
    

    Or if you want to use table() you can pipe the output of the mutate statement to pull(lang) |> table().