rcategorical-datasurvival-analysisfrequency-analysis

R Frequency table with multiple choice question that allows more than one response


I am analyzing survey data, where people could choose more than one county in a question about where their organization is located. I am trying to create a frequency table that counts every time a county is chosen, whether or not they choose one or multiple counties.

Example of data:

df <- data.frame(org = c("org_1", "org_2", "org 3", "org 4"),
             county = c("A, B", "A, D", "B, C", "B"))

Here is the output I would like

output <- data.frame(county = c("A", "B", "C", "D"),
                 frequency = c(2, 3, 1, 1))

I've tried to use some of the standard frequency table options, such as table(df$county), but this counts "A, B", "A, D", and "B, C" each as unique values, rather than seeing "A", "B", "C", and "D" as individual values.


Solution

  • Use separate_rows to split the column and get the frequency with count

    library(tidyr)
    library(dplyr)
    df %>% 
      separate_rows(county) %>% 
      count(county, name = 'frequency')
    

    -output

    # A tibble: 4 × 2
      county frequency
      <chr>      <int>
    1 A              2
    2 B              3
    3 C              1
    4 D              1