r

Relative frequencies of multiple variables in R


apologise for the very basic question but I am trying to calculate relative frequencies (%) of variables within a column grouped by another column. Example data:

df <- data.frame(Case = c("A","B","B","B","A","B","A","A","B","B","B"),
                          Status = c("Y","Y","Y","N","Y","ND","ND","N","Y","N","ND"))

I know how to calculate the absolute frequencies, so that the data will look like this

  Case N ND Y
1    A 1  1 2
2    B 2  2 3

This I can achieve, for instance with

newdf <- dcast(df, Case ~ Status,
               value.var = "Status", fun.aggregate = length)

But how can I calculate the percentages of cases A and B for each of the columns separately? Desired output would be something like:

  Case N ND Y
1    A 33.3  33.3 40
2    B 66.6  66.6 60

Note that each column has a different number of observations. Subsequently, I will use the output to plot frequencies of cases A and B for each status separately with facet.

All the answers I was able to find were dealing with counting absolute values only. So all help is very much appreciated!


Solution

  • 1) Base R table gets the frequencies and proportions gets those.

    round(100 * proportions(table(df), 2) , 1)
    ##     Status
    ## Case    N   ND    Y
    ##    A 33.3 33.3 40.0
    ##    B 66.7 66.7 60.0
    
    
    

    2) crosstable Try crosstable for a different layout.

    library(crosstable)
    
    crosstable(df, by = "Case")
    ## # A tibble: 3 × 5
    ##   .id    label  variable A          B         
    ##   <chr>  <chr>  <chr>    <chr>      <chr>     
    ## 1 Status Status N        1 (33.33%) 2 (66.67%)
    ## 2 Status Status ND       1 (33.33%) 2 (66.67%)
    ## 3 Status Status Y        2 (40.00%) 3 (60.00%)
    

    3) gmmodels This package has the CrossTable funtion. The output is somewhat large so I have omitted it.

    library(gmodels)
    
    with(df, CrossTable(Case, Status, prop.r = FALSE, prop.t = FALSE, prop.chisq = FALSE))
    

    4) descr This package features a CrossTable function based on the one in gmodels. It also has a plot method which produces a mosaic plot from an object of class "CrossTable" .

    library(descr)
    
    tab <- with(df, CrossTable(Case, Status, prop.r = FALSE, prop.t = FALSE, prop.chisq = FALSE))
    tab
    ##    Cell Contents 
    ## |-------------------------|
    ## |                       N | 
    ## |           N / Col Total | 
    ## |-------------------------|
    ##
    ## ======================================
    ##          Status
    ## Case         N      ND       Y   Total
    ## --------------------------------------
    ## A            1       1       2       4
    ##          0.333   0.333   0.400        
    ## --------------------------------------
    ## B            2       2       3       7
    ##          0.667   0.667   0.600        
    ## --------------------------------------
    ## Total        3       3       5      11
    ##          0.273   0.273   0.455        
    ## ======================================
    
    plot(tab, color = 2:3)
    

    screenshot