apologise for the very basic question but I am trying to calculate relative frequencies (%) of variables within a column grouped by another column. Example data:
df <- data.frame(Case = c("A","B","B","B","A","B","A","A","B","B","B"),
Status = c("Y","Y","Y","N","Y","ND","ND","N","Y","N","ND"))
I know how to calculate the absolute frequencies, so that the data will look like this
Case N ND Y
1 A 1 1 2
2 B 2 2 3
This I can achieve, for instance with
newdf <- dcast(df, Case ~ Status,
value.var = "Status", fun.aggregate = length)
But how can I calculate the percentages of cases A and B for each of the columns separately? Desired output would be something like:
Case N ND Y
1 A 33.3 33.3 40
2 B 66.6 66.6 60
Note that each column has a different number of observations. Subsequently, I will use the output to plot frequencies of cases A and B for each status separately with facet.
All the answers I was able to find were dealing with counting absolute values only. So all help is very much appreciated!
1) Base R table gets the frequencies and proportions gets those.
round(100 * proportions(table(df), 2) , 1)
## Status
## Case N ND Y
## A 33.3 33.3 40.0
## B 66.7 66.7 60.0
2) crosstable Try crosstable for a different layout.
library(crosstable)
crosstable(df, by = "Case")
## # A tibble: 3 × 5
## .id label variable A B
## <chr> <chr> <chr> <chr> <chr>
## 1 Status Status N 1 (33.33%) 2 (66.67%)
## 2 Status Status ND 1 (33.33%) 2 (66.67%)
## 3 Status Status Y 2 (40.00%) 3 (60.00%)
3) gmmodels This package has the CrossTable funtion. The output is somewhat large so I have omitted it.
library(gmodels)
with(df, CrossTable(Case, Status, prop.r = FALSE, prop.t = FALSE, prop.chisq = FALSE))
4) descr This package features a CrossTable function based on the one in gmodels. It also has a plot
method which produces a mosaic plot from an object of class "CrossTable"
.
library(descr)
tab <- with(df, CrossTable(Case, Status, prop.r = FALSE, prop.t = FALSE, prop.chisq = FALSE))
tab
## Cell Contents
## |-------------------------|
## | N |
## | N / Col Total |
## |-------------------------|
##
## ======================================
## Status
## Case N ND Y Total
## --------------------------------------
## A 1 1 2 4
## 0.333 0.333 0.400
## --------------------------------------
## B 2 2 3 7
## 0.667 0.667 0.600
## --------------------------------------
## Total 3 3 5 11
## 0.273 0.273 0.455
## ======================================
plot(tab, color = 2:3)