I am looping a few frequency tables with the freq() command in summarytools and printing the results. In doing so, I noticed that when I am trying to save the freq() object without missing values and convert it to a data frame, the total observations still keeps the missing values.
# Create a vector with 10 observations of "smoker"
smoker <- c("yes", "no", "yes", NA, NA, NA, "yes", "no", "yes", "no")
# Create a DataFrame using the vector
df <- data.frame(smoker)
library(summarytools)
library(dplyr)
# Create a frequency table without missing values
freq(df$smoker, report.nas = FALSE)
# Try to save this table into a data frame
table <- as.data.frame(freq(df$smoker, report.nas = FALSE)) # OR
table <- df %>% freq(smoker, report.nas = FALSE) %>% as.data.frame()
table
The results should look like this (missing values excluded, n=7):
Freq % % Cum.
no 3 42.86 42.86
yes 4 57.14 100.00
Total 7 100.00 100.00
But after saving it to a data.frame, it looks like this (missing values added back on, with total n=10):
Freq % Valid % Valid Cum. % Total % Total Cum.
no 3 42.85714 42.85714 30 30
yes 4 57.14286 100.00000 40 70
<NA> 3 NA NA 30 100
Total 10 100.00000 100.00000 100 100
This seems like a bug but not sure if this is the expected outcome. Any thoughts on how to save this output as a data.frame? I'm hoping to loop the data frame and add kable styling.
Using report.nas
only affects the printing of the NA
values, not the storage of them. If we store the freq
object as see
:
see <- summarytools::freq(df$smoker, report.nas = FALSE)
You can see it prints the values as desired:
# Frequencies
# df$smoker
# Type: Character
#
# Freq % % Cum.
# ----------- ------ -------- --------
# no 3 42.86 42.86
# yes 4 57.14 100.00
# Total 7 100.00 100.00
But it stores them with the NA
values:
So you will still need to subset to get what you want, this approach is simply using !is.na()
on the percent valid column:
want <- as.data.frame(see[!is.na(see[,2]),])
# Freq % Valid % Valid Cum. % Total % Total Cum.
# no 3 42.85714 42.85714 30 30
# yes 4 57.14286 100.00000 40 70
# Total 10 100.00000 100.00000 100 100