rdatatablesummarytools

summarytools::freq gives unintended results when variables are factors without NA


Since my data has summarized counts, I am using the freq function from summmarytools with weights.

With weights, the freq function works fine for summarizing a column when

  1. The column is numeric or integer
  2. The columns is a factor with NA or NaN values

But when

  1. The column is a factor without NA or NaN values then the summary take a level away from the column and displays it in NA!!

I came across this issue in a live case and have reproduced a sample.

     library(data.table)
     library(summarytools)
     dt <- data.table(A= as.integer( c(5,3,4,5,6,1,2,NA,3,NaN)),
             B= c(5,3,4,5,6,1,2,NA,3,NaN),
             C=as.factor( c(5,3,4,5,6,1,2,NA,3,NA)),
             D=as.factor( c(5,3,4,5,6,1,2,NaN,3,NaN)),
             E=as.factor( c(5,3,4,5,6,1,2,5,3,3)),

             Frequency=c(10,20,30,40,5,60,7,80,99,10)
           )
     str(dt)

frequency being an integer or numeric does not matter

if we have a factor without Nan or NA values it makes a difference

writeLines("\n\n\n Without weights: No errors")
#summarytools::freq(dt[,1:5]) #Commented to minimize clutter
writeLines("\n\n\n With weights, Column E shows incorrect values but not C and D")
summarytools::freq(dt[,1:5],weights=dt$Frequency)



 Without weights: No errors



 With weights, Column E shows incorrect values but not C and D
 1 NaN value(s) converted to NA

 0 NaN value(s) converted to NA

 Weighted Frequencies  
 dt$A  
 Weights: weights  

            Freq   % Valid   % Valid Cum.   % Total   % Total Cum.

      1    60.00     22.14          22.14     16.62          16.62
      2     7.00      2.58          24.72      1.94          18.56
      3   119.00     43.91          68.63     32.96          51.52
      4    30.00     11.07          79.70      8.31          59.83
      5    50.00     18.45          98.15     13.85          73.68
      6     5.00      1.85         100.00      1.39          75.07
   <NA>    90.00                              24.93         100.00
  Total   361.00    100.00         100.00    100.00         100.00

dt$B
Type: Numeric

            Freq   % Valid   % Valid Cum.   % Total   % Total Cum.

      1    60.00     22.14          22.14     16.62          16.62
      2     7.00      2.58          24.72      1.94          18.56
      3   119.00     43.91          68.63     32.96          51.52
      4    30.00     11.07          79.70      8.31          59.83
      5    50.00     18.45          98.15     13.85          73.68
      6     5.00      1.85         100.00      1.39          75.07
   <NA>    90.00                              24.93         100.00
  Total   361.00    100.00         100.00    100.00         100.00

dt$C
Type: Factor

            Freq   % Valid   % Valid Cum.   % Total   % Total Cum.

      1    60.00     22.14          22.14     16.62          16.62
      2     7.00      2.58          24.72      1.94          18.56
      3   119.00     43.91          68.63     32.96          51.52
      4    30.00     11.07          79.70      8.31          59.83
      5    50.00     18.45          98.15     13.85          73.68
      6     5.00      1.85         100.00      1.39          75.07
   <NA>    90.00                              24.93         100.00
  Total   361.00    100.00         100.00    100.00         100.00

dt$D
Type: Factor

            Freq   % Valid   % Valid Cum.   % Total   % Total Cum.

      1    60.00     22.14          22.14     16.62          16.62
      2     7.00      2.58          24.72      1.94          18.56
      3   119.00     43.91          68.63     32.96          51.52
      4    30.00     11.07          79.70      8.31          59.83
      5    50.00     18.45          98.15     13.85          73.68
      6     5.00      1.85         100.00      1.39          75.07
   <NA>    90.00                              24.93         100.00
  Total   361.00    100.00         100.00    100.00         100.00

dt$E
Type: Factor

            Freq   % Valid   % Valid Cum.   % Total   % Total Cum.

      1    60.00     16.85          16.85     16.62          16.62
      2     7.00      1.97          18.82      1.94          18.56
      3   129.00     36.24          55.06     35.73          54.29
      4    30.00      8.43          63.48      8.31          62.60
      5   130.00     36.52         100.00     36.01          98.61
   <NA>     5.00                               1.39         100.00
  Total   361.00    100.00         100.00    100.00         100.00

Solution

  • A fix was issued for this. You can install the latest version from GitHub with: devtools::install_github("dcomtois/summarytools")

    or, to get the latest development version:
    devtools::install_github("dcomtois/summarytools", ref = "dev-current)