rsummary

Why are values not correctly displayed in summary()?


I have a vector 10,000 numbers long with a skewed distribution and extreme values. When I call base R's summary() function, certain values display as 0 when they are not actually 0.

From my actual data:

> summary(vec)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.000e+00 0.000e+00 0.000e+00 1.244e+16 0.000e+00 1.225e+20

> str(summary(vec))
'summaryDefault' Named num [1:6] 3.69e-207 1.73e-01 2.84e-01 1.24e+16 4.52e-01 ...
 - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...

I am expecting the summary function to show the actual values that it has stored. Like this:

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
3.69e-207 1.73e-01   2.84e-01 1.244e+16  4.52e-01 1.225e+20

Here is a reproducible example (only the min shows incorrectly as 0):

> x = seq(from=3.69e-207, to=1.23e+20, length.out=10000)

> summary(x)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.000e+00 3.075e+19 6.150e+19 6.150e+19 9.225e+19 1.230e+20 

> str(summary(x))
 'summaryDefault' Named num [1:6] 3.69e-207 3.08e+19 6.15e+19 6.15e+19 9.22e+19 ...
 - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...

> summary(x)["Min."]
     Min. 
3.69e-207 

Solution

  • This has to do with the printing methods. Crudely, you can use unclass() to convert the output back from a summary table to a regular numeric vector, which prints as you would like:

    vec <- c(1e-200, rep(1, 100), 1e20)
    sv <- summary(vec)
    unclass(sv)
    

    This prints the summary as a vector rather than as a table:

            Min.       1st Qu.        Median          Mean       3rd Qu. 
    1.000000e-200  1.000000e+00  1.000000e+00  9.803922e+17  1.000000e+00 
             Max. 
     1.000000e+20 
    

    In the development version of R there is a zdigits argument for adjusting the behaviour of the internal call to the zapsmall() function which is explicitly intended to collapse small values to zero ... assuming this feature is in the next release of R, it will be available on 11 April 2025 ...

    print(sv, zdigits = Inf)
         Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
    1.000e-200  1.000e+00  1.000e+00  9.804e+17  1.000e+00  1.000e+20 
    

    print(summary(<numbers>)) gets new optional argument zdigits to allow more flexible and consistent (double) rounding. The current default zdigits = 4L is somewhat experimental. Specifying both digits = *, zdigits = * allows behaviour independent of the global digits option.