I have a vector 10,000 numbers long with a skewed distribution and extreme values. When I call base R's summary() function, certain values display as 0 when they are not actually 0.
From my actual data:
> summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000e+00 0.000e+00 0.000e+00 1.244e+16 0.000e+00 1.225e+20
> str(summary(vec))
'summaryDefault' Named num [1:6] 3.69e-207 1.73e-01 2.84e-01 1.24e+16 4.52e-01 ...
- attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
I am expecting the summary function to show the actual values that it has stored. Like this:
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.69e-207 1.73e-01 2.84e-01 1.244e+16 4.52e-01 1.225e+20
Here is a reproducible example (only the min shows incorrectly as 0):
> x = seq(from=3.69e-207, to=1.23e+20, length.out=10000)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000e+00 3.075e+19 6.150e+19 6.150e+19 9.225e+19 1.230e+20
> str(summary(x))
'summaryDefault' Named num [1:6] 3.69e-207 3.08e+19 6.15e+19 6.15e+19 9.22e+19 ...
- attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
> summary(x)["Min."]
Min.
3.69e-207
This has to do with the printing methods. Crudely, you can use unclass()
to convert the output back from a summary table to a regular numeric vector, which prints as you would like:
vec <- c(1e-200, rep(1, 100), 1e20)
sv <- summary(vec)
unclass(sv)
This prints the summary as a vector rather than as a table:
Min. 1st Qu. Median Mean 3rd Qu.
1.000000e-200 1.000000e+00 1.000000e+00 9.803922e+17 1.000000e+00
Max.
1.000000e+20
In the development version of R there is a zdigits
argument for adjusting the behaviour of the internal call to the zapsmall()
function which is explicitly intended to collapse small values to zero ... assuming this feature is in the next release of R, it will be available on 11 April 2025 ...
print(sv, zdigits = Inf)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000e-200 1.000e+00 1.000e+00 9.804e+17 1.000e+00 1.000e+20
print(summary(<numbers>))
gets new optional argumentzdigits
to allow more flexible and consistent (double) rounding. The current defaultzdigits = 4L
is somewhat experimental. Specifying bothdigits = *, zdigits = *
allows behaviour independent of the globaldigits
option.