The data set used here is risk (probability) and the probabilities are very small. When using the summary
function in R, the following is obtained
> summary(prob_ann)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 1.000e-16 1.034e-13 3.959e-12 7.880e-13 8.222e-10
However, a query for the actual minimum yields the correct value:
> min(prob_ann)
## [1] 1.199446e-35
My question is this: ¿why is summary
using scientific notation, but still reporting a TRUE ZERO value instead of the correct value of 1.199e-35
?
Update #1
Despite there being more than enough information to "debug" this question (as was demonstrated by the user who actually answered the question), someone "closed" this question because there wasn't enough information to reproduce the problem. Again, curious that this was the justification when the accepted answer clearly proved them wrong...which raises the question: ¿why was this question closed?
But, here is the "requested" code:
set.seed(123)
prob_ann <- c(1.199446e-35, runif(100, 3.33e-15, 9.99e-10))
summary(prob_ann)
min(prob_ann)
quantile(prob_ann,probs=c(0,1))
It's not a TRUE zero value. The reason why the minimum shown by summary
differs from the actual minimum is because of the class of the output value.
set.seed(123)
prob_ann <- c(1.199446e-35, runif(100, 0, 8.222e-10))
res <- summary(prob_ann); res
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000e+00 2.003e-10 3.831e-10 4.059e-10 6.203e-10 8.175e-10
min(prob_ann)
[1] 1.199446e-35
class(res)
#[1] "summaryDefault" "table"
The second last line of the summary.default
function is:
class(value) <- c("summaryDefault", "table")
the first argument changes the formatting of the output due to the print.summaryDefault
function:
function (x, digits = max(3L, getOption("digits") - 3L), ...)
{
xx <- x
if (is.numeric(x) || is.complex(x)) {
finite <- is.finite(x)
xx[finite] <- zapsmall(x[finite])
}
...
print.table(xx, digits = digits, ...)
invisible(x)
}
Thus, the output is being rounded (see zapsmall
for proof).
?zapsmall
zapsmall determines a digits argument dr for calling round(x, digits = dr) such that values close to zero (compared with the maximal absolute value in the vector) are ‘zapped’, i.e., replaced by 0.
If you want to see the unformatted output, then, you can use unclass
:
unclass(res)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.199446e-35 2.066278e-10 4.407913e-10 4.176800e-10 6.351261e-10 8.195258e-10
or use print.table
instead:
print.table(res)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.199446e-35 2.066278e-10 4.407913e-10 4.176800e-10 6.351261e-10 8.195258e-10