I loaded a data set called gob into R and tried the handy summary
function. It is Note that the 3rd quartile is less than the mean. How can this be? Is it the size of my data or something else like that?
I already tried passing in a large value for the digits parameter (e.g. 10), and that does not resolve the issue.
> summary(gob, digits=10)
customer_id 100101.D 100199.D 100201.D
Min. : 1083 Min. :0.0000000 Min. :0.0000000 Min. :0.0000000
1st Qu.: 965928 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.0000000
Median :2448738 Median :0.0000000 Median :0.0000000 Median :0.0000000
Mean :2660101 Mean :0.0010027 Mean :0.0013348 Mean :0.0000878
3rd Qu.:4133368 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000
Max. :6538193 Max. :1.0000000 Max. :1.0000000 Max. :0.7520278
Note that for gob$100201.D the mean is 0.0000878 but the 3rd Qu. = 0.
It is not a bug, just your data contains lot of 0 values. For example, if I make x with twelve 0 and one 1, I get result that 3rd quartile is smaller than mean
x<-c(0,0,0,0,0,0,0,0,0,0,0,0,1)
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.07692 0.00000 1.00000
Try to use table() on your column to see distribution of values
table(x)
x
0 1
12 1