I have a vector of numeric data (sample below). Let's store the vector as x. When I run summary(x) and descr(x), where descr() is from the summarytools package, I have agreement on the Min, Median, Mean, and Max values. However, my 1st & 3rd quartile values differ. This is the first time I've seen this discrepancy between the two function results. Any thoughts as to why and how this happens?
I started exploring the descr() source code, but haven't gotten far nor have I been able to access summary() source to see if therein is the difference. However, when looking at some of the cumulative percentages, I think there might be a difference in how they are calculated the quantiles.
x = c(1132.1, 731.1, 851.2, 704.0, 226.3, 1703.6, 853.6, 821.4, 1192.9, 814.2, 880.2, 1270.8, 784.2, 606.5, 702.8, 863.6, 419.2, 1486.9, 1325.8, 493.2, 847.7, 552.5, 709.3, 508.3, 400.0, 711.4, 1161.5, 778.4, 626.2, 365.0, 329.1, 457.7, 446.2, 564.1, 376.9, 463.3, 239.7, 250.9, 266.5, 298.2, 186.2, 79.0, 149.9, 178.7, 79.4, 91.8, 12.6)
install.packages("")
library(summarytools)
descr(x)
summary(x)
With descr() Q1= 298.20 and Q3= 847.70 With summary() Q1= 313.6 and Q3= 834.5
When I run freq(x) and look at the cumulative percentage, 298.2 is at 25.53%, 821.4 is at 74.47%, and 847.7 is at 76.6%. So it looks like descr() might be listing the x vector's values that are closest to but not under the 1st & 3rd quartile.
(821.4+847.7)/2 = 834.5
This matches the summary 3rd quartile which is not a vector value but closer to the estimated cumulative 75%. Still not sure how summary() obtains 313.6 for the 1st quartile.
Look at the help page for ?quantile
. There are multiple different ways of calculating quantiles, descr()
is using type = 2
and summary
is using the default of type = 7
:
> quantile(x, type = 2)
0% 25% 50% 75% 100%
12.6 298.2 564.1 847.7 1703.6
> quantile(x, type = 7)
0% 25% 50% 75% 100%
12.60 313.65 564.10 834.55 1703.60