I have a datafile with raw scores and with sample weights. Now I want to use the describe function of the psych package, taking into account the sample weights.
Does anyone know how to do that, or is there a function somewhere that does exactly te same as psych::describe() but can handle sample weights?
The next example will give some insight in what I intend to do.
library(psych)
describe(c(2,3,4,1,4,5,3,3))
#gives:
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 8 3.12 1.25 3 3.12 1.48 1 5 4 -0.2 -1.16 0.44
The sample weights are:
c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1)
The weighted mean would be (correct me if I am wrong):
sum(c(2,3,4,1,4,5,3,3)* c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1))/sum(c(0.2,0.5,1.2,1.5,0.2,0.6,0.6,1.1))
[1] 2.898305
So that's, ofcourse different from the unweighted mean. How can I make sure that the reported SD, kurtosis, skewness etc. are based on the sample weighted mean as well?
As the psych package does not handle weights, and there is no alternative package that serves an equivalent collection of weighted descriptives, one has to cherry pick from different packages and combine the output like psych::describe()
does.
Also, the calculation of weighted descriptives typically need to be supplied with each case in the data along with the individual weights assigned those cases, therefore "shortcuts" won't work. (For example, the Weighted Standard Error will not be equal to the Weighted Standard Deviation divided by the square root of the sample size.)
Here's a simple wrapper function that mimics the behavior of psych::describe()
for weighted data:
wtd.describe <- function(x, weights=NULL, trim=.1){
require(TAM)
require(diagis)
require(robsurvey)
out <- NULL
# Handling simple vectors
x <- as.data.frame(x)
# If no weights given, all weights = 1
if(is.null(weights)) {weights <- seq(1, nrow(x))}
i <- 1
for(colname in colnames(x)){
# Removing rows with missing data or weight
d <- x[complete.cases(x[[colname]], weights), , drop=FALSE][[colname]]
w <- weights[complete.cases(x[[colname]], weights)]
wd <- data.frame(
"vars" = i,
"n" = length(d),
"mean" = TAM::weighted_mean(d, w = w),
"sd" = TAM::weighted_sd(d, w = w),
"median" = robsurvey::weighted_median(d, w = w, na.rm = TRUE),
"trimmed" = robsurvey::weighted_mean_trimmed(d, w = w, LB = trim, UB = (1 - trim), na.rm = TRUE),
"mad" = robsurvey::weighted_mad(d, w = w, na.rm = TRUE, constant = 1.4826),
"min" = min(d),
"max" = max(d),
"range" = max(d) - min(d),
"skew" = TAM::weighted_skewness(d, w = w),
"kurtosis" = TAM::weighted_kurtosis(d, w = w),
"se" = diagis::weighted_se(d, w = w, na.rm = TRUE),
row.names = colname
)
i <- i+1
out <- rbind(out, wd)
}
return(out)
}
Please note that:
psych:describe()
are not emulated by the above function.na.rm = TRUE
is implied, as the TAM package does implicit na.rm = TRUE
.