rsummarypsychhmisc

Summary of numeric variable with missing values


Is there a R package that give me summary statistics of a numeric variable which include the percentage of missing values?

I have tried the built-in summary, Hmisc describe and psych describe, but non of those do:

> x <- rnorm(1000, 5, 0.6)
> x[sample(seq(1,length(x)),100)]<-NA
> summary(x)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.       NA's 
3.12726377 4.62901915 5.02423569 5.02075611 5.39955795 7.52357325        100 
> Hmisc::describe(x)
x 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
     900      100      900        1    5.021     5.02   0.6623    4.053 
     .10      .25      .50      .75      .90      .95 
   4.267    4.629    5.024    5.400    5.809    6.000 

lowest : 3.12726 3.27157 3.37501 3.41268 3.41959
highest: 6.41032 6.44724 6.50692 6.54191 7.52357
> psych::describe(x)
   vars   n mean   sd median trimmed  mad  min  max range skew kurtosis   se
X1    1 900 5.02 0.59   5.02    5.02 0.57 3.13 7.52   4.4 0.01     0.17 0.02

Solution

  • Take a look at the skimr R package.

    x <- rnorm(1000, 5, 0.6)
    x[sample(seq(1,length(x)),100)] <- NA
    skimr::skim(x)
    
    Name x
    Number of rows 1000
    Number of columns 1
    _______________________
    Column type frequency:
    numeric 1
    ________________________
    Group variables None

    Data summary

    Variable type: numeric

    skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
    data 100 0.9 5 0.61 3.03 4.6 4.98 5.41 6.78 ▁▃▇▅▁

    Created on 2024-12-21 with reprex v2.0.2