rdataframemissing-data

Elegant way to report missing values in a data.frame


Here's a little piece of code I wrote to report variables with missing values from a data frame. I'm trying to think of a more elegant way to do this, one that perhaps returns a data.frame, but I'm stuck:

for (Var in names(airquality)) {
    missing <- sum(is.na(airquality[,Var]))
    if (missing > 0) {
        print(c(Var,missing))
    }
}

Edit: I'm dealing with data.frames with dozens to hundreds of variables, so it's key that we only report variables with missing values.


Solution

  • Just use sapply

    > sapply(airquality, function(x) sum(is.na(x)))
      Ozone Solar.R    Wind    Temp   Month     Day 
         37       7       0       0       0       0
    

    You could also use apply or colSums on the matrix created by is.na()

    > apply(is.na(airquality),2,sum)
      Ozone Solar.R    Wind    Temp   Month     Day 
         37       7       0       0       0       0
    > colSums(is.na(airquality))
      Ozone Solar.R    Wind    Temp   Month     Day 
         37       7       0       0       0       0