rdataframenar-rownames

R: NA returned despite !is.na


I have a simple data frame:

> df <- data.frame(i=c(1:20), x=c(1:10, rep(NA, 10)))
> df
    i  x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10
11 11 NA
12 12 NA
13 13 NA
14 14 NA
15 15 NA
16 16 NA
17 17 NA
18 18 NA
19 19 NA
20 20 NA

I want to extract the rownames of the non NA parts which I can do as follows:

> rownames(df[c(1:20),][!is.na(df$x),])
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

So far so good. Now I want to skip the first row, but for some reason the command returns the same length output and now even contains an NA cell.

> rownames(df[c(2:20),][!is.na(df$x),])
 [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

It does not make sense to get a same sized vector and even one containing the supposedly excluded row. As you can see in the data frame above, df$x[11] is definitely NA, so why does it include something that !is.na() should usually get rid of? To be more specific: I am trying to observe an extract of a data frame, but exclude rows containing NAs. I would be happy about every piece of advice!


Solution

  • The problem is !is.na(df$x) is indexed to df, not df[c(2:20). !is.na(df$x) is true for the first 10 elements. So, rownames(df[c(2:20),][!is.na(df$x),]) returns rownames for elements 2 through 11 of df.

    df2 <- df[c(2:20),]
    rownames(df2[!is.na(df2$x),])    
    
    # [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"