I have a simple data frame:
> df <- data.frame(i=c(1:20), x=c(1:10, rep(NA, 10)))
> df
i x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 11 NA
12 12 NA
13 13 NA
14 14 NA
15 15 NA
16 16 NA
17 17 NA
18 18 NA
19 19 NA
20 20 NA
I want to extract the rownames of the non NA parts which I can do as follows:
> rownames(df[c(1:20),][!is.na(df$x),])
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
So far so good. Now I want to skip the first row, but for some reason the command returns the same length output and now even contains an NA cell.
> rownames(df[c(2:20),][!is.na(df$x),])
[1] "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
It does not make sense to get a same sized vector and even one containing the supposedly excluded row. As you can see in the data frame above, df$x[11] is definitely NA, so why does it include something that !is.na() should usually get rid of? To be more specific: I am trying to observe an extract of a data frame, but exclude rows containing NAs. I would be happy about every piece of advice!
The problem is !is.na(df$x)
is indexed to df
, not df[c(2:20)
. !is.na(df$x)
is true for the first 10 elements. So, rownames(df[c(2:20),][!is.na(df$x),])
returns rownames for elements 2 through 11 of df
.
df2 <- df[c(2:20),]
rownames(df2[!is.na(df2$x),])
# [1] "2" "3" "4" "5" "6" "7" "8" "9" "10"