I have this hypothetical dataframe
df <- data.frame(K=c(NA, NA, 3, 4,0,2,NA, NA), A=c(NA, NA, NA, 4,0,3,NA, NA), B=c(NA, 2, NA, NA,0,NA,NA,0), C=c(0, 3, 5, NA,0,5,NA,5), D=c(NA, 3, 1, 2,0,10,NA,3))
df
K A B C D
1 NA NA NA 0 NA
2 NA NA 2 3 3
3 3 NA NA 5 1
4 4 4 NA NA 2
5 0 0 0 0 0
6 2 3 NA 5 10
7 NA NA NA NA NA
8 NA NA 0 5 3
I want to find rowwise how many NAs we have before the first numerical value (in that row) and I want to add this information as a last column in the above data frame
So I want to have
df2 <- data.frame(K=c(NA, NA, 3, 4,0,2,NA, NA), A=c(NA, NA, NA, 4,0,3,NA, NA), B=c(NA, 2, NA, NA,0,NA,NA,0), C=c(0, 3, 5, NA,0,5,NA,5), D=c(NA, 3, 1, 2,0,10,NA,3),
nn=c(3,2,0,0,0,0,5,2))
df2
K A B C D nn
1 NA NA NA 0 NA 3
2 NA NA 2 3 3 2
3 3 NA NA 5 1 0
4 4 4 NA NA 2 0
5 0 0 0 0 0 0
6 2 3 NA 5 10 0
7 NA NA NA NA NA 1000
8 NA NA 0 5 3 2
For example, the first row contains 3 NAs before the first value of 0. The second row contains 2 NAs before the first value of 2. The third row contains zero NAs before the first value which is 3 and the same holds for the rows 4-6. For row seven we have only NAs,so in this case I just use the arbitrary value of 1000. For row eight we have two NAs before the first value of 0.
I tried this
DD<-apply(df,1,function(x) which(!is.na(x)))
unlist(map(DD, 1) , use.names=FALSE)-1
but it does not work as I get
3 2 0 0 0 0 2
I miss the row that contains only NAs. Any ideas? Would it be possible to do that using dplyr?
Note that my real dataset contains 500 columns and 200000 rows. So a more general code would help a lot.
You can use max.col
> df$nn <- (max.col(!is.na(df), "first") - 1) - (rowMeans(is.na(df)) == 1)
> df
K A B C D nn
1 NA NA NA 0 NA 3
2 NA NA 2 3 3 2
3 3 NA NA 5 1 0
4 4 4 NA NA 2 0
5 0 0 0 0 0 0
6 2 3 NA 5 10 0
7 NA NA NA NA NA -1
8 NA NA 0 5 3 2
where 7
-th row is set to -1
since the entire row consists of NA
s only.