rrowskip

How to skip not completely empty rows


So, I'm trying to read a excel files. What happens is that some of the rows are empty for some of the columns but not for all of them. I want to skip all the rows that are not complete, i.e., that don't have information in all of the columns. For example:

In this case I would like to skip the lines 1,5,6,7,8 and so on.


Solution

  • There is probably more elegant way of doing it, but a possible solution is to count the number of elements per rows that are not NA and keep only rows with the number of elements equal to the number of columns.

    Using this dummy example:

    df <-  data.frame(A = LETTERS[1:6],
                     B = c(sample(1:10,5),NA),
                     C = letters[1:6])
    
      A  B C
    1 A  5 a
    2 B  9 b
    3 C  1 c
    4 D  3 d
    5 E  4 e
    6 F NA f
    

    Using apply, you can for each rows count the number of elements without NA:

    v <- apply(df,1, function(x) length(na.omit(x)))
    
    [1] 3 3 3 3 3 2
    

    And then, keep only rows with the number of elements equal to the number of columns (which correspond to complete rows):

    df1 <- df[v == ncol(df),]
    
      A B C
    1 A 5 a
    2 B 9 b
    3 C 1 c
    4 D 3 d
    5 E 4 e
    

    Does it answer your question ?