rtidyverse

Drop rows with empty string in any of the specified columns


I have a very very large data frame. It has 10 columns and millions of rows.

I have two columns that have a lot of empty fields. both columns are in chr format. I need to remove the rows with empty fields in either column.

I will try to provide an example:

library(tidyverse)

tibble(
  start_loc = c("A St", "B St", ""),
  end_loc = c("D St", "", "F St"),
  id = c("m", "c", "m")
) %>% 
  {. ->> my_data}

my_data

# A tibble: 3 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 "A St"    "D St"  m    
2 "B St"    ""      c    
3 ""        "F St"  m 

So, I need to remove all rows that have blank fields in start_loc or end_loc.


Solution

  • We may use nzchar with if_all (assuming empty is "" and not NA)

    library(dplyr)
    my_data %>% 
       filter(if_all(c(start_loc, end_loc), nzchar))
    

    -output

    # A tibble: 1 x 3
      start_loc end_loc id   
      <chr>     <chr>   <chr>
    1 A St      D St    m    
    

    Or use if_any and negate (!)

    my_data %>%
        filter(!if_any(c(start_loc, end_loc), `==`, ""))
    # A tibble: 1 x 3
      start_loc end_loc id   
      <chr>     <chr>   <chr>
    1 A St      D St    m    
    

    If there are both "" and NA, we can use

    my_data %>%
        filter(!if_any(c(start_loc, end_loc), ~ is.na(.)|. == ""))
    # A tibble: 1 x 3
      start_loc end_loc id   
      <chr>     <chr>   <chr>
    1 A St      D St    m