[SOLVED] Drop rows with empty string in any of the specified columns

Drop rows with empty string in any of the specified columns

I have a very very large data frame. It has 10 columns and millions of rows.

I have two columns that have a lot of empty fields. both columns are in chr format. I need to remove the rows with empty fields in either column.

I will try to provide an example:

library(tidyverse)

tibble(
  start_loc = c("A St", "B St", ""),
  end_loc = c("D St", "", "F St"),
  id = c("m", "c", "m")
) %>% 
  {. ->> my_data}

my_data

# A tibble: 3 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 "A St"    "D St"  m    
2 "B St"    ""      c    
3 ""        "F St"  m

So, I need to remove all rows that have blank fields in start_loc or end_loc.

Solution

We may use nzchar with if_all (assuming empty is "" and not NA)

library(dplyr)
my_data %>% 
   filter(if_all(c(start_loc, end_loc), nzchar))

-output

# A tibble: 1 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 A St      D St    m

Or use if_any and negate (!)

my_data %>%
    filter(!if_any(c(start_loc, end_loc), `==`, ""))
# A tibble: 1 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 A St      D St    m

If there are both "" and NA, we can use

my_data %>%
    filter(!if_any(c(start_loc, end_loc), ~ is.na(.)|. == ""))
# A tibble: 1 x 3
  start_loc end_loc id   
  <chr>     <chr>   <chr>
1 A St      D St    m