rggplot2mapsr-sfr-maptools

Removing the Great Lakes from US county-level maps in R


I am using R to draw US map at county level. I downloaded the shapefile for US from GADM. The county-level shape file is "gadm36_USA_2.shp". I then used the code below to draw map:

library(sf)
library(tidyverse)

us2 <- st_read("<Path>\\gadm36_USA_2.shp")

mainland2 <- ggplot(data = us2) +
geom_sf(aes(fill = NAME_2), size = 0.4, color = "black") +
coord_sf(crs = st_crs(2163), 
         xlim = c(-2500000, 2500000), 
         ylim = c(-2300000, 730000)) + guides(fill = F)

The Great Lakes region (shown by red arrows) is plotted rather than left blank: enter image description here

What I want is a figure like below, where the Great Lakes region is left blank: enter image description here

How could I identify from the "gadm36_USA_2.shp" which rows correspond to the Great Lakes region so that I may delete them?

I understand there may be other ways to obtain shapefile than GADM. I believe GADM is an excellent source that provides bourndaries worldwide. I wish to take this opportunity to better acquaint myself with data downloaded from GADM.

Of course, other methods to obtain US county-level boundary data are welcome. I noted USAboundaries package also provide country, state, and county level coundaries, but I am having difficulties installing associated USAboundariesData package. Any idea to draw US counties in ways other than shapefile from GADM is welcome. Thanks.


Solution

  • One way is to remove every feature that is tagged with Lake in the existing records (currently 13 features). First, you need to find the lakes name in the attribute table as below:

    # retrieving the name of lakes and excluding them from the sf 
    
    all.names = us2$NAME_2
    patterns = c("Lake", "lake")
    
    lakes.name <- unique(grep(paste(patterns, collapse="|"), all.names, value=TRUE, ignore.case = TRUE))
    #[1] "Lake and Peninsula" "Lake"               "Bear Lake"          "Lake Michigan"      "Lake Hurron"        "Lake St. Clair"    
    #[7] "Lake Superior"      "Lake of the Woods"  "Red Lake"           "Lake Ontario"       "Lake Erie"          "Salt Lake"         
    #[13] "Green Lake" 
    
    `%notin%` <- Negate(`%in%`)
    us <- us2[us2$NAME_2 %notin% lakes.name, ]
    

    Then you can map the remaining features:

    mainland2 <- ggplot(data = us) +
      geom_sf(aes(fill = NAME_2), size = 0.4, color = "black") +
      coord_sf(crs = st_crs(2163), 
               xlim = c(-2500000, 2500000), 
               ylim = c(-2300000, 730000)) + guides(fill = F)
    mainland2
    

    enter image description here

    Another way (much easier but less flexible) is to map county features by excluding Water body values from ENGTYPE_2 as below:

    us <- us2[(us2$ENGTYPE_2) != "Water body",]
    mainland2 <- ggplot(data = us) +
      geom_sf(aes(fill = NAME_2), size = 0.4, color = "black") +
      coord_sf(crs = st_crs(2163), 
               xlim = c(-2500000, 2500000), 
               ylim = c(-2300000, 730000)) + guides(fill = F)
    mainland2
    

    enter image description here