I have downloaded the Census shapefile at the zip code level, cb_2017_us_zcta510_500k.shp
(https://www.census.gov/geo/maps-data/data/cbf/cbf_zcta.html)
I also have downloaded the mapping file that allows me to add the corresponding STATE
variable (https://www.census.gov/geo/maps-data/data/zcta_rel_download.html)
I merged the two and I get:
library(sf)
library(dplyr)
big_df
Simple feature collection with 44434 features and 2 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -176.6847 ymin: -14.37374 xmax: 145.8304 ymax: 71.34122
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
First 10 features:
ZCTA5CE10 STATE geometry
1 35442 1 MULTIPOLYGON (((-88.25262 3...
2 35442 1 MULTIPOLYGON (((-88.25262 3...
3 35442 1 MULTIPOLYGON (((-88.25262 3...
Now, I tried to filter all the small islands and Alaska:
remove_list <- c("02", "15", "72", "66", "78", "60", "69",
"64", "68", "70", "74", "81", "84", "86", "87", "89", "71", "76",
"95", "79")
big_df %>% filter(!STATE %in% map(remove_list, as.integer)) %>%
tm_shape(.) + tm_polygons('pt_count',palette = "Reds",
style = "quantile", n = 10,
title = "counts")
but I still get some tiny islands.
What am I missing here? Thanks!
Here is a way of obtaining the geometry (country outline) for mainland US:
library(raster)
library(sf)
library(dplyr)
us = getData('GADM', country='USA', level=0) %>%
st_as_sf() %>%
st_cast("POLYGON") %>%
mutate(area = st_area(.)) %>%
arrange(desc(area)) %>%
slice(1) # mainland US should be the largest
You can then use this to run st_intersection(big_df, us)
to extract only the parts of big_df
within us
. Note that it may pay off to first create a st_buffer
or st_convex_hull
around us
to ensure your big_df
doesn't get clipped somewhere along its borders.