rgisr-maptoolstidycensus

dotsInPolys length mismatch using data downloaded via tidycensus


Can you help figure out the best way to resolve the length mismatch error thrown by dotsInPolys? I think it is because there are NA's or NULLs or some funk in the polygon data that makes it too long. Here's code that reproduces the error. Ultimately, I want to plot multiple races using Leaflet, but I can't produce the lat/lon needed for the random dots at this point.

require(maptools)
require(tidycensus)

person.number.divider <- 1000

census_api_key("ENTER KEY HERE", install = TRUE)

racevars <- c(White = "B02001_002", #"P005003" 
              Black = "B02001_003", #Black or African American alone
              Latinx = "B03001_003"
)

nj.county <- get_acs(geography = "county", #tract
              year = 2015,
              variables = racevars,
              state = "NJ", #county = "Harris County",
              geometry = TRUE,
              summary_var = "B02001_001")

library(sf)
st_write(nj.county, "nj.county.shp", delete_layer = TRUE)

nj <- rgdal::readOGR(dsn = "nj.county.shp") %>%
  spTransform(CRS("+proj=longlat +datum=WGS84"))

nj@data <- nj@data %>% 
  tidyr::separate(NAME,
                  sep =",",
                  into = c("county", "state"))  %>%
  dplyr::select(estimat,variabl, GEOID, county) %>%
  spread(key = variabl, value = estimat) %>%
  mutate(county = trimws(county))


black.dots <- dplyr::select(nj@data, Black) / person.number.divider #%>%
black.dots <-   dotsInPolys(nj, as.integer(black.dots$Black), f="random")

# Error in dotsInPolys(nj, as.integer(black.dots$Black), f = "random") : 
# different lengths

length(nj) # 63 This seems too many, because I believe NJ has 21 counties.
length(black.dots$Black) # 21

This post (Advice on troubleshooting dotsInPolys error (maptools)) came close to helping me, but I couldn't see how to apply it to my case.

I can change the length of the nj spatialpolygonsdataframe by removing NA's and counties with a black pop greater than 0, but then the map doesn't plot multiple counties (maybe there is something wrong with the census download?).


Solution

  • It looks like you might have gotten this figured out, but I wanted to share another approach that uses sf::st_sample() instead of maptools::dotsInPolys(). One advantage of this is that you don't need to convert the sf object you get from tidycensus to a sp object.

    In the following example I split the census data by race into a list three sf objects then perform st_sample() on each element of the list (each race). Next, I recombine the sampled points into one sf object with a new race variable for each point. Finally, I use tmap to make a map, though you could use ggplot2 or leaflet to map as well.

    library(tidyverse)
    library(tidycensus)
    library(sf)
    library(tmap)
    
    person.number.divider <- 1000
    
    racevars <- c(White = "B02001_002", #"P005003" 
                  Black = "B02001_003", #Black or African American alone
                  Latinx = "B03001_003"
                  )
    
    # get acs data with geography in "tidy" form
    nj.county <- get_acs(geography = "county", #tract
                         year = 2015,
                         variables = racevars,
                         state = "NJ", #county = "Harris County",
                         geometry = TRUE,
                         summary_var = "B02001_001"
                         )
    
    # split by race
    county.split <- nj.county %>% 
      split(.$variable)
    
    # randomly sample points in polygons based on population
    points.list <- map(county.split, ~ st_sample(., .$estimate / person.number.divider))
    
    # combine points into sf collections and add race variable 
    points <- imap(points.list, ~ st_sf(tibble(race = rep(.y, length(.x))), geometry = .x)) %>% 
      reduce(rbind)
    
    # map!
    tm_shape(nj.county) +
      tm_borders(col = "darkgray", lwd = 0.5) +
      tm_shape(points) +
      tm_dots(col = "race", size = 0.01, pal = "Set2")
    

    I don't have enough rep to post the map image directly, but here it is.