rspatialtidycensus

Long/Lat points keep ending up in Kansas - sf, tidycensus


I am trying to make a point file using longitude and latitudes, and then use st_join with st_within to match them to census tracts. But the points keep ending up in Kansas. If you are using the tidycensus library w/ the API then there is reproducible code:

Code for dummy data points mostly in Colorado, and tract boundaries in Colorado and Kansas:

library(tidycensus)
library(sf)
library(dplyr)
library(tidyverse)

# Set seed for reproducibility
set.seed(42)

# Generate dummy data for points in New York
points <- data.frame(
  longitude = runif(300, min = -109, max = -102),  # Approximate longitude boundaries of Colorado
  latitude = runif(300, min = 36.993076, max = 41)  # Approximate latitude boundaries of Colorado
)

# Print the first few rows of the dummy data
points <- st_as_sf(points, coords = c("longitude", "latitude"), crs = "ESRI:102003")

tract2010 <- get_decennial(geography = "tract", variables = "P001001", year = 2010,
                          state = as.list(c("Colorado", "Kansas")), geometry = TRUE)

tract2010$state_code <- substr(tract2010$GEOID, 1, 2)
table(tract2010$state_code)

# make same CRS
tract2010 <- st_transform(tract2010, st_crs(points))`

Mapped it in leaflet to make sure the points were in the right place:

# test where it is
library(leaflet)
leaflet() %>%
  addTiles() %>%
  addMarkers(data = points)

enter image description here

Ran the join and checked the matches. From the table, all points are in state code 20 (Kansas)

#spatial join
points <- st_join(points, tract2010, join = st_within)
table(points$state_code, useNA = "always")

Solution

  • You are defining the coordinate system of your points data as ESRI:102003 but the longitude and latitude of your points data are in either WGS84 or NAD83. I have reproduced your entire code for the sake of clarity and annotated the extra step you needed. The following assumes your original points data are NAD83 (EPSG:4269), add the correct EPSG code if this is not correct:

    library(tidycensus)
    library(sf)
    library(dplyr)
    library(tidyverse)
    library(ggplot2)
    
    # Set seed for reproducibility
    set.seed(42)
    
    # Generate dummy data for points 
    points <- data.frame(
      longitude = runif(300, min = -109, max = -102),  # Approximate longitude boundaries of Colorado
      latitude = runif(300, min = 36.993076, max = 41)  # Approximate latitude boundaries of Colorado
    )
    
    # NAD83 points to ESRI:102003
    points <- st_as_sf(points, coords = c("longitude", "latitude")) %>%
      st_set_crs(4269) %>% # This is the bit you missed
      st_transform("ESRI:102003")
    
    # Get census tracts
    tract2010 <- get_decennial(geography = "tract", variables = "P001001", year = 2010,
                               state = as.list(c("Colorado", "Kansas")), geometry = TRUE)
    
    # Create new state_code variable
    tract2010$state_code <- substr(tract2010$GEOID, 1, 2)
    
    # Transform
    tract2010 <- st_transform(tract2010, st_crs(points))
    
    # Spatial join
    points <- st_join(points, tract2010_1, join = st_within)
    
    ggplot() +
      geom_sf(data = tract2010) +
      geom_sf(data = points,
              aes(colour = state_code))
    

    result