I am trying to make a point file using longitude and latitudes, and then use st_join with st_within to match them to census tracts. But the points keep ending up in Kansas. If you are using the tidycensus library w/ the API then there is reproducible code:
Code for dummy data points mostly in Colorado, and tract boundaries in Colorado and Kansas:
library(tidycensus)
library(sf)
library(dplyr)
library(tidyverse)
# Set seed for reproducibility
set.seed(42)
# Generate dummy data for points in New York
points <- data.frame(
longitude = runif(300, min = -109, max = -102), # Approximate longitude boundaries of Colorado
latitude = runif(300, min = 36.993076, max = 41) # Approximate latitude boundaries of Colorado
)
# Print the first few rows of the dummy data
points <- st_as_sf(points, coords = c("longitude", "latitude"), crs = "ESRI:102003")
tract2010 <- get_decennial(geography = "tract", variables = "P001001", year = 2010,
state = as.list(c("Colorado", "Kansas")), geometry = TRUE)
tract2010$state_code <- substr(tract2010$GEOID, 1, 2)
table(tract2010$state_code)
# make same CRS
tract2010 <- st_transform(tract2010, st_crs(points))`
Mapped it in leaflet to make sure the points were in the right place:
# test where it is
library(leaflet)
leaflet() %>%
addTiles() %>%
addMarkers(data = points)
Ran the join and checked the matches. From the table, all points are in state code 20 (Kansas)
#spatial join
points <- st_join(points, tract2010, join = st_within)
table(points$state_code, useNA = "always")
You are defining the coordinate system of your points data as ESRI:102003 but the longitude and latitude of your points data are in either WGS84 or NAD83. I have reproduced your entire code for the sake of clarity and annotated the extra step you needed. The following assumes your original points data are NAD83 (EPSG:4269), add the correct EPSG code if this is not correct:
library(tidycensus)
library(sf)
library(dplyr)
library(tidyverse)
library(ggplot2)
# Set seed for reproducibility
set.seed(42)
# Generate dummy data for points
points <- data.frame(
longitude = runif(300, min = -109, max = -102), # Approximate longitude boundaries of Colorado
latitude = runif(300, min = 36.993076, max = 41) # Approximate latitude boundaries of Colorado
)
# NAD83 points to ESRI:102003
points <- st_as_sf(points, coords = c("longitude", "latitude")) %>%
st_set_crs(4269) %>% # This is the bit you missed
st_transform("ESRI:102003")
# Get census tracts
tract2010 <- get_decennial(geography = "tract", variables = "P001001", year = 2010,
state = as.list(c("Colorado", "Kansas")), geometry = TRUE)
# Create new state_code variable
tract2010$state_code <- substr(tract2010$GEOID, 1, 2)
# Transform
tract2010 <- st_transform(tract2010, st_crs(points))
# Spatial join
points <- st_join(points, tract2010_1, join = st_within)
ggplot() +
geom_sf(data = tract2010) +
geom_sf(data = points,
aes(colour = state_code))