I have a movement dataset and a map of Canada as a shapefile (.shp). An example of the movement dataset looks like this (the coordinates are projected in ESPG:5321):
longitude latitude DateTimeRounded
459663.3 7181890 2007-09-10
459734.2 7181938 2007-09-11
459680.5 7181933 2007-09-12
459640.1 7181893 2007-09-13
459605.2 7181897 2007-09-14
459928.7 7182175 2007-09-15
459855.1 7182104 2007-09-16
I'm not sure how to attach or show the shapefile, but if any one has suggestions, please let me know!
My goal is to create a new column in my dataset called "region", which indicates in which province my each data point is. If a datapoint is outside the borders of any province (i.e., in the sea), I want the associated entry in the region column to be NA.
The final dataset would look something like this:
longitude latitude DateTimeRounded region
459663.3 7181890 2007-09-10 Manitoba
459734.2 7181938 2007-09-11 Manitoba
459680.5 7181933 2007-09-12 Manitoba
459640.1 7181893 2007-09-13 NA
459605.2 7181897 2007-09-14 NA
459928.7 7182175 2007-09-15 NA
459855.1 7182104 2007-09-16 Ontario
Originally, I used the package rgdal to do this, but since it has been retired I'm not sure how to go about it anymore. Would any one know how I can do this using other spatial packages like sp or terra?
This is what the original code looked like for reference:
setwd("CanadaMap/") # Directory for Canada Map
CANmap = readOGR(dsn=".", layer= "canada 5321")#File path to Canada Map
coordinates(df) = ~longitude + latitude #change longitude/latitude to what the lat/long columns are in your data frame
proj4string(df) <- CRS("+init=epsg:5321") #Use the correct reference system for your data
df <- spTransform(df, CRS("+proj=lcc +lat_1=44.5 +lat_2=54.5 +lat_0=0 +lon_0=-84 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")) #denotes what CRS to use for df (same as mapCRS above)
df$region = over(df,CANmap)$NAME #adds column for region (which was province)
df<-as.data.frame(df)
The sf
package may work for your case.
library(sf)
library(dplyr)
# Load example North Carolina data set from sf package
example_shapefile <- st_read(system.file("shape/nc.shp", package="sf"))
# Generate a bounding box
example_bbox <- st_bbox(example)
# This is just to create random fake coordinates within the bounding box
example_coords <- data.frame(
id = 1:40,
long = runif(40,example_bbox[1],example_bbox[3]),
lat = runif(40,example_bbox[2],example_bbox[4])
) |>
st_as_sf(
coords = c("long", "lat")
)
# Set CRS of our fake data to match our example shapefile
# Check example shapefile CRS with st_crs(example_shapefile)
st_crs(example_coords) <- 4267
Once we have our data prepared, then it's only a few more lines of code:
# Get our results
result <- st_join(
x = example_coords,
y = example_shapefile,
join = st_within,
left = TRUE
) |>
select(
id, NAME, geometry
)
head(result)
Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -83.27518 ymin: 33.9193 xmax: -75.91465 ymax: 36.58093
Geodetic CRS: NAD27
id NAME geometry
1 1 Sampson POINT (-78.17492 35.05357)
2 2 Person POINT (-79.03105 36.47649)
3 3 <NA> POINT (-76.11417 34.39561)
4 4 <NA> POINT (-83.27518 36.58093)
5 5 <NA> POINT (-75.91465 34.83266)
6 6 <NA> POINT (-77.34693 33.9193)
In this case, NAME corresponds to county in the North Carolina shapefile, but in your dataset would correspond to Region. Our spatial join specifies left = TRUE
to signify a left join to keep all of our lat/lon data and fill in NAs if there's no matching region that the coordinates fall within from our shapefile. Note that the sf
package uses data.frames with a geometry list column to store spatial data. Depending on what else you need to do afterwards, this may be fine or you may need to transform the geometry
list column back into separate lat/lon columns.