I have 2 dataframes in R that both have this type of structure:
Lon | Lat | Measurement |
---|---|---|
5 | 7 | 15 |
5 | 8 | 20 |
The numbers in the actual dataframes are different from this example but I don't think that should impact the question or response.
I want to find the average of the measurement column in a 10 meter by 10 meter area, and then take the ratio of the average in each area between the 2 dataframes. I have already been able to take the average in a 10m x 10m area for a single dataframe, but I've had trouble adding in the information from the second dataframe. Here is the code I have so far:
#some of the packages are here for plotting later
library(sf)
library(dplyr)
library(leaflet)
library(RColorBrewer)
library(osmdata)
# Example dataframes `df` and `df2`
# df <- data.frame(
# lat = c(7, 8, 9),
# lon = c(3, 3, 3),
# measurement = c(10, 20, 30) # Example measurement values
# )
# df2 <- data.frame(
# lat = c(7.00001, 8.00001, 9.00001),
# lon = c(3, 3, 3),
# measurement = c(5, 15, 25) # Example measurement values
# )
# Convert dataframes to sf objects with coordinates and CRS
df_sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
df2_sf <- st_as_sf(df2, coords = c("lon", "lat"), crs = 4326)
# Transform to a metric CRS
df_sf <- st_transform(df_sf, crs = 32630)
df2_sf <- st_transform(df2_sf, crs = 32630)
# Define bounding box
bbox <- st_bbox(df_sf)
# Create a common grid of 10m x 10m cells
grid <- st_make_grid(df_sf, cellsize = c(10, 10), square = TRUE)
grid_sf <- st_sf(geometry = grid)
# Spatial join: assign points to grid cells
df_joined <- st_join(df_sf, grid_sf, join = st_intersects)
df2_joined <- st_join(df2_sf, grid_sf, join = st_intersects)
# Aggregate data within each grid cell for both dataframes
aggregated_data_df <- df_joined %>%
group_by(geometry) %>%
summarize(measurement_avg_df = mean(measurement, na.rm = TRUE), .groups = 'drop') %>%
st_as_sf()
aggregated_data_df2 <- df2_joined %>%
group_by(geometry) %>%
summarize(measurement_avg_df2 = mean(measurement, na.rm = TRUE), .groups = 'drop') %>%
st_as_sf()
# Perform a join to merge the two aggregated datasets based on the grid cell geometry
merged_data <- st_join(aggregated_data_df, aggregated_data_df2, by = "geometry")
# Compute the ratio
merged_data <- merged_data %>%
mutate(ratio = measurement_avg_df / measurement_avg_df2)
# Compute center points of the merged data
merged_data <- st_centroid(merged_data)
# Convert the merged data back to the original CRS
merged_data <- st_transform(merged_data, crs = 4326)
# Extract longitude and latitude from the geometry column
merged_data <- merged_data %>%
mutate(lon = st_coordinates(geometry)[, 1],
lat = st_coordinates(geometry)[, 2]) %>%
select(-geometry)
When I run this code, here is what the merged_data dataframe looks like after this code is run:
measurement_avg_df | measurement_avg_df2 | ratio | lon | lat | geometry |
---|---|---|---|---|---|
5.5 | NA | NA | 3 | 7 | Point (3 7) |
15.5 | NA | NA | 3 | 8 | Point (3 8) |
What should I do to prevent the NA values from appearing in the merged table?
You could use a different geometry predicate function when calling sf_join()
to create merged_data
? For example if you do:
# Perform a join to merge the two aggregated datasets based on the grid cell geometry
merged_data <- st_join(aggregated_data_df, aggregated_data_df2, join=st_nearest_feature)
instead of the default st_intersects
, you get the following:
Simple feature collection with 3 features and 5 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3 ymin: 7 xmax: 3 ymax: 9
Geodetic CRS: WGS 84
# A tibble: 3 × 6
measurement_avg_df measurement_avg_df2 ratio lon lat geometry
<dbl> <dbl> <dbl> <dbl> <dbl> <POINT [°]>
1 10 5 2 3 7 (3 7)
2 20 15 1.33 3 8 (3 8)
3 30 25 1.2 3.00 9 (3 9)
I don't know if moving to st_nearest_feature
works with your real-world question or not.