I am working in R
with a regular dataframe (df
) and a shapefile (map2
), the share a common column called CD116FP
. df
has 103552 lines while map2
has 444 .I am loading the shapefile in the following way:
map2 <- read_sf("D:/Data/tl_2019_us_cd116.shp")
My end-goal is to use the function mapview()
to view the map included in map2
with the "intensity" that is described in df
under the column np_scores
. I hence do not want observations of df
that do not appear on map2
.
Here are my thoughts and failures:
If these two objects were regular dataframes, a reasonable candidate would be to use merge()
to combine both objects, however if you apply that function in this case, the resulting object looses the spatial properties and mapview
does not know how to read it.
Another approach that I used was trying this line of code:
map2m<-data.frame(map2, df[match(map2$CD116FP, df$CD116FP),])
But the result has dimensions that are too big (much bigger that 444 lines) and hence mapview
crashes when trying to plot the desired map.
np
to map2
:map2$np=10
for (i in c(1:nrow(map2)))
{
for (j in c(1:nrow(df)))
{
if (identical(map2$CD116FP[i],df$CD116FP[j]))
{map2$np[i]=df$np_score[j]}
else {map2$np[i]=0}
}
}
However, this approach just takes way too much time given the dimensions of my dataframe.
Do you have any suggestions?
I'm a bit puzzled by the structure of your data. Your df
has over 100,000 rows, so I'm guessing that the same CD116FP
occurs multiple times in df
, and the npscore
will presumably vary across these instances. If you want to merge these to map2
you will need to aggregate them first.
Let's try to recreate a similar setup:
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
map2 <- read_sf("C:/users/administrator/documents/shape/tl_2019_us_cd116.shp")
set.seed(69)
df <- data.frame(CD116FP = sprintf("%02d", sample(0:99, 103552, TRUE)),
npscores = runif(103552))
head(df)
#> CD116FP npscores
#> 1 95 0.6927742
#> 2 80 0.8543845
#> 3 90 0.5220353
#> 4 01 0.1449647
#> 5 76 0.9876543
#> 6 38 0.5629950
I have made df
have the same number of rows that your data has to show this solution will scale to your problem.
Let's aggregate the npscores
with dplyr
:
library(dplyr)
df_sum <- df %>%
filter(CD116FP %in% map2$CD116FP) %>%
group_by(CD116FP) %>%
summarise(npscores = mean(npscores))
map2$npscores <- df_sum$npscores[match(map2$CD116FP, df_sum$CD116FP)]
Now map2
has the aggregated npscores
we can plot - for example, in ggplot:
library(ggplot2)
ggplot(map2) +
geom_sf(aes(fill = npscores)) +
coord_sf(xlim = c(-180, -60),
ylim = c(15, 70)) +
scale_fill_gradient(low = "red", high = "gold")
Or in mapview:
library(mapview)
mapView(map2, zcol = "npscores")
Created on 2020-09-19 by the reprex package (v0.3.0)