rloopsforeachparallel-processinggis

How to resolve "external pointer is not valid" with foreach() in R


I am working on a code that calculates park area within a given distance from each population pixel cell. Both datasets are 30m rasters, from which I'm extracting the class and xy coordinates (in meters) to create dataframes. Because the datasets are massive, I'm trying to do the calculations county-by-county.

The TIFFs for my population data represent roughly one county each, so I'm using them to clip the park raster in the same call that I read them in. My code functions fine, but I've been working to make it a parallel process because the processing time for doing this for the entire country is around a full work day otherwise. I have adapted my original code to use foreach(), which makes more efficient use of my machine than the original code.

This code functions when I run each step manually, but throws the following error when I try to run it in parallel: "Error in { : task 1 failed - "i In argument: 'Count = dist_Euclidean(...)'. Caused by error: ! error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': external pointer is not valid" To emphasize, this code gives me the exact output I want, if I manually set "code" and "i" and click through it without running the foreach(). The issue is in the parallelization.

**Note: I know it's best practice to give the data you use, but these datasets are very large. "counties" and "states" are USA Census Bureau shapefiles, and "parks" is my raster of park areas across the country (30m pixels). "rasterlist" pulls the file name of every population TIFF I have. I don't think I can provide this data for testing, so hopefully this can be a theoretical exercise?

pkgList <- c("parallel", "parallelly","doParallel","foreach","terra","sf","dplyr","tidyr","stringi","stringr")
for(package.i in pkgList) {
  suppressPackageStartupMessages(library(package.i, character.only = TRUE))} ; rm(package.i)

rasterlist <- as.list(gsub("._","",list.files(path = "tifs", pattern = '.tif$', all.files = F, full.names = T)))
states <- st_read("StateLines/cb_2018_us_state_500k.shp")
counties <- st_read("CountyLines/cb_2018_us_county_500k.shp") %>% 
  mutate(State = stri_replace_all_fixed(STATEFP, states$STATEFP, states$NAME, vectorize_all = FALSE),
         code_st = paste(STATEFP,"-", sep = ""),
         State = gsub(" ", "", State), StateCode = paste(State, "-", sep = ""),
         NAME = case_when(NAME == "Miami-Dade" ~ "MiamiDade", TRUE ~ NAME)) %>% #stupid Miami-Dade
  filter(!State %in% c("Alaska","Hawaii","CommonwealthoftheNorthernMarianaIslands","UnitedStatesVirginIslands","AmericanSamoa","Guam","PuertoRico")) #I only have pop data for the continental US at the moment
parks <- rast("ParksProject.tif")

dist_Euclidean <- function(x, y, dfrm, accDist) {
  count <- 0
  dfrm1 <- data.frame(x=x,y=y)
  for (i in 1:nrow(dfrm1)){
    hold <- dfrm1[i,]
    count <- append(count, length(which(sqrt((dfrm$x - hold$x)^2 + (dfrm$y - hold$y)^2)<accDist)))}
  return(count[-1])}

accessDist <- 500 ###Distance used to define "access", in meters

cl <- makeCluster(availableCores(omit = 1, constraints = "connections", logical = FALSE), type = "PSOCK")
registerDoParallel(cl)

  for (code in unique(counties$code_st)) {
    df.state <- as.character(rasterlist[str_detect(rasterlist, code)])
    counties_temp <- st_transform(counties %>% filter(code_st == code), st_crs(parks))
    parks_state <- crop(parks, extend(ext(counties_temp), accessDist/30+5))
write.csv(  
    foreach(i = 1:length(df.state), .packages = pkgList, .combine = "rbind") %dopar% {
      temp <- as.data.frame(rast(df.state[i]), xy = TRUE) %>% 
        mutate(File = paste(df.state[i]), 
               State = stri_replace_all_fixed(File, counties_temp$code_st, counties_temp$StateCode, vectorize_all = FALSE),
               State = stri_replace_all_fixed(State, counties_temp$COUNTYFP, counties_temp$NAME, vectorize_all = FALSE)) %>%
        separate_wider_delim(State, names = c("Trash1","Trash2","State","County"), delim = c("-")) %>% 
        mutate(County = str_remove(County, ".tif")) %>% select(-c("Trash1","Trash2")) %>% 
        rename("Pop" = starts_with("neon-"))%>% mutate(Count = dist_Euclidean(x, y, as.data.frame(crop(parks_state, extend(ext(rast(df.state[i])),accessDist/30+5)), xy = TRUE), accDist = accessDist))
    }, (paste(unique(counties_temp$State), "Parks.csv", sep = "")))
rm(df.state, counties_temp, parks_state)
gc()
  }
stopCluster(cl)

An important point: this functioned with the foreach() loop in parallel before I tried to add the dist_Euclidean function into it. If I drop %>% mutate(Count = dist_Euclidean... accessDist)), this works - it just creates a dataframe of the population data. The issue seems to be in the park dataframe creation, but only when I try to do this as a parallel process. I've tried to add the parks_state raster to each parallelization manually with the .export call within foreach, but that doesn't fix the issue.

What's going on here?

EDIT:: Adding traceback output

> traceback()
9: stop(simpleError(msg, call = expr))
8: e$fun(obj, substitute(ex), parent.frame(), e$data)
7: foreach(i = 1:length(df.state), .packages = pkgList, .combine = "rbind") %dopar% 
       {
           temp <- as.data.frame(rast(df.state[i]), xy = TRUE) %>% 
               mutate(File = paste(df.state[i]), State = stri_replace_all_fixed(File, 
                   counties_temp$code_st, counties_temp$StateCode, 
                   vectorize_all = FALSE), State = stri_replace_all_fixed(State, 
                   counties_temp$COUNTYFP, counties_temp$NAME, vectorize_all = FALSE)) %>% 
               separate_wider_delim(State, names = c("Trash1", "Trash2", 
                   "State", "County"), delim = c("-")) %>% mutate(County = str_remove(County, 
               ".tif")) %>% select(-c("Trash1", "Trash2")) %>% rename(Pop = starts_with("neon-")) %>% 
               mutate(Count = dist_Euclidean(x, y, as.data.frame(crop(parks_state, 
                   extend(ext(rast(df.state[i])), accessDist/30 + 
                     5)), xy = TRUE), accDist = accessDist))
       }
6: is.data.frame(x)
5: utils::write.table(foreach(i = 1:length(df.state), .packages = pkgList, 
       .combine = "rbind") %dopar% {
       temp <- as.data.frame(rast(df.state[i]), xy = TRUE) %>% mutate(File = paste(df.state[i]), 
           State = stri_replace_all_fixed(File, counties_temp$code_st, 
               counties_temp$StateCode, vectorize_all = FALSE), 
           State = stri_replace_all_fixed(State, counties_temp$COUNTYFP, 
               counties_temp$NAME, vectorize_all = FALSE)) %>% separate_wider_delim(State, 
           names = c("Trash1", "Trash2", "State", "County"), delim = c("-")) %>% 
           mutate(County = str_remove(County, ".tif")) %>% select(-c("Trash1", 
           "Trash2")) %>% rename(Pop = starts_with("neon-")) %>% 
           mutate(Count = dist_Euclidean(x, y, as.data.frame(crop(parks_state, 
               extend(ext(rast(df.state[i])), accessDist/30 + 5)), 
               xy = TRUE), accDist = accessDist))
   }, (paste(unique(counties_temp$State), "Parks.csv", sep = "")), 
       col.names = NA, sep = ",", dec = ".", qmethod = "double")
4: eval(expr, p)
3: eval(expr, p)
2: eval.parent(Call)
1: write.csv(foreach(i = 1:length(df.state), .packages = pkgList, 
       .combine = "rbind") %dopar% {
       temp <- as.data.frame(rast(df.state[i]), xy = TRUE) %>% mutate(File = paste(df.state[i]), 
           State = stri_replace_all_fixed(File, counties_temp$code_st, 
               counties_temp$StateCode, vectorize_all = FALSE), 
           State = stri_replace_all_fixed(State, counties_temp$COUNTYFP, 
               counties_temp$NAME, vectorize_all = FALSE)) %>% separate_wider_delim(State, 
           names = c("Trash1", "Trash2", "State", "County"), delim = c("-")) %>% 
           mutate(County = str_remove(County, ".tif")) %>% select(-c("Trash1", 
           "Trash2")) %>% rename(Pop = starts_with("neon-")) %>% 
           mutate(Count = dist_Euclidean(x, y, as.data.frame(crop(parks_state, 
               extend(ext(rast(df.state[i])), accessDist/30 + 5)), 
               xy = TRUE), accDist = accessDist))
   }, (paste(unique(counties_temp$State), "Parks.csv", sep = "")))

Solution

  • The issue is that terra SpatRasters are non-exportable (https://future.futureverse.org/articles/future-4-non-exportable-objects.html). Thanks to HenrikB for linking the help article above, which explains that and other non-exportable classes/types. I'll just have to find a workaround!