rr-sfr-mapedit

Read in a list of shapefiles and row bind them in R (preferably using tidy syntax and sf)


I have a directory with a bunch of shapefiles for 50 cities (and will accumulate more). They are divided into three groups: cities' political boundaries (CityA_CD.shp, CityB_CD.shp, etc.), neighborhoods (CityA_Neighborhoods.shp, CityB_Neighborhoods.shp, etc.), and Census blocks (CityA_blocks.shp, CityB_blocks.shp, etc.). They use common file-naming syntaxes, have the same set of attribute variables, and are all in the same CRS. (I transformed all of them as such using QGIS.) I need to write a list of each group of files (political boundaries, neighborhoods, blocks) to read as sf objects and then bind the rows to create one large sf object for each group. However I am running into consistent problems developing this workflow in R.

library(tidyverse)
library(sf)
library(mapedit)

# This first line succeeds in creating a character string of the files that match the regex pattern.
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)

# This second line creates a list object from the files.
shapefile_list <- lapply(filenames, st_read)

# This third line (adopted from https://github.com/r-spatial/sf/issues/798) fails as follows.
districts <- mapedit:::combine_list_of_sf(shapefile_list)
Error: Column `District_I` cant be converted from character to numeric

# This fourth line fails in an apparently different way (also adopted from https://github.com/r-spatial/sf/issues/798).
districts <- do.call(what = sf:::rbind.sf, args = shapefile_list)
Error in CPL_get_z_range(obj, 2) : z error - expecting three columns;

The first error appears to be indicating that one of my shapefiles has an incorrect variable class for the common variable District_I but R provides no information to clue me into which file is causing the error.

The second error seems to be looking for a z coordinate but is only finding x and y in the geometry attribute.

I have four questions on this front:

  1. How can I have R identify which list item it is attempting to read and bind is causing an error that halts the process?
  2. How can I force R to ignore the incompatibility issue and coerce the variable class to character so that I can deal with the variable inconsistency (if that's what it is) in R?
  3. How can I drop a variable entirely from the read sf objects that is causing an error (i.e. omit District_I for all read_sf calls in the process)?
  4. More generally, what is going on and how can I solve the second error?

Thanks all as always for your help.

P.S.: I know this post isn't "reproducible" in the desired way, but I'm not sure how to make it so besides copying the contents of all my shapefiles. If I'm mistaken on this point, I'd gladly accept any wisdom on this front.

UPDATE: I've run

filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
shapefile_list <- lapply(filenames, st_read)
districts <- mapedit:::combine_list_of_sf(shapefile_list)

successfully on a subset of three of the shapefiles. So I've confirmed that there is some class conflict between the column District_I in one of the files causing the hold-up when running the code on the full batch. But again, I need the error to identify the file name causing the issue so I can fix it in the file OR need the code to coerce District_I to character in all files (which is the class I want that variable to be in anyway).

A note, particularly regarding Pablo's recommendation:

districts <- do.call(what = dplyr::rbind_all, shapefile_list)

results in an error Error in (function (x, id = NULL) : unused argument

followed by a long string of digits and coordinates. So,

mapedit:::combine_list_of_sf(shapefile_list)

is definitely the mechanism to read from the list and merge the files, but I still need a way to diagnose the source of the column incompatibility error across shapefiles.


Solution

  • So after much fretting and some great guidance from Pablo (and his link to https://community.rstudio.com/t/simplest-way-to-modify-the-same-column-in-multiple-dataframes-in-a-list/13076), the following works:

    library(tidyverse)
    library(sf)
    
    # Reads in all shapefiles from Directory that include the string "_CDs".
    filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
    
    # Applies the function st_read from the sf package to each file saved as a character string to transform the file list to a list object.
    shapefile_list <- lapply(filenames, st_read)
    
    # Creates a function that transforms a problem variable to class character for all shapefile reads.
    my_func <- function(data, my_col){
      my_col <- enexpr(my_col)
    
      output <- data %>% 
        mutate(!!my_col := as.character(!!my_col))
    }
    
    # Applies the new function to our list of shapefiles and specifies "District_I" as our problem variable.
    districts <- map_dfr(shapefile_list, ~my_func(.x, District_I))