sqlrr-sfspatial-query

Can subqueries be used in st_read()?


The st_read() command from the sf package in R can take SQL queries (see example here) when reading in shapefiles, but I'd like to be able to include a subquery.

For example, there's a shapefile here with US states:

library(sf)

p <- './cb_2018_us_state_20m.shp' #Path
allStates <- st_read(p) #Reads full dataset

#Query for two states
q1 <- 'select * from \"cb_2018_us_state_20m\" where NAME in (\'Oregon\',\'California\')'

twoStates <- st_read(p,query = q1) #Works

How would this work if there was a separate data frame or vector of States that I wanted to include this as a subquery? I've tried (unsuccessfully) various versions of SQL's in operator but it keeps returning syntax errors:

moreStateNames <- c('Oregon','California','Texas') #Vector of state names
q2 <- 'select * from \"cb_2018_us_state_20m\" where NAME in (\"moreStateNames\")'
moreStates <- st_read(p,query = q2) #Fails

Error: Query execution failed, cannot open layer. In addition: Warning message: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL Error 1: "moreStateNames" not recognised as an available field.

dfStates <- data.frame(id=1:3,moreStateNames=moreStateNames) #Dataframe with state names
q3 <- 'select * from \"cb_2018_us_state_20m\" where NAME in (select moreStateNames from \"dfStates\")'
moreStates <- st_read(p,query = q3) #Fails

Error: Query execution failed, cannot open layer. In addition: Warning message: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL Error 1: SQL Expression Parsing Error: syntax error, unexpected SELECT. Occurred around : m "cb_2018_us_state_20m" where NAME in (select moreStateNames from "dfStates")

Where am I messing up here? What are my options other than reading the entire shapefile and then filtering it post-hoc?


Solution

  • You need to expand the vector moreStateNames into a comma-separated string in R before submitting the query. Using sprintf saves you some of the hazzle with quotation marks:

    
        q2 <- sprintf('select * from \"%s\" where NAME in (\'%s\')',
                      "cb_2018_us_state_20m",
                      paste(moreStateNames, collapse = "','")
                      )
    
    

    results in:

    
        ## > q2
        ## [1] "select * from \"cb_2018_us_state_20m\" where NAME in  ('Oregon','California','Texas')"
    
    

    so that:

    
        ## > moreStates <- st_read(p,query = q2) # Works
        ## Reading query `select * from "cb_2018_us_state_20m" where NAME in ('Oregon','California','Texas')'
        ## from data source
        ## `full\path\to\cb_2018_us_state_20m\cb_2018_us_state_20m.shp' 
        ##   using driver `ESRI Shapefile'
        ## Simple feature collection with 3 features and 9 fields
        ## Geometry type: MULTIPOLYGON
        ## Dimension:     XY
        ## Bounding box:  xmin: -124.5524 ymin: 25.84012 xmax: -93.53094 ymax: 46.26913
        ## Geodetic CRS:  NAD83