I have a spatial data set and I would like to read to memory only those rows that match a given value in a certain column. Here is a reproducible example with what I've tried.
Using a shape file, I can read the entire data set, but the code throws and error when I try to filter by column value. It looks like it does not recognize the column name
# read entire data set
shp_file <- system.file("shape/nc.shp", package="sf")
data <- read_sf(shp_file, query = "SELECT * FROM nc")
# select rows with specific column value
data <- read_sf(file, query = "SELECT * FROM nc WERE FIPSNO = 37009")
> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,
> : Query execution failed, cannot open layer. In addition: Warning
> message: In CPL_read_ogr(dsn, layer, query, as.character(options),
> quiet, : GDAL Error 1: In ExecuteSQL(): sqlite3_prepare_v2(SELECT *
> FROM nc WERE FIPSNO = 37009): near "FIPSNO": syntax error
Now, using a data set in saved in geopackage format, I cannot even read the entire data set to memory. In this case, it does not seem to recognize the table.
gpkg_file <- system.file("gpkg/nc.gpkg", package="sf")
data <- read_sf(gpkg_file, query = "SELECT * FROM nc")
> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,
> : Query execution failed, cannot open layer. In addition: Warning
> message: In CPL_read_ogr(dsn, layer, query, as.character(options),
> quiet, : GDAL Error 1: In ExecuteSQL(): sqlite3_prepare_v2(SELECT *
> FROM nc): no such table: nc
Your approach feels reasonable; it is somewhat difficult to troubleshoot it without having access to your actual dataset, but we can do with the world geopackage that ships with {spData}
package.
The my first step would be to check the names of layers stored in your geopackage; this will tell you what are legitimate tables to which you can point your WHERE clause.
The second step is running your actual read operation; while sf::st_read()
and sf::read_sf()
are in principle the same thing - they differ only in defaults for some of the optional parameters - I suggest to use sf::st_read()
when in doubt.
In instances of troubleshooting it is also a good idea to assign your parameters explicitly, and not rely on position (dsn should be the first argument by position, but having it assigned by name is bulletproof).
library(sf)
# a GPKG dataset - it lives in spData package
world_dataset <- system.file("shapes/world.gpkg", package = "spData")
# check layers - there may be more than one!
st_layers(world_dataset)
# run the query; note the use of dsn and addressing a layer known to exist
europe <- st_read(dsn = world_dataset,
query = "select * from world where continent = 'Europe'")
# a visual check
plot(st_geometry(europe))
While I believe that reasoneable people can differ in their opinion whether Russian Federation or French Guiana actually belong to Europe the SQL query on technical level clearly works.