google-bigqueryopenstreetmapmap-matching

Using Big Query and OSM to find road type closest to a coordinate


Suppose I have a table called gps_points in Big query:

ID | lon | lat

and using OSM, for each point in gps_points table, I want to see what is the road type (motorway...)

The main challenge is that the points in gps_points may not be exactly the same as those in OSM.

For example, running this query returns only 1 point:

with points as 
(
  select  st_geogpoint(lon, lat) gpoint FROM `prj-test.gps_points` LIMIT 1000
)

select gpoint from points inner join `bigquery-public-data.geo_openstreetmap.planet_features` osm 

ON ST_CONTAINS(geometry, gpoint) 

WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) limit 1000

I am struggling in:

ON ST_CONTAINS(geometry, gpoint). I can add WHERE ST_Distance(geometry, gpoint) < 5 for example, but that only works if I do cross join. My tables are huge and I am not sure if cross join is the best answer.

Is there an efficient way to do this kind of lookup?


Solution

  • For BigQuery there is not much difference between

    INNER JOIN ... ON ST_CONTAINS(geometry, gpoint),

    CROSS JOIN ... WHERE ST_Distance(geometry, gpoint) < 5,

    or (also legal)

    INNER JOIN ... ON ST_Distance(geometry, gpoint) < 5.

    They are optimized similarly by the query planner, and you get optimized spatial join as described in the doc: https://cloud.google.com/bigquery/docs/geospatial-data#using_joins_with_spatial_data

    Also, even though that doc does not mention ST_Distance(...) < ..., it works with this function too! Such condition is rewritten automatically as ST_DWithin(geometry, gpoint, 5) by the query planner. You'll see CROSS JOIN ON in the Execution Details, but the join is still optimized. See https://mentin.medium.com/execution-details-for-spatial-join-7274356e0115 to check if it works in specific case.