Geohashes form a grid, much like a matrix.
If I have a table with a geohash7 and a value, how could I calculate say the average of all the adjacentcells on BigQuery?
Sample source table
geohash7 | value |
---|---|
r1q6fh3 | 7 |
r1q6fh6 | 2 |
r1q6fh7 | 5 |
r1q6fh1 | 7 |
r1q6fh4 | 2 |
r1q6fh5 | 6 |
r1q6f5c | 4 |
r1q6f5f | 2 |
r1q6f5g | 7 |
Sample output table
geohash7 | value |
---|---|
r1q6fh4 | 4.6 |
... | ... |
Ideally in a scalable way as I want to run this over a large landmass.
First you'll need a function that returns neighbors. It is probably easier to grab one of javascript geohash libraries and use it as UDF, rather than do bit fiddling in SQL.
Once you have a function that returns neighbors, say
GeoHashNeibors(center STRING) RETURNS ARRAY<STRING>
You can then join original table with the flattened list of neighbors and compute average. Something like
WITH with_neighbors AS (
select geohash, value, neighbor
from data d, unnest(GeoHashNeibors(d.geohash)) neighbor
),
joined AS (
select t1.geohash, t1.value, t1.neighbor, t2.value as n_value
from with_neighbors t1 left join data t2
on t1.neighbor = t2.geohash
)
SELECT geohash, ANY_VALUE(value) value, AVG(n_value) AS avg_neighbor
FROM joined
GROUP BY 1