snowflake-cloud-data-platformgeopandas

How to properly convert a Snowpark DataFrame with GEOGRAPHY to GeoPandas


I've a Snowflake table with a GEOGRAPHY datatype (polygons), which I try to convert to e GeoPandas DataFrame. The Snowpark Dataframe holds the SHAPE column as (GeographyType)

Calling

gpd.GeoDataFrame(session.table(table_name).to_pandas(),geometry="SHAPE",crs=4326)

results in an exception

  TypeError: Input must be valid geometry objects: {
  "coordinates": [
    [
      [
        [
          6.706410552876440e+00,
          4.658134676959030e+01,
          8.919489999999932e+02
        ],

I came up with an ugly workaround based on Convert pandas dataframe with a geojson-like column to Geopandas dataframe

from ast import literal_eval
from shapely.geometry import shape

df_pd = df_sp.to_pandas()

geom = df_pd.pop("SHAPE").apply(lambda x: shape(literal_eval(x)))
gpd.GeoDataFrame(df_pd, geometry=geom,crs=4326)

Is there a better option?


Solution

  • The idea is to setup GEOGRAPHY_OUTPUT_FORMAT and use shapely.wkt.loads:

    For sample data:

    CREATE OR REPLACE TABLE geospatial_table (id INTEGER, SHAPE GEOGRAPHY);
    INSERT INTO geospatial_table VALUES
      (1, 'POINT(-122.35 37.55)'),
      (2, 'LINESTRING(-124.20 42.00, -120.01 41.99)');
    SELECT * FROM geospatial_table;
    

    Code:

    import geopandas
    import shapely
    
    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
    
    session.sql("ALTER SESSION SET GEOGRAPHY_OUTPUT_FORMAT='WKT';").collect()
    
    pd = session.table('geospatial_table').to_pandas()
    pd['SHAPE'] = pd['SHAPE'].apply(shapely.wkt.loads)
    
    geopandas.GeoDataFrame(pd,geometry="SHAPE",crs=4326)
    

    Output:

    enter image description here

    EDIT:

    This works unless you have Z-Coordinates, in this case we get MULTIPOLYGONZ from Snowflake which cannot be parsed by shapely.wkt.loads

    In that case a WKB may be a better choice:

    CREATE OR REPLACE TABLE geospatial_table (id INTEGER, SHAPE GEOGRAPHY);
    INSERT INTO geospatial_table VALUES
      (1, 'POINTZ(-122.35 37.55 0)'),
      (2, 'LINESTRINGZ(-124.20 42.00 0.5, -120.01 41.99 0.5)');
    SELECT * FROM geospatial_table;
    
    
    import geopandas
    import shapely
    
    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
    
    session.sql("ALTER SESSION SET GEOGRAPHY_OUTPUT_FORMAT='WKB';").collect()
    
    pd = session.table('geospatial_table').to_pandas()
    pd['SHAPE'] = pd['SHAPE'].apply(shapely.wkb.loads)
    
    geopandas.GeoDataFrame(pd,geometry="SHAPE",crs=4326)
    

    Output:

    enter image description here