After moving the parquet file containing the locally created geodataframe to s3, I tried to read the file within AWS Glue as follows.
import geopandas as gpd
test_gdf = gpd.read_parquet("s3://bucket_name/key/file.parquet")
However, OS Error occurred as follows
OSError: When getting information for key 'key/file.parquet' in bucket 'bucket_name': AWS Error ACCESS_DENIED during HeadObject operation: No response body.
What I found strange was that when I run pandas.read_parquet, it runs successfully.
import pandas as pd
test_gdf = pd.read_parquet("s3://bucket_name/key/file.parquet")
However, I confirmed that reading a geodataframe by pandas and then converting it back to geodataframe takes a lot of time.
Therefore, I want to read the parquet file directly through geopandas.
Referring to other questions, there were issues with IAM Role or s3 bucket policy, so I checked them.
Policy at AWS Glue Role
{
...
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": [
"*"
]
...
}
S3 Bucket Policy
{
"Version": "2012-10-17",
"Id": "PolicyForDatalakeBucket",
"Statement": [
{
"Sid": "denyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::bucket_name/*",
"arn:aws:s3:::bucket_name"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
},
"ArnNotEquals": {
"aws:SourceArn": "arn:aws:iam::IAM_USER:role/GLUE_ROLE"
}
}
},
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::IAM_USER:role/GLUE_ROLE",
"arn:aws:iam::IAM_USER:root"
]
},
"Action": [
"s3:GetBucketAcl",
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::BUCKET_NAME/*",
"arn:aws:s3:::BUCKET_NAME"
]
}
]
}
What needs to be resolved so that geopandas can successfully read parquet files from s3?
The solution is like below,
import fsspec
import geopandas as gpd
with fsspec.open(feather_file) as f
gdf = gpd.read_feather(f)
If you want to access feather file in s3 bucket, you need to open the file by fsspec and try to read file by geopandas.read_feather.
You can find more reference in https://geopandas.org/en/stable/docs/user_guide/io.html