python-3.xaws-glueaws-lake-formationaws-data-wrangler

AWS Wrangler Error HIVE_METASTORE_ERROR: Table is missing storage descriptor


hope you can help me with a concern about an error with awswrangler.

this is the case: i have 2 aws accounts, AccountA and AccountB, both with lakeformation enabled, i have a set of databases in AccA and another set in AccB, so we share AccountB databases to AccountA through lakeformation so we can query their Db/tables with Athena in AccountA.

i am trying to automate a sql query with python, so i'm using awswrangler to achieve this, but i'm getting a not very specific error when in run the query in python.

when i run "select * from DatabaseAccB.Table" get this error "HIVE_METASTORE_ERROR: Table is missing storage descriptor" what could be the cause? i tried with boto3.Athena session and same result.

this may should help, when i query select * from DatabaseAccB.Table with my user, this runs fine. but when i try to do it with lambda or glue job, fails with error mentioned before.

PD: AccountA has only select/describe permission on tables in AccountB. Can show some code if you need.

PD2: if run "select * from DatabaseAccA.Table" query runs fine

tried with Boto 3, same result.

Tried using lambda, same result.

Tried giving admin access to glue role in AccountA, same result.

I think that there something happening with Lakeformation.

Thanks!


Solution

  • Make sure your Lambda/Glue Job execution Roles have the following Lake Formation permissions, all granted from AccountA's Console/CLI:

    Resource Link permissions must be granted in pairs: even though your queries point to a Resource Link the Principal executing the query in Athena/Redshift Spectrum still needs to have "normal" (SELECT, INSERT, etc) permissions on the underlying shared Database/Table granted by AccountA's Lake Formation Administrator.

    For the AWS Wrangler part, if the problem still persists, maybe you'll need to be explicit on which Glue Catalog ID it'll execute the query upon (at the moment I'm not sure if this parameter exists in AWS Wrangler though).