pythonaws-lambdaaws-glueamazon-athenaaws-data-wrangler

How to query an multi-AWS Athena databases in a single query from Lambda function using python?


I am attempting to build a Python Lambda function that pulls data from multiple Athena databases using the AWS Wrangler Python library.

wr.athena.read_sql_query('across databases sql query', 'one of databases name')

The query is similar to the following:

select * from data_catalog1.database1.table1 as a 
left join data_catalog2.database2.table2 as b on a.id = b.id;

This call is not throwing all kinds of permission errors:

  1. It throws an exception complaining that the table exists on a different AWS account under the same organization (Is that possible?)
  2. Or it complains that it doesn't have permissions to the output s3 bucket (which can be found in the settings tab of Athena). Not sure how that is possible.

I attempted to change the query, but that didn't help either. The only thing that worked was when I queried the data_catalog itself to list the existing databases or tables.

My questions

  1. Is it possible for an AWS Athena database to exist in one AWS account and its data files to live on another AWS Account?
  2. How do you know where an AWS Athena table data file lives? And does an S3 output bucket exist?
  3. Is there a better approach to finding this problem's root cause other than the trial and error approach?

Solution

    1. Is it possible for an AWS Athena database to exist in one AWS account and its data files to live on another AWS Account?

    A). Yes it is possible

    1. How do you know where an AWS Athena table data file lives? And does an S3 output bucket exist?

    A. Go to the AWS Glue data catalog, go to a specific Table, and check "Location", which provides information on a file location.

    1. Is there a better approach to finding this problem's root cause other than the trial and error approach?

    A. Run the query directly in the Athena console to check for errors. This step can often reveal if permissions or configurations are missing.

    The other way is to check CloudTrail Logs and enable Verbose logging in AWS Wrangler to check the issues.

    Please let me know any other information.