pythonamazon-s3amazon-athenapyathena

AWS Athena PyAthena AccessDeniedException


I am new to AWS. I have a user account and two roles, one for prod one for test.

Usually I log into my account and switch to prod role to run some simple select queries.

Now I want to use Athena locally in Python with PyAthena. I have tried the following resources from PyAthena documentation:

from pyathena import connect
import pandas as pd

conn = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
               aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
               s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
               region_name='us-west-2')
df = pd.read_sql("SELECT * FROM many_rows", conn)
print(df.head())

But always having the error

An error occurred (AccessDeniedException) when calling the StartQueryExecution operation: User: arn:aws:iam::xxxxxx:user/xxxx@xxxxx is not authorized to perform: athena:StartQueryExecution on resource: arn:aws:athena:ap-southeast-2:xxxxx:workgroup/primary

This is the exact error I would get if I run the same query using my user account without switching the role.

I have also tried to add a profile name parameter in connect but still not working even though the env is correctly recognised.

Could someone help me how to do the 'switch' role step in local python code?


Solution

  • It seems like the issue is due to a missing role and not the profile_name parameter. If you look at the Connection class in pyathena, there's a role_arn variable that you can specify while initializing the connection. Here's the line I'm talking about.

    You might want to try it this way -

    conn = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
                   aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
                   s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2',
                   role_arn='<your arn here'>)
    

    I haven't tested it myself though since I do not have an Athena setup.