pythonsqlamazon-web-servicesamazon-athenapyathena

SQL Datetime WHERE clause returning wrong month


I am extracting data from AWS Athena using pyathena library and following function:

def import_ben_datalake(ACCESS_KEY, SECRET_KEY, S3_DIR, REGION, start, end):
    conn = pyathena.connect(aws_access_key_id = ACCESS_KEY, 
                            aws_secret_access_key = SECRET_KEY,
                            s3_staging_dir = S3_DIR,
                            region_name = REGION)
    sql = f"""SELECT columns
          FROM table
          WHERE column_datetime BETWEEN PARSE_DATETIME('{start.strftime("%Y-%m-%d")}', 'YYYY-MM-DD')
                                    AND PARSE_DATETIME('{end.strftime("%Y-%m-%d")}', 'YYYY-MM-DD')"""

    df = pd.read_sql(sql, conn)
    conn.close()

    return df

start and end params are datetime.date variable as:

start_test = datetime.date(2020, 11, 22)
end_test = datetime.date(2020, 11, 28)

Both are dates from November this year, but when I call the function it is returning all the values between 2020-Jan-22 and 2020-Jan-28.

Any help would be great to solve this issue!


Solution

  • Follow the example of parameterized query that solved my issue:

    def import_ben_datalake(ACCESS_KEY, SECRET_KEY, S3_DIR, REGION, start, end):
        conn = pyathena.connect(aws_access_key_id = ACCESS_KEY, 
                                aws_secret_access_key = SECRET_KEY,
                                s3_staging_dir = S3_DIR,
                                region_name = REGION)
        sql = """SELECT columns
                 FROM table
                 WHERE column.datetime BETWEEN %(start)s AND %(end)s"""
    
       df = pd.read_sql(sql, conn, params = {"start": start, "end": end})
       conn.close()
    
       return df