etlaws-glueamazon-cloudwatchaws-console

Where can I see AWS Glue ETL job print statements?


Problem

Where do you find print statements from your Glue ETL jobs? You guys, this is killing me. Why is this not the easiest thing to find?


Situation

I am trying to look at properties of my tables and do some general debugging in the console for an AWS Glue ETL job. Throughout I log some things and print some things. The built in functions to print dynamic frame schema though return None, so I can't easily embed them into a log string. Here is a the gist of my job:

import some_stuff
...
# Create and join tables
customer_churn = glueContext.create_dynamic_frame.from_catalog(database=db_name, table_name=tbl_customer_churn)
customer_churn = cust_joined.join(paths1=["customer id"], paths2=['id'], frame2=other_table)

logger.info(f"Customer_churn_joined:\n")
customer_churn.printSchema()

# ---- Write out the combined file ---- 
s_customer_churn = customer_churn.toDF().select("customer id")

logger.info(f"Customer_churn_just_cust_id:\n")
s_customer_churn.printSchema()

s_customer_churn.write.option("header","true").format("csv").mode('Overwrite').save(output_dir) 

logger.info("output_dir:" + output_dir) 

Other relevant info:

What I've Tried

I looked in the contiuous logging tab and I get the logging statements, but no print statements come through. I saw the Output logs going to Cloudwatch (screenshot, bottom-right), so I clicked that link, but none of the logs had my print statements. Why is this not the easiest thing to see? output_logs link


Solution

  • All logs

    Look in the one without the suffix. There is a lot of items logged here so you will have to search. enter image description here

    Output logs

    Look in the one without the suffix enter image description here

    The others with the suffix are from the individual workers depending on the number of workers defined for the job.

    Example

    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    from pyspark.sql import Row
    
    ## @params: [JOB_NAME]
    args = getResolvedOptions(sys.argv, ['JOB_NAME'])
    
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)
    
    logger = glueContext.get_logger()
    logger.info('Hello from logger.info will be in All Logs')
    
    print('print will show up in output log')
    
    testDf = spark.createDataFrame([Row(test_data='dataframe printSchema() and show() will be in the output log')])
    testDf.printSchema()
    testDf.show()
    
    job.commit()
    

    Open All logs and you can search using the search box

    enter image description here

    Or do a find in the brower: enter image description here

    Just understand you might have to scroll to the top and click There are older events to load. Load more. before you would find it enter image description here

    Open `Output logs' and you can do the same type of search enter image description here