amazon-web-servicesaws-glueaws-glue-data-catalog

AWS Glue enableUpdateCatalog not creating new partitions after successful job run


I am having a problem, where i have set enableUpdateCatalog=True and also updateBehaviour=LOG to update my glue table which has 1 partition key. After the job, runs there are no new partitions added on my glue catalog table, but data in S3 is separated by the partition key i have used, how do i get the job to automatically partition my glue catalog table? Currently i have to manually run boto3 create_partition to create partitions on my glue catalog table. I want my job to automatically be able to create partitions as it discovers in S3 path separated by partition Keys Code:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

my_df = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="DataSink1",
    additional_options=additionalOptions)
job.commit()

PS: I am currently using PARQUET format

Am i missing any Rights that has to be added to my job so that it can create partitions from the job itself?


Solution

  • I got it to work by adding useGlueParquetWriter: 'true' to the CATALOG table properties. And also I have added

    format_options = {
    'useGlueParquetWriter': True
    }
    

    in the write_dynamic_frame.from_catalog calls. These steps got it to start working :)