google-bigqueryairflowgoogle-cloud-composer

What is the correct fields in `time_partitioning` for GCSToBigQueryOperator in Airflow


So I'm working with the Airflow Operator GCSToBigQueryOperator and I know the following for sure:

time_partitioning={'field': 'date', 'type': 'DAY'}

This works as it should. However when I try to add in expiration / require_partition_filter something like:

time_partitioning={'field': 'date', 'type': 'DAY', 'expiration': 365, 'require_partition_filter': True}

The output that I get is a partitioned table based on date and partitioned by DAY but nothing afterwards. I'm wondering if my syntax is incorrect, I've tried looking for examples of this or see what the syntax is but the airflow documentation refers back to the API.

Any help with this would be greatly appreciated as I'm unsure of the syntax for this would be.


Solution

  • I think you didn't set the correct configuration for expiration param, according to the documentation, the fields are :

    type_
    field
    expiration_ms
    require_partition_filter
    

    In your Airflow operator, use the following Dict :

    time_partitioning={
         'field': 'date', 
         'type': 'DAY', 
         'expiration_ms': 31536000000, # 365 days x 24h x 60m x 60s x 1000 ms
         'require_partition_filter': True
    }