I have a table where we don't have a partition policy. Data is ordered based on the ingestion time. When I went through this blob post https://yonileibowitz.github.io/blog-posts/data-partitioning.html#backfill-or-unordered-ingestion I realized overriding this policy will solve a lot of problems which I am facing now. In my case i have table lets call it as RAW_INOMING_DATA
its fields are T-> Time stamp of the packet, DATA -> Raw data generated at the time T. Now I need to override the partition in this table so that I can make T as the Ingestion_Time. How to achieve that?
the ideal solution for your scenario is that you pre-partition the data at the source by the relevant datetime property (if it's not already pre-partitioned), then specify the creationTime
ingestion property when you ingest it.
if the data is already ingested, you can set a uniform range datetime partition key, with the column name being the property of your data you want to partition by. This will improve the efficiency of filtering on that datetime column at query time.
if you also want the retention policy and caching policy to be applied according to the values in that column, set the overrideCreationTime
propety of the partition key to true
.
an example for the command to run for setting such a partition key, as appears in the documentation:
.alter table TableName policy partitioning ```{
"ColumnName": "timestamp",
"Kind": "UniformRange",
"Properties": {
"Reference": "2021-01-01T00:00:00",
"RangeSize": "7.00:00:00",
"OverrideCreationTime": false
}
}```