azure-data-explorerexternal-tables

Partition external table by unixtime


Typically when partitioning an external table, I will use 1-day boundaries with the syntax

.create external table extTbl (
    Timestamp: datetime,
    Data: dynamic
)
kind = blob
partition by (Timestamp:datetime = startofday(Timestamp))
...

However, I've come across a table with no datetime column. Instead, it contains UnixTime: long column for its timestamp.

First I tried round() to get to the start of the day.

.create external table extTbl (
    UnixTime: long,
    Data: dynamic
)
kind = blob
partition by (UnixTime:long = round(UnixTime/ 86400,0) * 86400)
...

...which returns BadRequest_SyntaxError. The same occurs when I try to cast to datetime...

.create external table extTbl (
    UnixTime: long,
    Data: dynamic
)
kind = blob
partition by (UnixTime:datetime = startofday(datetime_add('second',UnixTime,datetime(1970-01-01)))
...

The examples given in docs show use of some functions is permissible but I'm unsure which I have access to and which I don't.

How do I partition on this column?


Solution

  • TL;DR: it is not possible to partition on a numeric source column. You can create a new derived column in the export command though and partition on that.

    Given...

    .create table Tbl (
        UnixTime: long,
        Data: dynamic
    )
    

    ...your external table will be...

    .create external table extTbl (
        UnixTime_DT: datetime:
        UnixTime: long,
        Data: dynamic
    )
    kind = blob
    partition by (UnixTime_DT:datetime = startofday(UnixTime_DT))
    ...
    

    ...and your export command will be of the form...

    .export async to table extTbl <|
    Foo
    | where UnixTime >= tolong((datetime({Start})/timespan(1s))-62135596800)
    | where UnixTime <  tolong((datetime({End})/timespan(1s))-62135596800)
    | extend UnixTime_DT = unixtime_seconds_todatetime(UnixTime) 
    | project-reorder UnixTime_DT
    

    ...leveraging the unixtime_seconds_todatetime() (or milli/micro variants) in the export query because it is not allowed in the DDL command.

    n.b. the offset "62135596800" is the number of seconds between 1970-01-01 and 0001-01-01.


    ...use of some functions is permissible but I'm unsure which...

    The "Partitions formatting" section of the docs appears to list a very limited shortlist of functions.

    First I tried round()...

    The "Virtual columns" notes "Virtual columns can be of either type string or datetime"... which would seem to prevent use of round() (which returns a numeric type) were that reading not confounded by the fact that it is possible for both bin() and hash() to return numerics.


    You can vote for expanded coverage of this use case at the Azure feedback forum.