azureazure-data-explorerkqldata-ingestion

How can I ingest data from Apache Avro into the Azure Data Explorer?


for several days I'm trying to ingest Apache Avro formatted data from a blob storage into the Azure Data Explorer.

I'm able to reference the toplevel JSON-keys like $.Body (see red underlined example in the screenshot below), but when it goes to the nested JSON-keys, Azure fails to parse them properly and displays nothing (as seen in the green column: I would expect $.Body.entityId to reference the key "entityId" inside the Body-JSON).

Many thanks in advance for any help!

Here is a screenshot of the azure data explorer web interface

Edit 1

I already tried to increase the "Nested Levels" Option to 2, but all I got is this error message with no further details. The error message won't even disappear when I decrease the Level back to 1. I have to cancel and start the process all over agein.

I just recognize that the auto-generated columns have some strange types. Seems like they add up to the type string... This seems a little odd to me either.

Edit 2

Here is some kql-Code.

This is the schema of my input .avro file, what I get from my Eventhub-Capture:

{
  SequenceNumber: ...,
  Offset: ...,
  EnqueuedTimeUTC: ...,
  SystemProperties: ...,
  Properties: ...,
  Body: {
    entityId: ...,
    eventTime: ...,
    messageId: ...,
    data: ...
  }
}, ...

And with these ingestion commands I can't reference the inner JSON-keys. The toplevel keys work perfectly fine.

// Create table command
////////////////////////////////////////////////////////////
.create table ['test_table']  (['Body']:dynamic, ['entityId']:string)

// Create mapping command
////////////////////////////////////////////////////////////
.create table ['test_table'] ingestion apacheavro mapping 'test_table_mapping' '[{"column":"Body", "Properties":{"Path":"$.Body"}},{"column":"entityId", "Properties":{"Path":"$.Body.entityId"}}]'

// Ingest data into table command
///////////////////////////////////////////////////////////
.ingest async into table ['test_table'] (h'[SAS URL]') with (format='apacheavro',ingestionMappingReference='test_table_mapping',ingestionMappingType='apacheavro',tags="['503a2cfb-5b81-4c07-8658-639009870862']")

I would love to ingest the inner data fields on separate columns, instead of building any workaround with update policies.


Solution

  • Following the OP updates

    Here is the Avro schema of an Event Hubs capture.
    As you can see, Body as of type bytes, so there is practically nothing you can do with it at this form, other than ingesting it As Is (as Dynamic).

    {
    
        "type":"record",
        "name":"EventData",
        "namespace":"Microsoft.ServiceBus.Messaging",
        "fields":[
                     {"name":"SequenceNumber","type":"long"},
                     {"name":"Offset","type":"string"},
                     {"name":"EnqueuedTimeUtc","type":"string"},
                     {"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},
                     {"name":"Properties","type":{"type":"map","values":["long","double","string","bytes"]}},
                     {"name":"Body","type":["null","bytes"]}
                 ]
    }
    

    If you'll take a look on the ingested data, you'll see that the content of Body is arrays of integers.
    Those integers are the decimal values of the characters that construct Body.

    capture
    | project Body
    | take 3
    
    Body
    [123,34,105,100,34,58,32,34,56,49,55,98,50,99,100,57,45,97,98,48,49,45,52,100,51,53,45,57,48,51,54,45,100,57,55,50,51,55,55,98,54,56,50,57,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,48,44,32,34,109,121,105,110,116,34,58,32,50,48,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,51,57,56,53,52,52,56,55,52,53,57,56,57,48,55,57,55,125]
    [123,34,105,100,34,58,32,34,57,53,100,52,100,55,56,48,45,97,99,100,55,45,52,52,57,50,45,98,97,54,100,45,52,56,49,54,97,51,56,100,52,56,56,51,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,49,44,32,34,109,121,105,110,116,34,58,32,56,56,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,54,53,53,51,55,51,51,56,49,57,54,53,50,52,52,49,125]
    [123,34,105,100,34,58,32,34,53,50,100,49,102,54,54,53,45,102,57,102,54,45,52,49,50,49,45,97,50,57,99,45,55,55,56,48,102,101,57,53,53,55,48,56,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,50,44,32,34,109,121,105,110,116,34,58,32,49,57,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,52,53,57,54,49,56,54,51,49,51,49,50,50,52,50,50,51,125]

    Body can be converted to text using make_string() and then parsed to JSON using todynamic()

    capture
    | project BodyJSON = todynamic(make_string(Body))
    | take 3
    
    BodyJSON
    {"id":"817b2cd9-ab01-4d35-9036-d972377b6829","dt":"2021-08-12T16:48:32.5962540Z","i":0,"myint":20,"myfloat":"0.398544874598908"}
    {"id":"95d4d780-acd7-4492-ba6d-4816a38d4883","dt":"2021-08-12T16:48:32.5962540Z","i":1,"myint":88,"myfloat":"0.65537338196524408"}
    {"id":"52d1f665-f9f6-4121-a29c-7780fe955708","dt":"2021-08-12T16:48:32.5962540Z","i":2,"myint":19,"myfloat":"0.45961863131224223"}