I'm capturing data in avro via Azure EventHub so it can be handled via a Azure Automation Runbook. The issue I'm facing is when I try to parse the avro file to be used in a powershell script.
Here the content of the file. enter image description here
Objavro.codecnullavro.schemaì{"type":"record","name":"EventData","namespace":"Microsoft.ServiceBus.Messaging","fields":[{"name":"SequenceNumber","type":"long"},{"name":"Offset","type":"string"},{"name":"EnqueuedTimeUtc","type":"string"},{"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},{"name":"Properties","type":{"type":"map","values":["long","double","string","bytes","null"]}},{"name":"Body","type":["null","bytes"]}]} Êg;Ï7ˆ@žkß6Þ¿¼¬.
16904(6/25/2024 5:50:16 AM&x-opt-enqueued-time ¸–߉d ²{"value":[{"subscriptionId":"5exxxxa10333","subscriptionExpirationDateTime":"2024-06-28T00:23:31.567249+00:00","changeType":"created","resource":"Users/2ffxxxx81-a8d0xxxx83c/Messages/AAMkAxxxxxB7VrlAAA=","resourceData":{"@odata.type":"#Microsoft.Graph.Message","@odata.id":"Users/2ffxxxx-9f81-a8d****83c/Messages/AAMkAGY4xxxxxAAKB7VrlAAA=","@odata.etag":"W/\"CQAAABYAAxxxxxA38cxAAKB/4Iw\"","id":"AAMkAxxxxxxlAAA="},"clientState":"MlaxxxxyJ.MF7kmc3z","tenantId":"e1fxxxxxc83"}]}Êg;Ï7ˆ@žkß6Þ¿¼
I tried to parse the file with the pwsh module AvroTools but the read-avro cmdlet returns a PSCustom Object that I can't use:
Fields : {SequenceNumber, Offset, EnqueuedTimeUtc, SystemProperties…}
Count : 6
SchemaName : Microsoft.ServiceBus.Messaging.EventData
Name : EventData
Namespace : Microsoft.ServiceBus.Messaging
Fullname : Microsoft.ServiceBus.Messaging.EventData
Documentation :
Tag : Record
When checking the Fields, below the content:
Aliases :
Pos : 0
Documentation :
DefaultValue :
Ordering : ignore
Schema : {"type":"long"}
Name : SequenceNumber
Aliases :
Pos : 1
Documentation :
DefaultValue :
Ordering : ignore
Schema : {"type":"string"}
Name : Offset
Aliases :
Pos : 2
Documentation :
DefaultValue :
Ordering : ignore
Schema : {"type":"string"}
Name : EnqueuedTimeUtc
Aliases :
Pos : 3
Documentation :
DefaultValue :
Ordering : ignore
Schema : {"type":"map","values":["long","double","string","bytes"]}
Name : SystemProperties
Aliases :
Pos : 4
Documentation :
DefaultValue :
Ordering : ignore
Schema : {"type":"map","values":["long","double","string","bytes","null"]}
Name : Properties
Aliases :
Pos : 5
Documentation :
DefaultValue :
Ordering : ignore
Schema : ["null","bytes"]
Name : Body
I would like to retrieve the content as json like:
[{"subscriptionId":"5e6eec86-xxxxxxxxxxxxx-f7xxxx333","subscriptionExpirationDateTime":"2024-06-28T00:23:31.567249+00:00","changeType":"created","resource":"Users/2ff83-fxxxxxxxxxxxx-a8d0afceb83c/Messages/AAMkAGY4ZTZlNTI1xxxxxxxxxxxxxxxxxxxtRXSbFa_gJA38cxAAKB7VrlAAA=","resourceData":{"@odata.type":"#Microsoft.Graph.Message","@odata.id":"Users/2ff-xxxxxxxxxxxx81-a8d0afceb83c/Messages/AAMkAGY4ZxxxxxxxxxxxxxxxxxxxxxxxxCeBfyMDtRXSbFa_gJA38cxAAKB7VrlAAA=","@odata.etag":"W/\"CQAAABYAAACeBfyMDtRXSbFa+gJA38cxAAKB/4Iw\"","id":"AAMkAGY4ZTZxxxxxxxxxxxxxxxxxxxxxxxxbFa_gJA38cxAAKB7VrlAAA="},"clientState":"Mla8Q~xxxxxxxxxxxxxxxxxxxxxxxMF7kmc3z","tenantId":"e1f8axxxxxxxxxxxxxxxxxxxxc83"}]
It looks like you're using Read-AvroSchema
rather than Read-Avro
to parse your file (both cmdlets are part of the AvroTools
module that can be installed with, e.g., Install-Module -Scope CurrentUser AvroTools
):
Read-AvroSchema
reads metadata (structural data, like a database schema) from an Avro Object Container File (*.avro
), whereasRead-Avro
reads data, in the form of [pscustomobject]
instances (PowerShell's type for dynamically created "property bag" objects), whose properties reflect the fields in the input Avro data.Without having access to the source data in its original form (what is quoted in your question isn't valid as the content of an *.avro
file), I think what will work is:
# Extract the value of the "Body" field, assumed to be the JSON of interest
# as bytes representing UTF-8 encoding.
[System.Text.Encoding]::UTF8.GetString(
(Read-Avro yourFile.avro).Body
)
Note:
I'm assuming that the Body
field contains the JSON text of interest.
Since the type information lists of this field lists ["null", "bytes"]
, I'm assuming that the value is reported as a byte array.
I'm further assuming that the bytes represent the UTF-8 encoding of the JSON text; adjust as needed for a different character encoding.