I have a json object per row in an Azure dataflow and want to append all values to an array and then flatten it, so that each element of the array is a value rather than all values for that specific row.
My input data looks like this:
Column |
---|
{"Energy to Utility_kWh_15m": "ef9033a5-ca4c-44eb-9f20-5c8c0d4ca7d6", "Output Energy_kWh_15m": "849871d1-b5f5-4ae8-86ad-5030ce16cce5", "Plant Availability_%_15m": "5db1004a-bcdc-4973-9816-124262893d21" |
{"Energy to Utility_kWh_15m": "97046418-371d-41d3-a213-5e9715847a34", "Output Energy_kWh_15m": "6dc86c06-1a5c-11e9-9358-42010afa015a", "Plant Availability_%_15m": "6dcac67c-1a5c-11e9-9358-42010afa015a"} |
... |
and I want my final output to look like:
New Column |
---|
"ef9033a5-ca4c-44eb-9f20-5c8c0d4ca7d6" |
"849871d1-b5f5-4ae8-86ad-5030ce16cce5" |
"5db1004a-bcdc-4973-9816-124262893d21" |
"97046418-371d-41d3-a213-5e9715847a34" |
"6dc86c06-1a5c-11e9-9358-42010afa015a |
"6dcac67c-1a5c-11e9-9358-42010afa015a" |
... |
so that I can use the data in a ForEach pipeline activity and loop through each id.
I have the below solution that provides my expected output, where each select activity following the flatten selects a specific column (one of the key-value pairing). This is not a good solution because as my keys expand so too will the select activities required. I would like this to be dynamic, based on the keys in the json.
You can try the below approach, but this will only work in this case where there are not nested structures in your JSON strings and the values should not contain the special character Double quote ("
).
First take a derived column transformation after your source. Here, Create a new column sub_arr
with below expression.
slice(map(split(Column,'": "'),split(#item,'"')[1]),2)
This will first split the JSON string on '": "
' and then for each string, it will again split the sub record string on '"
' and takes first item. It means, it will create the array of values for each JSON string row as shown below.
Next, to combine these arrays of each row, take an Aggregate transformation and create the required res_arr
column in the aggregate section with below expression. Here, no need to take any column in Group By section.
flatten(collect(sub_arr))
Now, it will give the expected array as shown below.