I would like to parse a string in an Azure dataflow so that I can output a JSON object and store all values in an array.
I firstly read a table from dataverse in the dataflow and one of my columns looks like the following:
Column |
---|
{'Energy to Utility_kWh_15m': '093e2b7d-c53a-4387-8b3e-26e4e0386357', 'Output Energy_kWh_15m': '93744353-f1ab-4030-b0e9-d4dce5b169aa', 'Plant Availability_%_15m': 'd7b04c9b-e2f7-47f9-ae75-b8027627740b'} |
{'Energy to Utility_kWh_15m': '5e6da70c-00e9-482c-8ebf-5a020daedab0', 'Output Energy_kWh_15m': '988199db-b9dc-4ae4-ab22-3a1934c4507e', 'Plant Availability_%_15m': 'b79fac11-78a7-4056-aad0-e1dd6e6820f3'} |
Each row is stored as a string.
My understanding is that I should use a parser activity in the dataflow, however I am unsure how to parse the key value pairing, because the examples provided e.g., trade as boolean, customers as string[], do not work due to the key structure of the keys i.e., not being a whole word. I have tried something like the following but receive null valus for all rows:
({Plant Availability_%_15m} as string, {Energy to Utility_kWh_15m} as string, {Output Energy_kWh_15m} as string)
My current dataflow ends with the following, an array of dictionaries, which is not what I want.
Instead, what I would like is for all dictionary values across all table rows to be appended to an array, that looks like the following:
['093e2b7d-c53a-4387-8b3e-26e4e0386357', '93744353-f1ab-4030-b0e9-d4dce5b169aa', 'd7b04c9b-e2f7-47f9-ae75-b8027627740b','5e6da70c-00e9-482c-8ebf-5a020daedab0', '988199db-b9dc-4ae4-ab22-3a1934c4507e', 'b79fac11-78a7-4056-aad0-e1dd6e6820f3', ...]
This is so that outside of the dataflow, I can use a ForEach activity to loop through each value.
As per the structure and column names in your JSON strings in each row, it might not be possible to generate the JSON array. Because here, your JSON strings are containing different column names in each row. To generate a JSON array, all JSON string rows should contain the same columns.
I have tried parse transformation with above mapping and same input and got the same null values as yours.
Here, the reason for getting null values is your JSON is consisting single quotes instead of double quotes and parse mapping is not able to identity these rows.
So, add a derived column transformation and use below expression to replace the single quotes ('
) with double quotes ("
).
replace({Column Name}, "'", '"')
Now, use parse transformation with your required columns. Here, if the given column name is not present in the JSON string, then it will give null values to it and in the generated JSON file that particular column will not be added.
As I mentioned above, it won't generate the JSON array. To generate the JSON string as object with required keys, use the below expression in the parse mapping.
({Plant Availability_%_15m} as string, {Energy to Utility_kWh_15m} as string)
The generated JSON file:
To get the JSON array, your JSON strings should be like below.
Column Name
{'Plant Availability_%_15m': 'string_id', 'Output Energy_kWh_15m': 'string_id', ... }
{'Plant Availability_%_15m': 'string_id', 'Output Energy_kWh_15m': 'string_id', ... }
After the derived column step, use an aggregate transformation with below expression for a new column. Leave the group by section as it is.
replace(replace(replace(toString(collect({Column Name})),'"{','{'),'}"','}'),'\\','')
After this step, it will give a single row containing the JSON array in a string format. Now, use parse transformation on this column jsonstr
with below expression.
({Plant Availability_%_15m} as string, {Output Energy_kWh_15m} as string)[]
The JSON array will be generated as shown below after the parse transformation.