I'm trying to unnest array struct column in python. Datasource is a json file. I Would like to split tag column such that each key becomes a column and each value becomes its respective value for the entry. If a particular entry doesn't have a tag then corresponding value should be blank. Any suggestions on how to achieve this will be very helpful, TIA!
Current data structure:
{"Account ID": "101356", "AccountName": "Account1", "Tags": [{"Key": "AppGroupEmail", "Value": "testteam"}, {"Key": "SNOW", "Value": "TASK21002089571"}, {"Key": "ClientID", "Value": "220001"}]}
{"Account ID": "101357", "AccountName": "Account2", "Tags": [{"Key": "ClientID", "Value": "220007"}, {"Key": "BuildTicket", "Value": "TM21001412480"}, {"Key": "AccountType", "Value": "Dev"}]}
Expected outcome:
This code re-writes what appears to be JSON Lines-formatted data file to a CSV with the columns described:
import json
import csv
# Assumptions:
# 1. "Account ID", "AccountName", and "Tags" are keys in every JSON object.
# 2. "Tags" keys are unknown.
# 3. Input file can be completely loaded into memory in order to collect
# other column names from "Tags".
with open('input.jsonl') as file:
columns = ['Account ID', 'AccountName']
records = []
for line in file:
data = json.loads(line)
for d in data['Tags'].copy(): # original data is modified with new columns.
key, value = d['Key'], d['Value']
data[key] = value # Add "Tag" keys to data dict
if key not in columns:
columns.append(key) # collect "Tag" keys as column headers in order found.
del data['Tags'] # Unneeded now
records.append(data) # save the updated record
with open('output.csv', 'w', encoding='ascii', newline='') as file:
writer = csv.DictWriter(file, fieldnames=columns)
writer.writeheader()
writer.writerows(records)
output.csv:
Account ID,AccountName,AppGroupEmail,SNOW,ClientID,BuildTicket,AccountType
101356,Account1,testteam,TASK21002089571,220001,,
101357,Account2,,,220007,TM21001412480,Dev
Opened in Excel: