After trying to traverse values for my XML Payload, i am hitting dead-end.
I have a SOAP API XML response, which has common value's coming inside main record. I am able to extract records, but not able to populate common value's.
Can someone please guide me on an approach. Want to populate Field id="3" and "4" to all the records under the nest which is levelId="202".
Sharing my Current code below and full API output can be found at -> Sample API Response.
<Record contentId="28503182" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
<Record contentId="28503182" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
<Record contentId="28739688" levelId="62" levelGuid="b08s5b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69141</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Priviion</Field>
</Record>
<Record contentId="28739790" levelId="62" levelGuid="b085bs230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69155</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Accedress</Field>
</Record>
<Record contentId="28739790" levelId="62" levelGuid="b08d5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">8548</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Apss</Field>
</Record>
<Record contentId="28739790" levelId="62" levelGuid="bss5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">848</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Bions</Field>
</Record>
<Field id="3" guid="422cdb8e-1013-454a-b048-3cc630163a6e" type="9">
<Reference id="28151495">Class 2</Reference>
</Field>
<Field id="4" guid="7bcacb1c-4e3f-4598-a904-e2e0ae2be97d" type="1">6589</Field>
</Record>
</Records>
Code as of now:
# Extract data from XML and create a list of dictionaries
records = []
for record in root.findall('Record'):
record_data = {}
for field in record.findall('Field'):
field_id = int(field.get('id'))
if field_id in column_order:
record_data[field_id] = field.text
records.append(record_data)
# Create a DataFrame from the records
df = pd.DataFrame(records)
# Reorder the columns based on the desired order
df = df[column_order]
# Save DataFrame to a CSV file with encoded characters
df.to_csv('output.csv', encoding='utf-8-sig', index=False)
The current logic is able to extract values for id = 1 and 2, populates id 3 and 4 to the first record instead of all the records in that group
Current output looks like
ID,Name,Class,Code
68332,abc,Class 1,165664
68333,someothername,
69141,Priviion,Class 2,6589
69155,Accedress,
8548,Apss,
848,Bions,
instead of
ID,Name,Class,Code
68332,abc,Class 1,165664
68333,someothername,Class 1,165664
69141,Priviion,Class 2,6589
69155,Accedress,Class 2,6589
8548,Apss,Class 2,6589
848,Bions,Class 2,6589
The tags in your xmls seems to be a little off. From what I could understand, I combined both. Here I have used xml.etree.ElementTree
and numpy
to create DataFrame. You may use other than numpy.
import xml.etree.ElementTree as ET
import numpy as np
import pandas as pd
xml_text = '''<?xml version="1.0" encoding="utf-16"?>
<Records count="5">
<Metadata>
<FieldDefinitions>
<FieldDefinition id="1" guid="85a5f539-fd41-4682-bda3-588bdcdacca8" name="ID" alias="S_ID" />
<FieldDefinition id="2" guid="b2bf7611-6006-443c-9872-3e8067255104" name="Name" alias="Name_Updated" />
<FieldDefinition id="3" guid="b012f524-fb9c-44e2-88b0-904407f19ad0" name="Class" alias="common_class" />
<FieldDefinition id="4" guid="b012f524-fb9c-44e2-88b0-904407f19ad0" name="Code" alias="code" />
</FieldDefinitions>
</Metadata>
<LevelCounts>
<LevelCount id="202" guid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" count="1746" />
<LevelCount id="62" guid="b085b230-e20f-41df-a849-f5d6811447ea" count="9809" />
</LevelCounts>
<Record contentId="28111640" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
<Record contentId="28618177" levelId="62" levelGuid="b085b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">68332</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">abc</Field>
</Record>
<Record contentId="28618183" levelId="62" levelGuid="b085b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">68333</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">someothername</Field>
</Record>
<Field id="3" guid="422cdb8e-1013-454a-b048-3cc630163a6e" type="9">
<Reference id="28151495">Class 1</Reference>
</Field>
<Field id="4" guid="7bcacb1c-4e3f-4598-a904-e2e0ae2be97d" type="1">165664</Field>
</Record>
<Record contentId="28503182" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
<Record contentId="28739688" levelId="62" levelGuid="b08s5b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69141</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Priviion</Field>
</Record>
<Record contentId="28739790" levelId="62" levelGuid="b085bs230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69155</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Accedress</Field>
</Record>
<Record contentId="28739790" levelId="62" levelGuid="b08d5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">8548</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Apss</Field>
</Record>
<Record contentId="28739790" levelId="62" levelGuid="bss5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
<Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">848</Field>
<Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Bions</Field>
</Record>
<Field id="3" guid="422cdb8e-1013-454a-b048-3cc630163a6e" type="9">
<Reference id="28151495">Class 2</Reference>
</Field>
<Field id="4" guid="7bcacb1c-4e3f-4598-a904-e2e0ae2be97d" type="1">6589</Field>
</Record>
</Records>
'''
root = ET.fromstring(xml_text)
records = []
# searching for all <Record> in <Records> which is also root
for record in root.findall('Record'):
# searching for common class and code in <Record>
common_class = record.find('.//Field[@id="3"]//Reference').text
code = record.find('.//Field[@id="4"]').text
# searching for id and name in <Record> inside <Record>
for r in record.findall('Record'):
id = r.find('.//Field[@id="1"]').text
name = r.find('.//Field[@id="2"]').text
# appending to records
records.append([id, name, common_class, code])
# creating dataframe with records list and columns
df = pd.DataFrame(np.array(records), columns=['ID', 'Name', 'Class', 'Code'])
print(df)
Output:
ID Name Class Code
0 68332 abc Class 1 165664
1 68333 someothername Class 1 165664
2 69141 Priviion Class 2 6589
3 69155 Accedress Class 2 6589
4 8548 Apss Class 2 6589
5 848 Bions Class 2 6589