I need to iterate over the tag ObjectHeader and when the tag ObjectType/Id is equal to 1424 I need to extract all the values inside the following tags ObjectVariant/ObjectValue/Characteristic/Name and ObjectVariant/ObjectValue/PropertyValue/Value and put them in a dictionary. The expected output will be like this: {"Var1": 10.4, "Var2": 15.6}
Here is a snippet from the XML that I'm working with which has 30k lines (Hint: Id 1424 only appears once in the whole XML file).
<ObjectContext>
<ObjectHeader>
<ObjectType>
<Id>1278</Id>
<Name>ID_NAME</Name>
</ObjectType>
<ObjectVariant>
<ObjectValue>
<Characteristic>
<Name>Var1</Name>
<Description>Something about the name</Description>
</Characteristic>
<PropertyValue>
<Value>10.6</Value>
<Description>Something about the value</Description>
</PropertyValue>
</ObjectValue>
</ObjectVariant>
</ObjectHeader>
<ObjectHeader>
<ObjectType>
<Id>1424</Id>
<Name>ID_NAME</Name>
</ObjectType>
<ObjectVariant>
<ObjectValue>
<Characteristic>
<Name>Var1</Name>
<Description>Something about the name</Description>
</Characteristic>
<PropertyValue>
<Value>10.4</Value>
<Description>Something about the value</Description>
</PropertyValue>
</ObjectValue>
<ObjectValue>
<Characteristic>
<Name>Var2</Name>
<CharacteristicType>Something about the name</CharacteristicType>
</Characteristic>
<PropertyValue>
<Value>15.6</Value>
<Description>Something about the value</Description>
</PropertyValue>
</ObjectValue>
</ObjectVariant>
</ObjectHeader>
</ObjectContext>
Here is one possibility to write all to pandas and then filter the interessting values:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse("xml_to_dict.xml")
root = tree.getroot()
columns = ["id", "name", "value"]
row_list = []
for objHead in root.findall('.//ObjectHeader'):
for elem in objHead.iter():
if elem.tag == 'Id':
id = elem.text
if elem.tag == 'Name':
name = elem.text
if elem.tag == 'Value':
value = elem.text
row = id, name, value
row_list.append(row)
df = pd.DataFrame(row_list, columns=columns)
dff = df.query('id == "1424"')
print("Dictionary:", dict(list(zip(dff['name'], dff['value']))))
Output:
Dictionary: {'Var1': '10.4', 'Var2': '15.6'}