pythonxmlxml.etree

How to extract specfic values from xml file using python xml.etree.ElementTree iterating until an id is found inside a hidden child node?


I need to iterate over the tag ObjectHeader and when the tag ObjectType/Id is equal to 1424 I need to extract all the values inside the following tags ObjectVariant/ObjectValue/Characteristic/Name and ObjectVariant/ObjectValue/PropertyValue/Value and put them in a dictionary. The expected output will be like this: {"Var1": 10.4, "Var2": 15.6}

Here is a snippet from the XML that I'm working with which has 30k lines (Hint: Id 1424 only appears once in the whole XML file).

<ObjectContext>
    <ObjectHeader>
        <ObjectType>
            <Id>1278</Id>
            <Name>ID_NAME</Name>
        </ObjectType>
        <ObjectVariant>
            <ObjectValue>
                <Characteristic>
                    <Name>Var1</Name>
                    <Description>Something about the name</Description>
                </Characteristic>
                <PropertyValue>
                    <Value>10.6</Value>
                    <Description>Something about the value</Description>
                </PropertyValue>
            </ObjectValue>
        </ObjectVariant>
    </ObjectHeader>
    <ObjectHeader>
        <ObjectType>
            <Id>1424</Id>
            <Name>ID_NAME</Name>
        </ObjectType>
        <ObjectVariant>
            <ObjectValue>
                <Characteristic>
                    <Name>Var1</Name>
                    <Description>Something about the name</Description>
                </Characteristic>
                <PropertyValue>
                    <Value>10.4</Value>
                    <Description>Something about the value</Description>
                </PropertyValue>
            </ObjectValue>
            <ObjectValue>
                <Characteristic>
                    <Name>Var2</Name>
                    <CharacteristicType>Something about the name</CharacteristicType>
                </Characteristic>
                <PropertyValue>
                    <Value>15.6</Value>
                    <Description>Something about the value</Description>
                </PropertyValue>
            </ObjectValue>
        </ObjectVariant>
    </ObjectHeader>
</ObjectContext> 


Solution

  • Here is one possibility to write all to pandas and then filter the interessting values:

    import pandas as pd
    import xml.etree.ElementTree as ET
    
    tree = ET.parse("xml_to_dict.xml")
    root = tree.getroot()
    
    columns = ["id", "name", "value"]
    row_list = []
    for objHead in root.findall('.//ObjectHeader'):
        for elem in objHead.iter():
            if elem.tag == 'Id':
                id = elem.text
            if elem.tag == 'Name':
                name = elem.text
            if elem.tag == 'Value':
                value = elem.text
                row = id, name, value
                row_list.append(row)
    
    
    df = pd.DataFrame(row_list, columns=columns)
    dff = df.query('id == "1424"')
    
    print("Dictionary:", dict(list(zip(dff['name'], dff['value']))))
    

    Output:

    Dictionary: {'Var1': '10.4', 'Var2': '15.6'}