xmlsoapxml-parsingsoap-client

Parsing XML in Python to populate common values


After trying to traverse values for my XML Payload, i am hitting dead-end.

I have a SOAP API XML response, which has common value's coming inside main record. I am able to extract records, but not able to populate common value's.

Can someone please guide me on an approach. Want to populate Field id="3" and "4" to all the records under the nest which is levelId="202".

Sharing my Current code below and full API output can be found at -> Sample API Response.

<Record contentId="28503182" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
    <Record contentId="28503182" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
        <Record contentId="28739688" levelId="62" levelGuid="b08s5b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
            <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69141</Field>
            <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Priviion</Field>
        </Record>
        <Record contentId="28739790" levelId="62" levelGuid="b085bs230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
            <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69155</Field>
            <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Accedress</Field>
        </Record>
        <Record contentId="28739790" levelId="62" levelGuid="b08d5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
            <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">8548</Field>
            <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Apss</Field>
        </Record>   
        <Record contentId="28739790" levelId="62" levelGuid="bss5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
            <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">848</Field>
            <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Bions</Field>
        </Record>           
        <Field id="3" guid="422cdb8e-1013-454a-b048-3cc630163a6e" type="9">
            <Reference id="28151495">Class 2</Reference>
        </Field>
        <Field id="4" guid="7bcacb1c-4e3f-4598-a904-e2e0ae2be97d" type="1">6589</Field>
    </Record>
</Records>

Code as of now:

    # Extract data from XML and create a list of dictionaries
    records = []
    for record in root.findall('Record'):
        record_data = {}
        for field in record.findall('Field'):
            field_id = int(field.get('id'))
            if field_id in column_order:
                record_data[field_id] = field.text
        records.append(record_data)

    # Create a DataFrame from the records
    df = pd.DataFrame(records)

    # Reorder the columns based on the desired order
    df = df[column_order]

    # Save DataFrame to a CSV file with encoded characters
    df.to_csv('output.csv', encoding='utf-8-sig', index=False)

The current logic is able to extract values for id = 1 and 2, populates id 3 and 4 to the first record instead of all the records in that group

Current output looks like

ID,Name,Class,Code
68332,abc,Class 1,165664
68333,someothername,
69141,Priviion,Class 2,6589
69155,Accedress,
8548,Apss,
848,Bions,

instead of

ID,Name,Class,Code
68332,abc,Class 1,165664
68333,someothername,Class 1,165664
69141,Priviion,Class 2,6589
69155,Accedress,Class 2,6589
8548,Apss,Class 2,6589
848,Bions,Class 2,6589

Solution

  • The tags in your xmls seems to be a little off. From what I could understand, I combined both. Here I have used xml.etree.ElementTree and numpy to create DataFrame. You may use other than numpy.

    import xml.etree.ElementTree as ET
    import numpy as np
    import pandas as pd
    
    
    xml_text = '''<?xml version="1.0" encoding="utf-16"?>
    <Records count="5">
        <Metadata>
            <FieldDefinitions>
                <FieldDefinition id="1" guid="85a5f539-fd41-4682-bda3-588bdcdacca8" name="ID" alias="S_ID" />
                <FieldDefinition id="2" guid="b2bf7611-6006-443c-9872-3e8067255104" name="Name" alias="Name_Updated" />
                <FieldDefinition id="3" guid="b012f524-fb9c-44e2-88b0-904407f19ad0" name="Class" alias="common_class" />
                <FieldDefinition id="4" guid="b012f524-fb9c-44e2-88b0-904407f19ad0" name="Code" alias="code" />
            </FieldDefinitions>
        </Metadata>
        <LevelCounts>
            <LevelCount id="202" guid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" count="1746" />
            <LevelCount id="62" guid="b085b230-e20f-41df-a849-f5d6811447ea" count="9809" />
        </LevelCounts>
        <Record contentId="28111640" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
            <Record contentId="28618177" levelId="62" levelGuid="b085b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
                <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">68332</Field>
                <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">abc</Field>
            </Record>
            <Record contentId="28618183" levelId="62" levelGuid="b085b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
                <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">68333</Field>
                <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">someothername</Field>
            </Record>
            <Field id="3" guid="422cdb8e-1013-454a-b048-3cc630163a6e" type="9">
                <Reference id="28151495">Class 1</Reference>
            </Field>
            <Field id="4" guid="7bcacb1c-4e3f-4598-a904-e2e0ae2be97d" type="1">165664</Field>
        </Record>
        
        <Record contentId="28503182" levelId="202" levelGuid="34df2128-c36c-4017-b6c3-de09dbb8bd0b" moduleId="400" parentId="0">
            <Record contentId="28739688" levelId="62" levelGuid="b08s5b230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
                <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69141</Field>
                <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Priviion</Field>
            </Record>
            <Record contentId="28739790" levelId="62" levelGuid="b085bs230-e20f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
                <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">69155</Field>
                <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Accedress</Field>
            </Record>
            <Record contentId="28739790" levelId="62" levelGuid="b08d5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
                <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">8548</Field>
                <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Apss</Field>
            </Record>   
            <Record contentId="28739790" levelId="62" levelGuid="bss5b230-es0f-41df-a849-f5d6811447ea" moduleId="167" parentId="0">
                <Field id="1" guid="a69370c5-b6a7-4e20-a073-dd6bbd131e43" type="6">848</Field>
                <Field id="2" guid="43dddeeb-052f-4f29-a179-28e3f9406852" type="1">Bions</Field>
            </Record>           
            <Field id="3" guid="422cdb8e-1013-454a-b048-3cc630163a6e" type="9">
                <Reference id="28151495">Class 2</Reference>
            </Field>
            <Field id="4" guid="7bcacb1c-4e3f-4598-a904-e2e0ae2be97d" type="1">6589</Field>
        </Record>
    </Records>
    '''
    
    root = ET.fromstring(xml_text)
    records = []
    # searching for all <Record> in <Records> which is also root
    for record in root.findall('Record'):
        # searching for common class and code in <Record>
        common_class = record.find('.//Field[@id="3"]//Reference').text
        code = record.find('.//Field[@id="4"]').text
        # searching for id and name in <Record> inside <Record>
        for r in record.findall('Record'):
            id = r.find('.//Field[@id="1"]').text
            name = r.find('.//Field[@id="2"]').text
            # appending to records
            records.append([id, name, common_class, code])
    # creating dataframe with records list and columns
    df = pd.DataFrame(np.array(records), columns=['ID', 'Name', 'Class', 'Code'])
    print(df)
    

    Output:

          ID           Name    Class    Code
    0  68332            abc  Class 1  165664
    1  68333  someothername  Class 1  165664
    2  69141       Priviion  Class 2    6589
    3  69155      Accedress  Class 2    6589
    4   8548           Apss  Class 2    6589
    5    848          Bions  Class 2    6589