pythonxmlglobelementtreeforce.com

How to use python to parse XML to required custom fields


I've got a directory full of salesforce objects in XML format. I'd like to identify the <fullName> and parent file of all the custom <fields> where <required> is true. Here is some truncated sample data, lets call it "Custom_Object__c:

<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.sforce.com/2006/04/metadata">
<deprecated>false</deprecated>
<description>descriptiontext</description>
<fields>
    <fullName>custom_field1</fullName>
    <required>false</required>
    <type>Text</type>
    <unique>false</unique>
</fields>
<fields>
    <fullName>custom_field2</fullName>
    <deprecated>false</deprecated>
    <visibleLines>5</visibleLines>
</fields>
<fields>
    <fullName>custom_field3</fullName>
    <required>false</required>
</fields>
<fields>
    <fullName>custom_field4</fullName>
    <deprecated>false</deprecated>
    <description>custom field 4 description</description>
    <externalId>true</externalId>
    <required>true</required>
    <scale>0</scale>
    <type>Number</type>
    <unique>false</unique>
</fields>
<fields>
    <fullName>custom_field5</fullName>
    <deprecated>false</deprecated>
    <description>Creator of this log message. Application-specific.</description>
    <externalId>true</externalId>
    <label>Origin</label>
    <length>255</length>
    <required>true</required>
    <type>Text</type>
    <unique>false</unique>
</fields>
<label>App Log</label>
<nameField>
    <displayFormat>LOG-{YYYYMMDD}-{00000000}</displayFormat>
    <label>Entry ID</label>
    <type>AutoNumber</type>
</nameField>
</CustomObject>

The desired output would be a dictionary with format something like:

required_fields =  {'Custom_Object__1': 'custom_field4', 'Custom_Object__1': 'custom_field5',... etc for all the required fields in all files in the fold.}

or anything similar.

I've already gotten my list of objects through glob.glob, and I can get a list of all the children and their attributes with ElementTree but I'm struggling past there. I feel like I'm very close but I'd love a hand finishing this task off. Here is my code so far:

import os
import glob
import xml.etree.ElementTree as ET

os.chdir("/Users/paulsallen/workspace/fforce/FForce Dev Account/config/objects/")
objs = []


for file in glob.glob("*.object"):
    objs.append(file)

fields_dict = {}

for object in objs:
    root = ET.parse(objs).getroot()

....

and once I get the XML data parsed I don't know where to take it from there.


Solution

  • You really want to switch to using lxml here, because then you can use an XPath query:

    from lxml import etree as ET
    
    os.chdir("/Users/paulsallen/workspace/fforce/FForce Dev Account/config/objects/")
    objs = glob.glob("*.object")
    
    fields_dict = {}
    
    for filename in objs:
        root = ET.parse(filename).getroot()
        required = root.xpath('.//n:fullName[../n:required/text()="true"]/text()',
            namespaces={'n': tree.nsmap[None]})
        fields_dict[os.path.splitext(filename)[0]] = required
    

    With that code you end up with a dictionary of lists; each key is a filename (without the extension), each value is a list of required fields.

    The XPath query looks for fullName elements in the default namespace, that have a required element as sibling with the text 'true' in them. It then takes the contained text of each of those matching elements, which is a list we can store in the dictionary.