python-3.xxsdxmlschema

Extract enumerations with documentation from XSD file in Python


I'm trying to write a function to get the description of some values from a XSD file, with a structure like this

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">
    <xs:element name="File">
        <xs:annotation>
            <xs:documentation>List of centers</xs:documentation>
        </xs:annotation>
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" name="Register">
                    <xs:annotation>
                        <xs:documentation>Center list registers</xs:documentation>
                    </xs:annotation>
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="CenterType">
                                <xs:annotation>
                                    <xs:documentation>Type of center  </xs:documentation>
                                </xs:annotation>
                                <xs:simpleType>
                                    <xs:restriction base="xs:int">
                                        <xs:totalDigits value="1"/>
                                        <xs:enumeration value="1">
                                            <xs:annotation>
                                                <xs:documentation>Own center</xs:documentation>
                                            </xs:annotation>
                                        </xs:enumeration>
                                        <xs:enumeration value="2">
                                            <xs:annotation>
                                                <xs:documentation>External center</xs:documentation>
                                            </xs:annotation>
                                        </xs:enumeration>
                                        <xs:enumeration value="3">
                                            <xs:annotation>
                                                <xs:documentation>Associated center</xs:documentation>
                                            </xs:annotation>
                                        </xs:enumeration>
                                        <xs:enumeration value="4">
                                            <xs:annotation>
                                                <xs:documentation>Other</xs:documentation>
                                            </xs:annotation>
                                        </xs:enumeration>
                                    </xs:restriction>
                                </xs:simpleType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

By example, if I put

get_value("CenterType", "1")

my function might return "Own center"

I'm using Python 3.8 with XMLSchema.

I wrote this snippet, and I get to print the tag of all elements

xsd_xml = xmlschema.XMLSchema(xsd_file)     
fichero = xsd_xml.elements["File"][0]
            
for elem in fichero:
    print(elem.tag)

But I need to access to the enumeration and documentation fields. How can I extract this data?


Solution

  • Finally, I solved my problem using LXML and XMLSchema namespace

    def get_value(self, field: str, code: str, file: str):
            
            desc = ""
                
            xsd_xml = ET.parse(file)
            search_elem = f".//{{http://www.w3.org/2001/XMLSchema}}element[@name='{field}']"
            element = xsd_xml.find(search_elem)
                
            search_enum = f".//{{http://www.w3.org/2001/XMLSchema}}enumeration[@value='{code}']"
            enumeration = element.find(search_enum)
                
            if enumeration is not None:
                documentation = enumeration.find(".//{http://www.w3.org/2001/XMLSchema}documentation")
                desc = documentation.text
            else:
                desc = "N/A"
                
            return desc