pythonxmlpython-3.xsoapzeep

How to parse SOAP XML with Python?


Goal: Get the values inside <Name> tags and print them out. Simplified XML below.

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <soap:Body>
      <GetStartEndPointResponse xmlns="http://www.etis.fskab.se/v1.0/ETISws">
         <GetStartEndPointResult>
            <Code>0</Code>
            <Message />
            <StartPoints>
               <Point>
                  <Id>545</Id>
                  <Name>Get Me</Name>
                  <Type>sometype</Type>
                  <X>333</X>
                  <Y>222</Y>
               </Point>
               <Point>
                  <Id>634</Id>
                  <Name>Get me too</Name>
                  <Type>sometype</Type>
                  <X>555</X>
                  <Y>777</Y>
               </Point>
            </StartPoints>
         </GetStartEndPointResult>
      </GetStartEndPointResponse>
   </soap:Body>
</soap:Envelope>

Attempt:

import requests
from xml.etree import ElementTree

response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')

# XML parsing here
dom = ElementTree.fromstring(response.text)
names = dom.findall('*/Name')
for name in names:
    print(name.text)

I have read other people recommending zeep to parse soap xml but I found it hard to get my head around.


Solution

  • The issue here is dealing with the XML namespaces:

    import requests
    from xml.etree import ElementTree
    
    response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
    
    # define namespace mappings to use as shorthand below
    namespaces = {
        'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
        'a': 'http://www.etis.fskab.se/v1.0/ETISws',
    }
    dom = ElementTree.fromstring(response.content)
    
    # reference the namespace mappings here by `<name>:`
    names = dom.findall(
        './soap:Body'
        '/a:GetStartEndPointResponse'
        '/a:GetStartEndPointResult'
        '/a:StartPoints'
        '/a:Point'
        '/a:Name',
        namespaces,
    )
    for name in names:
        print(name.text)
    

    The namespaces come from the xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" and xmlns="http://www.etis.fskab.se/v1.0/ETISws" attributes on the Envelope and GetStartEndPointResponse nodes respectively.

    Keep in mind, a namespace is inherited by all children nodes of a parent even if the namespace isn't explicitly specified on the child's tag as <namespace:tag>.

    Note: I had to use response.content rather than response.body.