pythonxmlelementtreexml-namespaces

Parsing XML document with namespaces in Python


I am trying to parse xml with namespace and attributes. I'm using XML library in Python and since I'm new with this, cannot find solution even I checked over this forum, there are similar questions but not same structure of XML document as I have.

This is my XML:

<?xml version='1.0' encoding='UTF-8'?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cec="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sbt="http://mfin.gov.rs/srbdt/srbdtext" xmlns:urn="oasis:names:specification:ubl:schema:xsd:Invoice-2">
  <cbc:ID>IF149-0111/24</cbc:ID>
  <cac:InvoiceLine>
    <cbc:ID>1</cbc:ID>
    <cbc:InvoicedQuantity unitCode="H87">3.00</cbc:InvoicedQuantity>
    <cbc:LineExtensionAmount currencyID="RSD">26574.00</cbc:LineExtensionAmount>
    <cac:TaxTotal>
      <cbc:TaxAmount currencyID="RSD">5314.80</cbc:TaxAmount>
      <cac:TaxSubtotal>
        <cbc:TaxAmount currencyID="RSD">5314.800</cbc:TaxAmount>
        <cbc:Percent>20.0</cbc:Percent>
        <cac:TaxCategory>
          <cbc:ID>S</cbc:ID>
          <cbc:Name>20%</cbc:Name>
          <cbc:Percent>20.0</cbc:Percent>
        </cac:TaxCategory>
      </cac:TaxSubtotal>
    </cac:TaxTotal>
    <cac:Item>
      <cbc:Description>[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select</cbc:Description>
      <cbc:Name>[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select</cbc:Name>
      <cac:ClassifiedTaxCategory>
        <cbc:ID>S</cbc:ID>
        <cbc:Name>20%</cbc:Name>
        <cbc:Percent>20.0</cbc:Percent>
      </cac:ClassifiedTaxCategory>
    </cac:Item>
  </cac:InvoiceLine>
  <cac:InvoiceLine>
    <cbc:ID>2</cbc:ID>
    <cbc:InvoicedQuantity unitCode="H87">1.00</cbc:InvoicedQuantity>
    <cbc:LineExtensionAmount currencyID="RSD">600.00</cbc:LineExtensionAmount>
    <cac:TaxTotal>
      <cbc:TaxAmount currencyID="RSD">120.00</cbc:TaxAmount>
      <cac:TaxSubtotal>
        <cbc:TaxAmount currencyID="RSD">120.000</cbc:TaxAmount>
        <cbc:Percent>20.0</cbc:Percent>
        <cac:TaxCategory>
          <cbc:ID>S</cbc:ID>
          <cbc:Name>20%</cbc:Name>
          <cbc:Percent>20.0</cbc:Percent>
          <cbc:TaxExemptionReason></cbc:TaxExemptionReason>
          <cac:TaxScheme>
            <cbc:ID schemeID="UN/ECE 5153" schemeAgencyID="6">VAT</cbc:ID>
          </cac:TaxScheme>
        </cac:TaxCategory>
      </cac:TaxSubtotal>
    </cac:TaxTotal>
    <cac:Item>
      <cbc:Description>[U11124116] Usluga transporta</cbc:Description>
      <cbc:Name>[U11124116] Usluga transporta</cbc:Name>
      <cac:SellersItemIdentification>
        <cbc:ID>U11124116</cbc:ID>
      </cac:SellersItemIdentification>
      <cac:ClassifiedTaxCategory>
        <cbc:ID>S</cbc:ID>
        <cbc:Name>20%</cbc:Name>
        <cbc:Percent>20.0</cbc:Percent>
        <cbc:TaxExemptionReason></cbc:TaxExemptionReason>
        <cac:TaxScheme>
          <cbc:ID schemeID="UN/ECE 5153" schemeAgencyID="6">VAT</cbc:ID>
        </cac:TaxScheme>
      </cac:ClassifiedTaxCategory>
    </cac:Item>
    <cac:Price>
      <cbc:PriceAmount currencyID="RSD">600.00</cbc:PriceAmount>
      <cbc:BaseQuantity unitCode="H87">1.00</cbc:BaseQuantity>
    </cac:Price>
  </cac:InvoiceLine>
</Invoice>

I tried with:

import xml.etree.ElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for x in root.findall('.//'):
    print(x.tag, " ", x.get('InvoiceLine'))

and I get some result

{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}CustomizationID   None
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}ID   None
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate   None
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DueDate   None

But I need to extract following values in "InvoiceLine" section:


Solution

  • this worked

    namespaces = {
        'cbc': 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2',
        'cac': 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2',
    }
    
    # Extract the parameters you need
    for invoice_line in root.findall('.//cac:InvoiceLine', namespaces):
        invoiced_quantity = invoice_line.find('cbc:InvoicedQuantity', namespaces)
        line_extension_amount = invoice_line.find('cbc:LineExtensionAmount', namespaces)
        tax_total = invoice_line.find('cac:TaxTotal', namespaces)
        tax_amount = tax_total.find('cbc:TaxAmount', namespaces)
        tax_subtotal = tax_total.find('cac:TaxSubtotal', namespaces)
        percent = tax_subtotal.find('cbc:Percent', namespaces)
    

    Edit:

    I save your xml and ran this

    import xml.etree.ElementTree as ET
    
    # Define the file path
    file_path = 'input.xml'
    
    # Parse the XML file
    namespace = {
        'cbc': 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2',
        'cac': 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2'
    }
    
    # Load and parse the XML file
    tree = ET.parse(file_path)
    root = tree.getroot()
    
    # Extracting invoice ID
    invoice_id = root.find('cbc:ID', namespace).text
    
    # Extracting invoice lines
    invoice_lines = []
    for line in root.findall('cac:InvoiceLine', namespace):
        line_id = line.find('cbc:ID', namespace).text
        quantity = line.find('cbc:InvoicedQuantity', namespace).text
        line_amount = line.find('cbc:LineExtensionAmount', namespace).text
        item_description = line.find('cac:Item/cbc:Description', namespace).text
        item_name = line.find('cac:Item/cbc:Name', namespace).text
        
        invoice_lines.append({
            'line_id': line_id,
            'quantity': quantity,
            'line_amount': line_amount,
            'item_description': item_description,
            'item_name': item_name
        })
    
    # Output the extracted data
    print(f"Invoice ID: {invoice_id}")
    print("Invoice Lines:")
    for line in invoice_lines:
        print(line)
    

    gave following result

    nvoice ID: IF149-0111/24
    Invoice Lines:
    {'line_id': '1', 'quantity': '3.00', 'line_amount': '26574.00', 'item_description': '[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select', 'item_name': '[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select'}
    {'line_id': '2', 'quantity': '1.00', 'line_amount': '600.00', 'item_description': '[U11124116] Usluga transporta', 'item_name': '[U11124116] Usluga transporta'}