I am trying to parse xml with namespace and attributes. I'm using XML library in Python and since I'm new with this, cannot find solution even I checked over this forum, there are similar questions but not same structure of XML document as I have.
This is my XML:
<?xml version='1.0' encoding='UTF-8'?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cec="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sbt="http://mfin.gov.rs/srbdt/srbdtext" xmlns:urn="oasis:names:specification:ubl:schema:xsd:Invoice-2">
<cbc:ID>IF149-0111/24</cbc:ID>
<cac:InvoiceLine>
<cbc:ID>1</cbc:ID>
<cbc:InvoicedQuantity unitCode="H87">3.00</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="RSD">26574.00</cbc:LineExtensionAmount>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="RSD">5314.80</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxAmount currencyID="RSD">5314.800</cbc:TaxAmount>
<cbc:Percent>20.0</cbc:Percent>
<cac:TaxCategory>
<cbc:ID>S</cbc:ID>
<cbc:Name>20%</cbc:Name>
<cbc:Percent>20.0</cbc:Percent>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:Item>
<cbc:Description>[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select</cbc:Description>
<cbc:Name>[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select</cbc:Name>
<cac:ClassifiedTaxCategory>
<cbc:ID>S</cbc:ID>
<cbc:Name>20%</cbc:Name>
<cbc:Percent>20.0</cbc:Percent>
</cac:ClassifiedTaxCategory>
</cac:Item>
</cac:InvoiceLine>
<cac:InvoiceLine>
<cbc:ID>2</cbc:ID>
<cbc:InvoicedQuantity unitCode="H87">1.00</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="RSD">600.00</cbc:LineExtensionAmount>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="RSD">120.00</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxAmount currencyID="RSD">120.000</cbc:TaxAmount>
<cbc:Percent>20.0</cbc:Percent>
<cac:TaxCategory>
<cbc:ID>S</cbc:ID>
<cbc:Name>20%</cbc:Name>
<cbc:Percent>20.0</cbc:Percent>
<cbc:TaxExemptionReason></cbc:TaxExemptionReason>
<cac:TaxScheme>
<cbc:ID schemeID="UN/ECE 5153" schemeAgencyID="6">VAT</cbc:ID>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:Item>
<cbc:Description>[U11124116] Usluga transporta</cbc:Description>
<cbc:Name>[U11124116] Usluga transporta</cbc:Name>
<cac:SellersItemIdentification>
<cbc:ID>U11124116</cbc:ID>
</cac:SellersItemIdentification>
<cac:ClassifiedTaxCategory>
<cbc:ID>S</cbc:ID>
<cbc:Name>20%</cbc:Name>
<cbc:Percent>20.0</cbc:Percent>
<cbc:TaxExemptionReason></cbc:TaxExemptionReason>
<cac:TaxScheme>
<cbc:ID schemeID="UN/ECE 5153" schemeAgencyID="6">VAT</cbc:ID>
</cac:TaxScheme>
</cac:ClassifiedTaxCategory>
</cac:Item>
<cac:Price>
<cbc:PriceAmount currencyID="RSD">600.00</cbc:PriceAmount>
<cbc:BaseQuantity unitCode="H87">1.00</cbc:BaseQuantity>
</cac:Price>
</cac:InvoiceLine>
</Invoice>
I tried with:
import xml.etree.ElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for x in root.findall('.//'):
print(x.tag, " ", x.get('InvoiceLine'))
and I get some result
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}CustomizationID None
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}ID None
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate None
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DueDate None
But I need to extract following values in "InvoiceLine" section:
this worked
namespaces = {
'cbc': 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2',
'cac': 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2',
}
# Extract the parameters you need
for invoice_line in root.findall('.//cac:InvoiceLine', namespaces):
invoiced_quantity = invoice_line.find('cbc:InvoicedQuantity', namespaces)
line_extension_amount = invoice_line.find('cbc:LineExtensionAmount', namespaces)
tax_total = invoice_line.find('cac:TaxTotal', namespaces)
tax_amount = tax_total.find('cbc:TaxAmount', namespaces)
tax_subtotal = tax_total.find('cac:TaxSubtotal', namespaces)
percent = tax_subtotal.find('cbc:Percent', namespaces)
Edit:
I save your xml and ran this
import xml.etree.ElementTree as ET
# Define the file path
file_path = 'input.xml'
# Parse the XML file
namespace = {
'cbc': 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2',
'cac': 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2'
}
# Load and parse the XML file
tree = ET.parse(file_path)
root = tree.getroot()
# Extracting invoice ID
invoice_id = root.find('cbc:ID', namespace).text
# Extracting invoice lines
invoice_lines = []
for line in root.findall('cac:InvoiceLine', namespace):
line_id = line.find('cbc:ID', namespace).text
quantity = line.find('cbc:InvoicedQuantity', namespace).text
line_amount = line.find('cbc:LineExtensionAmount', namespace).text
item_description = line.find('cac:Item/cbc:Description', namespace).text
item_name = line.find('cac:Item/cbc:Name', namespace).text
invoice_lines.append({
'line_id': line_id,
'quantity': quantity,
'line_amount': line_amount,
'item_description': item_description,
'item_name': item_name
})
# Output the extracted data
print(f"Invoice ID: {invoice_id}")
print("Invoice Lines:")
for line in invoice_lines:
print(line)
gave following result
nvoice ID: IF149-0111/24
Invoice Lines:
{'line_id': '1', 'quantity': '3.00', 'line_amount': '26574.00', 'item_description': '[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select', 'item_name': '[P11190420] Toner Cartridge Brother DCP5500/MFC L 5700/6800 TN3410/3480 Katun Select'}
{'line_id': '2', 'quantity': '1.00', 'line_amount': '600.00', 'item_description': '[U11124116] Usluga transporta', 'item_name': '[U11124116] Usluga transporta'}