pythonxmlbeautifulsoup

Python for xml parsing, how to track correct tree


I try to parse XML file to get NeedThisValue!!! for one of the element tagged <Value>. But there are several tags <Value> in file. How I can get the right one under <Image> branch? This is example of my XML:

<Report xmlns=http://schemas.microsoft.com>
  <AutoRefresh>0</AutoRefresh>
  <DataSources>
    <DataSource Name="DataSource2">
      <Value>SourceAlpha</Value>
      <rd:SecurityType>None</rd:SecurityType>
    </DataSource>
  </DataSources>
  <Image Name="Image36">
    <Source>Embedded</Source>
        <Value>NeedThisValue!!!</Value>
        <Sizing>FitProportional</Sizing>
  </Image>
</Report>  

And I'm using this code:

from bs4 import BeautifulSoup
    
   with open(filepath, 'r') as f:
       data = f.read()
       Bs_data = BeautifulSoup(data, "xml")
       b_unique = Bs_data.find_all('Value')
       print(b_unique)

Result is below, I need second one only.

[<Value>SourceAlpha</Value>, <Value>NeedThisValue!!!</Value>]

Solution

  • As an alternative to the accepted solution from @Igel, you can reach it also with lxml and xpath():

    from lxml import html
    
    broken_xml = """<Report xmlns=http://schemas.microsoft.com>
      <AutoRefresh>0</AutoRefresh>
      <DataSources>
        <DataSource Name="DataSource2">
          <Value>SourceAlpha</Value>
          <rd:SecurityType>None</rd:SecurityType>
        </DataSource>
      </DataSources>
      <Image Name="Image36">
        <Source>Embedded</Source>
            <Value>NeedThisValue!!!</Value>
            <Sizing>FitProportional</Sizing>
      </Image>
    </Report>
    """
    
    tree = html.fromstring(broken_xml)
    print(html.tostring(tree, pretty_print=True).decode())
    
    value_elem = tree.xpath('//image[@name="Image36"]/value')[0]
    print(value_elem.text)
    

    Output:

    <report xmlns="http://schemas.microsoft.com">
      <autorefresh>0</autorefresh>
      <datasources>
        <datasource name="DataSource2">
          <value>SourceAlpha</value>
          <securitytype>None</securitytype>
        </datasource>
      </datasources>
      <image name="Image36">
        <source>Embedded</source>
            <value>NeedThisValue!!!</value>
            <sizing>FitProportional</sizing>
      </image>
    </report>
    
    
    NeedThisValue!!!