I try to parse XML file to get NeedThisValue!!!
for one of the element tagged <Value>
. But there are several tags <Value>
in file. How I can get the right one under <Image>
branch?
This is example of my XML:
<Report xmlns=http://schemas.microsoft.com>
<AutoRefresh>0</AutoRefresh>
<DataSources>
<DataSource Name="DataSource2">
<Value>SourceAlpha</Value>
<rd:SecurityType>None</rd:SecurityType>
</DataSource>
</DataSources>
<Image Name="Image36">
<Source>Embedded</Source>
<Value>NeedThisValue!!!</Value>
<Sizing>FitProportional</Sizing>
</Image>
</Report>
And I'm using this code:
from bs4 import BeautifulSoup
with open(filepath, 'r') as f:
data = f.read()
Bs_data = BeautifulSoup(data, "xml")
b_unique = Bs_data.find_all('Value')
print(b_unique)
Result is below, I need second one only.
[<Value>SourceAlpha</Value>, <Value>NeedThisValue!!!</Value>]
As an alternative to the accepted solution from @Igel, you can reach it also with lxml and xpath():
from lxml import html
broken_xml = """<Report xmlns=http://schemas.microsoft.com>
<AutoRefresh>0</AutoRefresh>
<DataSources>
<DataSource Name="DataSource2">
<Value>SourceAlpha</Value>
<rd:SecurityType>None</rd:SecurityType>
</DataSource>
</DataSources>
<Image Name="Image36">
<Source>Embedded</Source>
<Value>NeedThisValue!!!</Value>
<Sizing>FitProportional</Sizing>
</Image>
</Report>
"""
tree = html.fromstring(broken_xml)
print(html.tostring(tree, pretty_print=True).decode())
value_elem = tree.xpath('//image[@name="Image36"]/value')[0]
print(value_elem.text)
Output:
<report xmlns="http://schemas.microsoft.com">
<autorefresh>0</autorefresh>
<datasources>
<datasource name="DataSource2">
<value>SourceAlpha</value>
<securitytype>None</securitytype>
</datasource>
</datasources>
<image name="Image36">
<source>Embedded</source>
<value>NeedThisValue!!!</value>
<sizing>FitProportional</sizing>
</image>
</report>
NeedThisValue!!!