I try to parse XML file to get NeedThisValue!!!
for one of the element tagged <Value>
. But there are several tags <Value>
in file. How I can get the right one under <Image>
branch?
This is example of my XML:
<Report xmlns=http://schemas.microsoft.com>
<AutoRefresh>0</AutoRefresh>
<DataSources>
<DataSource Name="DataSource2">
<Value>SourceAlpha</Value>
<rd:SecurityType>None</rd:SecurityType>
</DataSource>
</DataSources>
<Image Name="Image36">
<Source>Embedded</Source>
<Value>NeedThisValue!!!</Value>
<Sizing>FitProportional</Sizing>
</Image>
</Report>
And I'm using this code:
from bs4 import BeautifulSoup
with open(filepath, 'r') as f:
data = f.read()
Bs_data = BeautifulSoup(data, "xml")
b_unique = Bs_data.find_all('Value')
print(b_unique)
Result is below, I need second one only.
[<Value>SourceAlpha</Value>, <Value>NeedThisValue!!!</Value>]
As mentioned you could be more specific in your selection:
Bs_data.select('Image Value')
to get just the first matching tag:
Bs_data.select_one('Image Value')
Used css selectors
here to chain the tags.
from bs4 import BeautifulSoup
xml = '''<Report xmlns=http://schemas.microsoft.com>
<AutoRefresh>0</AutoRefresh>
<DataSources>
<DataSource Name="DataSource2">
<Value>SourceAlpha</Value>
<rd:SecurityType>None</rd:SecurityType>
</DataSource>
</DataSources>
<Image Name="Image36">
<Source>Embedded</Source>
<Value>NeedThisValue!!!</Value>
<Sizing>FitProportional</Sizing>
</Image>
</Report>'''
Bs_data = BeautifulSoup(xml, 'xml')
## iterating resultset
for item in Bs_data.select('Image Value'):
print(item.get_text(strip=True))
## or using the first result only
print(Bs_data.select_one('Image Value').get_text(strip=True)).get_text(strip=True)
In addition based on comment - how to extract attribute value - simply treating the tag as a dictionary:
## iterating resultset of image tags
for item in Bs_data.select('Image'):
print(item.get('Name'))
print(item.Value.get_text(strip=True))