I want to take the content of the first td
of a table in a HTML
document. For example, I have this table:
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>
<td>
This is the second sample text
</td>
</tr>
</table>
How can I use Beautifulsoup to take the text "This is a sample text"?
I use soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'})
to get
the whole table.
The target is http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
Note; since the html is a bit invalid - I think that we have to do some cleaning.
First find the table (as you are doing). Using find
rather than findall
returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0]
to take the first element of the list):
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
Then use find
again to find the first td
:
first_td = table.find('td')
Then use encode_contents()
to extract the textual contents:
text = first_td.encode_contents()
... and the job is done (though you may also want to use strip()
to remove leading and trailing spaces:
trimmed_text = text.strip()
This should give:
>>> print trimmed_text
This is a sample text
>>>
as desired.