How can I reach the text div? This is a website divs and I would like to get datas from different divs. There are more article2 divs in the col... divs. I need every text data. But my code don't working because i don't know how to reach the different divs with same time(col_6...,col_3... divs).
My code:
article_title = div.find('div', attrs={'class':'article2'}).find('div', attrs={'class':'text'}).find('h1')
The site code:
<div class="row">
<div class="col_6 ct_12 cd_12 cm_12">
<a href="https://kronikaonline.ro/erdelyi-hirek/uralkodasanak-helyszinen-a-gyulafehervari-varban-allitanak-emleket-bethlen-gabor-erdelyi-fejedelemnek">
<div class="article2" style="padding-top:0px;">
<div class="text">
<h1>TITLE</h1>
</div>
</div>
</a>
</div>
<div class="col_3 ct_12 cd_12 cm_12">
</div>
</div>
You could use find_all()
or css selectors
to select all your articles and iterate the ResultSet
to get all information you like to scrape:
for a in soup.select('div.article2'):
print(f"title: {a.h1.text}")
print(f"url: {a.find_previous('a').get('href')}")
Extract data and store in list of dicts:
from bs4 import BeautifulSoup
html = '''
<div class="row">
<div class="col_6 ct_12 cd_12 cm_12">
<a href="url1">
<div class="article2" style="padding-top:0px;">
<div class="text">
<h1>TITLE1</h1>
</div>
</div>
</a>
</div>
<div class="col_6 ct_12 cd_12 cm_12">
<a href="url2">
<div class="article2" style="padding-top:0px;">
<div class="text">
<h1>TITLE2</h1>
</div>
</div>
</a>
</div>
</div>
'''
soup = BeautifulSoup(html)
data = []
for a in soup.select('div.article2'):
data.append({
'title': a.h1.text,
'url': a.find_previous('a').get('href')
})
print(data)
[{'title': 'TITLE1', 'url': 'url1'}, {'title': 'TITLE2', 'url': 'url2'}]