I'm able to scrape a bunch of data from a webpage, but I'm struggling with extracting the specific content from subsections that have the exact same attributes and values. Here is the html:
<li class="highlight">
Relationship Issues
</li>
<li class="highlight">
Depression
</li>
<li class="highlight">
Spirituality
</li>
<li class="">
ADHD
</li>
<li class="">
Alcohol Use
</li>
<li class="">
Anger Management
</li>
Using that html as a reference I have the following:
import requests
from bs4 import BeautifulSoup
import html5lib
import re
headers = {'User-Agent': 'Mozilla/5.0'}
URL = "website.com"
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html5lib')
specialties = soup.find_all('div', {'class': 'spec-list attributes-top'})
for x in specialties:
Specialty_1 = x.find('li', {'class': 'highlight'}).text
Specialty_2 = x.find('li', {'class': 'highlight'}).text
Specialty_3 = x.find('li', {'class': 'highlight'}).text
So the ideal outcome is to have: Specialty_1 = Relationship Issues; Specialty_2 = Depression; Specialty_3 = Spirituality
AND
Issue_1 = ADHD; Issue_2 = Alcohol Use; Issue_3 = Anger Management
Would appreciate any and all help!
You could develop Andrej's dictionary idea and use if else based on class being present to determine prefix and extend the select to include the additional section. You need to reset the numbering for the new section e.g. with a flag
results = {}
flag = False
counter = 1
for j in soup.select(".specialties-list li, .attributes-issues li"):
if j['class']:
results[f'Specialty_{counter}'] = j.text.strip()
else:
if not flag:
counter = 1
flag = True
results[f'Issue_{counter}'] = j.text.strip()
counter +=1
print(results)