I am an absolute beginner to Web Scraping using Python and just knowing ver little about programming i Python. I am just trying to extract the information of the lawyers in the Tennesse location. In the webpage ,there are multiple links, within which there are further more links and within those are the various lawyers.
If kindly just could you tell me the steps which I should follow.
I hae done till extracting he links on the first page, but I only need links of the cities whereas I have got all the links with href
tags. Now how can I iterate them and proceed further?
from bs4 import BeautifulSoup as bs
import pandas as pd
res = requests.get('https://attorneys.superlawyers.com/tennessee/', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')
links = [item['href'] for item in soup.select('a')]
print(links)```
It is printing
````C:\Users\laptop\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/laptop/.PyCharmCE2017.1/config/scratches/scratch_1.py
['https://www.superlawyers.com', 'https://attorneys.superlawyers.com', 'https://ask.superlawyers.com', 'https://video.superlawyers.com',.... ````
All the links are extracted whereas I only need the links of the cities. Kindly help.
Faster would be use to use a parent id then select a
tags within
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://attorneys.superlawyers.com/tennessee/')
soup = bs(r.content, 'lxml')
cities = [item['href'] for item in soup.select('#browse_view a')]