I'm trying to collect links to personal profiles and contacts from the following website: https://www.dlapiper.com/en-us/people#t=All&sort=relevancy&numberOfResults=100&f:CountriesID=[United%20Kingdom]
I'm using Selenium to do scraping via chromedriver and normally it works just fine - however, for this particular website I can't get to the source html where all the links to people's profiles would be visible.
I wrote a standard script which would normally work for any other dynamic website.
links = []
driver = webdriver.Chrome()
driver.get('https://www.dlapiper.com/en-gb/people#t=All&sort=%40lastname%20ascending&f:CountriesID=[United%20Kingdom]')
time.sleep(5)
cookies_button = driver.find_element(By.ID, "onetrust-reject-all-handler")
cookies_button.click()
time.sleep(5)
html = driver.page_source
time.sleep(5)
soup = BeautifulSoup(html, 'html.parser')
parse = soup.find_all('a')
for item in parse:
links.append(item.get('href'))
print(links)
However, links from the people search block can't get into the driver.page_source - even though I can find all the link elements when I press "inspect" in Chrome. I have tried increasing the time.sleep(), did not help.
I understand that there are lots of javascripts being executed on this page - maybe I need to activate some of them manually? Help would be much appreciated as I don't know Javascript.
The lawyer's contact details are in an iframe...
1 Frame ID myIframe
2 Frame Name Unused
3 Frame Title People Index Hosted Search Page
4 Frame Source https://www.dlapiper.com/en-US/coveosearchpages/people%20index%20hosted%20search%20page#t=All&sort=relevancy&f:CountriesID=[United%20States]
5 Frame Domain www.dlapiper.com
6 Type text/html
7 Mode CSS1Compat
8 Language en
9 Encoding UTF-8
10 Modified 06/03/2024 19:25:06
11 Load Time 2.52 seconds
12 Source Size 361 bytes
13 Position 0 - 291 pixels
14 Viewport 1903 x 1500 pixels
Is your script scanning inside child frames?