[SOLVED] Web scraping for multiple classes using python

Web scraping for multiple classes using python

I am trying to scrape address from 10K filing document in HTML: https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm

It has multiple div class, and I want to scrape for address inside span.

Expected output:

1600 Amphitheatre parkway

I have tried few things like below:

from requests_html import HTMLSession

s = HTMLSession()
r = s.get('https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm')
r

add1 = r.html.find_all('div')
add1

However, if you inspect the page it has many layers I am new to HTML and python. Please help

Solution

You could do it like this, but I'm not sure it's very robust, or applicable to many examples given how the ids look...

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
page = session.get('https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm')
soup = BeautifulSoup(page.content, 'html.parser')

content = soup.find(id="d92517213e644-wk-Fact-0B11263160365DBABCF89969352EE602")
print(content.text)

output

1600 Ampitheatre Parkway

Edit : I didn't see @baduker answer and I didn't know there was an API, he is right, use the API