I am trying to select the Properties section header in this 10K filing; and once selected from there I intend to to grab the text in that section (i.e. all text between the Properties and Legal Proceedings section headers.
When I run the code below I get the IndexError 'list index out of range' but I don't understand why since the text "PROPERTIES" appears to be within a 'p' tag. I have also tried using 'id="ITEM_2_PROPERTIES"' instead of text= but that didn't work either
Where am I going wrong?
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/ix?doc=/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
properties_header = soup.find_all('p', text="PROPERTIES")[0]
print(properties_header)
It's because you're making a request to a JS
rendered site, so there's no such p
with text PROPERTIES
.
However, if you change your target URL, there's one:
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231.htm'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
properties_header = soup.find_all('p', text="PROPERTIES")
print(properties_header)
Output:
[<p id="ITEM_2_PROPERTIES" style="margin-bottom:0pt;margin-top:0pt;font-weight:bold;font-style:normal;text-transform:none;font-variant: normal;font-family:Times New Roman;font-size:10pt;">PROPERTIES</p>]
I got the new target URL from the Developer Tool. This comes up when you turn JS
back on. So, I guess you should target that URL for your future requests.