The following correct & useful answer was provided to question How to filter on this artifact in the HTML?
from bs4 import BeautifulSoup
import requests
page = requests.get("https://finance.yahoo.com/quote/GOOGL?p=GOOGL")
soup = BeautifulSoup(page.content, 'html.parser')
soup.select_one('fin-streamer[data-symbol="GOOGL"]')['value']
I find the same fin-streamer[data-symbol="GOOGL"]
on https://finance.yahoo.com/quote/GOOGL/key-statistics but adjusting the code (below) does not work for that page
from bs4 import BeautifulSoup
import requests
page = requests.get("https://finance.yahoo.com/quote/GOOGL/key-statistics")
soup = BeautifulSoup(page.content, 'html.parser')
soup.select_one('fin-streamer[data-symbol="GOOGL"]')['value']
I get this:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-3b1c0b9e0480> in <module>
3 page = requests.get("https://finance.yahoo.com/quote/GOOGL/key-statistics")
4 soup = BeautifulSoup(page.content, 'html.parser')
----> 5 soup.select_one('fin-streamer[data-symbol="GOOGL"]')['value']
TypeError: 'NoneType' object is not subscriptable
Could you help me find out why?
The reason for my request is that I am seeking to avoid having to load / parse the quote page that is otherwise not needed or useful compared to the key statistics page.
Always and first of all, take a look at your soup to see if all the expected ingredients are in place.
You have to add a user-agent
to your request headers to get the right source back from the server:
page = requests.get("https://finance.yahoo.com/quote/GOOGL/key-statistics", headers={'user-agent':'some-agent'})
from bs4 import BeautifulSoup
import requests
page = requests.get("https://finance.yahoo.com/quote/GOOGL/key-statistics", headers={'user-agent':'some-agent'})
soup = BeautifulSoup(page.content, 'html.parser')
soup.select_one('fin-streamer[data-symbol="GOOGL"]')['value']