So, I've recently learnt BeautifulSoup and decided to scrape stock data from yahoo finance as an exercise.
This code right here only returns static prices of the stock, which is not updating
import requests
from bs4 import BeautifulSoup
def priceTracker():
ticker = 'TSLA'
url = f'https://finance.yahoo.com/quote/{ticker}?p={ticker}&.tsrc=fin-srch'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
price = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
return(price)
while True:
print(priceTracker())
I found a solution online, where people included a "header" argument in requests.get() in line 8, and it worked.
import requests
from bs4 import BeautifulSoup
def priceTracker():
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'}
ticker = 'TSLA'
url = f'https://finance.yahoo.com/quote/{ticker}?p={ticker}&.tsrc=fin-srch'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
price = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
return(price)
while True:
print(priceTracker())
My question is, why do the scraped prices on yahoo finance only update when the "header" is included? I don't understand why it behaves like that.
HTTP headers let the client and the server pass additional information with an HTTP request or response.
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'}
ticker = 'TSLA'
url = f'https://finance.yahoo.com/quote/{ticker}?p={ticker}&.tsrc=fin-srch'
response = requests.get(url, headers=headers)
Some sites require 'User-Agent'
to be included as additional information in the header to access.