pythonhtmlparsingweb-scrapingstocks

Parsing stock recommended rating from Yahoo stock site


I'm looking to parse a specific Yahoo stock page using a Python script (take https://finance.yahoo.com/quote/NOA?ltr=1 for example) and print the "Recommended Rating" to a file. Recommended rating can be found on the right hand side of the page about half way down.

This is what I have so far

  try:
    import urllib.request as urllib2
except ImportError:
    import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://finance.yahoo.com/quote/NOA?ltr=1'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, "html.parser")
name_box = soup.find(attrs={'div': 'rating-text Arrow South Fw(b) Bgc($strongBuy) Bdtc($strongBuy)'})
name = name_box.text.strip()
print(name)

The tricky part is that I believe the recommended rating is only listed on the page as InnerHTML. I'm not sure how i'd go about retrieving this data, a push in the right direction would be greatly appreciated!


Solution

  • Yahoo makes a get request to the url in the script below for some of their data. If you look in the network tab of the developer tools and refresh the page for NOA stock you should see 'NOA?formatt...'. Click this and then view the response object to see some of the data. You'll need the requests module for the script below to work: pip install requests.

    # get_mean_recs.py
    import csv
    from datetime import datetime
    import requests
    import sys
    
    get_date = lambda : datetime.utcnow().strftime('%d-%m-%Y')
    
    lhs_url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'
    rhs_url = '?formatted=true&crumb=swg7qs5y9UP&lang=en-US&region=US&' \
              'modules=upgradeDowngradeHistory,recommendationTrend,' \
              'financialData,earningsHistory,earningsTrend,industryTrend&' \
              'corsDomain=finance.yahoo.com'
    
    def get_mean_rec(ticker):
        url =  lhs_url + ticker + rhs_url
        r = requests.get(url)
        if not r.ok:
            return -1
        result = r.json()['quoteSummary']['result'][0]
        return result['financialData']['recommendationMean']['fmt']
    
    def read_from_csv(fn):
        with open(fn, 'r') as f:
            reader = csv.reader(f)
            for line in reader:
                for ticker in line:
                    yield ticker
    
    def write_to_csv(fn, data):
        with open(fn, 'a') as f:
            fieldnames = data[0].keys()
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            for item in data:
                writer.writerow(item)
    
    def assemble_dict(ticker):
        return {
            'ticker': ticker,
            'mean_rec': get_mean_rec(ticker),
            'utc_date': get_date()
        }
    
    def main():
        in_fn = sys.argv[1]
        out_fn = sys.argv[2]
        data = [assemble_dict(ticker) for ticker in read_from_csv(in_fn)]
        write_to_csv(out_fn, data)
    
    if __name__ == '__main__':
        main()
    

    Usage:

    python get_mean_recs.py input.csv output.csv