pythonpython-3.xweb-scrapingpython-requests

How to avoid copy and paste hardcoded cookies from the network panel to make dynamic requests?


I've written a script using the requests module that fetches the names from the second column of the table named 'Mutual Funds' on this webpage.

The script works only when I include hardcoded cookies in the headers, copied from the network panel.

How can I automatically include cookies in the script instead of copying and pasting the hardcoded ones each time I run it?

import json
import requests
from bs4 import BeautifulSoup

link = 'https://query1.finance.yahoo.com/v1/finance/screener'
params = {
    'formatted': 'true',
    'useRecordsResponse': 'true',
    'lang': 'en-US',
    'region': 'US',
    'crumb': 'zKNQf6Chboq',
}

payload = {"size":25,"offset":0,"sortType":"DESC","sortField":"fundnetassets","includeFields":["ticker","companyshortname","intradaypricechange","percentchange","intradayprice","trailing_ytd_return","trailing_3m_return","annualreturnnavy1","annualreturnnavy3","annualreturnnavy5","annualreportnetexpenseratio","annualreportgrossexpenseratio","fundnetassets","performanceratingoverall","fiftydaymovingavg","twohundreddaymovingavg","day_open_price","fiftytwowklow","fiftytwowkhigh"],"topOperator":"AND","query":{"operator":"and","operands":[{"operator":"or","operands":[{"operator":"eq","operands":["exchange","NAS"]}]}]},"quoteType":"MUTUALFUND"}

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    # 'cookie': 'A1=d=AQABBFAteGcCENecbGhV1dr_-0vFWvPHWDQFEgEBCAFIh2exZxH3bmUB_eMBAAcIUC14Z_PHWDQ&S=AQAAAswrXYNs-vA3GNpmkVtrHlo;'
}

res = requests.post(link,params=params,json=payload,headers=headers)
print(res.status_code)
for item in res.json()['finance']['result']:
    for elem in item['records']:
        print(elem['companyName'])

Solution

  • Answering your focus - You can use requests.Session() to automatically manage cookies across requests. This means the cookies obtained during the initial GET request to Yahoo Finance are automatically included in subsequent requests, eliminating the need to hardcode cookies.

    import requests
    import json
    
    # Create a session to manage cookies
    session = requests.Session()
    
    # Initial GET request to Yahoo Finance to fetch the page
    yahoo_url = 'https://finance.yahoo.com/research-hub/screener/mutualfunds/'
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    }
    
    response = session.get(yahoo_url, headers=headers)
    
    # API endpoint and parameters for the POST request
    api_url = 'https://query1.finance.yahoo.com/v1/finance/screener'
    
    ...
    
    # POST request using the session
    response = session.post(api_url, params=params, json=payload, headers=headers)
    

    Be aware this will not fix the interaction totally because Yahoo Finance uses a crumb system to prevent unauthorized API access, and the crumb needs to be dynamically obtained each time you access the API..

    So your next focus/question should be How to fetch the crumb?