python python-3.x web-scraping python-requests

How to avoid copy and paste hardcoded cookies from the network panel to make dynamic requests?

I've written a script using the requests module that fetches the names from the second column of the table named 'Mutual Funds' on this webpage.

The script works only when I include hardcoded cookies in the headers, copied from the network panel.

How can I automatically include cookies in the script instead of copying and pasting the hardcoded ones each time I run it?

import json
import requests
from bs4 import BeautifulSoup

link = 'https://query1.finance.yahoo.com/v1/finance/screener'
params = {
    'formatted': 'true',
    'useRecordsResponse': 'true',
    'lang': 'en-US',
    'region': 'US',
    'crumb': 'zKNQf6Chboq',
}

payload = {"size":25,"offset":0,"sortType":"DESC","sortField":"fundnetassets","includeFields":["ticker","companyshortname","intradaypricechange","percentchange","intradayprice","trailing_ytd_return","trailing_3m_return","annualreturnnavy1","annualreturnnavy3","annualreturnnavy5","annualreportnetexpenseratio","annualreportgrossexpenseratio","fundnetassets","performanceratingoverall","fiftydaymovingavg","twohundreddaymovingavg","day_open_price","fiftytwowklow","fiftytwowkhigh"],"topOperator":"AND","query":{"operator":"and","operands":[{"operator":"or","operands":[{"operator":"eq","operands":["exchange","NAS"]}]}]},"quoteType":"MUTUALFUND"}

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    # 'cookie': 'A1=d=AQABBFAteGcCENecbGhV1dr_-0vFWvPHWDQFEgEBCAFIh2exZxH3bmUB_eMBAAcIUC14Z_PHWDQ&S=AQAAAswrXYNs-vA3GNpmkVtrHlo;'
}

res = requests.post(link,params=params,json=payload,headers=headers)
print(res.status_code)
for item in res.json()['finance']['result']:
    for elem in item['records']:
        print(elem['companyName'])

Solution

Answering your focus - You can use requests.Session() to automatically manage cookies across requests. This means the cookies obtained during the initial GET request to Yahoo Finance are automatically included in subsequent requests, eliminating the need to hardcode cookies.

import requests
import json

# Create a session to manage cookies
session = requests.Session()

# Initial GET request to Yahoo Finance to fetch the page
yahoo_url = 'https://finance.yahoo.com/research-hub/screener/mutualfunds/'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
}

response = session.get(yahoo_url, headers=headers)

# API endpoint and parameters for the POST request
api_url = 'https://query1.finance.yahoo.com/v1/finance/screener'

...

# POST request using the session
response = session.post(api_url, params=params, json=payload, headers=headers)

Be aware this will not fix the interaction totally because Yahoo Finance uses a crumb system to prevent unauthorized API access, and the crumb needs to be dynamically obtained each time you access the API..

So your next focus/question should be How to fetch the crumb?