I am trying to access data(quote value) from an E-commerce website using the 'requests' library in python. The problem I have is that the cookies in the website are dynamic. And my code requires a header to get a response. I can open the website and scrape it but to do that I need to copy the header details from the response header. However I need to automate this process so that I don't need to manually put the cookie in every time I want to scrape. This the link "https://www.nseindia.com/get-quotes/equity?symbol=RELIANCE". I am trying to get the 'Intraday chart' data so I can store it in a DataFrame and plot it.
I am a beginner and I have never web scraped before.
This is what I have tried so far.
import requests
import pandas as pd
# I take this data from the website response headers
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br, zstd',
'Accept-Language': 'en-US,en;q=0.5',
'Cookie' : 'Cookie Value'
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
response = requests.get(url = 'https://www.nseindia.com/api/chart-databyindex?index=RELIANCEEQN', headers = headers)
response.json()['grapthData']
reliance = pd.DataFrame(response.json()['grapthData'])
reliance.columns = ['Timestamp', 'Price']
reliance['Timestamp'] = pd.to_datetime(reliance['Timestamp'], unit = 'ms' )
reliance.plot(x = 'Timestamp', y = 'Price')
To scrape data from a website with dynamic cookies, you'll need to handle cookies and headers automatically. One way to achieve this is by using the requests library along with requests.Session to manage and persist cookies across multiple requests. Additionally, you can use the BeautifulSoup library to parse the HTML content if necessary.
Here's a more automated approach to handling dynamic cookies and headers:
Example:
import requests
# Initialize a session to handle cookies
session = requests.Session()
# Initial request to get the dynamic cookies
url = 'https://www.nseindia.com/get-quotes/equity?symbol=RELIANCE'
initial_response = session.get(url)
# Now make the actual request to get the intraday chart data
data_url = 'https://www.nseindia.com/api/chart-databyindex?index=RELIANCEEQN'
response = session.get(data_url)
Using Selenium to automate the browser and extract cookies can be an effective approach to handle dynamic cookies. Here’s how you can use Selenium to open the webpage, retrieve the cookies, and then use these cookies in the requests library to fetch the required data.
pip install requests pandas selenium
Example:
from selenium import webdriver
import requests
import pandas as pd
import time
# Initialize the Selenium WebDriver
driver = webdriver.Chrome()
# Open the URL using Selenium
url = 'https://www.nseindia.com/get-quotes/equity?symbol=RELIANCE'
driver.get(url)
time.sleep(5)
# Extract cookies from the Selenium browser session
cookies = driver.get_cookies()
# Close the Selenium browser
driver.quit()
# Create a dictionary of cookies for the requests session
cookies_dict = {cookie['name']: cookie['value'] for cookie in cookies}
# Initialize a session to handle cookies
session = requests.Session()
# Set the headers for the session
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br, zstd',
'Accept-Language': 'en-US,en;q=0.5',
'X-Requested-With': 'XMLHttpRequest'
})
# Set the cookies for the session
session.cookies.update(cookies_dict)
# Make the actual request to get the intraday chart data
data_url = 'https://www.nseindia.com/api/chart-databyindex?index=RELIANCEEQN'
response = session.get(data_url)
This approach combines the automation capabilities of Selenium with the simplicity and efficiency of the requests library for making HTTP requests, ensuring you can handle dynamic cookies without manual intervention.