pythonpandascsvhttp-status-code-429

Python HTTP error 429 (Too Many Requests)


I used to fetch a CSV file from a URL and put that CSV file directly to a Pandas dataframe like this:

import pandas as pd

grab_csv = 'https://XXXX.XX/data.csv'
pd_data = pd.read_csv(grab_csv).drop(columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 4', 'Column 5', 'Column 6', 'Column 7'])

Since today, I get urllib.error.HTTPError: HTTP Error 429: Too Many Requests. What I tried in order to fix it:

import pandas as pd
import requests
from io import StringIO

grab_csv = 'https://XXXX.XX/data.csv'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
        
res_grab_data = requests.get(StringIO(grab_csv), headers=headers).text

pd_data = pd.read_csv(res_grab_data).drop(columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 4', 'Column 5', 'Column 6', 'Column 7'])

This time, I get the error requests.exceptions.MissingSchema: Invalid URL '<_io.StringIO object at 0x0000012B7C622A20>': No schema supplied. Perhaps you meant http://<_io.StringIO object at 0x0000012B7C622A20>?.

Any idea how I can solve the HTTP Error 429 with pandas and requests?


Solution

  • The error is being thrown by the web server that you are making the requests to, almost certainly because you're issuing requests too quickly and they don't like it. It's not because of an error in your code.

    Your attempt at fixing it doesn't make much sense -- StringIO allows you to use an in-memory string as if it were a file object. Passing it as a parameter to requests.get isn't really a valid use case -- you should be using requests.get(grab_csv, ... as you were previously, as .get() expects the url parameter to be a string.

    I'd consult the documentation for the API your using (if there is any), and slow down your rate of requests to be in line with their limits.

    There is a neat Python package (aptly named ratelimit) that lets you decorate your function to enforce the rate limiting: https://pypi.org/project/ratelimit/