pythonbeautifulsoup

Python: Too many requests


I have made a python program, which parses the the sub-reddits page and make a list of them. But the problem is whenever I try to run this program, reddit server always give me error: 429, 'too many requests'.

How can I bring down the number of requests made, so that I am not rate limited?

from bs4 import BeautifulSoup as bs
from time import sleep
import requests as req

html = req.get('http://www.reddit.com/')
print html
soup = bs(html.text)

# http://www.reddit.com/subreddits/
link_to_sub_reddits = soup.find('a',id='sr-more-link')['href']

print link_to_sub_reddits

L=[]

for navigate_the_pages in xrange(1):

        res = req.get(link_to_sub_reddits)

        soup = bs(res.text)
        # soup created
        print soup.text

        div = soup.body.find('div', class_=lambda(class_):class_ and class_=='content')
        div = div.find('div', id= lambda(id):id and id=='siteTable')

        cnt=0

        for iterator in div:

            div_thing = div.contents[cnt]
            
            if not div_thing=='' and div_thing.name=='div' and 'thing' in div_thing['class']:
                
                div_entry = div_thing.find('a',class_=lambda(class_):class_ and 'entry' in class_)
                # div with class='entry......'
                
                link = div_entry.find('a')['href']
                # link of the subreddit
                name_of_sub = link.split('/')[-2]
                # http://www.reddit.com/subreddits/
                # ['http:', '', 'www.reddit.com', 'subreddits', '']

                description = div_entry.find('strong').text
                # something about the community

                p_tagline = div_entry.find('p',class_='tagline')
                subscribers = p_tagline.find('span',class_='number').text

                L.append((name_of_sub, link, description, subscribers))

            elif not div_thing=='' and div_thing.name=='div' and 'nav-buttons' in div_thing['class']:
                # case when we find 'nav' button

                link_to_sub_reddits = div_thing.find('a')['href']
                break

            cnt = cnt + 1
            sleep(10)

        sleep(10)

Solution

  • One possible reason behind this can be that reddit may have been checking for user agent header. Since you are not adding any user agent header, reddit is flagging this as a request by bot and that's why you are getting the error. Trying adding user agent to request.