pythonimdbimdbpy

503 error when downloading data from imdb api


I am trying to download a plot for almost 25 000 movies with the usage of imdbpy module for python. To speed up, I'm using Pool function from Multiprocessing module. However after almost 100 requests the 503 error occurs with a following message: Service Temporarily Unavailable. After 10-15 minutes I can process again but after approximately 20 requests the same error occurs again.

I am aware that it might be a simple block from the api to prevent too many calls however I can't find any info about maximum number of requests per time unit on the web.

Do you have any idea how to process so many calls without being shutdown? Moreover, do you know where I can find the documentation of imdb api?

Best


Solution

  • Please, don't do it.

    Scraping is forbidden by IMDb's terms of service, and IMDbPY was never intended to be used to mass-scrape the web site: in fact it's explicitly designed to fetch a single movie at a time.

    In theory IMDbPY can manage the plain text data files they distribute, but unfortunately they recently changed both the format and the content of the data.

    IMDb has no APIs that I know of; if you have to manage such a huge portion of their data, you have to get a licence.

    Please consider the use of http://www.omdbapi.com/