pythonweb-scrapingimdbimdbpy

Is there a way to extract IMDb reviews using IMDbPY?


I do not need the data-set, that's available in Kaggle . I want to extract a movie review from IMDb using IMDbPY or any other scraping method .

https://imdbpy.github.io/


Solution

  • While it is not obvious from the imdbpy docs. You can always check the attributes of variable by checking the keys of the variables. Not all information that you are looking for is not immediately available when you scrape a movie using imdbpy. In your case you want to get the reviews. So you have to add them. We can see in the infoset, that there are three different types of reviews; 'reviews', 'external reviews', and 'critic reviews'. The keys that are associated with these are not added yet. The example below shows how it is done.

    from imdb import IMDb
    
    # create an instance of the IMDb class
    ia = IMDb()
    
    the_matrix = ia.get_movie('0133093')
    print(sorted(the_matrix.keys()))
    
    # show all information sets that can be fetched for a movie
    print(ia.get_movie_infoset()) #Information we can add. Keys will be added
    ia.update(the_matrix, ['external reviews'])
    ia.update(the_matrix, ['reviews'])
    ia.update(the_matrix, ['critic reviews'])
    # show which keys were added by the information set
    print(the_matrix.infoset2keys['external reviews']) #no external reviews, so no key is added
    print(the_matrix.infoset2keys['reviews']) # A lot of reviews. Adds key: 'reviews'
    print(the_matrix.infoset2keys['critic reviews']) #Adds the keys: 'metascore', and 'metacritic url'
    # print(the_matrix['reviews'])
    print(sorted(the_matrix.keys())) #Check out the new keys that we have added