web-scrapingbeautifulsoupimdb

How to extract title name and rating of a movie from IMDB database?


I'm very new to web scrapping in python. I want to extract the movie name, release year, and ratings from the IMDB database. This is the website for IMBD with 250 movies and ratings https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I use the module, BeautifulSoup, and request. Here is my code

movies = bs.find('tbody',class_='lister-list').find_all('tr')

When I tried to extract the movie name, rating & year, I got the same attribute error for all of them.

<td class="title column">
 <a href="/title/tt11564570/?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=ea4e08e1-c8a3-47b5-ac3a-75026647c16e&amp;pf_rd_r=BQWZRBFAM81S7K6ZBPJP&amp;pf_rd_s=center-1&amp;pf_rd_t=15506&amp;pf_rd_i=moviemeter&amp;ref_=chtmvm_tt_1" title="Rian Johnson (dir.), Daniel Craig, Edward Norton">Glass Onion: une histoire à couteaux tirés</a>
 <span class="secondary info">(2022)</span>
 <div class="velocity">1
 <span class="secondary info">(
 <span class="global-sprite telemeter up"></span>
 1)</span>

 <td class="ratingColumn imdbRating">
 <strong title="7,3 based on 207 962 user ratings">7,3</strong>strong text


title = movies.find('td',class_='titleColumn').a.text
rating = movies.find('td',class_='ratingColumn imdbRating').strong.text
year = movies.find('td',class_='titleColumn').span.text.strip('()')

AttributeError Traceback (most recent call last) <ipython-input-9-2363bafd916b> in <module> ----> 1 title = movies.find('td',class_='titleColumn').a.text 2 title

~\anaconda3\lib\site-packages\bs4\element.py in getattr(self, key) 2287 def getattr(self, key): 2288 """Raise a helpful exception to explain a common code fix.""" -> 2289 raise AttributeError( 2290 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key 2291 )

AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Can someone help me to solve the problem? Thanks in advance!


Solution

  • To get the ResultSets as list, you can try the next example.

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    
    data = []
    
    res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
    #print(res)
    soup = BeautifulSoup(res.content, "html.parser")
    
    for card in soup.select('.chart.full-width tbody tr'):
        data.append({
            "title": card.select_one('.titleColumn a').get_text(strip=True),
            "year": card.select_one('.titleColumn span').text,
            'rating': card.select_one('td[class="ratingColumn imdbRating"]').get_text(strip=True)
                })
    
    df = pd.DataFrame(data)
    print(df)
    #df.to_csv('out.csv', index=False)
    

    Output:

                                                title       year rating
    0                            Avatar: The Way of Water  (2022)    7.9
    1                                         Glass Onion  (2022)    7.2
    2                                            The Menu  (2022)    7.3
    3                                         White Noise  (2022)    5.8
    4                                   The Pale Blue Eye  (2022)    6.7
    ..                                                ...     ...    ...
    95                                          Zoolander  (2001)    6.5
    96                      Once Upon a Time in Hollywood  (2019)    7.6
    97  The Lord of the Rings: The Fellowship of the Ring  (2001)    8.8
    98                                     New Year's Eve  (2011)    5.6
    99                            Spider-Man: No Way Home  (2021)    8.2
    
    [100 rows x 3 columns]
    

    Update: To extract data using find_all and find method.

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    headers = {'User-Agent':'Mozilla/5.0'}
    
    data = []
    
    res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
    #print(res)
    soup = BeautifulSoup(res.content, "html.parser")
    
    for card in soup.table.tbody.find_all("tr"):
        data.append({
            "title": card.find("td",class_="titleColumn").a.get_text(strip=True),
            "year": card.find("td",class_="titleColumn").span.get_text(strip=True),
            'rating': card.find('td',class_="ratingColumn imdbRating").get_text(strip=True)
                })
    
    df = pd.DataFrame(data)
    print(df)