I'm very new to web scrapping in python. I want to extract the movie name, release year, and ratings from the IMDB database. This is the website for IMBD with 250 movies and ratings https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I use the module, BeautifulSoup, and request. Here is my code
movies = bs.find('tbody',class_='lister-list').find_all('tr')
When I tried to extract the movie name, rating & year, I got the same attribute error for all of them.
<td class="title column">
<a href="/title/tt11564570/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=ea4e08e1-c8a3-47b5-ac3a-75026647c16e&pf_rd_r=BQWZRBFAM81S7K6ZBPJP&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1" title="Rian Johnson (dir.), Daniel Craig, Edward Norton">Glass Onion: une histoire à couteaux tirés</a>
<span class="secondary info">(2022)</span>
<div class="velocity">1
<span class="secondary info">(
<span class="global-sprite telemeter up"></span>
1)</span>
<td class="ratingColumn imdbRating">
<strong title="7,3 based on 207 962 user ratings">7,3</strong>strong text
title = movies.find('td',class_='titleColumn').a.text
rating = movies.find('td',class_='ratingColumn imdbRating').strong.text
year = movies.find('td',class_='titleColumn').span.text.strip('()')
AttributeError Traceback (most recent call last) <ipython-input-9-2363bafd916b> in <module> ----> 1 title = movies.find('td',class_='titleColumn').a.text 2 title
~\anaconda3\lib\site-packages\bs4\element.py in getattr(self, key) 2287 def getattr(self, key): 2288 """Raise a helpful exception to explain a common code fix.""" -> 2289 raise AttributeError( 2290 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key 2291 )
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Can someone help me to solve the problem? Thanks in advance!
To get the ResultSets as list, you can try the next example.
from bs4 import BeautifulSoup
import requests
import pandas as pd
data = []
res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
#print(res)
soup = BeautifulSoup(res.content, "html.parser")
for card in soup.select('.chart.full-width tbody tr'):
data.append({
"title": card.select_one('.titleColumn a').get_text(strip=True),
"year": card.select_one('.titleColumn span').text,
'rating': card.select_one('td[class="ratingColumn imdbRating"]').get_text(strip=True)
})
df = pd.DataFrame(data)
print(df)
#df.to_csv('out.csv', index=False)
Output:
title year rating
0 Avatar: The Way of Water (2022) 7.9
1 Glass Onion (2022) 7.2
2 The Menu (2022) 7.3
3 White Noise (2022) 5.8
4 The Pale Blue Eye (2022) 6.7
.. ... ... ...
95 Zoolander (2001) 6.5
96 Once Upon a Time in Hollywood (2019) 7.6
97 The Lord of the Rings: The Fellowship of the Ring (2001) 8.8
98 New Year's Eve (2011) 5.6
99 Spider-Man: No Way Home (2021) 8.2
[100 rows x 3 columns]
Update: To extract data using find_all and find
method.
from bs4 import BeautifulSoup
import requests
import pandas as pd
headers = {'User-Agent':'Mozilla/5.0'}
data = []
res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
#print(res)
soup = BeautifulSoup(res.content, "html.parser")
for card in soup.table.tbody.find_all("tr"):
data.append({
"title": card.find("td",class_="titleColumn").a.get_text(strip=True),
"year": card.find("td",class_="titleColumn").span.get_text(strip=True),
'rating': card.find('td',class_="ratingColumn imdbRating").get_text(strip=True)
})
df = pd.DataFrame(data)
print(df)