Html code from python requests library:
<p class="text-muted ">
<span class="certificate">12</span>
<span class="ghost">|</span>
<span class="runtime">192 min</span>
<span class="ghost">|</span>
<span class="genre">Action, Adventure, Fantasy</span>
</p>
Code:
import requests
base_url = "https://www.imdb.com"
search_url = base_url + "/search/title/?"
params = {
"title_type": "feature",
"release_date": "2022-01-01,2022-12-31", # Movies released in the past 1 year
"start": 1 # Starting page number
}
# Send GET request to IMDb search page
# response = urllib.request.urlopen(search_url + urllib.parse.urlencode(params))
response = requests.get(search_url, params=params)
print((response.text))
How to get the exact html code?
I have tried urllib.request
with no help.
Try to set Accept-Language
HTTP header to en-US
:
import requests
from bs4 import BeautifulSoup
base_url = "https://www.imdb.com"
search_url = base_url + "/search/title/"
params = {
"title_type": "feature",
"release_date": "2022-01-01,2022-12-31", # Movies released in the past 1 year
"start": 1,
}
headers = {
'Accept-Language': 'en-US,en;q=0.5'
}
response = requests.get(search_url, params=params, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
for title in soup.select('h3')[:10]:
print(f"{title.get_text(strip=True, separator=' '):<60} {title.find_next(class_='certificate').text:<10}")
Prints:
1. Avatar: The Way of Water (2022) PG-13
2. The Blackening (2022) R
3. X (II) (2022) R
4. Sisu (2022) R
5. A Man Called Otto (2022) PG-13
6. Top Gun: Maverick (2022) PG-13
7. Chevalier (2022) PG-13
8. The Batman (2022) PG-13
9. Sanctuary (I) (2022) R
10. Everything Everywhere All at Once (2022) R
For example for de-DE
header I get:
headers = {
'Accept-Language': 'de-DE,de;q=0.5'
}
...
Prints:
1. Avatar: The Way of Water (2022) 12
2. The Blackening (2022) R
3. X (II) (2022) 16
4. Sisu: Rache ist süss (2022) 18
5. Ein Mann namens Otto (2022) 12
6. Top Gun: Maverick (2022) 12
7. Chevalier (2022) PG-13
8. The Batman (2022) 12
9. Sanctuary (I) (2022) R
10. Everything Everywhere All at Once (2022) 16