[SOLVED] Beatifulsoup not returning full html of the page

Beatifulsoup not returning full html of the page

I want to scrape few pages from amazon website like title,url,aisn and i run into a problem that script only parsing 15 products while on the page it is showing 50. i decided to print out all html to console and i saw that the html is ending at 15 products without any errors from the script. Here is the part of my script

keyword = "men jeans".replace(' ', '+')

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5'}
url = "https://www.amazon.com/s/field-keywords={}".format(keyword)

request = requests.session()
req = request.get(url, headers = headers)
sleep(3)
soup = BeautifulSoup(req.content, 'html.parser')
print(soup)

Solution

It's because few of the items are generated dynamically. There might be any better solution other than using selenium. However, as a workaround you can try the below way instead.

from selenium import webdriver
from bs4 import BeautifulSoup

def fetch_item(driver,keyword):
    driver.get(url.format(keyword.replace(" ", "+")))
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    for items in soup.select("[id^='result_']"):
        try:
            name = items.select_one("h2").text
        except AttributeError: name = ""
        print(name)

if __name__ == '__main__':
    url = "https://www.amazon.com/s/field-keywords={}"
    driver = webdriver.Chrome()
    try:
        fetch_item(driver,"men jeans")
    finally:
        driver.quit()

Upon running the above script you should get 56 names or something as result.