htmlweb-scrapingbeautifulsouphyperlinkdata-collection

Scrape image url


I am trying to scrape image source links using beautiful soup from the amazon but not getting the right output, link from where I am scraping is : https://www.amazon.in/s?bbn=1389401031&rh=n%3A1389401031%2Cp_36%3A1318505031&dc&qid=1622460176&rnid=1318502031&ref=lp_1389401031_nr_p_36_2

below is the code:

base_url = requests.get("https://www.amazon.in/mobile-phones/b/?ie=UTF8&node=1389401031&ref_=nav_cs_mobiles_9292c6cb7b394d30b2467b8f631090a7")  

base_url

soup = BeautifulSoup(base_url.content,'html.parser')

search_url = soup.find_all("span",class_="a-list-item")

search_url

urls = []

abz = []

for i in search_url:

    for j in i.find_all("a"):

        urls.append(j["href"])


urls

lst = [x for x in urls if "%E2%82%" in x]

links_to_scrap = lst[2:4]

links_to_scrap

img_links = []


for url in links_to_scrap:

    pname = requests.get("https://www.amazon.in/mobile-phones/b/ie=UTF8&node=1389401031&ref_=nav_cs_mobiles_9292c6cb7b394d30b2467b8f631090a7"+url)

    soupp = BeautifulSoup(pname.content,'html.parser')

    image = soupp.find_all("div",class_="a-section aok-relative s-image-wide-3-2-aspect")

    for i in image:

        for j in i.find_all("img"):

            img_links.append(j["src"])
img_links

Solution

  • To get image URLs from this Amazon page you can use this example:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = "https://www.amazon.in/s?bbn=1389401031&rh=n%3A1389401031%2Cp_36%3A1318505031&dc&qid=1622460176&rnid=1318502031&ref=lp_1389401031_nr_p_36_2"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
    }
    
    soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
    
    for img in soup.select(".s-image"):
        print(img["src"])
    

    Prints:

    https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71A9Vo1BatL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71jG5HwkQQS._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71GQUxuSpnS._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/710jkZNub3S._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/716nHhG9SWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/713asoeJn7S._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71jG5HwkQQS._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/618MEYCaUQL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71A9Vo1BatL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71hEzQGO5qL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71sxlhYhKWL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/51UUJpcldDL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/81WVehzY2+L._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71nrZHQMZ7L._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/71U2SiHgbiL._AC_UL320_.jpg
    https://m.media-amazon.com/images/I/41QsvcpKaZL._AC_UL320_.jpg