pythonhtmlscrapyaliexpress

Cannot scrape AliExpress HTML element


I would like to scrape an arbitrary offer from aliexpress. Im trying to use scrapy and selenium. The issue I face is that when I use chrome and do right click > inspect on a element I see the real HTML but when I do right click > view source I see something different - a HTML CSS and JS mess all around.

As far as I understand the content is pulled asynchronously? I guess this is the reason why I cant find the element I am looking for on the page.

I was trying to use selenium to load the page first and then get the content I want but failed. I'm trying to scroll down to get to reviews section and get its content

Is this some advanced anti-bot solution that they have or maybe my approach is wrong?

The code that I currently have:

import scrapy
from selenium import webdriver
import logging
import time

logging.getLogger('scrapy').setLevel(logging.WARNING)


class MySpider(scrapy.Spider):
    name = 'myspider'
    
    start_urls = ['https://pl.aliexpress.com/item/32998115046.html']

    def __init__(self):
        self.driver = webdriver.Chrome()

    def parse(self, response):
        self.driver.get(response.url)

        scroll_retries = 20
        data = ''
        while scroll_retries > 0:
            try:
                data = self.driver.find_element_by_class_name('feedback-list-wrap')
                scroll_retries = 0
            except:
                self.scroll_down(500)
                scroll_retries -= 1

        print("----------")
        print(data)
        print("----------")
        self.driver.close()

    def scroll_down(self, pixels):
        self.driver.execute_script("window.scrollTo(0, {});".format(pixels))
        time.sleep(2)

Solution

  • By watching requests in network tab in inspect tool of browser you will find out comments are comming from here so you can crawl this page instead.