pythondatabaseparsingauto

Auto-parse on python


I'm trying to write a script that will automatically parse product cards. I managed to cope with one page. How can I make the script automatically go to another page?

I've seen several answers where selenium helped. But I couldn't figure it out.

Here's the code:

import random
import string
import csv 
import requests
from bs4 import BeautifulSoup

url = "https://game29.ru/products?category=926"

response = requests.get(url)

html = response.text

multi_class = {'class': ['row'], 'style': 'border: 2px solid #898989;border-radius: 7px;padding: 2px;margin-top: -2px;'}

soup = BeautifulSoup(html, "html.parser")

products = soup.find_all("div", {"class":"row"})

identifaer = "".join([random.choice(string.ascii_letters + string.digits) for n in range(32)])
ad_status = "Free"
category = "Игры, приставки и программы"
goods_type = "Игры для приставок"
ad_type = "Продаю своё"
adress = ""
discription = ""
condition = "Новое"
data_begin = "2024-04-03"
data_end = "2024-05-03" 
allow_email = "Нет"
contact_phone = ""
contact_method = "По телефону и в сообщениях"

all_products = []

for product in products:
    if product.attrs == multi_class:
        identifaer
        image ="https://www.game29.ru" + product.find("img")["src"]
        if image != "https://game29.ru/zaglushka.png":
            title = product.find("div", {"class":"cart-item-name"}).text
            price = product.find("div", {"class": "cart-item-price"}).text.strip().replace("руб.", "")
            all_products.append([identifaer, ad_status, category, goods_type, ad_type, adress, title, discription, condition, price, data_begin, data_end, allow_email, contact_phone, image, contact_method])

# names = ["Id", "AdStatus", "Category", "GoodsType", "Adtype", "Adress", "Title", "Discription", "Condition", "Price", "DataBegin", "DataEnd", "AllowEmail", "ContactPhone","ImageUrls", "ContactMethod"]

with open("data.csv", "a", newline='') as csv.file:
    writer = csv.writer(csv.file, delimiter=',')
    # writer.writerow(names)
    
    for product in all_products:
        writer.writerow(product)

I really thought selenium would help me. And I think that the answer is there, but unfortunately, I don't understand it yet, and I don’t have much time. I would be glad if you could help me.


Solution

  • If you look at the website you can see that clicking any page number modifies the URL to include a page= attribute. For example page 2 is accessed via the address https://game29.ru/products?page=2&category=926. So you should create a function that processes each page, and then call that from a loop that increments the page number. Something like:

    def parser(url):
        # add the beautiful soup and parsing code here
        # return True or False to indicat that the page was processed
    
    # The main loop is something like
    page_number = 1
    while True:
        url = F'https://game29.ru/products?page={page_number}&category=926'
        if parser(url) == False:
            break # stop processing
        page_number += 1 # go to the next page