I'm trying to write a script that will automatically parse product cards. I managed to cope with one page. How can I make the script automatically go to another page?
I've seen several answers where selenium helped. But I couldn't figure it out.
Here's the code:
import random
import string
import csv
import requests
from bs4 import BeautifulSoup
url = "https://game29.ru/products?category=926"
response = requests.get(url)
html = response.text
multi_class = {'class': ['row'], 'style': 'border: 2px solid #898989;border-radius: 7px;padding: 2px;margin-top: -2px;'}
soup = BeautifulSoup(html, "html.parser")
products = soup.find_all("div", {"class":"row"})
identifaer = "".join([random.choice(string.ascii_letters + string.digits) for n in range(32)])
ad_status = "Free"
category = "Игры, приставки и программы"
goods_type = "Игры для приставок"
ad_type = "Продаю своё"
adress = ""
discription = ""
condition = "Новое"
data_begin = "2024-04-03"
data_end = "2024-05-03"
allow_email = "Нет"
contact_phone = ""
contact_method = "По телефону и в сообщениях"
all_products = []
for product in products:
if product.attrs == multi_class:
identifaer
image ="https://www.game29.ru" + product.find("img")["src"]
if image != "https://game29.ru/zaglushka.png":
title = product.find("div", {"class":"cart-item-name"}).text
price = product.find("div", {"class": "cart-item-price"}).text.strip().replace("руб.", "")
all_products.append([identifaer, ad_status, category, goods_type, ad_type, adress, title, discription, condition, price, data_begin, data_end, allow_email, contact_phone, image, contact_method])
# names = ["Id", "AdStatus", "Category", "GoodsType", "Adtype", "Adress", "Title", "Discription", "Condition", "Price", "DataBegin", "DataEnd", "AllowEmail", "ContactPhone","ImageUrls", "ContactMethod"]
with open("data.csv", "a", newline='') as csv.file:
writer = csv.writer(csv.file, delimiter=',')
# writer.writerow(names)
for product in all_products:
writer.writerow(product)
I really thought selenium would help me. And I think that the answer is there, but unfortunately, I don't understand it yet, and I don’t have much time. I would be glad if you could help me.
If you look at the website you can see that clicking any page number modifies the URL to include a page=
attribute. For example page 2 is accessed via the address https://game29.ru/products?page=2&category=926. So you should create a function that processes each page, and then call that from a loop that increments the page number. Something like:
def parser(url):
# add the beautiful soup and parsing code here
# return True or False to indicat that the page was processed
# The main loop is something like
page_number = 1
while True:
url = F'https://game29.ru/products?page={page_number}&category=926'
if parser(url) == False:
break # stop processing
page_number += 1 # go to the next page