pythonbeautifulsoupfindallskip

How to skip a line when using Beautifulsoup find_all?


This is my code. It finds all car links without "https://" and domain name. However one of them is a full link with "https://...". How to write a code, which will skip this one result, to tell him don't mind the line with "https://" or any other text?

for page_number in range(1, 10):
    url = f"xyz{page_number}"
    page_number += 1
    req = requests.get(url)
    src = req.text
    soup = BeautifulSoup(src, "lxml")
    get_car_links = soup.find_all(class_="info-container")
    for i in get_car_links:
        car_links = i.find("a", class_="title")
        car_datas = (car_links.get("href"))
        print(car_datas) 

Solution

  • You can add an if condition to check and skip the case.

    from bs4 import BeautifulSoup
    import requests
    
    for page_number in range(1, 10):
        url = f"xyz{page_number}"
        page_number += 1
        req = requests.get(url)
        soup = BeautifulSoup(req.text, "lxml")
        
        get_car_links = soup.find_all(class_="info-container")
        for i in get_car_links:
            if  not 'http' in i.find('a', class_='title').get('href'):
                car_links = i.find("a", class_="title")
                car_datas = car_links.get("href")
                print(car_datas)