pythonselenium-webdriverweb-scrapingwebdriver

Extracting titles using selenium


I'm trying to scrape recipe titles from a website, Link using Selenium, but I’m encountering an issue where I can only extract some of the titles, while others return empty strings.

I’m using the following code snippet to retrieve the titles:

page_url = f'https://www.allrecipes.com/search?{keyword}={keyword}&offset={nb}&q={keyword}'.format(keyword=keyword, nb=nb)

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

driver.get(page_url)

titles =  [element.get_attribute('data-tag') for element in driver.find_elements(By.CLASS_NAME, "card__content ")]
recipe_links = [element.get_attribute('href') for element in driver.find_elements(By.CSS_SELECTOR, 'a.comp.mntl-card-list-items.mntl-document-card.mntl-card.card.card--no-image')]

print(titles,recipe_links)
driver.quit()

While this successfully extracts all recipe links and 2 first titles, some titles are returning empty strings.

when I tried this code:

titles = driver.find_elements(By.XPATH, "//span[@class='card__title']")
for title in titles:
    print(title.get_attribute('outerHTML'))

This displayed the elements of the page, including the titles correctly:

<span class="card__title">
    <span class="card__title-text ">Chicken Makhani (Indian Butter Chicken)</span>
</span>
title:  
<span class="card__title-text ">Chicken Makhani (Indian Butter Chicken)</span>
...
  1. Why am I getting empty strings for certain titles?
  2. How can I ensure that I can retrieve, from the first page, all titles correctly?

Solution

  • Main issue in my opinion is the onetrust popup, that blocks the rest of the content and should be closed before.

    WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[id="onetrust-reject-all-handler"]'))).click()
    

    Also try to change your strategy to select elements and collecting information to avoid several lists/iterations and get all information in one go. Check the selection of the card and extraction of child elements.

    Example:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    
    url = f'https://www.allrecipes.com/search?chicken=chicken&offset=0&q=chicken'
    driver.get(url)
    
    WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[id="onetrust-reject-all-handler"]'))).click()
    
    data = []
    
    for e in driver.find_elements(By.CSS_SELECTOR,'a[id^="mntl-card-list-items"]'):
        data.append(
            {
                'title' : e.find_element(By.CSS_SELECTOR,'.card__title-text').text,
                'url' : e.get_attribute('href')
            }
        )
    
    print(data)
    

    Output of our generated dict:

    [{'title': 'Chicken Makhani (Indian Butter Chicken)', 'url': 'https://www.allrecipes.com/recipe/45957/chicken-makhani-indian-butter-chicken/'}, {'title': 'Chicken Arroz Caldo (Chicken Rice Porridge)', 'url': 'https://www.allrecipes.com/recipe/212940/chicken-arroz-caldo-chicken-rice-porridge/'}, {'title': 'Garlic Chicken Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/86047/garlic-chicken-fried-chicken/'}, {'title': 'Chicken Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/16573/chicken-fried-chicken/'}, {'title': 'Chicken Enchiladas with Cream of Chicken Soup', 'url': 'https://www.allrecipes.com/recipe/22737/chicken-enchiladas-v/'}, {'title': 'Makhani Chicken (Indian Butter Chicken)', 'url': 'https://www.allrecipes.com/recipe/24782/makhani-chicken-indian-butter-chicken/'}, {'title': 'Simple Baked Chicken Breasts', 'url': 'https://www.allrecipes.com/recipe/240208/simple-baked-chicken-breasts/'}, {'title': 'Best Chicken Salad', 'url': 'https://www.allrecipes.com/recipe/8499/basic-chicken-salad/'}, {'title': 'Crispy Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/8805/crispy-fried-chicken/'}, {'title': "Chef John's Nashville Hot Chicken", 'url': 'https://www.allrecipes.com/recipe/254804/chef-johns-nashville-hot-chicken/'}, {'title': 'Chicken Parmesan', 'url': 'https://www.allrecipes.com/recipe/223042/chicken-parmesan/'}, {'title': 'Juicy Roasted Chicken', 'url': 'https://www.allrecipes.com/recipe/83557/juicy-roasted-chicken/'}, {'title': 'Baked Chicken Schnitzel', 'url': 'https://www.allrecipes.com/recipe/244950/baked-chicken-schnitzel/'}, {'title': 'Rotisserie Chicken', 'url': 'https://www.allrecipes.com/recipe/93168/rotisserie-chicken/'}, {'title': 'Quick and Easy Chicken Noodle Soup', 'url': 'https://www.allrecipes.com/recipe/26460/quick-and-easy-chicken-noodle-soup/'}, {'title': "General Tso's Chicken", 'url': 'https://www.allrecipes.com/recipe/91499/general-tsaos-chicken-ii/'}, {'title': 'Baked Teriyaki Chicken', 'url': 'https://www.allrecipes.com/recipe/9023/baked-teriyaki-chicken/'}, {'title': 'Buffalo Chicken Dip', 'url': 'https://www.allrecipes.com/recipe/68461/buffalo-chicken-dip/'}, {'title': 'Chicken Cordon Bleu', 'url': 'https://www.allrecipes.com/recipe/8495/chicken-cordon-bleu-i/'}, {'title': 'Southern Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/8635/southern-fried-chicken/'}, {'title': "Chef John's Buttermilk Fried Chicken", 'url': 'https://www.allrecipes.com/recipe/220128/chef-johns-buttermilk-fried-chicken/'}, {'title': 'Yummy Honey Chicken Kabobs', 'url': 'https://www.allrecipes.com/recipe/8626/yummy-honey-chicken-kabobs/'}, {'title': 'Broccoli Chicken Casserole', 'url': 'https://www.allrecipes.com/recipe/8965/broccoli-chicken-casserole-i/'}, {'title': 'Best Chicken Marinade', 'url': 'https://www.allrecipes.com/recipe/83793/best-chicken-marinade/'}]