I'm trying to scrape recipe titles from a website, Link using Selenium, but I’m encountering an issue where I can only extract some of the titles, while others return empty strings.
I’m using the following code snippet to retrieve the titles:
page_url = f'https://www.allrecipes.com/search?{keyword}={keyword}&offset={nb}&q={keyword}'.format(keyword=keyword, nb=nb)
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)
driver.get(page_url)
titles = [element.get_attribute('data-tag') for element in driver.find_elements(By.CLASS_NAME, "card__content ")]
recipe_links = [element.get_attribute('href') for element in driver.find_elements(By.CSS_SELECTOR, 'a.comp.mntl-card-list-items.mntl-document-card.mntl-card.card.card--no-image')]
print(titles,recipe_links)
driver.quit()
While this successfully extracts all recipe links and 2 first titles, some titles are returning empty strings.
when I tried this code:
titles = driver.find_elements(By.XPATH, "//span[@class='card__title']")
for title in titles:
print(title.get_attribute('outerHTML'))
This displayed the elements of the page, including the titles correctly:
<span class="card__title">
<span class="card__title-text ">Chicken Makhani (Indian Butter Chicken)</span>
</span>
title:
<span class="card__title-text ">Chicken Makhani (Indian Butter Chicken)</span>
...
Main issue in my opinion is the onetrust popup, that blocks the rest of the content and should be closed before.
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[id="onetrust-reject-all-handler"]'))).click()
Also try to change your strategy to select elements and collecting information to avoid several lists/iterations and get all information in one go. Check the selection of the card and extraction of child elements.
Example:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
url = f'https://www.allrecipes.com/search?chicken=chicken&offset=0&q=chicken'
driver.get(url)
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[id="onetrust-reject-all-handler"]'))).click()
data = []
for e in driver.find_elements(By.CSS_SELECTOR,'a[id^="mntl-card-list-items"]'):
data.append(
{
'title' : e.find_element(By.CSS_SELECTOR,'.card__title-text').text,
'url' : e.get_attribute('href')
}
)
print(data)
Output of our generated dict:
[{'title': 'Chicken Makhani (Indian Butter Chicken)', 'url': 'https://www.allrecipes.com/recipe/45957/chicken-makhani-indian-butter-chicken/'}, {'title': 'Chicken Arroz Caldo (Chicken Rice Porridge)', 'url': 'https://www.allrecipes.com/recipe/212940/chicken-arroz-caldo-chicken-rice-porridge/'}, {'title': 'Garlic Chicken Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/86047/garlic-chicken-fried-chicken/'}, {'title': 'Chicken Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/16573/chicken-fried-chicken/'}, {'title': 'Chicken Enchiladas with Cream of Chicken Soup', 'url': 'https://www.allrecipes.com/recipe/22737/chicken-enchiladas-v/'}, {'title': 'Makhani Chicken (Indian Butter Chicken)', 'url': 'https://www.allrecipes.com/recipe/24782/makhani-chicken-indian-butter-chicken/'}, {'title': 'Simple Baked Chicken Breasts', 'url': 'https://www.allrecipes.com/recipe/240208/simple-baked-chicken-breasts/'}, {'title': 'Best Chicken Salad', 'url': 'https://www.allrecipes.com/recipe/8499/basic-chicken-salad/'}, {'title': 'Crispy Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/8805/crispy-fried-chicken/'}, {'title': "Chef John's Nashville Hot Chicken", 'url': 'https://www.allrecipes.com/recipe/254804/chef-johns-nashville-hot-chicken/'}, {'title': 'Chicken Parmesan', 'url': 'https://www.allrecipes.com/recipe/223042/chicken-parmesan/'}, {'title': 'Juicy Roasted Chicken', 'url': 'https://www.allrecipes.com/recipe/83557/juicy-roasted-chicken/'}, {'title': 'Baked Chicken Schnitzel', 'url': 'https://www.allrecipes.com/recipe/244950/baked-chicken-schnitzel/'}, {'title': 'Rotisserie Chicken', 'url': 'https://www.allrecipes.com/recipe/93168/rotisserie-chicken/'}, {'title': 'Quick and Easy Chicken Noodle Soup', 'url': 'https://www.allrecipes.com/recipe/26460/quick-and-easy-chicken-noodle-soup/'}, {'title': "General Tso's Chicken", 'url': 'https://www.allrecipes.com/recipe/91499/general-tsaos-chicken-ii/'}, {'title': 'Baked Teriyaki Chicken', 'url': 'https://www.allrecipes.com/recipe/9023/baked-teriyaki-chicken/'}, {'title': 'Buffalo Chicken Dip', 'url': 'https://www.allrecipes.com/recipe/68461/buffalo-chicken-dip/'}, {'title': 'Chicken Cordon Bleu', 'url': 'https://www.allrecipes.com/recipe/8495/chicken-cordon-bleu-i/'}, {'title': 'Southern Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/8635/southern-fried-chicken/'}, {'title': "Chef John's Buttermilk Fried Chicken", 'url': 'https://www.allrecipes.com/recipe/220128/chef-johns-buttermilk-fried-chicken/'}, {'title': 'Yummy Honey Chicken Kabobs', 'url': 'https://www.allrecipes.com/recipe/8626/yummy-honey-chicken-kabobs/'}, {'title': 'Broccoli Chicken Casserole', 'url': 'https://www.allrecipes.com/recipe/8965/broccoli-chicken-casserole-i/'}, {'title': 'Best Chicken Marinade', 'url': 'https://www.allrecipes.com/recipe/83793/best-chicken-marinade/'}]