pythonseleniumparsing

How to crawl question and answer of Google People Also Ask with Selenium and Python for a quantity that is more than the default output of Google?


I found a good solution, but it works on the number of questions and answers that Google gives by default, but for example I need more.

I am a novice developer on Python. How do I get more questions and answers? Do I have to implement a click first to disclose the required amount and then parse?


Solution

  • The following code parse the questions appearing on screen, then asks if you want to parse more questions or not. If you enter y then it clicks on the last question's button so that more are loaded in the page. The questions are stored in the list questions, the answers in the list answers

    import time
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service
    
    your_path = '...'
    driver = webdriver.Chrome(service=Service(your_path))
    
    driver.get('https://www.google.com/search?q=How%20to%20make%20bakery%3F&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How%20to%20make%20bakery%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz')
    
    questions, answers = [], []
    while 1:
        for idx,question in enumerate(driver.find_elements(By.CSS_SELECTOR, "div[id*='RELATED_QUESTION']")):
            if idx >= len(questions): # skip already parsed questions
                questions.append(question.text)
                txt = ''
                for answer in question.find_elements(By.CSS_SELECTOR, "div[id*='WEB_ANSWERS_RESULT']"):
                    txt += answer.get_attribute('innerText')
                answers.append(txt)
        inp = input(f'{idx+1} questions parsed, continue? (y/n)')
        if inp == 'y':
            question.click()
            time.sleep(2)
        else:
            break