pythonweb-scrapingpyppeteer

Pyppeteer (python) - clink a tag and after scraping the page


I am new to Pyppeteer (Python) and I am trying to know how to (in order):

  1. log into the page
  2. clink a tag
  3. take the data from the tag which I have clinked

The website is 'https://quotes.toscrape.com/login'

I think I managed to solve the first part which is logging in. However, I have difficulties in the second and third.

Appreciate if someone can guide me via python examples on this. For example, clinking the Tags = 'inspirational' under the third quotes (by Einstein) and taking all the quotes from the 'inspirational' page.

import asyncio
import nest_asyncio
nest_asyncio.apply()
from pyppeteer import launch

username = 'AAA'
password = 'BBB'
 
async def main():
 #   browser = await launch(headless=False, args=['--user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'])
    browser = await launch(headless=False)
    page = await browser.newPage()
    await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36')
    await page.goto('https://quotes.toscrape.com/login',)
    
    await page.waitForSelector( '[id="username"]')
    await page.focus('[id="username"]')
    await page.keyboard.type(username)
    
    await page.waitForSelector( '[id="password"]')
    await page.focus('[id="password"]')
    await page.keyboard.type(password)
    
    await asyncio.wait([
    page.click('[type="submit"]'),
    page.waitForNavigation()])
    
    
    
asyncio.get_event_loop().run_until_complete(main())

Solution

  • Add this to main()

     page.click('span.tag-item:nth-child(3) > a:nth-child(1)')
     quotelist = page.JJ(".quote") #alias to querySelectorAll()
     quotetext = quotelist.JJeval('.text', '(nodes => nodes.map(n => n.innerText))')
     return quotetext
    

    I wrote this based on their docs here https://miyakogi.github.io/pyppeteer/reference.html#browser-class

    Of course JS is a much better language to work with webpages, so for more comlicated stuff I'd use JS based web scrapers