web-scrapinghtml-tablepuppeteerelementhref

use puppeteer to scrape paragraph inner text and image title from table td


table

I have a table with this structure. and i want to scrape from the td with class 'description' the image title and the text from paragraph. I have tried several ways with no luck. Please help me on this guys i am really stacked here.

I think my question is very clear but so far i have

 let descs = await page.evaluate(() => {
        let desc = Array.from(document.querySelectorAll('tr.even td.description p'))
        return desc.filter((p) => p.innerText !== "").map(p => p.innerText.replace((/  |\r\n|\n|\r/gm),""));
   });                                                                                                                              

With this code i am getting the paragraph text but how can i get the img title also?


Solution

  • By provided HTML structure i suggest to get td element and perform $$eval with mapping on it.

    Where texts is your function that was defined for p and title you are getting by querySelector with img[src] selector from td element.

    await page.waitForSelector('tr.even td.description');
    const data = await page.$$eval('tr.even td.description', tds =>
          tds.map(td => {
            return {
              texts: Array.from(td.querySelectorAll('p')).filter((p) => p.innerText !== "").map(p => p.innerText.replace((/  |\r\n|\n|\r/gm),"")),
              title: td.querySelector('img[src]')?.getAttribute('title'),
            }
          })
        );```