javascriptnode.jspuppeteertelegraf

node js telegraf reply isn't working inside a function


i tried to make a scraping telegram bot using node and telegraf i did try replying message with the function but it replied the [promise object ] even though I used return in the function i tried replying inside if function and the bot didn't replied at all all the things i have tried are temporarily removed with comments so you can see them

here is the code :

import puppeteer from "puppeteer";
import { Telegraf } from "telegraf";

let input = "zesht"
// let textReply = "martike"
const bot = new Telegraf("bot token here")

const getData = async function() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(`https://fa.wikipedia.org/w/index.php?search=${input}&title=%D9%88%DB%8C%DA%98%D9%87%3A%D8%AC%D8%B3%D8%AA%D8%AC%D9%88&ns0=1` , { timeout : 0});

    
    const searchResult = await page.$(".mw-search-result-heading > a[data-serp-pos='0']" , {timeout : 0})
    // const maghaleLink = await searchResult.evaluate(el => el.href)
    
    if(searchResult == null){
       const maghale = await page.$('p');
       const textReply = await maghale.evaluate( el => el.textContent);
        // console.log(matn)
        // return matn
        // return await maghale.evaluate(el => el.textContent)
        bot.on("message" , async ctx=> { input = ctx.message.text , await ctx.reply(textReply)});
    }
    else{
        const maghaleLink = await searchResult.evaluate(el => el.href);
        await page.goto(maghaleLink);
        const maghale1 = await page.$('p');
        const textReply = await maghale1.evaluate( el => el.textContent);
        //  await maghale1.evaluate(el => el.textContent)
        // console.log(matn1)
        bot.on("message" , async ctx => { input = ctx.message.text ,await ctx.reply(textReply) }) ;
    }


    browser.close()
}

// const bot = new Telegraf("bot token here")

bot.start(ctx => ctx.reply("salam mashti"))
// bot.on("message" , async ctx => { await getData()} )
bot.on("message" , getData())
// bot.on("message" , async ctx=> { input = ctx.message.text; getData() ; await ctx.reply(textReply) })

bot.launch()

expected

replying with a text scraped from wikipedia

what happened

bot replies nothing or replies a text that is [promise object] or gives an error saying : unhandled error while processing message


Solution

  • This is a good example of the power of breaking the problem into small steps. If you isolate your Telegraf and Puppeteer code, each sub-task can be solved on its own more easily, then combined to complete the main specification.

    Start by writing a Telegraf-free scraping function that accepts simple inputs and outputs. You can use this as if it were a black box library, isolated from your Telegraf client. This approach is simpler to code, test and maintain versus trying to do everything in one go and winding up with confusing bugs and tightly coupled logic.

    I can't read "fa" so I used "en". You'll need to adapt this code a bit to your use case.

    import puppeteer from "puppeteer"; // 22.7.1
    
    const getData = async input => {
      const browser = await puppeteer.launch();
    
      try {
        const [page] = await browser.pages();
        await page.setJavaScriptEnabled(false);
        const url = `https://en.wikipedia.org/w/index.php?search=${input}`;
        await page.goto(url, {waitUntil: "domcontentloaded"});
        const notFound = await page.$(".mw-search-nonefound");
        const results = await page.$(".mw-search-result-heading");
        const disambiguation = await page.$("#disambigbox");
    
        if (notFound) {
          return await notFound.evaluate(el => el.textContent.trim());
        } else if (results) {
          const href = await results.evaluate(el => el.href);
          await page.goto(href, {waitUntil: "domcontentloaded"});
        } else if (disambiguation) {
          const href = await page.$eval(".mw-body-content a", el => el.href);
          await page.goto(href, {waitUntil: "domcontentloaded"});
        }
    
        return await page.$eval("p", el => el.textContent.trim());
      } finally {
        await browser.close();
      }
    };
    
    // test it:
    getData("foobar").then(data => console.log(data));
    getData("fbar").then(data => console.log(data));
    getData("asdjhasjkdhaskhdash").then(data => console.log(data));
    

    Now you have an async function that takes a string search term input and resolves to a string output, without Telegraf involved. Feel free to adjust this as necessary--I don't presume I've handled all the edge cases. I'd also probably rename it to searchWikipedia or something a bit more precise.

    Note that I'm avoiding timeout: 0 which can cause a difficult to debug infinite loop. Never use timeout: 0--provide a sensible value, catch the error and log and handle it so you can figure out what's going wrong. page.$ runs instantly and doesn't accept an option of {timeout} in any event.

    I've also omitted blocking resources, which will speed things up further and lighten the load on the server.

    Taking a step back, you can probably do this more easily and efficiently with fetch rather than Puppeteer, since the data is in the static HTML, or even use the Wikipedia API, but I'll leave that as an exercise.

    Now that the scraping function is tested and ready, you can forget about its inner workings and add Telegraf code to invoke it as a client:

    import puppeteer from "puppeteer";
    import {Telegraf} from "telegraf";
    
    const getData = async input => {
      // same as above -- you can even move this to a separate module and import it
    };
    
    const handleMessage = async ctx => {
      try {
        const replyText = await getData(ctx.message.text);
        await ctx.reply(replyText);
      } catch (err) {
        console.error(err);
        await ctx.reply("soemthing went wrong");
      }
    };
    
    const bot = new Telegraf("bot token here");
    bot.start(ctx => ctx.reply("salam mashti"));
    bot.on("message", handleMessage);
    bot.launch();
    

    Notice how I'm not calling the message handler function--I'm passing it as a value to be called by Telegraf. Also, I'm not re-registering the on("message" event handler after each message. The event handler should persist, so you only need to register it once.

    I don't have Telegraf, so the bottom code is a bit of a guess and may require a few tweaks. But due to the modular design, it should be easy to debug and verify without worrying about Puppeteer, which is now totally out of the picture.

    Disclosure: I'm the author of the linked blog posts.