phpsymfonyweb-scrapingweb-crawlersymfony-panther

In Symfony/Panther when scraping, waitfor function will throw exception if it timesout - i need it to continue if item is not found


I have a database of clinics, and an url to each clinic. All clinic pages are the same in terms of html/css, with different content to scrape.

However, some clinics have no content on their page, and this causes trouble for me.

I have:

$crawler = $this->client->request('GET', $clinic->url);
$this->client->waitFor('.facility');

If .facility is not present, the waitFor() will throw exception because of timeout. I need to be able to continue in that case, and not throw an exception. So if it times out, it should continue and not end.

I cannot count the facility items and check it that way, since these are loaded with ajax and are not present at the start of page load.

What I have tried and researched:

Is it possible for symfony/panther to wait for some elements n times?

HowTo Wait - PHPWebDriver


Solution

  • You could just catch the exception, like this...

    try
    {
        $this->client->waitFor('.facility');
    }
    catch (TimeoutException $e)
    {
        // Log something here that it was skipped by a timeout...
        // PHP will continue    
    }
    

    At the top of your class you may need to add (That's what the code looks like it is using.):

    use Facebook\WebDriver\Exception\TimeoutException;
    

    Also note that the function has other parameters that could be useful:

    /**
     * @param string $locator The path to an element to be waited for. Can be a CSS selector or Xpath expression.
     *
     * @throws NoSuchElementException
     * @throws TimeoutException
     */
    public function waitFor(string $locator, int $timeoutInSecond = 30, int $intervalInMillisecond = 250): PantherCrawler
    {
        ....