phplaravelweb-scrapingresque

Logic code in PHP/Laravel with Job Queue system


I'm making a web scraper using Laravel 3 and have a queue system resque.

Question: Where should I place the scraping logic code?

I currently have it in the controller function so I can test it by going to its url. This also allows recurring jobs using Cron, as resque does not allow recurring jobs. I will still need to retain this easy way of testing the scraping functions.

Attempt: Here's what I am thinking of, how will you organize your code for such purposes?

Worker Class

class ScraperWorker
{
    public function perform()
    {
        $url = $this->args['url']
        Scraper::do_scrape($url);
    }
}

Scraping Class

class Scraper
{
    public static function do_scrape($url) {
        //some scraping code
    }
}   

Controller Class

For quick testing, and for Cron jobs to hit

class Scraper_Controller extends Base_Controller {

    public function test_scrape($url) {
        Scraper::do_scrape($url);
    }
}

Solution

  • I think you are on the right track. One thing you could change is making the Scraper and its methods NOT static. That would make it no harder to use, but a LOT easier to unit test. This becomes especially important later, when the Scraper becomes more complex and needs configuration.

    PS. Check PHP-Spider: an extensible and configurable spider/scraper. It could save you a lot of work. Full disclosure: I wrote it.