I'm making a web scraper using Laravel 3 and have a queue system resque
.
Question: Where should I place the scraping logic code?
In the worker/job class?
In a library class that is called statically by the worker/job class?
In a controller function and have the worker/job class trigger the controller function?
I currently have it in the controller function so I can test it by going to its url. This also allows recurring jobs using Cron, as resque
does not allow recurring jobs. I will still need to retain this easy way of testing the scraping functions.
Attempt: Here's what I am thinking of, how will you organize your code for such purposes?
Worker Class
class ScraperWorker
{
public function perform()
{
$url = $this->args['url']
Scraper::do_scrape($url);
}
}
Scraping Class
class Scraper
{
public static function do_scrape($url) {
//some scraping code
}
}
Controller Class
For quick testing, and for Cron jobs to hit
class Scraper_Controller extends Base_Controller {
public function test_scrape($url) {
Scraper::do_scrape($url);
}
}
I think you are on the right track. One thing you could change is making the Scraper and its methods NOT static. That would make it no harder to use, but a LOT easier to unit test. This becomes especially important later, when the Scraper becomes more complex and needs configuration.
PS. Check PHP-Spider: an extensible and configurable spider/scraper. It could save you a lot of work. Full disclosure: I wrote it.