[SOLVED] Use scrapyd job id in scrapy pipelines

Use scrapyd job id in scrapy pipelines

I've implemented a web application that is triggering scrapy spiders using scrapyd API (web app and scrapyd are running on the same server).

My web application is storing job ids returned from scrapyd in DB. My spiders are storing items in DB.

Question is: how could I link in DB the job id issued by scrapyd and items issued by the crawl?

I could trigger my spider using an extra parameter - let say an ID generated by my web application - but I'm not sure it is the best solution. At the end, there is no need to create that ID if scrapyd issues it already...

Thanks for your help

Solution

The question should be phrased as "How can I get a job id of a scrapyd task in runtime?"

When scrapyd runs a spider it actually gives the spider the job id as an argument. Should be always as last argument of sys.args.

Also, os.environ['SCRAPY_JOB'] should do the trick.