[SOLVED] Iron worker and scrapy

Iron worker and scrapy

I am trying to create an iron.io worker using scrapy.

According to iron.io we need to place all the dependencies for the code in the worker itself.

I have created a folder called module which will have all the 3rd party modules and installed scrapy via pip.

pip install scrapy -t module/

When trying to run scrapy via python module/scrapy/__init__.py I am getting

Traceback (most recent call last):
  File "module/scrapy/__init__.py", line 10, in <module>
    __version__ = pkgutil.get_data(__package__, 'VERSION').decode('ascii').strip()
  File "/usr/lib/python2.7/pkgutil.py", line 578, in get_data
    loader = get_loader(package)
  File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/usr/lib/python2.7/pkgutil.py", line 424, in iter_importers
    if fullname.startswith('.'):
AttributeError: 'NoneType' object has no attribute 'startswith'

Solution

You'd probably be better off using Scrapy from your IronWorker code rather than calling it from the command line, just like it has on the front page of http://scrapy.org/ or in the tutorial: http://doc.scrapy.org/en/0.24/intro/tutorial.html

To use this in IronWorker, after you've done the pip install, be sure to add:

pip 'scrapy'

to your .worker file. Then in your worker script, you'd import it:

import scrapy

Then use it like it says in the tutorial link above.