pythonscrapyiron.io

Iron worker and scrapy


I am trying to create an iron.io worker using scrapy.

According to iron.io we need to place all the dependencies for the code in the worker itself.

I have created a folder called module which will have all the 3rd party modules and installed scrapy via pip.

pip install scrapy -t module/

When trying to run scrapy via python module/scrapy/__init__.py I am getting

Traceback (most recent call last):
  File "module/scrapy/__init__.py", line 10, in <module>
    __version__ = pkgutil.get_data(__package__, 'VERSION').decode('ascii').strip()
  File "/usr/lib/python2.7/pkgutil.py", line 578, in get_data
    loader = get_loader(package)
  File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/usr/lib/python2.7/pkgutil.py", line 424, in iter_importers
    if fullname.startswith('.'):
AttributeError: 'NoneType' object has no attribute 'startswith'

Solution

  • You'd probably be better off using Scrapy from your IronWorker code rather than calling it from the command line, just like it has on the front page of http://scrapy.org/ or in the tutorial: http://doc.scrapy.org/en/0.24/intro/tutorial.html

    To use this in IronWorker, after you've done the pip install, be sure to add:

    pip 'scrapy' 
    

    to your .worker file. Then in your worker script, you'd import it:

    import scrapy
    

    Then use it like it says in the tutorial link above.