pythonsqlitesqlalchemyscrapypython-dataset

Database insertion fails without error with scrapy


enter image description here

I'm working with scrapy and dataset (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data) which is a layer on top of sqlalchemy , trying to load data into a sqllite table as a follow up to Sqlalchemy : Dynamically create table from Scrapy item.

using the dataset package I have:

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):

        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        db = dataset.connect(db_path)
        self.table = db[table_name].table


    def process_item(self, item, spider):

        try:
            print('TEST DATASET..')
            self.table.insert(dict(name='John Doe', age=46, country='China'))
            print('INSERTED')
        except IntegrityError:
                print('THIS IS A DUP')
        return item

after running my spider I see the print statements printed out in the try except block, with no errors, but after completion , I look in the table and see the screenshot. No data is in the table. What am I doing wrong?


Solution

  • The code you posted is not working as is for me:

    TypeError: __init__() takes exactly 2 arguments (1 given)
    

    That's because the __init__ method expects a table_name argument which is not being passed. You need to implement the from_crawler class method in the pipeline object, something like:

    @classmethod
    def from_crawler(cls, crawler):
        return cls(table_name=crawler.spider.name)
    

    That would create a pipeline object using the spider name as table name, you can of course use any name you want.

    Also, the line self.table = db[table_name].table should be replaced by self.table = db[table_name] (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data)

    After that, the data is stored: enter image description here