pythonsnowflake-cloud-data-platformpandas-to-sql

Python and Snowflake error on appending into existing table on the cloud


I am trying to upload a dataframe into an existing table in snowflake cloud. Here is the dataframe:

columns_df.head()

enter image description here

Now when using the to_sql() from pandas to append data into existing table:

columns_df.to_sql('survey_metadata_column_names', index=False,  index_label=None, con=conn, schema='PUBLIC', if_exists='append', chunksize=300)

I received the following error:

DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting

TypeError: not all arguments converted during string formatting

Some of the column names contains dashes and underscores.


Solution

  • From the snowflake documentation.

    To write data from a Pandas DataFrame to a Snowflake database, do one of the following:

    • Call the write_pandas() function.
    • Call the pandas.DataFrame.to_sql() method, and specify pd_writer as the method to use to insert the data into the database.

    Note the highlighted text in the second bullet. I have noticed several issues using both methods, but they are the official solutions.

    from snowflake.connector.pandas_tools import pd_writer
    columns_df.to_sql('survey_metadata_column_names', 
                     index = False,  
                     index_label = None, 
                     con = Engine, #Engine should be an SQLAlchemy engine 
                     schema = 'PUBLIC', 
                     if_exists = 'append', 
                     chunksize = 300,
                     method = pd_writer)
    

    Or alternatively

    from snowflake.connector.pandas_tools import write_pandas
    con = snowflake.connector.connect(...)
    success, nchunks, nrows, _ = write_pandas(con, 
                                              columns_df, 
                                              'survey_metadata_column_names', 
                                              chunk_size = 300, 
                                              schema = 'PUBLIC')
    

    Note that the first method requires an SQLAlchemy engine while the second one can use a regular connection.