pythonpandassqlitepandas-to-sql

Error "table ... already exits" when using to_sql(...,if_exists='append') with Pandas and SQLite


Using Panadas 2.2.3, sqlite3.version 2.6.0 and python 3.12.5, I get an error "table ... already exits" when using to_sql with if_exists='append'. I just try to append some data from a Pandas df to a SQLite DB table. Using if_exists='replace' produces the same result.

In order to make sure that the db connection is active and the columns match, I used some simple print statements in a first try block and the failing to.sql in a second try block. Also a "select statement" from the same table is used in the first block. The first block is executed without an exception and the second block throws the message 'table "groupedData" already exists': (See print('ERROR Try 2'))

Source code:

try:
    print(db_conn)
    print(table_grouped)
    data = [x.keys() for x in db_conn.cursor().execute(f'select * from {table_grouped};').fetchall()]
    print(data)
except Error as e:
    print('ERROR Try 1')
    print(e)

try: 
    print(df_grouped.head(5))
        
    df_grouped.to_sql(table_grouped, db_conn, if_exists='append', index=False) 
    #if_exists : {‘fail’, ‘replace’, ‘append’}
    db_conn.commit()

except Error as e:
    print('ERROR Try 2')
    print(e)

Output:

<sqlite3.Connection object at 0x000001C0E7C0EB60>
groupedData
[['CustomerID', 'TotalSalesValue', 'SalesDate']]
   CustomerID  TotalSalesValue  SalesDate
0       12345            400.0 2020-02-01
1       12345           1050.0 2020-02-04
2       12345             10.0 2020-02-10
3       12345            200.0 2021-02-01
4       12345             50.0 2021-02-04
ERROR Try 2
table "groupedData" already exists

Solution

  • We can see what is happening by logging the SQL statements made by Pandas. This minimal example:

    import sqlite3
    from sqlite3 import Error
    import pandas as pd
    
    table_name = 'tbl'
    
    df = pd.DataFrame([(1,)], columns=['a'])
    
    with sqlite3.connect(':memory:') as conn:
        # Log all (successful) SQL statements.
        conn.set_trace_callback(print)
    
        # Create table with differently cased name.
        CREATE = """CREATE TABLE Tbl (a int)"""
        conn.execute(CREATE)
    
        print('*** Updating table ***')
        try:
            df.to_sql(table_name, conn, if_exists='append', index=False)
            conn.commit()
        except Error as e:
            print(e)
    

    Produces this output:

    CREATE TABLE Tbl (a int)
    *** Updating table ***
    
            SELECT
                name
            FROM
                sqlite_master
            WHERE
                type IN ('table', 'view')
                AND name='tbl';
    table "tbl" already exists
    

    So we can see that when Pandas checks for the table's existence, it uses the exact name that is passed to to_sql(), so the existing table is not found. However when Pandas attempts to create the table, SQLite will raise an error* if a table already exists with the same case-insensitive name, as we can see in the SQLite CLI:

    sqlite> CREATE TABLE T (a int);
    sqlite> CREATE TABLE t (a int);
    Parse error: table t already exists
      CREATE TABLE t (a int);
                   ^--- error here
    

    Arguably Pandas could check in a case-insensitive way, as described here, but equally it could be argued that it is the programmer's responsibility to use consistent names.


    * Pandas raises a ValueError if it detects that a table already exists, however the code is trapping an SQLite Error, so the exception isn't being raised by Pandas.

    In fact, it seems that this issue has been raised before, and the Pandas developers elected to not make any changes.