Using Panadas 2.2.3, sqlite3.version 2.6.0 and python 3.12.5, I get an error "table ... already exits" when using to_sql with if_exists='append'. I just try to append some data from a Pandas df to a SQLite DB table. Using if_exists='replace' produces the same result.
In order to make sure that the db connection is active and the columns match, I used some simple print statements in a first try block and the failing to.sql in a second try block. Also a "select statement" from the same table is used in the first block. The first block is executed without an exception and the second block throws the message 'table "groupedData" already exists': (See print('ERROR Try 2'))
Source code:
try:
print(db_conn)
print(table_grouped)
data = [x.keys() for x in db_conn.cursor().execute(f'select * from {table_grouped};').fetchall()]
print(data)
except Error as e:
print('ERROR Try 1')
print(e)
try:
print(df_grouped.head(5))
df_grouped.to_sql(table_grouped, db_conn, if_exists='append', index=False)
#if_exists : {‘fail’, ‘replace’, ‘append’}
db_conn.commit()
except Error as e:
print('ERROR Try 2')
print(e)
Output:
<sqlite3.Connection object at 0x000001C0E7C0EB60>
groupedData
[['CustomerID', 'TotalSalesValue', 'SalesDate']]
CustomerID TotalSalesValue SalesDate
0 12345 400.0 2020-02-01
1 12345 1050.0 2020-02-04
2 12345 10.0 2020-02-10
3 12345 200.0 2021-02-01
4 12345 50.0 2021-02-04
ERROR Try 2
table "groupedData" already exists
We can see what is happening by logging the SQL statements made by Pandas. This minimal example:
import sqlite3
from sqlite3 import Error
import pandas as pd
table_name = 'tbl'
df = pd.DataFrame([(1,)], columns=['a'])
with sqlite3.connect(':memory:') as conn:
# Log all (successful) SQL statements.
conn.set_trace_callback(print)
# Create table with differently cased name.
CREATE = """CREATE TABLE Tbl (a int)"""
conn.execute(CREATE)
print('*** Updating table ***')
try:
df.to_sql(table_name, conn, if_exists='append', index=False)
conn.commit()
except Error as e:
print(e)
Produces this output:
CREATE TABLE Tbl (a int)
*** Updating table ***
SELECT
name
FROM
sqlite_master
WHERE
type IN ('table', 'view')
AND name='tbl';
table "tbl" already exists
So we can see that when Pandas checks for the table's existence, it uses the exact name that is passed to to_sql()
, so the existing table is not found. However when Pandas attempts to create the table, SQLite will raise an error* if a table already exists with the same case-insensitive name, as we can see in the SQLite CLI:
sqlite> CREATE TABLE T (a int);
sqlite> CREATE TABLE t (a int);
Parse error: table t already exists
CREATE TABLE t (a int);
^--- error here
Arguably Pandas could check in a case-insensitive way, as described here, but equally it could be argued that it is the programmer's responsibility to use consistent names†.
* Pandas raises a ValueError
if it detects that a table already exists, however the code is trapping an SQLite Error
, so the exception isn't being raised by Pandas.
† In fact, it seems that this issue has been raised before, and the Pandas developers elected to not make any changes.