I'm using pandas to read an XLSB file into a data frame before using 'to_sql't to push it to my SQLite Database. Engine is using pyxlsb and not an issue.
My issue is that my XLSB column headers (row 1) have a mixture of spaces before and after the column name. This means that the database also captures the errors in the create statement.
CREATE TABLE "EXAMPLE" (
"NAME " TEXT,
" AGE" TEXT,
"POST CODE " TEXT
)
So, do I need to fix the issue before I pd.read_excel? Can I 'to_sql' without using data frame headers? Can I 'to_sql' where the data frame headers don't match the table in the DB by doing a Drop & Insert?
Any guidance would be much appreciated; sadly, fixing the XLSB at source is not an option.
You can remove unnecessary whitespaces from your column headers after you read the data in from excel using strip()
:
df = pd.DataFrame({
'Name ': [],
' Age ': [],
'Post Code ':[]
})
df = df.rename(columns=lambda x: x.strip())
print(df.columns)
['Name', 'Age', 'Post Code']
Then you can use to_sql()
after the data is cleaned up