[SOLVED] Remove Leading/Trialing spaces from header row in Excel using Python

Remove Leading/Trialing spaces from header row in Excel using Python

I'm using pandas to read an XLSB file into a data frame before using 'to_sql't to push it to my SQLite Database. Engine is using pyxlsb and not an issue.

My issue is that my XLSB column headers (row 1) have a mixture of spaces before and after the column name. This means that the database also captures the errors in the create statement.

CREATE TABLE "EXAMPLE" (
  "NAME " TEXT,
  " AGE" TEXT,
  "POST CODE " TEXT
)

So, do I need to fix the issue before I pd.read_excel? Can I 'to_sql' without using data frame headers? Can I 'to_sql' where the data frame headers don't match the table in the DB by doing a Drop & Insert?

Any guidance would be much appreciated; sadly, fixing the XLSB at source is not an option.

Solution

You can remove unnecessary whitespaces from your column headers after you read the data in from excel using strip() :

df = pd.DataFrame({
    'Name  ': [],
    ' Age  ': [],
    'Post Code  ':[]
    })

df = df.rename(columns=lambda x: x.strip())


print(df.columns)

['Name', 'Age', 'Post Code']

Then you can use to_sql() after the data is cleaned up