pythonpandasdataframesqlite

Remove Leading/Trialing spaces from header row in Excel using Python


I'm using pandas to read an XLSB file into a data frame before using 'to_sql't to push it to my SQLite Database. Engine is using pyxlsb and not an issue.

My issue is that my XLSB column headers (row 1) have a mixture of spaces before and after the column name. This means that the database also captures the errors in the create statement.

CREATE TABLE "EXAMPLE" (
  "NAME " TEXT,
  " AGE" TEXT,
  "POST CODE " TEXT
)

So, do I need to fix the issue before I pd.read_excel? Can I 'to_sql' without using data frame headers? Can I 'to_sql' where the data frame headers don't match the table in the DB by doing a Drop & Insert?

Any guidance would be much appreciated; sadly, fixing the XLSB at source is not an option.


Solution

  • You can remove unnecessary whitespaces from your column headers after you read the data in from excel using strip() :

    df = pd.DataFrame({
        'Name  ': [],
        ' Age  ': [],
        'Post Code  ':[]
        })
    
    df = df.rename(columns=lambda x: x.strip())
    
    
    print(df.columns)
    
    ['Name', 'Age', 'Post Code']
    
    

    Then you can use to_sql() after the data is cleaned up