pythonmysqlexcelcsvimport-from-excel

Generate table schema inspecting Excel(CSV) and import data


How would I go around creating a MYSQL table schema inspecting an Excel(or CSV) file. Are there any ready Python libraries for the task?

Column headers would be sanitized to column names. Datatype would be estimated based on the contents of the spreadsheet column. When done, data would be loaded to the table.

I have an Excel file of ~200 columns that I want to start normalizing.


Solution

  • Just for (my) reference, I documented below what I did:

    1. XLRD is practical, however I've just saved the Excel data as CSV, so I can use LOAD DATA INFILE
    2. I've copied the header row and started writing the import and normalization script
    3. Script does: CREATE TABLE with all columns as TEXT, except for Primary key
    4. query mysql: LOAD DATA LOCAL INFILE loading all CSV data into TEXT fields.
    5. based on the output of PROCEDURE ANALYSE, I was able to ALTER TABLE to give columns the right types and lengths. PROCEDURE ANALYSE returns ENUM for any column with few distinct values, which is not what I needed, but I found that useful later for normalization. Eye-balling 200 columns was a breeze with PROCEDURE ANALYSE. Output from PhpMyAdmin propose table structure was junk.
    6. I wrote some normalization mostly using SELECT DISTINCT on columns and INSERTing results to separate tables. I have added to the old table a column for FK first. Just after the INSERT, I've got its ID and UPDATEed the FK column. When loop finished I've dropped old column leaving only FK column. Similarly with multiple dependent columns. It was much faster than I expected.
    7. I ran (django) python manage.py inspctdb, copied output to models.py and added all those ForeignkeyFields as FKs do not exist on MyISAM. Wrote a little python views.py, urls.py, few templates...TADA

    Created https://blocl.uk/schools based on data from https://www.get-information-schools.service.gov.uk/Downloads