mysqlbashmysqlimport

Import multiple .csv files to mysql using bash


I would like to use a bash script to first copy files from a google drive to the local drive and rename (to remove the spaces and to add a prefix) and then import these .csv files into a mysql database. These .csv's have different columns so i need to get the column names from each .csv and create a table using these column names and then import the data.

I would appreciate any help.

I found some code here https://ericlondon.com/2011/04/10/a-bash-shell-script-to-import-a-large-number-of-csv-files-into-mysql.html which i have modified but i cant get it to work. I admit i dont understand what it all does.

This is what i currently have for my script:

#!/bin/bash

# show commands being executed, per debug
set -x

# define database connectivity
_db="dbname"
_db_user="user"
_db_password="password"

# delete existing files
rm -rf /path/to/*.csv

# copy files to local disk
cp /old/path/to/*.csv /path/to/

# define directory containing CSV files
_csv_directory="/path/to"

# go into directory
cd $_csv_directory || exit

# edit file name
rename "s/ //g" *.csv
rename "s/^/tp/g" *.csv

# get a list of CSV files in directory
_csv_files=`ls -1 *.csv`

# loop through csv files
for _csv_file in ${_csv_files[@]}
do

  # remove file extension
  _csv_file_extensionless=`echo "$_csv_file" | sed 's/\(.*\)\..*/\1/'`

  # define table name
  _table_name="${_csv_file_extensionless}"

  # get header columns from CSV file
  _header_columns=`head -1 $_csv_directory/$_csv_file | tr ',' '\n' | sed 's/"//' | sed 's/ /_/g'`
  _header_columns_string=`head -1 $_csv_directory/$_csv_file | sed 's/ /_/g' | sed 's/"//g' | sed 's/(//g' | sed 's/)//g'`

  # ensure table exists
  mysql -u $_db_user -p$_db_password $_db << eof
    CREATE TABLE IF NOT EXISTS \`$_table_name\` ENGINE=MyISAM DEFAULT CHARSET=utf8
eof

  # loop through header columns
  for _header in "${_header_columns[@]}"
  do

    # add column
    mysql -u $_db_user -p$_db_password $_db --execute="alter table \`$_table_name\` add column IF NOT EXISTS \`$_header\` text"

  done

  # import csv into mysql
  mysqlimport --fields-enclosed-by='"' --fields-terminated-by=',' --lines-terminated-by="\r\n" --columns=$_header_columns_string -u $_db_user -p$_db_password $_db $_csv_directory/$_csv_file

done
exit

This is the updated output i get for each .csv:

++ for _csv_file in ${_csv_files[@]}
+++ echo tpTriplexBC.csv
+++ sed 's/\(.*\)\..*/\1/'
++ _csv_file_extensionless=tpTriplexBC
++ _table_name=tpTriplexBC
+++ head -1 /path/to/tpTriplexBC.csv
+++ tr , '\n'
+++ sed 's/^"//'
+++ sed 's/"$//'
+++ sed 's/ /_/g'
+++ sed 's/(//g'
+++ sed 's/)//g'
++ _header_columns='ProductName
ProductID
Quantity
Tax
Paper_Type
PriceExpress
Production_Days_Express
PriceStandard
Production_Days_Standard
PriceSaver
Production_Days_Saver
PriceSameday
Production_Days_Sameday
Price_Just_Print
Price_File_Check
Price_File_Check_with_Proofing
Price_File_Assist
Artwork_Tax
Size
Flat_Width_mm
Flat_Height_mm
Finished_Width_mm
Finished_Height_mm
productionDataJSON
'pdatedAt"
+++ head -1 /path/to/tpTriplexBC.csv
+++ sed 's/ /_/g'
+++ sed 's/"//g'
+++ sed 's/(//g'
+++ sed 's/)//g'
++ _header_columns_string=$'ProductName,ProductID,Quantity,Tax,Paper_Type,PriceExpress,Production_Days_Express,PriceStandard,Production_Days_Standard,PriceSaver,Production_Days_Saver,PriceSameday,Production_Days_Sameday,Price_Just_Print,Price_File_Check,Price_File_Check_with_Proofing,Price_File_Assist,Artwork_Tax,Size,Flat_Width_mm,Flat_Height_mm,Finished_Width_mm,Finished_Height_mm,productionDataJSON,updatedAt\r'
++ mysql -u user -ppassword dbname
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `ProductName` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `ProductID` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Quantity` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Tax` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Paper_Type` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceExpress` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Express` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceStandard` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Standard` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceSaver` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Saver` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceSameday` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Sameday` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_Just_Print` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_File_Check` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_File_Check_with_Proofing` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_File_Assist` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Artwork_Tax` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Size` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Flat_Width_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Flat_Height_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Finished_Width_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Finished_Height_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `productionDataJSON` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexB` text'COLUMN IF NOT EXISTS `updatedAt"
'RROR 1166 (42000) at line 1: Incorrect column name 'updatedAt"
++ mysqlimport '--fields-enclosed-by="' --fields-terminated-by=, '--lines-terminated-by=\n' $'--columns=ProductName,ProductID,Quantity,Tax,Paper_Type,PriceExpress,Production_Days_Express,PriceStandard,Production_Days_Standard,PriceSaver,Production_Days_Saver,PriceSameday,Production_Days_Sameday,Price_Just_Print,Price_File_Check,Price_File_Check_with_Proofing,Price_File_Assist,Artwork_Tax,Size,Flat_Width_mm,Flat_Height_mm,Finished_Width_mm,Finished_Height_mm,productionDataJSON,updatedAt\r' -u user -ppassword dbname /path/to/tpTriplexBC.csv
mysqlimport: Error: 1054, Unknown column 'updatedAt' in 'field list', when using table: tpTriplexBC
++ exit

The columns are inserted to each table but no data. Also tables called -1 and ls are created which i dont want.

The first line of the csv is: "ProductName","ProductID","Quantity","Tax","Special Finish","Laminate","Sides Printed","Material","PriceStandard","Production Days Standard","PriceSaver","Production Days Saver","PriceSameday","Production Days Sameday","Price Just Print","Price File Check","Price File Check with Proofing","Price File Assist","Artwork Tax","Size","Flat Width (mm)","Flat Height (mm)","Finished Width (mm)","Finished Height (mm)","productionDataJSON","updatedAt"

Octal Dump output for the header:

0000000   "   P   r   o   d   u   c   t   N   a   m   e   "   ,   "   P
0000020   r   o   d   u   c   t   I   D   "   ,   "   Q   u   a   n   t
0000040   i   t   y   "   ,   "   T   a   x   "   ,   "   P   a   p   e
0000060   r       T   y   p   e   "   ,   "   P   r   i   c   e   E   x
0000100   p   r   e   s   s   "   ,   "   P   r   o   d   u   c   t   i
0000120   o   n       D   a   y   s       E   x   p   r   e   s   s   "
0000140   ,   "   P   r   i   c   e   S   t   a   n   d   a   r   d   "
0000160   ,   "   P   r   o   d   u   c   t   i   o   n       D   a   y
0000200   s       S   t   a   n   d   a   r   d   "   ,   "   P   r   i
0000220   c   e   S   a   v   e   r   "   ,   "   P   r   o   d   u   c
0000240   t   i   o   n       D   a   y   s       S   a   v   e   r   "
0000260   ,   "   P   r   i   c   e   S   a   m   e   d   a   y   "   ,
0000300   "   P   r   o   d   u   c   t   i   o   n       D   a   y   s
0000320       S   a   m   e   d   a   y   "   ,   "   P   r   i   c   e
0000340       J   u   s   t       P   r   i   n   t   "   ,   "   P   r
0000360   i   c   e       F   i   l   e       C   h   e   c   k   "   ,
0000400   "   P   r   i   c   e       F   i   l   e       C   h   e   c
0000420   k       w   i   t   h       P   r   o   o   f   i   n   g   "
0000440   ,   "   P   r   i   c   e       F   i   l   e       A   s   s
0000460   i   s   t   "   ,   "   A   r   t   w   o   r   k       T   a
0000500   x   "   ,   "   S   i   z   e   "   ,   "   F   l   a   t    
0000520   W   i   d   t   h       (   m   m   )   "   ,   "   F   l   a
0000540   t       H   e   i   g   h   t       (   m   m   )   "   ,   "
0000560   F   i   n   i   s   h   e   d       W   i   d   t   h       (
0000600   m   m   )   "   ,   "   F   i   n   i   s   h   e   d       H
0000620   e   i   g   h   t       (   m   m   )   "   ,   "   p   r   o
0000640   d   u   c   t   i   o   n   D   a   t   a   J   S   O   N   "
0000660   ,   "   u   p   d   a   t   e   d   A   t   "  \r  \n

Solution

  • Looking at the od output of your CSV header, appears that you have a carriage return and line feed at the end of your CSV lines (the \r\n sequence). Did the CSV come from processing under Windows? \r\n is a common text line termination under Windows, whereas Linux uses only \n. The \r gets included as part of the string and is not considered part of the line termination. It also can yield strange looking output since the \r by itself, when output, resets the cursor position to the first column and subsequent characters display over the top of whatever may have been displayed on that same line previously.

    I recommend running dos2unix on your CSV file first, then do your processing. I think that will clear up the odd error and help the processing of the rest of the file work properly as well.