mysqlload-data-infile

Load DATA INFILE taking approx 3min for a 2 GB file. updated_at column set to DEFAULT CURRENT_TIMESTAMP


all records have same TIMESTAMP value. i am not setting any value for updated_at column in the load data infile statement. Why such behaviour

Expecting the seconds to be different as it is not possible to insert 5 million records within 1 s.

2023-05-07 06:26:36 example. at least the seconds should be different for some records

mysql -h$1 -u$2 -p$3 --local-infile=1 -e "LOAD DATA LOCAL INFILE '$4' INTO TABLE abc FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES ( legal_name, pg_merchant_id, ip_addr)" 

table structure :

CREATE TABLE `abc` (
   `pg_merchant_id` varchar(10) NOT NULL,
   `legal_name` varchar(100) NOT NULL,
   `ip_addr` varchar(150) NOT NULL,
   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`),
    KEY `updated_at_idx` (`updated_at`)
) 

Solution

  • CURRENT_TIMESTAMP will be the time the LOAD statement began. This is stated in the docs:

    https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_current-timestamp :

    CURRENT_TIMESTAMP and CURRENT_TIMESTAMP() are synonyms for NOW().

    https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_now :

    NOW() returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.) This differs from the behavior for SYSDATE(), which returns the exact time at which it executes.

    I suggest you set updated_at after the load finishes; if you do the load file and update in a transaction, they won't see the earlier values.

    Even then, the other team needs to know the longest any statement or transaction can take, and look for updated_at at least that far in the past. If they are just comparing to the current time, they will miss some updates or inserts.