mysqlmysql-workbenchmysqladministrator

Using mysqldump to format one insert per line?


This has been asked a few times but I cannot find a resolution to my problem. Basically when using mysqldump, which is the built in tool for the MySQL Workbench administration tool, when I dump a database using extended inserts, I get massive long lines of data. I understand why it does this, as it speeds inserts by inserting the data as one command (especially on InnoDB), but the formatting makes it REALLY difficult to actually look at the data in a dump file, or compare two files with a diff tool if you are storing them in version control etc. In my case I am storing them in version control as we use the dump files to keep track of our integration test database.

Now I know I can turn off extended inserts, so I will get one insert per line, which works, but any time you do a restore with the dump file it will be slower.

My core problem is that in the OLD tool we used to use (MySQL Administrator) when I dump a file, it does basically the same thing but it FORMATS that INSERT statement to put one insert per line, while still doing bulk inserts. So instead of this:

INSERT INTO `coupon_gv_customer` (`customer_id`,`amount`) VALUES (887,'0.0000'),191607,'1.0300');

you get this:

INSERT INTO `coupon_gv_customer` (`customer_id`,`amount`) VALUES 
 (887,'0.0000'),
 (191607,'1.0300');

No matter what options I try, there does not seem to be any way of being able to get a dump like this, which is really the best of both worlds. Yes, it take a little more space, but in situations where you need a human to read the files, it makes it MUCH more useful.

Am I missing something and there is a way to do this with MySQLDump, or have we all gone backwards and this feature in the old (now deprecated) MySQL Administrator tool is no longer available?


Solution

  • With the default mysqldump format, each record dumped will generate an individual INSERT command in the dump file (i.e., the sql file), each on its own line. This is perfect for source control (e.g., svn, git, etc.) as it makes the diff and delta resolution much finer, and ultimately results in a more efficient source control process. However, for significantly sized tables, executing all those INSERT queries can potentially make restoration from the sql file prohibitively slow.

    Using the --extended-insert option fixes the multiple INSERT problem by wrapping all the records into a single INSERT command on a single line in the dumped sql file. However, the source control process becomes very inefficient. The entire table contents is represented on a single line in the sql file, and if a single character changes anywhere in that table, source control will flag the entire line (i.e., the entire table) as the delta between versions. And, for large tables, this negates many of the benefits of using a formal source control system.

    So ideally, for efficient database restoration, in the sql file, we want each table to be represented by a single INSERT. For an efficient source control process, in the sql file, we want each record in that INSERT command to reside on its own line.

    My solution to this is the following back-up script:

    #!/bin/bash
    
    cd my_git_directory/
    
    ARGS="--host=myhostname --user=myusername --password=mypassword --opt --skip-dump-date"
    /usr/bin/mysqldump $ARGS --database mydatabase | sed 's$VALUES ($VALUES\n($g' | sed 's$),($),\n($g' > mydatabase.sql
    
    git fetch origin master
    git merge origin/master
    git add mydatabase.sql
    git commit -m "Daily backup."
    git push origin master
    

    The result is a sql file INSERT command format that looks like:

    INSERT INTO `mytable` VALUES
    (r1c1value, r1c2value, r1c3value),
    (r2c1value, r2c2value, r2c3value),
    (r3c1value, r3c2value, r3c3value);
    

    Some notes: