pythonmysqlmysql-connector-python

Accelerating slow MySQL import using python connector


I'm writing a python script which imports a large database into MySQL. So far I used this approach, and it works, but it is slow:

importdb = "mysql -h " + DB_HOST + " -u " + DB_USER + " -p" + shlex.quote(DB_USER_PASSWORD) + " " + DB_TARGET + " < " + os.getcwd() + "\\AllPrintings.sql"
os.system(importdb)

According to this and similar posts, import can be faster if default settings are changed before import:

SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
# <import statement here>
COMMIT;
SET unique_checks=1;
SET foreign_key_checks=1;

The issue is that those settings are relevant only for specific connection. As I am using python script, I see two options, and in both I am failing to find a working solution:

  1. Use os.system and add SET commands - I tried to separate commands with ; but it seems to stop executing them after first ;

#doesn't work

importdb = "mysql -h " + DB_HOST + " -u " + DB_USER + " -p" + shlex.quote(DB_USER_PASSWORD)+ "; " + "use " + DB_TARGET + "; SET autocommit=0; SET unique_checks=0; SET FOREIGN_KEY_CHECKS=0; " + "source" + os.getcwd() + "\\AllPrintings.sql; SET autocommit=1; SET unique_checks=1; SET FOREIGN_KEY_CHECKS=1;"
os.system(importdb)
  1. Use import mysql.connector and add import statement: SET commands work, but not importing

#doesn't work

cur.execute(f"SET autocommit=0;")
cur.execute(f"SET unique_checks=0;")
cur.execute(f"SET FOREIGN_KEY_CHECKS=0;")
cur.execute(DB_TARGET + ' < ' + os.getcwd() + '\\AllPrintings.sql')
conn.commit()
cur.execute(f"SET autocommit=1;")
cur.execute(f"SET unique_checks=1;")
cur.execute(f"SET FOREIGN_KEY_CHECKS=1;");

I'm stuck and I don't know how to move further, all help will be appreciated...


EDIT & SOLUTION

On the basis of @mrrobot.viewsource answer below, the solution was to modify imported db with commands above. It decreased importing time of test database from 73 min to 2 min. My code:

with open(os.getcwd() + "\\AllPrintings.sql", "r+",encoding="utf8") as f:
        content = f.read()
        f.seek(0, 0)
        f.write("SET autocommit=0;\nSET unique_checks=0;\nSET FOREIGN_KEY_CHECKS=0;" + '\n' + content)

with open(os.getcwd() + "\\AllPrintings.sql", "a+", encoding="utf8") as f:
    f.write("\nSET unique_checks=1;\nSET FOREIGN_KEY_CHECKS=1;\n")

Solution

  • There are a few issues with your code. Did you try printing the variable importdb ? Here is what is happening in your script:

    PROBLEM:

    Your script (commented os.system(importdb) for obvious reasons):

    import shlex
    import os
    
    DB_HOST="1.1.1.1"
    DB_USER="root"
    DB_USER_PASSWORD="password"
    DB_TARGET="somedb"
    
    importdb = "mysql -h " + DB_HOST + " -u " + DB_USER + " -p" + shlex.quote(DB_USER_PASSWORD)+ "; " + "use " + DB_TARGET + "; SET autocommit=0; SET unique_checks=0; SET FOREIGN_KEY_CHECKS=0; " + "source" + os.getcwd() + "\\AllPrintings.sql; SET autocommit=1; SET unique_checks=1; SET FOREIGN_KEY_CHECKS=1;"
    print(importdb)
    #os.system(importdb)
    
    

    The output:

    mysql -h 1.1.1.1 -u root -ppassword; use somedb; SET autocommit=0; SET unique_checks=0; SET FOREIGN_KEY_CHECKS=0; source/some/path\AllPrintings.sql; SET autocommit=1; SET unique_checks=1; SET FOREIGN_KEY_CHECKS=1;
    

    From the output:

    1. the first ; i.e. mysql -h 1.1.1.1 -u root -ppassword; makes the control go to the MySQL client shell and you are no longer with python here which answers your below query.

    but it seems to stop executing them after first ;

    1. If you check the output source/some/path\AllPrintings.sql; - There should have been space between source and filename.

    2. The SET commands must be present inside your AllPrintings.sql file as in:

    # <AllPrintings.sql - start of file>
    
    SET autocommit=0;
    SET unique_checks=0;
    SET foreign_key_checks=0;
    
    # <your_queries in the sql file>
    
    COMMIT;
    SET unique_checks=1;
    SET foreign_key_checks=1;
    
    # <AllPrintings.sql - end of file>
    
    

    SOLUTION:

    Add the SET commands in the AllPrintings.sql file itself as mentioned in Point 3 and change your importdb variable as below

    importdb = "mysql -h " + DB_HOST + " -u " + DB_USER + " -p" + shlex.quote(DB_USER_PASSWORD)+ " -D " + DB_TARGET + " < " + os.getcwd() + "\\AllPrintings.sql"
    

    That should work for you.