pythonmysqlkeyexecutemany

Inserting into MySQL from Python using mysql.connector module - How to use executemany() to inert rows and child rows?


I have a MySQL server running on a remote host. The connection to the host is fairly slow and it affects the performance of the Python code I am using. I find that using the executemany() function makes a big improvement over using an loop to insert many rows. My challenge is that for each row I insert into one table, I need to insert several rows in another table. My sample below does not contain much data, but my production data could be thousands of rows.

I know that this subject has been asked about many times in many places, but I don't see any kind of definitive answer, so I'm asking here...

My tables...

Table: makes
 pkey bigint autoincrement primary_key
 make varchar(255) not_null

Table: models
 pkey bigint autoincrement primary_key
 make_key bigint not null
 model varchar(255) not_null

...and the code...

... 
cars = {"Ford": ["F150", "Fusion", "Taurus"],
        "Chevrolet": ["Malibu", "Camaro", "Vega"],
        "Chrysler": ["300", "200"], 
        "Toyota": ["Prius", "Corolla"]}

# Fill makes table with car makes
sql_data = list(cars.keys())
sql = "INSERT INTO makes (make) VALUES (%s)"
cursor.executemany(sql, sql_data)
rows_added = len(sqldata)

# Find the primary key for the first row that was just added
sql = "SELECT LAST_INSERT_ID()"
cursor.execute(sql)
rows = cursor.fetchall()
first_key = rows[0][0]

# Fill the models table with the car models, linked to their make
this_key = first_key
sql_data = []
for car in cars:
    for model in cars[car]:
        sql_data.append((this_key, car))
    
    this_key += 1

sql = "INSERT INTO models (make_key, model) VALUES (%s, %s)"
cursor.executemany(sql, sql_data)
    
cursor.execute("COMMIT")
...

Solution

  • I have, more than once, measured about 10x speedup when batching inserts.

    If you are inserting 1 row in table A, then 100 rows in table B, don't worry about the speed of the 1 row; worry about the speed of the 100.

    Yes, it is clumsy to get the ids generated by an insert. I have found no straightforward way like LAST_INSERT_ID, but that works only for a single-row insert.

    So, I have developed the following to do a batch of "normalization" inserts. This is where you a have a table that maps strings to ids (where the string is likely to show up repeatedly). It takes 2 steps: First a batch insert of the "new" strings. Then fetch all the needed ids and copy them into the other table. The details are laid out here: http://mysql.rjweb.org/doc.php/staging_table#normalization (Sorry, I am not fluent in python or the hundred other ways to talk to MySQL, so I can't give you python code.)

    Your use case example is "normalization"; I recommend doing it outside the main transaction. Note that my code takes care of multiple connections, avoiding 'burning' ids, etc.

    When you have subcategories ("make" + "model" or "city" + "state" + "country"), I recommend a single normalization table, not one for each.

    In your example, pkey could be a 2-byte SMALLINT UNSIGNED (limit 64K) instead of a bulky 8-byte BIGINT.