pythonmysqlexecutemany

Read CSV file into MySQL and use executemany instead of execute


I have a python script that reads a large (4GB!!!) CSV file into MySQL. It works as is, but is DOG slow. The CSV file has over 4 million rows. And it is taking forever to insert all the records into the database.

Could I get an example of how I would use executemany in this situation?

Here is my code:

source = os.path.join('source_files', 'aws_bills', 'march-bill-original-2019.csv')
try:
    with open(source) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        next(csv_reader)
        insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
        #for row in csv_reader:
        for row_idx, row in enumerate(csv_reader):
            try:
                cursor.execute(insert_sql,row)
                #cursor.executemany(insert_sql, 100)
                mydb.commit()
                print('row', row_idx, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
            except Exception as e:
                print("MySQL Exception:", e)
        print("Done importing data.") 

Again, that code works to insert the records into the database. But I am hoping to speed this up with executemany if I can get an example of how to do that.


Solution

  • Good Night I saw that the question is a little old and I don't know if you still need it. I was doing something similar recently, initially I transformed the csv into a list so that the executemany function accepts the data, right after performing the request passing its insert with the list, in your case it would look like this:

    import pandas as pd
    
    df = pd.read_csv(r'path_your_csv')
    df1=pd.DataFrame(df)
    df1=df1.astype(str)
    List_Values=df1.values.tolist()
    insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
    
    cursor.executemany(insert_sql, List_Values)