pythonmysqldataframe

MySQL update query for rows NOT matching any records in a dataframe?


I have a MySQL table

+----+------+------+------+
| ID | x    | y    | z    |
+----+------+------+------+
|  1 |    1 | a    |    2 |
|  2 |    1 | b    | NULL |
|  3 |    1 | c    | NULL |
|  4 |    2 | a    | NULL |
|  5 |    2 | b    |    1 |
|  6 |    2 | c    |    2 |
|  7 |    3 | a    |    1 |
|  8 |    3 | b    | NULL |
|  9 |    3 | c    | NULL |
+----+------+------+------+

in which I want to update values in the column z based on the pandas dataframe

lst = [{'ID': 3, 'x': 1, 'y': 'c'},
       {'ID': 4, 'x': 2, 'y': 'a'},
       {'ID': 8, 'x': 3, 'y': 'b'}]

df = pd.DataFrame(lst)

I want to set the value of z to 3 for those rows satisfying

  1. the present value for z is NULL, and
  2. the combination of the value for x and the value for y is NOT present in the dataframe.

This corresponds to the rows 2 and 9 in the table. Can I do this with a single query?

Being fairly new to MySQL, I tried the following:

import pandas as pd
import mysql.connector
from mysql.connector import errorcode

import configparser
config = configparser.ConfigParser()
config.read('config.ini')
mysql_host     = config['mysql']['host']
mysql_database = config['mysql']['database']
mysql_user     = config['mysql']['user']
mysql_password = config['mysql']['password']

connection = mysql.connector.connect(
    database = mysql_database,
    host     = mysql_host,
    user     = mysql_user,
    password = mysql_password
)
cursor = connection.cursor()

values = df[['x', 'y']].values.tolist()

query = "UPDATE my_table SET z = 3 WHERE z IS NULL AND NOT (x = %s AND y = %s)"

cursor.executemany(query, values)
connection.commit()

This does not work: it changes all NULL values to 3. Apparently cursor.executemany iterates over the items in the values list, so that makes sense. Is there a way where the items are not iterated over, but considered simultaneously, when the WHERE ... NOT part in the query is considered? And effectively only updating rows 2 and 9 in the table?

Of course I could also read part of the table into a dataframe, manipulate the dataframe according to my wishes, and based on that update the table. But it is less direct. Is there no direct way where the manipulation is done inside MySQL using a query and the dataframe?

I am looking for a way that applies to large tables and dataframes, not just the small ones in the example here. Thanks!


Solution

  • a possible solution is to

    
    import pandas as pd
    import mysql.connector
    from mysql.connector import errorcode
    
    import configparser
    config = configparser.ConfigParser()
    config.read('config.ini')
    mysql_host     = config['mysql']['host']
    mysql_database = config['mysql']['database']
    mysql_user     = config['mysql']['user']
    mysql_password = config['mysql']['password']
    
    connection = mysql.connector.connect(
        database = mysql_database,
        host     = mysql_host,
        user     = mysql_user,
        password = mysql_password
    )
    cursor = connection.cursor()
    
    
    temp_creator = '''CREATE  TEMPORARY TABLE IF NOT EXISTS val (ID int, x int, y varchar(1));'''
    cursor.execute(temp_creator)
     #load dataframe
    lst = [{'ID': 3, 'x': 1, 'y': 'c'},
           {'ID': 4, 'x': 2, 'y': 'a'},
           {'ID': 8, 'x': 3, 'y': 'b'}]
    df = pd.DataFrame(lst)
    values = df[['ID','x', 'y']].values.tolist()
    #import dataframe into teporary
    cursor.executemany("""
        INSERT INTO val
            (ID,x,y)
        VALUES (%s,%s,%s)""", values)
    
    connection.commit()
    
    
    query = """UPDATE my_table m JOIN val v ON m.z IS 
    NULL AND (m.x = v.x AND m.y <> v.y) SET m.z = 3 
    WHERE  m.ID > 0;"""
    
    cursor.execute(query)
    
    connection.commit()