I have a MySQL table
+----+------+------+------+
| ID | x | y | z |
+----+------+------+------+
| 1 | 1 | a | 2 |
| 2 | 1 | b | NULL |
| 3 | 1 | c | NULL |
| 4 | 2 | a | NULL |
| 5 | 2 | b | 1 |
| 6 | 2 | c | 2 |
| 7 | 3 | a | 1 |
| 8 | 3 | b | NULL |
| 9 | 3 | c | NULL |
+----+------+------+------+
in which I want to update values in the column z
based on the pandas dataframe
lst = [{'ID': 3, 'x': 1, 'y': 'c'},
{'ID': 4, 'x': 2, 'y': 'a'},
{'ID': 8, 'x': 3, 'y': 'b'}]
df = pd.DataFrame(lst)
I want to set the value of z
to 3
for those rows satisfying
z
is NULL
, andx
and the value for y
is NOT present in the dataframe.This corresponds to the rows 2 and 9 in the table. Can I do this with a single query?
Being fairly new to MySQL, I tried the following:
import pandas as pd
import mysql.connector
from mysql.connector import errorcode
import configparser
config = configparser.ConfigParser()
config.read('config.ini')
mysql_host = config['mysql']['host']
mysql_database = config['mysql']['database']
mysql_user = config['mysql']['user']
mysql_password = config['mysql']['password']
connection = mysql.connector.connect(
database = mysql_database,
host = mysql_host,
user = mysql_user,
password = mysql_password
)
cursor = connection.cursor()
values = df[['x', 'y']].values.tolist()
query = "UPDATE my_table SET z = 3 WHERE z IS NULL AND NOT (x = %s AND y = %s)"
cursor.executemany(query, values)
connection.commit()
This does not work: it changes all NULL
values to 3
.
Apparently cursor.executemany
iterates over the items in the values
list,
so that makes sense.
Is there a way where the items are not iterated over,
but considered simultaneously,
when the WHERE ... NOT
part in the query is considered?
And effectively only updating rows 2 and 9 in the table?
Of course I could also read part of the table into a dataframe, manipulate the dataframe according to my wishes, and based on that update the table. But it is less direct. Is there no direct way where the manipulation is done inside MySQL using a query and the dataframe?
I am looking for a way that applies to large tables and dataframes, not just the small ones in the example here. Thanks!
a possible solution is to
make a temporary table
insert your dataframe
and then run the update on that table
After the connection is closed the temporary table disappears.
In case you want to run multiple times in the same connction, you should truncate the table
import pandas as pd
import mysql.connector
from mysql.connector import errorcode
import configparser
config = configparser.ConfigParser()
config.read('config.ini')
mysql_host = config['mysql']['host']
mysql_database = config['mysql']['database']
mysql_user = config['mysql']['user']
mysql_password = config['mysql']['password']
connection = mysql.connector.connect(
database = mysql_database,
host = mysql_host,
user = mysql_user,
password = mysql_password
)
cursor = connection.cursor()
temp_creator = '''CREATE TEMPORARY TABLE IF NOT EXISTS val (ID int, x int, y varchar(1));'''
cursor.execute(temp_creator)
#load dataframe
lst = [{'ID': 3, 'x': 1, 'y': 'c'},
{'ID': 4, 'x': 2, 'y': 'a'},
{'ID': 8, 'x': 3, 'y': 'b'}]
df = pd.DataFrame(lst)
values = df[['ID','x', 'y']].values.tolist()
#import dataframe into teporary
cursor.executemany("""
INSERT INTO val
(ID,x,y)
VALUES (%s,%s,%s)""", values)
connection.commit()
query = """UPDATE my_table m JOIN val v ON m.z IS
NULL AND (m.x = v.x AND m.y <> v.y) SET m.z = 3
WHERE m.ID > 0;"""
cursor.execute(query)
connection.commit()