I am working on migrating a function implemented in Java Spring Boot to Python Django.
I encountered an issue while migrating a function that retrieves a list of keywords through a select query and updates these keywords using an update query.
Specifically, updating 450 rows takes around 0.1 seconds in Java, but it takes 2 seconds in Python.
Below are the methods I used to measure the time and the Java and Python code.
Measurement method
// java
long startTime = System.nanoTime();
// Execute the update
long endTime = System.nanoTime();
double duration = (endTime - startTime) / 1_000_000_000.0;;
String formattedDuration = String.format("%.2f seconds", duration);
log.info("processing_time: {}", formattedDuration);
// python
start_time = time.perf_counter()
// Execute the update
end_time = time.perf_counter()
processing_time = end_time - start_time
processing_time_formatted = f"{processing_time:.2f}"
logger.info(f"processing_time: {processing_time_formatted} seconds")
Logic
Java
// application.yml
spring:
datasource:
driver-class-name: org.mariadb.jdbc.Driver
url: jdbc:mariadb:...
username: ...
password: ...
jpa:
hibernate:
ddl-auto: none
properties:
hibernate:
show_sql: true
use_sql_comments: true
format_sql: true
dialect : org.hibernate.dialect.MariaDBDialect
open-in-view: false
sql:
init:
mode: never
// service
public void updateUserIntrKwd(String indvNum) {
// 2s
List<IntrKwdDto> kwdList = intrKwdMapper.selectIntrKwdList(indvNum);
// 0.001s
DatesAndMaxFrequency datesAndMaxFrequency = getDatesAndMaxFrequency(kwdList);
// 0.1s
intrKwdMapper.updateUserIntrKwd(kwdList, datesAndMaxFrequency);
}
// Mapper
<update id="updateUserIntrKwd" parameterType="map">
<foreach collection="kwdList" item="item" separator=";">
...
Python
// settings.py
DATABASES = {
"default": {
"ENGINE": "django.db.backends.mysql",
"NAME": "...",
"USER": "...",
"PASSWORD": "...",
"HOST": "...",
"PORT": 3306,
},
// service
def update_user_interest_keywords(indv_num: str) -> int:
// 2s
user_kwd_list = select_interest_keywords_list(indv_num)
// 0.001s
dates_and_max_frequency = get_dates_and_max_frequency(user_kwd_list)
// 2s
updated_rows =
execute_user_interest_keywords_query(user_kwd_list, dates_and_max_frequency)
return updated_rows
// query
def execute_user_interest_keywords_query(userKwdList: list[IntrKwd], datesAndMaxFrequency: DatesAndMaxFrequency):
base_query = """
query...
"""
def get_query_params(item):
return (
item.last_aper_yrmo,
...
)
params = [get_query_params(item) for item in userKwdList]
try:
with connections['...'].cursor() as cursor:
cursor.executemany(base_query, params)
updated_rows = cursor.rowcount
...
return updated_rows
They share the same database, and I have structured the Python code to be as similar to the Java code as possible.
Notably, the processing speed of the SELECT query is the same. The difference in performance only occurs with the UPDATE query.
I believe the only difference is the environment in which the servers are running.
Based on this thought, I have tried the following methods:
Switching the database library
I attempted to switch the database library from 'pymysql' to 'mysqlclient', following the advice that 'mysqlclient' might perform better.
Result: No significant change.
Changing options in 'settings.py'
'OPTIONS': {
'init_command': "SET sql_mode='STRICT_TRANS_TABLES'",
'autocommit': True,
'connect_timeout': 10,
'charset': 'utf8mb4',
'use_unicode': True,
},
'CONN_MAX_AGE': 60,
I added the above options to the 'DATABASES'
Result: No significant change.
Changing .executemany()
to .execute()
Based on the advice that .executemany()
might only perform well with INSERT and REPLACE statements, I modified the logic to use .execute()
instead.
Result: No significant change.
Installing the 'mariadb' library
The database in use is MariaDB version 10.11.1. Based on the advice that installing the 'mariadb' library might improve performance, I added the 'mariadb' library.
Result: No significant change.
A coding newbie desperately needs your help! Why on earth is the UPDATE query taking over ten times longer, even though the logic is exactly the same?
To long for a comment:
You're obviously using MariaDB Connector/Java, but not MariaDB Connector/Python, so it's like comparing apples with pears.
All official MariaDB Connectors support INSERT,UPDATE,DELETE in batch (bulk) mode, this feature is not supported by other Python drivers like pymysql, mysqlclient or MySQL Connector/Python. Depending on the kind of connection (remote or not) bulk mode is up to 15 times faster.
Using multiple execute() with autocommit will slow down performance - instead start a transaction, execute all statements and commit at the end.
Unfortunately Django misses native MariaDB support, my PR which I submitted some years ago was closed as "won't fix". A better alternative for MariaDB might be SQLAlchemy, which has it's own dialect for MariaDB Connector/Python and supports native bulk operations.