I'm using Sqlalchemy(2.0.3) with python3.10 and after fresh container boot it takes ~2.2s to execute specific query, all consecutive calls of the same query take ~70ms to execute. I'm using PostgreSQL and it takes 40-70ms to execute raw query in DataGrip. Here is the code:
self._Session = async_sessionmaker(self._engine, expire_on_commit=False)
...
@property
def session(self):
return self._Session
...
async with PostgreSQL().session.begin() as session:
total_functions = aliased(db_models.Function)
finished_functions = aliased(db_models.Function)
failed_functions = aliased(db_models.Function)
stmt = (
select(
db_models.Job,
func.count(distinct(total_functions.id)).label("total"),
func.count(distinct(finished_functions.id)).label("finished"),
func.count(distinct(failed_functions.id)).label("failed")
)
.where(db_models.Job.project_id == project_id)
.outerjoin(db_models.Job.packages)
.outerjoin(db_models.Package.modules)
.outerjoin(db_models.Module.functions.of_type(total_functions))
.outerjoin(finished_functions, and_(
finished_functions.module_id == db_models.Module.id,
finished_functions.progress == db_models.FunctionProgress.FINISHED
))
.outerjoin(failed_functions, and_(
failed_functions.module_id == db_models.Module.id,
or_(
failed_functions.state == db_models.FunctionState.FAILED,
failed_functions.state == db_models.FunctionState.TERMINATED,
))
)
.group_by(db_models.Job.id)
)
start = time.time()
yappi.set_clock_type("WALL")
with yappi.run():
job_infos = await session.execute(stmt)
yappi.get_func_stats().print_all()
end = time.time()
Things I have tried and discovered:
query_cache_size=0
, however I'm not 100% sure that it worked, since documentations says:ORM functions related to unit-of-work persistence as well as some attribute loading strategies will make use of individual per-mapper caches outside of the main cache.
..urrency_py3k.py:130 greenlet_spawn 2/1 0.000000 2.324807 1.162403
..rm/session.py:2168 Session.execute 1 0.000028 2.324757 2.324757
..0 _UnixSelectorEventLoop._run_once 11 0.000171 2.318555 0.210778
..syncpg_cursor._prepare_and_execute 1 0.000054 2.318187 2.318187
..cAdapt_asyncpg_connection._prepare 1 0.000020 2.316333 2.316333
..nnection.py:533 Connection.prepare 1 0.000003 2.316154 2.316154
..nection.py:573 Connection._prepare 1 0.000017 2.316151 2.316151
..n.py:359 Connection._get_statement 2/1 0.001033 2.316122 1.158061
..ectors.py:452 EpollSelector.select 11 0.000094 2.315352 0.210487
..y:457 Connection._introspect_types 1 0.000025 2.314904 2.314904
..ction.py:1669 Connection.__execute 1 0.000027 2.314879 2.314879
..ion.py:1699 Connection._do_execute 1 2.314095 2.314849 2.314849
...py:2011 Session._execute_internal 1 0.000034 0.006174 0.006174
I have also seen that one may disable cache per connection:
with engine.connect().execution_options(compiled_cache=None) as conn:
conn.execute(table.select())
However I'm working with ORM layer and not sure how to apply this in my case.
Any ideas where this delay might come from?
After hours of googling I have found this post. In short, problem is related to lack of dependencies(in some alpine docker images) that are required by JIT that is used by Postgres. For details I really recommend to read post and real-life impact author provides.
Actual solution for Sqlalchemy is to switch off JIT:
engine = create_async_engine(
"postgresql+asyncpg://user:password@localhost/tmp",
connect_args={"server_settings": {"jit": "off"}},
)
Reference to docs.