Why does this Qt program that creates multiple QApplication objects crash unless I reset a dead local variable?

Program

If the following program is run on Windows with a single command-line argument, it will crash:

# threading-crash.py
"""Reproduce a crash involving Qt and threading"""

from PyQt5 import QtCore

import sys
from threading import Thread

from typing import Optional


class WorkerManager(QtCore.QObject):
  # Signal emitted when thread is finished.
  worker_finished = QtCore.pyqtSignal()

  def start_worker(self) -> None:
    def worker() -> None:
      # Printing here is necessary for the crash to happen *reliably*,
      # though it still happens without it (just less often).
      print("Emitting worker_finished signal")

      self.worker_finished.emit()

    t = Thread(target=worker)
    t.start()


def run_test() -> None:
  # When using `mypy`, I cannot assign `None` to `app` at the end unless
  # the type is declared to be optional here.
  app: Optional[QtCore.QCoreApplication] = QtCore.QCoreApplication(sys.argv)
  assert(app)      # Pacify mypy.

  mgr = WorkerManager()

  def finished() -> None:
    # Terminate the `exec_` call below.
    assert(app)    # Pacify mypy.
    app.exit(0)

  # Make a queued connection since this is a cross-thread signal.  (This
  # is not necessary to reproduce the crash; auto does the same thing.)
  mgr.worker_finished.connect(
    finished, QtCore.Qt.QueuedConnection) # type: ignore

  # Start the worker thread, which will signal `finished`.
  mgr.start_worker()

  # Wait for the signal to be received.
  app.exec_()

  if len(sys.argv) == 1:
    # This fixes the crash!
    app = None


def main() -> None:
  for i in range(10):
    print(f"{i}: run_test")
    run_test()     # Crashes on the second call.


if __name__ == "__main__":
  main()


# EOF

Demonstration

On my system (and with the print call in worker) this program crashes or hangs 100% of the time in the second run_test call.

Example run:

$ python threading-crash.py CRASH
0: run_test
Emitting worker_finished signal
1: run_test
Emitting worker_finished signal
Segmentation fault
Exit 139

The exact behavior varies unpredictably; another example:

$ python threading-crash.py CRASH
0: run_test
Emitting worker_finished signal
1: run_test
Emitting worker_finished signal
Exception in thread Thread-2 (worker):
Traceback (most recent call last):
  File "D:\opt\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
Exit 127

Other possibilities include popping up an error dialog box ("The instruction at (hex) referenced memory at (hex)."), or just hanging completely.

In contrast, when run without arguments, thus activating the app = None line, it runs fine (even with a large iteration count like 1000):

$ python threading-crash.py
0: run_test
Emitting worker_finished signal
1: run_test
Emitting worker_finished signal
[...]
9: run_test
Emitting worker_finished signal

Other variations

Removing the print in start_worker makes the crash happen less frequently, but does not solve it.

Joining the worker thread at the end of start_worker (so there is no concurrency) removes the crash.

Joining the worker after app.exec_() does not help; it still crashes. Calling time.sleep(1) there (with or without the join) also does not help. This means the crash happens even though there is only one thread running at the time.

Disconnecting the worker_finished signal after app.exec_() does not help.

Adding a call to gc.collect() at the top of run_test has no effect.

Using QtCore.QThread instead of threading.Thread also has no effect on the crash.

Question

Why does this program crash? In particular:

Why does it not crash when I reset app to None? Shouldn't that (or something equivalent) automatically happen when run_test returns?
Is this a bug in my program, or a bug in Python or Qt?

Why am I making multiple `QCoreApplications`?

This example is reduced from a unit test suite. In that suite, each test is meant to be independent of any other, so those tests that need it create their own QCoreApplication object. The documentation does not appear to prohibit this.

Versions, etc.

$ python -V -V
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)]

$ python -m pip list | grep -i qt
PyQt5               5.15.11
PyQt5-Qt5           5.15.2
PyQt5_sip           12.17.0
PyQt5-stubs         5.15.6.0

I'm running this on Windows 10 Home. The above examples use a Cygwin shell, but the same thing happens under cmd.exe. This is all using the native Windows port of Python.

Further simplified

In comments, @ekhumoro suggested replacing the thread with a timer, and to my surprise, the crash still happens! (I was evidently misled by the highly non-deterministic behavior, not all of which I've shared.) Here is a more minimal reroducer (with typing annotations also removed):

# threading-crash.py
"""Reproduce a crash involving Qt and (not!) threading"""

from PyQt5 import QtCore
import sys


class WorkerManager(QtCore.QObject):
  # Signal emitted... never, now.
  the_signal = QtCore.pyqtSignal()


def run_test() -> None:
  app = QtCore.QCoreApplication(sys.argv)
  mgr = WorkerManager()

  def finished() -> None:
    # This call is required since it keeps `app` alive.
    app.exit(0)

  # Connect the signal (which is never emitted) to a local lambda.
  mgr.the_signal.connect(finished)

  # Start and stop the event loop.
  QtCore.QTimer.singleShot(100, app.quit)
  app.exec_()

  if len(sys.argv) == 1:
    # This fixes the crash!
    app = None # type: ignore


def main() -> None:
  for i in range(4):
    print(f"{i}: run_test")
    run_test()     # Crashes on the second call.


if __name__ == "__main__":
  main()


# EOF

Now, the key element seems to be that we have a signal connected to a local lambda that holds a reference to the QCoreApplication.

If the signal is disconnected before exec_() (i.e., right after it was connected), then no crash occurs. (Of course, that is not a solution to the original problem, since in the original program, the point of the signal was to cause exec_() to return.)

If the signal is disconnected after exec_(), then the program crashes; the lambda lives on, apparently.

Solution

The fundamental problem is that this code creates two QCoreApplication objects whose lifetimes overlap. Evidently, that condition leads to random crashes on Windows.

The overlapping lifetimes comes about as a result of creating a cycle in the object graph that prevents garbage collection from cleaning them up. First, the run_test invocation creates a WorkerManager instance and has a local variable, mgr, pointing to it. Next, the WorkerManager has its the_signal signal connected to the finished closure. And the finished closure has a pointer back to the run_test invocation because that is necessary for it to be able to look up the local variable app.

The situation is depicted in the following object diagram:

The reason setting app to None fixes the crash is that it breaks the link to the QCoreApplication object, thus allowing it to be destroyed. When I first discovered this, I found it surprising because I thought that run_test returning would be sufficient, because I overlooked the impact of the finished closure still being alive.

Alternatively, if we disconnect the the_signal signal when it is received, then that breaks one edge of the cycle, allowing the rest of the cycle to be collected, which in turn allows the QCoreApplication to be destroyed.

As pointed out by @ekhumoro in a comment, there are at least two other ways to fix this. The first is to call exit(0) using QCoreApplication.instance() rather than using the app local variable, because then no closure is needed. The other is to connect the_signal directly to app.quit, which again avoids creating a closure.

Interestingly, it does not work to connect the_signal to app.exit. app.exit has a default argument of zero, but evidently the way that's implemented in a case like this involves creating a closure in order to pass the default argument to the underlying function.

This answer to Repeatedly create/delete a PyQt6.QtWebEngineWidgets.QWebEngineView widget explains that constructing multiple QApplication objects in a single process is in theory allowed, but there are some limitations to be aware of.
This answer to Lifetime of object in lambda connected to pyqtSignal explains some of the lifetime implications of closures and signals.