In my application I am using a non-event-thread to do some fairly heavy processing: using module docx2python to extract text content from multiple .docx files and then re-assemble batches of lines (10-line batches) to form "Lucene Documents" for the purpose of indexing with Elasticsearch.
I have a hotkey from configuring a menu item which lets me toggle between viewing the initial frame and a 2nd frame which represents what's going on in terms of executing tasks. This toggling has been working instantaneously... until I started actually working on the real tasks.
I can tell from the log messages that the toggle method is not the culprit: there can be a gap (about 1 s) between me pressing the key and seeing the log message at the start of the toggle method.
Just to show that I'm not making some textbook mistake in my use of threads, please have a look at this MRE below (F4 toggles). The trouble is, this MRE does NOT in fact illustrate the problem. I tried using various crunchy tasks which might potentially "congest" thread-handling, but nothing so far reproduces the phenomenon.
I thought of putting QApplication.processEvents()
in the worker thread at a point where it would be called frequently. This changed nothing.
Is anyone familiar with how hotkeys are handled? It feels like I'm missing some obvious "yield" technique (i.e. yield to force detection of hotkey activity...).
import sys, time, logging, pathlib
from PyQt5 import QtWidgets, QtCore
class MainWindow(QtWidgets.QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle('Blip')
self.create_initial_gui_components()
self.add_standard_menu_components()
self.task_mgr = TaskManager()
self.task_mgr.thread.start()
self.alternate_view = False
def create_initial_gui_components(self):
self.resize(1200, 600)
central_widget = QtWidgets.QFrame(self)
self.setCentralWidget(central_widget)
central_widget.setLayout(QtWidgets.QVBoxLayout(central_widget))
central_widget.setStyleSheet('border: 5px solid green;')
# initially showing frame
self.frame1 = QtWidgets.QFrame()
central_widget.layout().addWidget(self.frame1)
self.frame1.setStyleSheet('border: 5px solid blue;')
# initially hidden frame
self.frame2 = QtWidgets.QFrame()
central_widget.layout().addWidget(self.frame2)
self.frame2.setStyleSheet('border: 5px solid red;')
self.frame2.hide()
def add_standard_menu_components(self):
self.menubar = QtWidgets.QMenuBar(self)
self.setMenuBar(self.menubar)
self.main_menu = QtWidgets.QMenu('&Main', self.menubar)
self.menubar.addMenu(self.main_menu)
# make the menu item with hotkey
self.main_menu.toggle_frame_view_action = self.make_new_menu_item_action('&Toggle view tasks', 'F4',
self.toggle_frame_view, self.main_menu, enabled=True)
def toggle_frame_view(self):
# this is the message which occurs after pressing F4, after a very bad gap, in the "real world" app
print('toggle...')
if self.alternate_view:
self.frame2.hide()
self.frame1.show()
else:
self.frame2.show()
self.frame1.hide()
self.alternate_view = not self.alternate_view
def make_new_menu_item_action(self, text, shortcut, connect_method, menu, enabled=True):
action = QtWidgets.QAction(text, menu)
action.setShortcut(shortcut)
menu.addAction(action)
action.triggered.connect(connect_method)
action.setEnabled(enabled)
return action
class TaskManager():
def __init__(self):
self.thread = QtCore.QThread()
self.task = LongTask()
self.task.moveToThread(self.thread)
self.thread.started.connect(self.task.run)
class LongTask(QtCore.QObject):
def run(self):
out_file_path = pathlib.Path('out.txt')
if out_file_path.is_file():
out_file_path.unlink()
# None of this works (has the desired effect of causing toggling latency)...
s = ''
for i in range(20):
print(f'{i} blip')
for j in range(1, 1000001):
for k in range(1, 1000001):
l = j/k
# performing I/O operations makes no difference
with open('out.txt', 'w') as f:
f.write(f'l: {l} len(s) {len(s)}\n')
s += f'{{once upon a {j}}}\n'
print('here BBB')
def main():
app = QtWidgets.QApplication(sys.argv)
window = MainWindow()
window.show()
app.exec()
if __name__ == '__main__':
main()
Later
Suspicion about the answer. When a "long task" is started, some gui elements have to be constructed and added to frame 2 (progress bar and stop button on a small strip of a QFrame - one for each live task). Will have to do some tests. But it seems fairly unlikely that this could have such a dramatic effect (1s of latency).
I think I finally detected the problem ... it's classic and it may be of interest to someone baffled by a similar situation.
In my worker thread (non-event thread) I had multiple time.sleep(.0001)
commands, the purpose of which is to allow other threads (including the event thread aka "main thread") to wrest control ... as previously explained here by Musicamante in reply to an earlier question by me.
The purpose of my worker thread, as mentioned, is to split apart Word (.docx) documents for analysis, using docx2python, and the analysis itself is "CPU-intensive". I was slightly surprised to find that, even for a very large file, this command took very little time:
docx_obj = docx2python.docx2python(docx_file_path)
... but what I finally found, with copious logging, is that this line was the real culprit:
if not self.generate_ldocs_from_structure_list(docx_obj.body):
...
... but it wasn't the calling of (my method) generate...
. It turns out that docx_obj.body
, although it seems to be just a reference to a property, actually involves the execution of a very lengthy operation. For the largest file in my test set, with 44,000 words, this operation took over 1 second! This was always the first document in my test set, and so occurred in about 1 second into the run.
Any attempt to toggle, using F4, while that is happening is therefore likely to be met with a very slow response. And short of going multiprocessing
(as recommended by ekhumoro in his comment), there appears to be no solution to this: docx_obj.body
won't let go.
Having said that, I just saw two interesting vids about the GIL: one is a lecture by David Beazley, forensically analysing the GIL and its operation. NB I do appreciate that PyQt itself is not running Python code, but C++ code: however, it appears that using worker threads in a Pythonic manner can cause mayhem if they can't be engineered to "let go" (docx2python
might be able to be engineered to do that: however I looked at the source code and found that the entire lengthy operation uses a copy.deepcopy
operation, so no way of inserting time.sleep()
in there...).
The other is this one, excitingly entitled "how to bypass the GIL", and explaining why doing that, and ultimately sticking with threads, might be preferable to going all multiprocess. Unfortunately the Python code numba
can handle in a "jitted" function with "nogil=True" appears limited to numerical operations.