c++multithreadingqtqimageqtconcurrent

Queued async operations with QtConcurrent interfere QImage from freed


I'm writing an image-processing app with Qt6.5.3. There's a producer(camera) that keeps grabbing images and a consumer that performs detection on grabbed images. As the detection could be quite slow, I use multithread to speed up the whole pipeline. My code can be summarized as:

#include <QCoreApplication>
#include <QDebug>
#include <QImage>
#include <QThread>
#include <QTimer>
#include <QtConcurrent>

class Producer : public QObject {
  Q_OBJECT

public:
  Producer(QObject *parent = nullptr) : QObject(parent) {}

public slots:
  void produce() {
    constexpr auto count = 1000;
    for (int i = 0; i < count; ++i) {
      QImage img(2448, 2048, QImage::Format_Grayscale8);
      img.fill(0);
      emit imageReady(img);
    }
  }

signals:
  void imageReady(QImage image);
};

class Consumer : public QObject {
  Q_OBJECT

public:
  Consumer(QObject *parent = nullptr) : QObject(parent) {}
  int consumedCount() const { return count_; }

public slots:
  void onImageReady(QImage image) {
    QFuture<void> future = QtConcurrent::run([=] {
      QImage copy = image.copy(); // Make a deep copy first
      QThread::msleep(200);       // Mock detection on the copy
      qDebug() << ++count_;
    });
  }

private:
  std::atomic_int count_ = 0;
};

int main(int argc, char *argv[]) {
  QCoreApplication a(argc, argv);
  Producer producer;
  Consumer consumer;
  QObject::connect(&producer, &Producer::imageReady, &consumer,
                   &Consumer::onImageReady);
  QTimer::singleShot(0, &producer, &Producer::produce);
  return a.exec();
}

#include "main.moc"

Since the consumer is much slower than the producer, so when running, the process can take a lot of memory (~5GB) as images are queued to be detected. This is expected.

What is strange is that even after all images are detected, the process memory usage is still high (~2GB).

At first, I thought there was some memory leak, but memcheck in valgrind denied it.

==123984== Memcheck, a memory error detector
==123984== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==123984== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==123984== Command: ./MultithreadImage
==123984== Parent PID: 99641
==123984== 
==123984== 
==123984== Process terminating with default action of signal 2 (SIGINT)
==123984==    at 0x5CA9BCF: poll (poll.c:29)
==123984==    by 0x60091F5: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7200.4)
==123984==    by 0x5FB13E2: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7200.4)
==123984==    by 0x569B809: QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (qeventdispatcher_glib.cpp:393)
==123984==    by 0x53FCF6A: QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (qeventloop.cpp:182)
==123984==    by 0x53F97CD: QCoreApplication::exec() (qcoreapplication.cpp:1439)
==123984==    by 0x10B94F: main (main.cpp:55)
==123984== 
==123984== HEAP SUMMARY:
==123984==     in use at exit: 187,502 bytes in 332 blocks
==123984==   total heap usage: 16,974 allocs, 16,642 frees, 10,028,770,623 bytes allocated
==123984== 
==123984== LEAK SUMMARY:
==123984==    definitely lost: 0 bytes in 0 blocks
==123984==    indirectly lost: 0 bytes in 0 blocks
==123984==      possibly lost: 1,648 bytes in 7 blocks
==123984==    still reachable: 185,854 bytes in 325 blocks
==123984==                       of which reachable via heuristic:
==123984==                         newarray           : 328 bytes in 3 blocks
==123984==         suppressed: 0 bytes in 0 blocks
==123984== Rerun with --leak-check=full to see details of leaked memory
==123984== 
==123984== For lists of detected and suppressed errors, rerun with: -s
==123984== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

After some debugging, I found it seems the reference count inside QImage does not reach 0 until the app is about to quit, which verifies the result of memcheck: the memory is not leaked, it's just being held by someplace in my app. On the other hand, if I just comment on the QThread::msleep(200); part, making the consumer almost sync with the producer, the memory usage then would be fine. So, there must be something that goes wrong when images are queued, but I can't find where it is in my code.


Solution

  • I can reproduce the behavior. However, it is not a memory leak or some structure holding onto an allocation. It is simply the malloc implementation not releasing the memory to the OS.

    Try this:

    #include <malloc.h>
    
    ...
    public slots:
      void onImageReady(QImage image) {
        QFuture<void> future = QtConcurrent::run([=] {
          QImage copy = image.copy(); // Make a deep copy first
          QThread::msleep(200);       // Mock detection on the copy
          int count = ++count_;
          qDebug() << count;
          if(count == 1000)
            malloc_trim(0);
        });
      }
    

    This should release the memory.

    Your real code wouldn't need to fiddle with this. The memory would be reused.

    Alternatives include:

        using info_type = QPair<void*, qsizetype>;
        constexpr qsizetype size = 2448 * 2048;
        auto info = std::make_unique<info_type>(nullptr, size);
        info->first = mmap(
          nullptr, size, PROT_READ | PROT_WRITE,
          MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        QImageCleanupFunction cleanup = +[](void* vinfo) {
          std::unique_ptr<info_type> info{ static_cast<info_type*>(vinfo) };
          munmap(info->first, info->second);
        };
        QImage img { static_cast<uchar*>(info->first), 2448, 2048,
          2448 /*bytes per line*/, QImage::Format_Grayscale8,
          cleanup, info.get() };
        info.release();
    

    Again, I don't think you have a real problem. Your real application would likely continue allocating and deallocating images and thereby reuse the memory, anyway. It is just apparent in this test case since you stop allocating long before you are done deallocating since the consumer is so much slower.