c++qt5zlibquazipqiodevice

how to report progress of data read on a QuaGzipFile (QuaZIP library)


I am using QuaZIP 0.5.1 with Qt 5.1.1 for C++ on Ubuntu 12.04 x86_64.

My program reads a large gzipped binary file, usually 1GB of uncompressed data or more, and makes some computations on it. It is not computational-extensive, and most of the time is passed on I/O. So if I can find a way to report how much data of the file is read, I can report it on a progress bar, and even provide an estimation of ETA.

I open the file with:

QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
{
    // report error
    return;
}

But there is no functionality in QuaGzipFile to find the file size nor the current position.

I do not need to find size and position of uncompressed stream, the size and position of compressed stream are fine, because a rough estimation of progress is enough.

Currently, I can find size of compressed file, using QFile(fileName).size(). Also, I can easily find current position in uncompressed stream, by keeping sum of return values of gzip.read(). But these two numbers do not match.

I can alter the QuaZIP library, and access internal zlib-related stuff, if it helps.


Solution

  • There is no reliable way to determine total size of uncompressed stream. See this answer for details and possible workarounds.

    However, there is a way to get position in compressed stream:

    QFile file(fileName);
    file.open(QFile::ReadOnly);
    QuaGzipFile gzip;
    gzip.open(file.handle(), QuaGzipFile::ReadOnly);
    while(true) {
      QByteArray buf = gzip.read(1000);
      //process buf
      if (buf.isEmpty()) { break; }
      QFile temp_file_object;
      temp_file_object.open(file.handle(), QFile::ReadOnly);
      double progress = 100.0 * temp_file_object.pos() / file.size();
      qDebug() << qRound(progress) << "%";
    }
    

    The idea is to open file manually and use file descriptor to get position. QFile cannot track external position changes, so file.pos() will be always 0. So we create temp_file_object from the file descriptor forcing QFile to request file position. I could use some lower level API (such as lseek()) to get file position but I think my way is more cross-platform.

    Note that this method is not very accurate and can give progress values bigger than real. That's because zlib can internally read and decode more data than you have already read.