c++capnproto

Capnp: Move to previous position in BufferedInputStreamWrapper


I have a binary file with multiple Capnp messages which I want to read. Reading sequentially works well, but I have the use-case, that I want to jump to a previously known position. The data sequential images with metadata including there timestamp. I would like to have the possibility to jump back and forth (like in a video player).

This is what I have tried:

int fd = open(filePath.c_str(), O_RDONLY);
kj::FdInputStream fdStream(fd);
kj::BufferedInputStreamWrapper bufferedStream(fdStream);
for (;;) {
  kj::ArrayPtr<const kj::byte> framePtr = bufferedStream.tryGetReadBuffer();

  if (framePtr != nullptr) {
    capnp::PackedMessageReader message(bufferedStream);
    // This should reset the buffer to the last read message?
    bufferedStream.read((void*)framePtr.begin(), framePtr.size());
    // ...
  }
  else {
    // reset to beginning
  }
}

But I get this error:

capnp/serialize.c++:186: failed: expected segmentCount < 512; Message has too many segments

I was assuming that tryGetReadBuffer() returns the position and size of the next packed message. But then again, how is the BufferedInputStream supposed to know what "a message" is.

Question: How can I get position and size of messages and read these messages later on from the BufferedInputStreamWrapper?

Alternative: Reading the whole file once, take ownership of the data and save it to a vector. Such as described here (https://groups.google.com/forum/#!topic/capnproto/Kg_Su1NnPOY). Better solution all along?


Solution

  • BufferedInputStream is not seekable. In order to seek backwards, you will need to destroy bufferedStream and then seek the underlying file descriptor, e.g. with lseek(), then create a new buffered stream.

    Note that reading the current position (in order to pass to lseek() later to go back) is also tricky if a buffered stream is present, since the buffered stream will have read past the position in order to fill the buffer. You could calculate it by subtracting off the buffer size, e.g.:

    // Determine current file position, so that we can seek to it later.
    off_t messageStartPos = lseek(fd, 0, SEEK_CUR) -
        bufferedStream.tryGetReadBuffer().size();
    
    // Read a message
    {
      capnp::PackedMessageReader message(bufferedStream);
      // ... do stuff with `message` ...
    
      // Note that `message` is destroyed at this }. It's important that this
      // happens before querying the buffered stream again, because
      // PackedMesasgeReader updates the buffer position in its destructor.
    }
    
    // Determine the end position of the message (if you need it?).
    off_t messageEndPos = lseek(fd, 0, SEEK_CUR) -
        bufferedStream.tryGetReadBuffer().size();
    

    bufferedStream.read((void*)framePtr.begin(), framePtr.size());
    

    FWIW, the effect of this line is "advance past the current buffer an on to the next one". You don't want to do this when using PackedMessageReader, as it will already have advanced the stream itself. In fact, because PackedMessageReader might have already advanced past the current buffer, framePtr may now be invalid, and this line might segfault.


    Alternative: Reading the whole file once, take ownership of the data and save it to a vector. Such as described here (https://groups.google.com/forum/#!topic/capnproto/Kg_Su1NnPOY). Better solution all along?

    If the file fits comfortably in RAM, then reading it upfront is usually fine, and probably a good idea if you expect to be seeking back and forth a lot.

    Another option is to mmap() it. This makes it appear as if the file is in RAM, but the operating system will actually read in the contents on-demand when you access them.

    However, I don't think this will actually simplify the code much. Now you'll be dealing with an ArrayInputStream (a subclass of BufferedInputStream). To "seek" you would create a new ArrayInputStream based on a slice of the buffer starting at the point where you want to start.