What is the point of the offset variable in dispatch_data_apply for libdispatch?

I'm having trouble understanding the offset variable provided to the data applier for a dispatch_io_read function call. I see that the documentation claims the offset is the logical offset from the base of the data object. Looking at the source code for the dispatch_data_apply function confirms that this variable always starts from 0 for the first apply for a data chunk, and then is simply the sum of the range lengths.

I guess I don't understand the purpose of this variable then. I had originally assumed this was the offset for the entire read, but it's not. It seems you have to keep track of the bytes read and offset by that amount to actually properly do a read in libdispatch.

// Outside the dispatch_io_read handler...
char * currBufferPosition = destinationBuffer;

// Inside the dispatch_io_read handler...
dispatch_io_read(channel, fileOffset, bytesRequested, queue, ^(bool done, dispatch_data_t data, int error) {
  // Note: Real code would handle error variable.
  dispatch_data_apply(data, ^bool(dispatch_data_t region, size_t offset, const void * buffer, size_t size) {
    memcpy(currBufferPosition, buffer, size);
    currBufferPosition += size;
    return true;
  });
});

My question is: Is this the right way of using the data returned by dispatch_data_apply? And if so, what is the purpose of the offset variable passed into the applier handler? The documentation does not seem clear about this to me.

Solution

A dispatch_data_t is an sequence of bytes. The bytes can be stored in multiple non-contiguous byte arrays. For example, bytes 0-6 can be stored in an array, and then bytes 7-12 are stored in a separate array somewhere else in memory.

For efficiency, the dispatch_data_apply function lets you iterate over those arrays in-place (without copying out the data). On each call to your “applier”, you receive a pointer to one of the underlying storage arrays in the buffer argument. The size argument tells you how many bytes are in this particular array, and the offset argument tells you how (logically) far the first byte of this particular array is from the first byte of the entire dispatch_data_t.

Example:

#import <Foundation/Foundation.h>

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        dispatch_data_t aData = dispatch_data_create("Hello, ", 7, nil, DISPATCH_DATA_DESTRUCTOR_DEFAULT);
        dispatch_data_t bData = dispatch_data_create("world!", 6, nil, DISPATCH_DATA_DESTRUCTOR_DEFAULT);
        dispatch_data_t cData = dispatch_data_create_concat(aData, bData);

        dispatch_data_apply(cData, ^bool(dispatch_data_t  _Nonnull region, size_t offset, const void * _Nonnull buffer, size_t size) {
            printf("applying at offset %lu, buffer %p, size %lu, contents: [%*.*s]\n", (unsigned long)offset, buffer, (unsigned long)size, (int)size, (int)size, buffer);
            return true;
        });
    }
    return 0;
}

Output:

applying at offset 0, buffer 0x100407970, size 7, contents: [Hello, ]
applying at offset 7, buffer 0x1004087b0, size 6, contents: [world!]

Okay, so that's what the offset argument is for. Now how does this relate to dispatch_io_read?

Well, dispatch_io_read doesn't pass you the same bytes twice. Once it has passed you some bytes, it discards them. The next time it passes you bytes, they are fresh, newly-read bytes. If you want the old bytes, you have to keep them around yourself. If you want to know how many old bytes you were given before the current call to your callback, you have to keep that count yourself. That is not what the offset argument is for.

It's possible that when dispatch_io_read calls you, it passes you a dispatch_data_t that has stored its bytes in multiple non-contiguous arrays, so when you call dispatch_data_apply on it, your applier gets called multiple times, with different offsets and buffers and sizes. But those calls only get you access to the fresh new bytes for the current call to your callback, not to old bytes from any prior calls to your callback.