c++binarywaveletflatbuffershaar-wavelet

Can Flatbuffers take advantage of 0's in vectors? Or are other wavelets better than the Haar transform?


I'm serializing some data and want to make the file size as small as possible without losing the essential details of the data. The first step for me was to save the data in a binary format instead of ASCII and I decided to try Flatbuffers. Previously when the data were stored as text files, they were about 400 mb. Using the schema shown below, the file is about 200 mb. So that's a nice decrease in size, but smaller would of course be better. The data consist of 1 of the ControlParams, 82 of the ControlData, and the intensity vector takes up most of the space, being a matrix with a size of about 128x5000. We are already around the theoretical binary size of 128x5000*82 * 4 bytes per float ~ 200 mb. The matrices are pretty dense in general, but here and there I can see rows that are zero. Can Flatbuffers take advantage of these zeros to reduce the file size further? Perhaps there are other inefficiencies that someone can spot in the schema, since I just am getting started with Flatbuffers?

Another way to go about reducing the file size might be investigating different wavelets to compress the original intensities with. I'm using the Haar transform now because I was able to make a C++ function to do this, and found that a compression of 2x or possibly 4x was possible. I might like to investigate other wavelets, but would like to know if others have tried different wavelets compared to Haar and found they were able to use fewer coefficients with them.

namespace RTSerialization;

table ControlParams{
    extractStepSizeDa:float = 1.0005;
    smooth:bool = false;
    haarLevel:int = 10;
    deltaTimeSec:float;
}

table ControlData{
    mzAxis:[float];
    timeSec:[float];
    intensities:[float];
    scanFilter:string;
}

table ControlParamsAndData{
    params:ControlParams;
    dataSet:[ControlData];
}

root_type ControlParamsAndData;

Solution

  • Yes, your size is entirely determined by a single float array, the rest of the FlatBuffer format is entirely irrelevant to the question of how to make this smaller.

    And no, FlatBuffers doesn't do any form of automatic compression, since the design is all about random access. Any access to your float array should be O(1).

    So optimizing this data comes entirely down to you. You say the data is matrices.. floats in matrices are often in limited ranges like -1 to 1, so could be quantized into a short?

    Other forms of compression of course mean you'd have to do your own packing/unpacking.