protocol-buffersprotoprotobuf-javaprotobuf-c

Will changing the ProtoBuffer Varint type from a bool type to an enum type representing all bit-mask values be forward compatible?


I want to make the following ProtoBuffer message to be forward compatible.

The current Storage message defines a state field as a bool type:

message Storage {
    bool state = 1;
}

In the Protobuffer encoding, it encodes the Varint types like the bool and the enum type in the following format:

|1-bit sequence number|4-bit serial number|3-bit data type|n-bit payload|

For the Varint type, the data type value will become 000:

|X|XXXX|000|XXXX...|

Since the Storage message structure only contains one field with a serial number of 1, the sequence number will become 0 as the serial number hasn't been resolved to the last byte. Hence, the above format will become:

|0|0001|000|XXXX...|

Now, if set Storage.state = 0, it will be stored as follows:

|0|0001|000|<0 will not be encoded>

The Protobuffer value for the Storage message will become 0x8.

if set Storage.state = 1, it will be stored as follows:

|0|0001|000|00000001|

The Protobuffer value for the Storage message will become 0x8 0x1.

Now, I want to change the above Storage.state definition from the bool type to an enum type as follows:

// BIT7 | BIT6 | BIT5 | BIT4 | BIT3 | BIT2 | BIT1 | BIT0 |
//-------------------------------------------------------
//  0   |  0   |  0   |  0   |  0   |  0   |  0   |  0   | = STATE0 (0)
//  0   |  0   |  0   |  0   |  0   |  0   |  0   |  1   | = STATE1 (1)
//  0   |  0   |  0   |  0   |  0   |  0   |  1   |  0   | = STATE2 (2)
//  0   |  0   |  0   |  0   |  0   |  0   |  1   |  1   | = STATE2 (3)
//... so go on
//  1   |  1   |  1   |  1   |  1   |  1   |  1   |  0   | = STATE2 (254)
//  1   |  1   |  1   |  1   |  1   |  1   |  1   |  1   | = STATE2 (255)

enum State {
    STATE0 = 0;
    STATE1 = 1;
    STATE2 = 2;
    STATE3 = 3;
    //... so go on
    STATE254 = 254;
    STATE255 = 255;
}

message Storage {
    State state = 1
}

So now, in Protobuf encoding,

if set Storage.state = State.STATE0, it will be stored as follows:

|0|0001|000|<0 will not be encoded>

The Protobuffer value for the Storage message will become 0x8.

if set Storage.state = State.STATE1, it will be stored as follows:

|0|0001|000|00000001|

The Protobuffer value for the Storage message will become 0x8 0x1.

if set Storage.state = State.STATE2, it will be stored as follows:

|0|0001|000|00000010|

The Protobuffer value for the Storage message will become 0x8 0x2.

if set Storage.state = State.STATE255, it will be stored as follows:

|0|0001|000|11111111|

The Protobuffer value for the Storage message will become 0x8 0xFF.

Will this change still be forward compatible for proto2 and proto3 and in C and Java?

I based my question on the reference below: google protocol buffer -- the coding principle of protobuf II


Solution

  • I'm assuming that what you're actually trying to store here is: the bitwise state values - what might be a [Flags] enum in C# (mentioned purely to set context).

    Honestly, declaring an enum with a value per bit combination: isn't a good idea; it will escalate very quickly, and it isn't intuitive to use. It also leaves potential for silly errors when copy/pasting large volumes of lines...

    // omitted... 212 lines - but would you spot the error?
    STATE213 = 213;
    STATE214 = 214;
    STATE215 = 214;
    STATE216 = 216;
    STATE217 = 217;
    // ... etc
    

    (OK, that specific error requires the allow-alias flag, but: you get the point)


    In proto2, enums are expected to be recognised; when unexpected enum values are encountered, it gets a bit... hazy, with any of:

    1. parse failure
    2. treated as an unknown field (needing to be accessed via a separate API)
    3. silently handled and parsed via the integer value (which has the effect of preserving bit flags)

    Since every flag combination will not have an enum definition, what you want here is option 3, but that isn't guaranteed in all implementations.


    In proto3, the framework leans as far in the direction of 3 as possible, explicitly in the language specification, with the integer value being stored and retrieved (which has the effect of preserving bit flags) but it is also explicitly called out that some platforms do not allow open enums types - for example, Java.


    Because of this limitation, since you mention in the tags, I would recommend simply using an integer directly. It will at least work similarly on all implementations. By comparison to your proposed solution, it is at least as usable - but usually a lot more usable; consider how it works as an enum:

    obj.state = State.State217;
    

    vs as an integer:

    obj.state = 217;
    

    This will also allow bitwise combination/test/etc operations to be used re values, which isn't the case for closed enum types.


    As for whether bool, enum and int32/uint32/sint32 (and the 64-bit counterparts) are technically interchangeable (scale permitting): yes; they're all encoded as varint.