pythonjsondictionaryprotocol-buffersprotobuf-3

How to serialize default values in nested messages in Protobuf


As the title states I have a protobuf message with another message inside it like this:

syntax = "proto3";

message Message
{
    message SubMessage {
        int32 number = 1;
    }
    
    SubMessage subMessage = 1;
}

My example.json is empty (which means default values everywhere):

{
}

In my python script I read this message with:

example_json = open("example.json", "r").read()

example_message = example.Message()
google.protobuf.json_format.Parse(example_json, example_message)

and when I check the value of example_message.subMessage.number it is 0 which is correct.

Now I want to convert it into a dict where all values are present - even the default values. For the conversion I use the method google.protobuf.json_format.MessageToDict(). But as you may know MessageToDict() doesn't serialize default values without me telling it to do so (like in this question: Protobuf doesn't serialize default values). So I added the argument including_default_value_fields=True to the call of MessageToDict():

protobuf.MessageToDict(example_message, including_default_value_fields=True)

which returns:

{}

instead of what I expected:

{'subMessage': {'number': 0}}

A comment in the code of protobuf (found here: https://github.com/protocolbuffers/protobuf/blob/master/python/google/protobuf/json_format.py) confirms this behaviour:

including_default_value_fields: If True, singular primitive fields, repeated fields, and map fields will always be serialized. If False, only serialize non-empty fields. Singular message fields and oneof fields are not affected by this option.

So what can I do to get a dict with all values even when they are default values inside nested messages?


Interestingly when my example.json looks like this:

{
    "subMessage" : {
        "number" : 0
    }
}

I get the expected output. But I cannot make sure that the example.json will have all values written out so this is not an option.


Solution

  • Based on the answer of Looping over Protocol Buffers attributes in Python I created a custom MessageToDict function:

    def MessageToDict(message):
        message_dict = {}
        
        for descriptor in message.DESCRIPTOR.fields:
            key = descriptor.name
            value = getattr(message, descriptor.name)
            
            if descriptor.label == descriptor.LABEL_REPEATED:
                message_list = []
                
                for sub_message in value:
                    if descriptor.type == descriptor.TYPE_MESSAGE:
                        message_list.append(MessageToDict(sub_message))
                    else:
                        message_list.append(sub_message)
                
                message_dict[key] = message_list
            else:
                if descriptor.type == descriptor.TYPE_MESSAGE:
                    message_dict[key] = MessageToDict(value)
                else:
                    message_dict[key] = value
        
        return message_dict
    

    Given the message read from the empty example.json this function returns:

    {'subMessage': {'number': 0}}