goprotocol-buffersproto3protobuf-go

Could I reuse an existing protobuf binary, when marshaling a message including it?(protobuf3)


Protobuf definations are like that:

syntax = "proto3"

message HugeMessage {
    // omitted
}

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

In a situation I received a HugeMessage from somebody, and I want to pack it with additional fields, and then transmit the message to someone else. So that I have to Unmarshal the HugeMessage binary into Go structure, pack it into Request, and Marshal again. Due to the hgue size for HugeMessage, the cost for Unmarshal and Marshal are unaffordable. so could I reuse the HugeMessage binary without change the protobuf definations?

func main() {
    // receive it from file or network, not important.
    bins, _ := os.ReadFile("hugeMessage.dump")
    var message HugeMessage
    _ = proto.Unmarshal(bins, &message) // slow
    request := Request{
        name: "xxxx",
        payload: message,
    }
    requestBinary, _ := proto.Marshal(&request) // slow
    // send it.
    os.WriteFile("request.dump", requestBinary, 0644)
}


Solution

  • The short answer is: no, there is no simple or standard way to achieve this.

    The most obvious strategy is to do as you currently have - unmarshal the HugeMessage, set it into Request, then marshal again. The golang protobuf API surface doesn't really provide a means to do much beyond that - with good reason.

    That said, there are ways to achieve what you're looking to do. But these aren't necessarily safe or reliable, so you have to weigh that cost vs the cost of what you have now.

    One way you can avoid the unmarshal is to take advantage of the way a message is normally serialized;

    message Request {
        string name = 1;
        HugeMessage payload = 2;
    }
    

    .. is equivalent to

    message Request {
        string name = 1;
        bytes payload = 2;
    }
    

    .. where payload contains the result of calling Marshal(...) against some HugeMessage.

    So, if we have the following definitions:

    syntax = "proto3";
    
    message HugeMessage {
      bytes field1 = 1;
      string field2 = 2;
      int64 field3 = 3;
    }
    
    message Request {
      string name = 1;
      HugeMessage payload = 2;
    }
    
    message RawRequest {
      string name = 1;
      bytes payload = 2;
    }
    

    The following code:

    req1, err := proto.Marshal(&pb.Request{
        Name: "name",
        Payload: &pb.HugeMessage{
            Field1: []byte{1, 2, 3},
            Field2: "test",
            Field3: 948414,
        },
    })
    if err != nil {
        panic(err)
    }
    
    huge, err := proto.Marshal(&pb.HugeMessage{
        Field1: []byte{1, 2, 3},
        Field2: "test",
        Field3: 948414,
    })
    if err != nil {
        panic(err)
    }
    
    req2, err := proto.Marshal(&pb.RawRequest{
        Name:    "name",
        Payload: huge,
    })
    if err != nil {
        panic(err)
    }
    
    fmt.Printf("equal? %t\n", bytes.Equal(req1, req2))
    

    outputs equal? true

    Whether this "quirk" is entirely reliable isn't clear, and there is no guarantees it will continue to work indefinitely. And obviously the RawRequest type has to fully mirror the Request type, which isn't ideal.

    Another alternative is to construct the message in a more manual fashion, i.e. using the protowire package - again, haphazard, caution advised.