protocol-buffersprotobuf-python

Changing protobuf field type from double to float


I have a proto message I'm using in a service to store data in various places

My message is like this:

message Matrix {
  double width = 1;
  double height = 2;
  repeated double entries = 3;
}

My team has decided that the Matrix message is too large, and changing the types to float seems like an easy way to achieve a payload size reduction. But when I change the proto definition to use float instead of double here, and try to read old data (in a Python reader), it looks corrupted.

One option I can think of is to add a new float option for each field:

message Matrix {
  oneof r_oneof {
    double width_d = 1;
    float width_f = 4;
  }
  oneof c_oneof {
    double height_d = 2;
    float height_f = 5;
  }
  oneof e_oneof {
    repeated double entries_d = 3;
    repeated float entries_f = 6;
  }
}

Then my deserializing code can check whether each oneof field is the double or float field. This works, but feels like a clunky design pattern.

Is there another way to provide backwards-compatibility with old data in this example?


Solution

  • I think you have the right idea. You will want to keep the old fields together with their field numbers unchanged as long as there is data stored in the old format. One of the great things about protocol buffers is that unset fields are essentially free, so you can add as many new fields as you want to facilitate the migration.

    What I would do is add a new set of float fields and rename the double fields while preserving their field numbers:

    message Matrix {
      // Values in these fields should be transitioned to floats
      double deprecated_width = 1;
      double deprecated_height = 2;
      repeated double deprecated_entries = 3;
    
      float width = 4;
      float height = 5;
      repeated float entries = 6;
    }
    

    Whenever you read a Matrix from persistent storage move any values from the deprecated fields to the non-deprecated fields and write back the result. This should facilitate incremental migration to floats.

    I will mention one more thing that you probably already know: protocol buffers don't care about the fields names. Only field numbers matter for serialization and deserialization. This means fields can be renamed freely as long as the code that manipulates them is likewise updated.

    At some point in the future when the migration has been completed remove the migration code and delete the deprecated fields but reserve their field numbers:

    message Matrix {
      reserved 1, 2, 3;
      float width = 4;
      float height = 5;
      repeated float entries = 6;
    }
    

    This ensures any stray messages in the old format blow up on deserialization instead of cause data corruption.

    Hope this helps!