Is it possible to get proto files from generated pb2.py with protoc? Will be the same reverse engineering possible for gRPC?
The format of the _pb2.py
file varies between protobuf-python versions, but most of them have a field called serialized_pb
inside them. This contains the whole structure of the .proto
file in the FileDescriptorProto format:
serialized_pb=b'\n\x0c...'
This can be passed to the protoc
compiler to generate headers for other languages. However, it has to be first put inside a FileDescriptorSet
to match the format correctly. This can be done using Python:
import google.protobuf.descriptor_pb2
fds = google.protobuf.descriptor_pb2.FileDescriptorSet()
fds.file.append(google.protobuf.descriptor_pb2.FileDescriptorProto())
fds.file[0].ParseFromString(b'\n\x0c... serialized_pb data ....')
open('myproto.txt', 'w').write(str(fds))
open('myproto.pb', 'wb').write(fds.SerializeToString())
The snippet above saves a human-readable version to myproto.txt
and a format that is nominally compatible with protoc
to myproto.pb
. The text representation looks like this:
file {
name: "XYZ.proto"
dependency: "dependencyXYZ.proto"
message_type {
name: "MyMessage"
field {
name: "myfield"
number: 1
label: LABEL_OPTIONAL
type: TYPE_INT32
}
...
For example C++ headers could now be generated using:
protoc --cpp_out=. --descriptor_set_in=myproto.pb XYZ.proto
Note that the XYZ.proto
must match the name of the file in the descriptor set, which you can check in myproto.txt
. However this method quickly gets difficult if the file has dependencies, as all of those dependencies have to be collected in the same descriptor set. In some cases it may be easier to just use the textual representation to rewrite the .proto
file by hand.