pythonprotocol-buffersprotobuf-python

Failing to Import Files Compiled from Protobuf in Python


My directory structure is as follows:

 test
 |-test.py
 |-test.proto
 |-test_pb2.py
 |-__init__.py
 |-comm
    |-comm.proto
    |-comm_pb2.py
    |-__init__.py  

both __init__.py is empty and test.proto is like this:

package test;
import "comm/comm.proto";

message Test{
    optional comm.Foo foo = 1;
}

and comm.proto is like this:

package comm;

message Foo{}

i successfuly used command protoc --python_out=. -I. comm.proto in comm directory to compile comm.proto and protoc --python_out=. -I. test.proto in test directory to compile test.proto but when i tried to import test_pb2 in test.py, i encounter this error

TypeError: Couldn't build proto file into descriptor pool: Depends on file 'comm/comm.proto', but it has not been loaded

Can someone help me identify the reason and provide a solution, please?


Solution

  • So, I think how protoc works with Python is more complicated|confusing than the average bear! My recourse is to use Golang to see what its SDK does and then reverse engineer that back into Python.

    The documentation says that protoc ignores Protocol Buffers packages. See Defining your Protocol Format and the note:

    The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects. In Python, packages are normally determined by directory structure, so the package you define in your .proto file will have no effect on the generated code. However, you should still declare one to avoid name collisions in the Protocol Buffers name space as well as in non-Python languages.

    I'm unfamiliar with Python Packages and Modules but ...

    1. package's

    When you define a package in a proto file, convention is that the proto file be in a folder that matches the package name. You can see this with e.g. Google's Well-known Types and e.g. Timestamp. This is defined in timestamp.proto to be in package google.protobuf and so it's in the a folder path google/protobuf and, by convention, it's called timestamp.proto (though the file name is arbitrary).

    Because message Test is in package test and message Foo is in package comm, the structure would be more correctly (!?) structured as:

    .
    ├── comm
    │   └── comm.proto
    └── test
        └── test.proto
    

    When you use protoc, you must define --proto_path for each distinct folder that contains package folders. In this case --proto_path=${PWD} which can be omitted.

    1. protoc

    This gives protoc:

    protoc \
    --python_out=${PWD} \
    --pyi_out=${PWD} \
    test/test.proto \
    comm/comm.proto
    

    NOTE I encourage you to include --pyi_out to get "intellisense" in Visual Studio Code.

    Or, my preference is to be more emphatic and include the root (${PWD}) of the proto package's:

    protoc \
    --proto_path=${PWD} \
    --python_out=${PWD} \
    --pyi_out=${PWD} \
    ${PWD}/test/test.proto \
    ${PWD}/comm/comm.proto
    
    1. Python packages

    Curiously, even though we were told that the package would not have an effect on the Python code, it does. The package test proto is output to test folder and package comm to comm folder.

    You will need to create an empty __init__.py in test (but not comm) for the code to work.

    1. Empty Message

    Your comm.Foo message is empty ({}) and so there's nothing to use.

    I tweaked it:

    package comm;
    
    message Foo{
        optional string bar = 1;
    }
    
    1. Basic Python example
    import test.test_pb2
    
    test = test.test_pb2.Test()
    
    test.foo.bar = "Hello"
    
    print(test)
    

    Yields:

    foo {
      bar: "Hello"
    }
    
    1. protos folder

    My recommendation is to use a folder e.g. protos to hold proto sources, i.e.

    .
    └── protos
        ├── comm
        │   └── comm.proto
        └── test
            └── test.proto
    

    And then e.g.:

    protoc \
    --proto_path=${PWD}/protos \
    --python_out=${PWD} \
    --pyi_out=${PWD} \
    ${PWD}/protos/test/test.proto \
    ${PWD}/protos/comm/comm.proto
    

    NOTE --proto_path extends to include protos and protos must be prefix on the proto sources but the generated code is still by package in ${PWD}

    .
    ├── comm
    │   ├── comm_pb2.py
    │   ├── comm_pb2.pyi
    ├── main.py
    ├── protos
    │   ├── comm
    │   │   └── comm.proto
    │   └── test
    │       └── test.proto
    └─── test
        ├── __init__.py
        ├── test_pb2.py
        └── test_pb2.pyi
    

    This has the advantage of being more universally applicable if you decide to generate e.g. Golang sources too.