tensorrtnvidia-jetson-nano

Build engine TensorRT on Jetson Nano


I have this below code to build an engine (file engine with extension is .engine, not .trt) to use TensorRT on Jetson Nano. Although I configured engine file using FP16, when I run inference, I could only got correct class with dtype in both input and output is FP32. If I use dtype FP16 in input or output, the class will be not correct. I used weights of model resnet50 in ONNX file to build this engine. Could someone help and explain to me why it is. Thank you so much!

def build_engine(onnx_file_path):
    engine = None

    runtime = trt.Runtime(TRT_LOGGER)
    engine_file_path = os.path.join(os.getcwd(), ENGINE_FILE)
    if os.path.exists(engine_file_path) and os.path.isfile(engine_file_path):
        with open(engine_file_path,"rb") as fb:
            engine = fb.read()
            engine = runtime.deserialize_cuda_engine(engine)
    else:
        explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
        # initialize TensorRT engine and parse ONNX model
        builder = trt.Builder(TRT_LOGGER)
        network = builder.create_network(explicit_batch_flag)
        parser = trt.OnnxParser(network, TRT_LOGGER)
        
        # parse ONNX
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            parser.parse(model.read())
        print('Completed parsing of ONNX file')

        config = builder.create_builder_config()
        config.max_workspace_size = 1 << 20
        if builder.platform_has_fast_fp16:
            print(builder.platform_has_fast_fp16)
            config.set_flag(trt.BuilderFlag.FP16)

        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)

        with open(engine_file_path,"wb")as f:
            f.write(plan)
 
    return engine

When I used engine.get_binding_dtype to get the dtype in input and output, it was FP32 so I think the error is in build_engine function.


Solution

  • When building a TensortRT engine, using FP16 and/or FP32, refers to the internal weight/processing used by the engine.

    Using trtexec command on a Jetson Xavier NX, I get:

    $ /usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --saveEngine=resnet16.engine --fp16
    ...
    [09/05/2023-17:25:09] [I] === Build Options ===
    [09/05/2023-17:25:09] [I] Max batch: explicit batch
    [09/05/2023-17:25:09] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
    [09/05/2023-17:25:09] [I] minTiming: 1
    [09/05/2023-17:25:09] [I] avgTiming: 8
    [09/05/2023-17:25:09] [I] Precision: FP32+FP16
    ...
    [09/05/2023-17:25:09] [I] Input(s)s format: fp32:CHW
    [09/05/2023-17:25:09] [I] Output(s)s format: fp32:CHW
    [09/05/2023-17:25:09] [I] Input build shapes: model
    [09/05/2023-17:25:09] [I] Input calibration shapes: model
    

    So you still need to pass fp32 as input and you will get fp32 as output.

    You could use additional options for set the input/output formats

    $ /usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --saveEngine=resnet16.engine \
        --inputIOFormats="fp16:chw" --outputIOFormats="fp16:chw" --fp16
    ...
    [09/05/2023-17:34:48] [I] Precision: FP32+FP16
    ...
    [09/05/2023-17:34:48] [I] Input(s): fp16:chw
    [09/05/2023-17:34:48] [I] Output(s): fp16:chw
    

    In Python, you will use:

    formats = (1 << (int)(trt.TensorFormat.CHW2))
    network.get_input(0).allowed_formats = formats
    network.get_output(0).allowed_formats = formats