I have this below code to build an engine (file engine with extension is .engine, not .trt) to use TensorRT on Jetson Nano. Although I configured engine file using FP16, when I run inference, I could only got correct class with dtype in both input and output is FP32. If I use dtype FP16 in input or output, the class will be not correct. I used weights of model resnet50 in ONNX file to build this engine. Could someone help and explain to me why it is. Thank you so much!
def build_engine(onnx_file_path):
engine = None
runtime = trt.Runtime(TRT_LOGGER)
engine_file_path = os.path.join(os.getcwd(), ENGINE_FILE)
if os.path.exists(engine_file_path) and os.path.isfile(engine_file_path):
with open(engine_file_path,"rb") as fb:
engine = fb.read()
engine = runtime.deserialize_cuda_engine(engine)
else:
explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
# initialize TensorRT engine and parse ONNX model
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(explicit_batch_flag)
parser = trt.OnnxParser(network, TRT_LOGGER)
# parse ONNX
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
parser.parse(model.read())
print('Completed parsing of ONNX file')
config = builder.create_builder_config()
config.max_workspace_size = 1 << 20
if builder.platform_has_fast_fp16:
print(builder.platform_has_fast_fp16)
config.set_flag(trt.BuilderFlag.FP16)
plan = builder.build_serialized_network(network, config)
engine = runtime.deserialize_cuda_engine(plan)
with open(engine_file_path,"wb")as f:
f.write(plan)
return engine
When I used engine.get_binding_dtype to get the dtype in input and output, it was FP32 so I think the error is in build_engine function.
When building a TensortRT engine, using FP16 and/or FP32, refers to the internal weight/processing used by the engine.
Using trtexec
command on a Jetson Xavier NX, I get:
$ /usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --saveEngine=resnet16.engine --fp16
...
[09/05/2023-17:25:09] [I] === Build Options ===
[09/05/2023-17:25:09] [I] Max batch: explicit batch
[09/05/2023-17:25:09] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/05/2023-17:25:09] [I] minTiming: 1
[09/05/2023-17:25:09] [I] avgTiming: 8
[09/05/2023-17:25:09] [I] Precision: FP32+FP16
...
[09/05/2023-17:25:09] [I] Input(s)s format: fp32:CHW
[09/05/2023-17:25:09] [I] Output(s)s format: fp32:CHW
[09/05/2023-17:25:09] [I] Input build shapes: model
[09/05/2023-17:25:09] [I] Input calibration shapes: model
So you still need to pass fp32
as input and you will get fp32
as output.
You could use additional options for set the input/output formats
$ /usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --saveEngine=resnet16.engine \
--inputIOFormats="fp16:chw" --outputIOFormats="fp16:chw" --fp16
...
[09/05/2023-17:34:48] [I] Precision: FP32+FP16
...
[09/05/2023-17:34:48] [I] Input(s): fp16:chw
[09/05/2023-17:34:48] [I] Output(s): fp16:chw
In Python, you will use:
formats = (1 << (int)(trt.TensorFormat.CHW2))
network.get_input(0).allowed_formats = formats
network.get_output(0).allowed_formats = formats