1. 问题

报错:

[TensorRT] ERROR: CUDA cask failure at execution for trt_maxwell_scudnn_128x64_relu_small_nn_v1.
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 400 (invalid resource handle)
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 400 (invalid resource handle)

解决方案:
可能是flask创建的多线程和cuda的上下文不匹配的关系
参考:https://github.com/jkjung-avt/tensorrt_demos/issues/213

  1. 在net类中增加一个cuda上下文变量
self.cuda_ctx   = cuda_ctx
if self.cuda_ctx:
    self.cuda_ctx.push()

try:
    self.inputs, self.outputs, self.bindings, self.stream, self.input_shape, self.infer_dtype \
     = allocate_buffer(self.engine)
except Exception as e:
    raise RuntimeError("fail to allocate cuda/host buffer") from e
finally:
    if self.cuda_ctx:
        self.cuda_ctx.pop()
  1. 在推理的时候
if self.cuda_ctx:
	self.cuda_ctx.push()
self.inputs[0].host = np.ascontiguousarray(img_resized)
out = do_inference(self.context, self.bindings, self.inputs, self.outputs, self.stream)
out = self.postprocess(out)
if self.cuda_ctx:
    self.cuda_ctx.pop()