要是像他的demo中的事例,那量化就简单多了,训练好的model,只需要运行的时候执行如下几行代码即可

backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)

但假如只是按照他这么做的话,首先会遇到一个报错

NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

那我们先看一下量化前和量化后的模型,量化前长这样

Blaze(
  (conv1): Sequential(
    (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (conv2): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=24, bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv3): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=24, bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv4): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=24, bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (shortcut): Sequential(
      (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (1): Conv2d(24, 48, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (conv5): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=48, bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv6): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=48, bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv7): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(48, 48, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=48, bias=False)
        (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
    (shortcut): Sequential(
      (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (1): Conv2d(48, 96, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (conv8): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (conv9): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (conv10): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(96, 96, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=96, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
    (shortcut): Sequential(
      (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (1): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (conv11): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (conv12): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (loc): Sequential(
    (0): Sequential(
      (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): Conv2d(96, 8, kernel_size=(1, 1), stride=(1, 1))
    )
    (1): Sequential(
      (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (conf): Sequential(
    (0): Sequential(
      (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): Conv2d(96, 4, kernel_size=(1, 1), stride=(1, 1))
    )
    (1): Sequential(
      (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): Conv2d(96, 12, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (landm): Sequential(
    (0): Sequential(
      (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): Conv2d(96, 20, kernel_size=(1, 1), stride=(1, 1))
    )
    (1): Sequential(
      (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): Conv2d(96, 60, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)

量化后长这样

Blaze(
  (conv1): Sequential(
    (0): QuantizedConv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), scale=1.0, zero_point=0, padding=(1, 1), bias=False)
    (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (conv2): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=24, bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv3): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=24, bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv4): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(2, 2), scale=1.0, zero_point=0, padding=(2, 2), groups=24, bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (shortcut): Sequential(
      (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (1): QuantizedConv2d(24, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
      (2): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (conv5): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=48, bias=False)
        (1): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv6): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=48, bias=False)
        (1): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (conv7): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(48, 48, kernel_size=(5, 5), stride=(2, 2), scale=1.0, zero_point=0, padding=(2, 2), groups=48, bias=False)
        (1): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
    (shortcut): Sequential(
      (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (1): QuantizedConv2d(48, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
      (2): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (conv8): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
        (1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (conv9): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
        (1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (conv10): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(2, 2), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
        (1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
    (shortcut): Sequential(
      (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (1): QuantizedConv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
      (2): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (conv11): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
        (1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (conv12): BlazeBlock(
    (actvation): ReLU(inplace=True)
    (conv): Sequential(
      (0): Sequential(
        (0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
        (1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
        (1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): ReLU(inplace=True)
    )
  )
  (loc): Sequential(
    (0): Sequential(
      (0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(96, 8, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    )
    (1): Sequential(
      (0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    )
  )
  (conf): Sequential(
    (0): Sequential(
      (0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(96, 4, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    )
    (1): Sequential(
      (0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(96, 12, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    )
  )
  (landm): Sequential(
    (0): Sequential(
      (0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(96, 20, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    )
    (1): Sequential(
      (0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(96, 60, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    )
  )
)

如果稍微观察一下的话,会发现这个ReLU好像没有变化,也不知道对不对

回到上面那个报错,这个是因为预测的时候,输入没有量化,他是配套的,其实这个在上面的链接里也提到了,需要加上这么两行

self.quant = torch.quantization.QuantStub()
self.dequant = torch.quantization.DeQuantStub()

当然forward里也需要加上这么几行

inputs = self.quant(inputs)
...
bbox_regressions = torch.cat([o.view(o.size(0), -1, 4) for o in loc], 1)
classifications = torch.cat([o.view(o.size(0), -1, 2) for o in conf], 1)
ldm_regressions = torch.cat([o.view(o.size(0), -1, 10) for o in landm], 1)

然后运行,会报下面的错

NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'QuantizedCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

好像跟上面的报错有点像哈,不过问题还是不太一样,这个可能是因为cuda不支持量化,所以要不还是放到CPU上执行,所以把设备改为CPU,会报如下错误

NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, CUDA, Meta, MkldnnCPU, SparseCPU, SparseCUDA, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

这个错误呢,跟上面格式好像也有点像,但他的问题又不一样,他是因为在模型中,有一个shortcut,也就是类似于resnet中的參差结构,会有一个加法,报错就是因为相加导致的,根据网络上的某些人的指导

NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'QuantizedCPU' backend - #2 by dalseeroh - quantization - PyTorch Forums

pytorch 量化int_pytorch

https://discuss.pytorch.org/t/notimplementederror-could-not-run-aten-empty-memory-format-with-arguments-from-the-quantizedcpu-backend/138618/2加减乘除都是不支持量化的,要想放在量化里,你需要先dequant,计算完再 quant,

h = self.dequant(h)
x = self.dequant(x)
z = h + x
z = self.actvation(z)

return self.quant(z)

然后,运行成功

Surprise

然后,预测的时候发现,现在推理要23ms, 量化前我记得是4ms左右,我不知道自己图个啥,然后精度是

==================== Results ====================
Easy   Val AP: 9.360055428376237e-08
Medium Val AP: 8.086679597040724e-07
Hard   Val AP: 3.3702511281364735e-07
=================================================

四舍五入也就基本为0吧

唯一的一点是,保存的模型确实小了,量化前是 785K,量化后是 384K,放弃了

如果 dynamic quantization 只支持一些层,static quantization也只是支持一些排列,如

Convolution, Relu,这是支持的,那你要是说我反过来行不?比如 Relu,Convolution,答:不行,就是这么傲娇,只有如下的op和顺序才可以:

Convolution, Batch normalization
Convolution, Batch normalization, Relu
Convolution, Relu
Linear, Relu
Batch normalization, Relu

QAT也稍微试了一下,貌似QAT不支持CUDA,无法理解

以上