要是像他的demo中的事例,那量化就简单多了,训练好的model,只需要运行的时候执行如下几行代码即可
backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
但假如只是按照他这么做的话,首先会遇到一个报错
NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
那我们先看一下量化前和量化后的模型,量化前长这样
Blaze(
(conv1): Sequential(
(0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=24, bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv3): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=24, bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv4): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=24, bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(shortcut): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(24, 48, kernel_size=(1, 1), stride=(1, 1))
(2): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(conv5): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=48, bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv6): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=48, bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv7): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(48, 48, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=48, bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
(shortcut): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(48, 96, kernel_size=(1, 1), stride=(1, 1))
(2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(conv8): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(conv9): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(conv10): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=96, bias=False)
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
(shortcut): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(conv11): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(conv12): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=96, bias=False)
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(loc): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): Conv2d(96, 8, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1))
)
)
(conf): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): Conv2d(96, 4, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): Conv2d(96, 12, kernel_size=(1, 1), stride=(1, 1))
)
)
(landm): Sequential(
(0): Sequential(
(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): Conv2d(96, 20, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): Conv2d(96, 60, kernel_size=(1, 1), stride=(1, 1))
)
)
)
量化后长这样
Blaze(
(conv1): Sequential(
(0): QuantizedConv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), scale=1.0, zero_point=0, padding=(1, 1), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=24, bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv3): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=24, bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv4): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(2, 2), scale=1.0, zero_point=0, padding=(2, 2), groups=24, bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(shortcut): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): QuantizedConv2d(24, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
(2): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(conv5): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=48, bias=False)
(1): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv6): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(48, 48, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=48, bias=False)
(1): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv7): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(48, 48, kernel_size=(5, 5), stride=(2, 2), scale=1.0, zero_point=0, padding=(2, 2), groups=48, bias=False)
(1): QuantizedBatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
(shortcut): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): QuantizedConv2d(48, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
(2): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(conv8): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
(1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(conv9): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
(1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(conv10): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(2, 2), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
(1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
(shortcut): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): QuantizedConv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
(2): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(conv11): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
(1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(conv12): BlazeBlock(
(actvation): ReLU(inplace=True)
(conv): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=96, bias=False)
(1): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ReLU(inplace=True)
(2): Sequential(
(0): QuantizedConv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), bias=False)
(1): QuantizedBatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
(4): QuantizedBatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): ReLU(inplace=True)
)
)
(loc): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): QuantizedConv2d(96, 8, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
)
(1): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
)
)
(conf): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): QuantizedConv2d(96, 4, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
)
(1): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): QuantizedConv2d(96, 12, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
)
)
(landm): Sequential(
(0): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): QuantizedConv2d(96, 20, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
)
(1): Sequential(
(0): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=96)
(1): ReLU(inplace=True)
(2): QuantizedConv2d(96, 60, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
)
)
)
如果稍微观察一下的话,会发现这个ReLU好像没有变化,也不知道对不对
回到上面那个报错,这个是因为预测的时候,输入没有量化,他是配套的,其实这个在上面的链接里也提到了,需要加上这么两行
self.quant = torch.quantization.QuantStub()
self.dequant = torch.quantization.DeQuantStub()
当然forward里也需要加上这么几行
inputs = self.quant(inputs)
...
bbox_regressions = torch.cat([o.view(o.size(0), -1, 4) for o in loc], 1)
classifications = torch.cat([o.view(o.size(0), -1, 2) for o in conf], 1)
ldm_regressions = torch.cat([o.view(o.size(0), -1, 10) for o in landm], 1)
然后运行,会报下面的错
NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'QuantizedCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
好像跟上面的报错有点像哈,不过问题还是不太一样,这个可能是因为cuda不支持量化,所以要不还是放到CPU上执行,所以把设备改为CPU,会报如下错误
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, CUDA, Meta, MkldnnCPU, SparseCPU, SparseCUDA, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
这个错误呢,跟上面格式好像也有点像,但他的问题又不一样,他是因为在模型中,有一个shortcut,也就是类似于resnet中的參差结构,会有一个加法,报错就是因为相加导致的,根据网络上的某些人的指导
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'QuantizedCPU' backend - #2 by dalseeroh - quantization - PyTorch Forums
https://discuss.pytorch.org/t/notimplementederror-could-not-run-aten-empty-memory-format-with-arguments-from-the-quantizedcpu-backend/138618/2加减乘除都是不支持量化的,要想放在量化里,你需要先dequant,计算完再 quant,h = self.dequant(h)
x = self.dequant(x)
z = h + x
z = self.actvation(z)
return self.quant(z)
然后,运行成功
Surprise
然后,预测的时候发现,现在推理要23ms, 量化前我记得是4ms左右,我不知道自己图个啥,然后精度是
==================== Results ====================
Easy Val AP: 9.360055428376237e-08
Medium Val AP: 8.086679597040724e-07
Hard Val AP: 3.3702511281364735e-07
=================================================
四舍五入也就基本为0吧
唯一的一点是,保存的模型确实小了,量化前是 785K,量化后是 384K,放弃了
如果 dynamic quantization 只支持一些层,static quantization也只是支持一些排列,如
Convolution, Relu,这是支持的,那你要是说我反过来行不?比如 Relu,Convolution,答:不行,就是这么傲娇,只有如下的op和顺序才可以:
Convolution, Batch normalization
Convolution, Batch normalization, Relu
Convolution, Relu
Linear, Relu
Batch normalization, Relu
QAT也稍微试了一下,貌似QAT不支持CUDA,无法理解
以上