deeplearning模型库

1. 图像分类

数据集:ImageNet1000类

1.1  量化

 

分类模型Lite时延(ms)

 

设备

模型类型

压缩策略

armv7 Thread 1

armv7 Thread 2

armv7 Thread 4

armv8 Thread 1

armv8 Thread 2

armv8 Thread 4

高通835

MobileNetV1

FP32 baseline

96.1942

53.2058

32.4468

88.4955

47.95

27.5189

高通835

MobileNetV1

quant_aware

60.8186

32.1931

16.4275

56.4311

29.5446

15.1053

高通835

MobileNetV1

quant_post

60.5615

32.4016

16.6596

56.5266

29.7178

15.1459

高通835

MobileNetV2

FP32 baseline

65.715

38.1346

25.155

61.3593

36.2038

22.849

高通835

MobileNetV2

quant_aware

48.3655

30.2021

21.9303

46.1487

27.3146

18.3053

高通835

MobileNetV2

quant_post

48.3495

30.3069

22.1506

45.8715

27.4105

18.2223

高通835

ResNet50

FP32 baseline

526.811

319.6486

205.8345

506.1138

335.1584

214.8936

高通835

ResNet50

quant_aware

475.4538

256.8672

139.699

461.7344

247.9506

145.9847

高通835

ResNet50

quant_post

476.0507

256.5963

139.7266

461.9176

248.3795

149.353

高通855

MobileNetV1

FP32 baseline

33.5086

19.5773

11.7534

31.3474

18.5382

10.0811

高通855

MobileNetV1

quant_aware

36.7067

21.628

11.0372

14.0238

8.199

4.2588

高通855

MobileNetV1

quant_post

37.0498

21.7081

11.0779

14.0947

8.1926

4.2934

高通855

MobileNetV2

FP32 baseline

25.0396

15.2862

9.6609

22.909

14.1797

8.8325

高通855

MobileNetV2

quant_aware

28.1583

18.3317

11.8103

16.9158

11.1606

7.4148

高通855

MobileNetV2

quant_post

28.1631

18.3917

11.8333

16.9399

11.1772

7.4176

高通855

ResNet50

FP32 baseline

185.3705

113.0825

87.0741

177.7367

110.0433

74.4114

高通855

ResNet50

quant_aware

327.6883

202.4536

106.243

243.5621

150.0542

78.4205

高通855

ResNet50

quant_post

328.2683

201.9937

106.744

242.6397

150.0338

79.8659

麒麟970

MobileNetV1

FP32 baseline

101.2455

56.4053

35.6484

94.8985

51.7251

31.9511

麒麟970

MobileNetV1

quant_aware

62.5012

32.1863

16.6018

57.7477

29.2116

15.0703

麒麟970

MobileNetV1

quant_post

62.4412

32.2585

16.6215

57.825

29.2573

15.1206

麒麟970

MobileNetV2

FP32 baseline

70.4176

42.0795

25.1939

68.9597

39.2145

22.6617

麒麟970

MobileNetV2

quant_aware

52.9961

31.5323

22.1447

49.4858

28.0856

18.7287

麒麟970

MobileNetV2

quant_post

53.0961

31.7987

21.8334

49.383

28.2358

18.3642

麒麟970

ResNet50

FP32 baseline

586.8943

344.0858

228.2293

573.3344

351.4332

225.8006

麒麟970

ResNet50

quant_aware

488.361

260.1697

142.416

479.5668

249.8485

138.1742

麒麟970

ResNet50

quant_post

489.6188

258.3279

142.6063

480.0064

249.5339

138.5284

1.2 剪裁

PaddleLite推理耗时说明:

环境:Qualcomm SnapDragon 845 + armv8

速度指标:Thread1/Thread2/Thread4耗时

PaddleLite版本: v2.3

模型

压缩方法

Top-1/Top-5 Acc

模型体积(MB)

GFLOPs

PaddleLite推理耗时

TensorRT推理速度(FPS)

MobileNetV1

Baseline

70.99%/89.68%

17

1.11

66.052\35.8014\19.5762

-

MobileNetV1

uniform -50%

69.4%/88.66% (-1.59%/-1.02%)

9

0.56

33.5636\18.6834\10.5076

-

MobileNetV1

sensitive -30%

70.4%/89.3% (-0.59%/-0.38%)

12

0.74

46.5958\25.3098\13.6982

-

MobileNetV1

sensitive -50%

69.8% / 88.9% (-1.19%/-0.78%)

9

0.56

37.9892\20.7882\11.3144

-

MobileNetV2

-

72.15%/90.65%

15

0.59

41.7874\23.375\13.3998

-

MobileNetV2

uniform -50%

65.79%/86.11% (-6.35%/-4.47%)

11

0.296

23.8842\13.8698\8.5572

-

ResNet34

-

72.15%/90.65%

84

7.36

217.808\139.943\96.7504

342.32

ResNet34

uniform -50%

70.99%/89.95% (-1.36%/-0.87%)

41

3.67

114.787\75.0332\51.8438

452.41

ResNet34

auto -55.05%

70.24%/89.63% (-2.04%/-1.06%)

33

3.31

105.924\69.3222\48.0246

457.25

1.3 蒸馏

模型

压缩方法

Top-1/Top-5 Acc

模型体积(MB)

MobileNetV1

student

70.99%/89.68%

17

ResNet50_vd

teacher

79.12%/94.44%

99

MobileNetV1

ResNet50_vd1 distill

72.77%/90.68% (+1.78%/+1.00%)

17

MobileNetV2

student

72.15%/90.65%

15

MobileNetV2

ResNet50_vd distill

74.28%/91.53% (+2.13%/+0.88%)

15

ResNet50

student

76.50%/93.00%

99

ResNet101

teacher

77.56%/93.64%

173

ResNet50

ResNet101 distill

77.29%/93.65% (+0.79%/+0.65%)

99

注意:带”_vd”后缀代表该预训练模型使用了Mixup,Mixup相关介绍参考mixup: Beyond Empirical Risk Minimization

1.4 搜索

数据集: ImageNet1000

模型

压缩方法

Top-1/Top-5 Acc

模型体积(MB)

GFLOPs

MobileNetV2

-

72.15%/90.65%

15

0.59

MobileNetV2

SANAS

71.518%/90.208% (-0.632%/-0.442%)

14

0.295

数据集: Cifar10

模型

压缩方法

Acc

模型参数(MB)

下载

Darts

-

97.135%

3.767

-

Darts_SA(基于Darts搜索空间)

SANAS

97.276%(+0.141%)

3.344(-11.2%)

-

Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6, 2, 0, 3, 4, 5, 0, 4, 5, 5, 1, 4, 8, 0, 0]. Darts_SA的token是:[5, 5, 0, 5, 5, 10, 7, 7, 5, 7, 7, 11, 10, 12, 10, 0, 5, 3, 10, 8].

2. 目标检测

2.1 量化

数据集: COCO 2017

模型

压缩方法

数据集

Image/GPU

输入608 Box AP

输入416 Box AP

输入320 Box AP

模型体积(MB)

TensorRT时延(V100, ms)

MobileNet-V1-YOLOv3

-

COCO

8

29.3

29.3

27.1

95

-

MobileNet-V1-YOLOv3

quant_post

COCO

8

27.9 (-1.4)

28.0 (-1.3)

26.0 (-1.0)

25

-

MobileNet-V1-YOLOv3

quant_aware

COCO

8

28.1 (-1.2)

28.2 (-1.1)

25.8 (-1.2)

26.3

-

R34-YOLOv3

-

COCO

8

36.2

34.3

31.4

162

-

R34-YOLOv3

quant_post

COCO

8

35.7 (-0.5)

-

-

42.7

-

R34-YOLOv3

quant_aware

COCO

8

35.2 (-1.0)

33.3 (-1.0)

30.3 (-1.1)

44

-

R50-dcn-YOLOv3 obj365_pretrain

-

COCO

8

41.4

-

-

177

18.56

R50-dcn-YOLOv3 obj365_pretrain

quant_aware

COCO

8

40.6 (-0.8)

37.5

34.1

66

14.64

数据集:WIDER-FACE

模型

压缩方法

Image/GPU

输入尺寸

Easy/Medium/Hard

模型体积(MB)

BlazeFace

-

8

640

91.5/89.2/79.7

815

BlazeFace

quant_post

8

640

87.8/85.1/74.9 (-3.7/-4.1/-4.8)

228

BlazeFace

quant_aware

8

640

90.5/87.9/77.6 (-1.0/-1.3/-2.1)

228

BlazeFace-Lite

-

8

640

90.9/88.5/78.1

711

BlazeFace-Lite

quant_post

8

640

89.4/86.7/75.7 (-1.5/-1.8/-2.4)

211

BlazeFace-Lite

quant_aware

8

640

89.7/87.3/77.0 (-1.2/-1.2/-1.1)

211

BlazeFace-NAS

-

8

640

83.7/80.7/65.8

244

BlazeFace-NAS

quant_post

8

640

81.6/78.3/63.6 (-2.1/-2.4/-2.2)

71

BlazeFace-NAS

quant_aware

8

640

83.1/79.7/64.2 (-0.6/-1.0/-1.6)

71

2.2 剪裁

数据集:Pasacl VOC & COCO 2017

PaddleLite推理耗时说明:

环境:Qualcomm SnapDragon 845 + armv8

速度指标:Thread1/Thread2/Thread4耗时

PaddleLite版本: v2.3

模型

压缩方法

数据集

Image/GPU

输入608 Box AP

输入416 Box AP

输入320 Box AP

模型体积(MB)

GFLOPs (608*608)

PaddleLite推理耗时(ms)(608*608)

TensorRT推理速度(FPS)(608*608)

MobileNet-V1-YOLOv3

Baseline

Pascal VOC

8

76.2

76.7

75.3

94

40.49

1238\796.943\520.101

60.04

MobileNet-V1-YOLOv3

sensitive -52.88%

Pascal VOC

8

77.6 (+1.4)

77.7 (1.0)

75.5 (+0.2)

31

19.08

602.497\353.759\222.427

99.36

MobileNet-V1-YOLOv3

-

COCO

8

29.3

29.3

27.0

95

41.35

-

-

MobileNet-V1-YOLOv3

sensitive -51.77%

COCO

8

26.0 (-3.3)

25.1 (-4.2)

22.6 (-4.4)

32

19.94

-

73.93

R50-dcn-YOLOv3

-

COCO

8

39.1

-

-

177

89.60

-

27.68

R50-dcn-YOLOv3

sensitive -9.37%

COCO

8

39.3 (+0.2)

-

-

150

81.20

-

30.08

R50-dcn-YOLOv3

sensitive -24.68%

COCO

8

37.3 (-1.8)

-

-

113

67.48

-

34.32

R50-dcn-YOLOv3 obj365_pretrain

-

COCO

8

41.4

-

-

177

89.60

-

-

R50-dcn-YOLOv3 obj365_pretrain

sensitive -9.37%

COCO

8

40.5 (-0.9)

-

-

150

81.20

-

-

R50-dcn-YOLOv3 obj365_pretrain

sensitive -24.68%

COCO

8

37.8 (-3.3)

-

-

113

67.48

-

-

2.3 蒸馏

数据集:Pasacl VOC & COCO 2017

模型

压缩方法

数据集

Image/GPU

输入608 Box AP

输入416 Box AP

输入320 Box AP

模型体积(MB)

MobileNet-V1-YOLOv3

-

Pascal VOC

8

76.2

76.7

75.3

94

ResNet34-YOLOv3

-

Pascal VOC

8

82.6

81.9

80.1

162

MobileNet-V1-YOLOv3

ResNet34-YOLOv3 distill

Pascal VOC

8

79.0 (+2.8)

78.2 (+1.5)

75.5 (+0.2)

94

MobileNet-V1-YOLOv3

-

COCO

8

29.3

29.3

27.0

95

ResNet34-YOLOv3

-

COCO

8

36.2

34.3

31.4

163

MobileNet-V1-YOLOv3

ResNet34-YOLOv3 distill

COCO

8

31.4 (+2.1)

30.0 (+0.7)

27.1 (+0.1)

95

2.4 搜索

数据集:WIDER-FACE

模型

压缩方法

Image/GPU

输入尺寸

Easy/Medium/Hard

模型体积(KB)

硬件延时(ms)

BlazeFace

-

8

640

91.5/89.2/79.7

815

71.862

BlazeFace-NAS

-

8

640

83.7/80.7/65.8

244

21.117

BlazeFace-NASV2

SANAS

8

640

87.0/83.7/68.5

389

22.558

Note: 硬件延时时间是利用提供的硬件延时表得到的,硬件延时表是在855芯片上基于PaddleLite测试的结果。BlazeFace-NASV2的详细配置在这里.

3. 图像分割

数据集:Cityscapes

3.1 量化

模型

压缩方法

mIoU

模型体积(MB)

DeepLabv3+/MobileNetv1

-

63.26

6.6

DeepLabv3+/MobileNetv1

quant_post

58.63 (-4.63)

1.8

DeepLabv3+/MobileNetv1

quant_aware

62.03 (-1.23)

1.8

DeepLabv3+/MobileNetv2

-

69.81

7.4

DeepLabv3+/MobileNetv2

quant_post

67.59 (-2.22)

2.1

DeepLabv3+/MobileNetv2

quant_aware

68.33 (-1.48)

2.1

图像分割模型Lite时延(ms), 输入尺寸769x769

设备

模型类型

压缩策略

armv7 Thread 1

armv7 Thread 2

armv7 Thread 4

armv8 Thread 1

armv8 Thread 2

armv8 Thread 4

高通835

Deeplabv3- MobileNetV1

FP32 baseline

1227.9894

734.1922

527.9592

1109.96

699.3818

479.0818

高通835

Deeplabv3- MobileNetV1

quant_aware

848.6544

512.785

382.9915

752.3573

455.0901

307.8808

高通835

Deeplabv3- MobileNetV1

quant_post

840.2323

510.103

371.9315

748.9401

452.1745

309.2084

高通835

Deeplabv3-MobileNetV2

FP32 baseline

1282.8126

793.2064

653.6538

1193.9908

737.1827

593.4522

高通835

Deeplabv3-MobileNetV2

quant_aware

976.0495

659.0541

513.4279

892.1468

582.9847

484.7512

高通835

Deeplabv3-MobileNetV2

quant_post

981.44

658.4969

538.6166

885.3273

586.1284

484.0018

高通855

Deeplabv3- MobileNetV1

FP32 baseline

568.8748

339.8578

278.6316

420.6031

281.3197

217.5222

高通855

Deeplabv3- MobileNetV1

quant_aware

608.7578

347.2087

260.653

241.2394

177.3456

143.9178

高通855

Deeplabv3- MobileNetV1

quant_post

609.0142

347.3784

259.9825

239.4103

180.1894

139.9178

高通855

Deeplabv3-MobileNetV2

FP32 baseline

639.4425

390.1851

322.7014

477.7667

339.7411

262.2847

高通855

Deeplabv3-MobileNetV2

quant_aware

703.7275

497.689

417.1296

394.3586

300.2503

239.9204

高通855

Deeplabv3-MobileNetV2

quant_post

705.7589

474.4076

427.2951

394.8352

297.4035

264.6724

麒麟970

Deeplabv3- MobileNetV1

FP32 baseline

1682.1792

1437.9774

1181.0246

1261.6739

1068.6537

690.8225

麒麟970

Deeplabv3- MobileNetV1

quant_aware

1062.3394

1248.1014

878.3157

774.6356

710.6277

528.5376

麒麟970

Deeplabv3- MobileNetV1

quant_post

1109.1917

1339.6218

866.3587

771.5164

716.5255

500.6497

麒麟970

Deeplabv3-MobileNetV2

FP32 baseline

1771.1301

1746.0569

1222.4805

1448.9739

1192.4491

760.606

麒麟970

Deeplabv3-MobileNetV2

quant_aware

1320.2905

921.4522

676.0732

1145.8801

821.5685

590.1713

麒麟970

Deeplabv3-MobileNetV2

quant_post

1320.386

918.5328

672.2481

1020.753

820.094

591.4114

3.2 剪裁

PaddleLite推理耗时说明:

环境:Qualcomm SnapDragon 845 + armv8

速度指标:Thread1/Thread2/Thread4耗时

PaddleLite版本: v2.3

模型

压缩方法

mIoU

模型体积(MB)

GFLOPs

PaddleLite推理耗时

TensorRT推理速度(FPS)

fast-scnn

baseline

69.64

11

14.41

1226.36\682.96\415.664

39.53

fast-scnn

uniform -17.07%

69.58 (-0.06)

8.5

11.95

1140.37\656.612\415.888

42.01

fast-scnn

sensitive -47.60%

66.68 (-2.96)

5.7

7.55

866.693\494.467\291.748

51.48

 

 
人工智能芯片与自动驾驶