网络部分

网络在/models/yolov3.yaml里面定义,如下:

# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple

# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32

# darknet53 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0 ## 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2 ## 1
[-1, 1, Bottleneck, [64]], ## 2
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4 ## 3
[-1, 2, Bottleneck, [128]], ## 4
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8 ## 5
[-1, 8, Bottleneck, [256]], ## 6
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16 ## 7
[-1, 8, Bottleneck, [512]], ## 8
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32 ## 9
[-1, 4, Bottleneck, [1024]], # 10 ## 10
]

# YOLOv3 head
head:
[[-1, 1, Bottleneck, [1024, False]], ## 11
[-1, 1, Conv, [512, [1, 1]]],
[-1, 1, Conv, [1024, 3, 1]],
[-1, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [1024, 3, 1]], # 15 (P5/32-large)

[-2, 1, Conv, [256, 1, 1]], ## 16
[-1, 1, nn.Upsample, [None, 2, 'nearest']], ## 17
[[-1, 8], 1, Concat, [1]], # cat backbone P4 ## 18
[-1, 1, Bottleneck, [512, False]], ## 19
[-1, 1, Bottleneck, [512, False]], ## 20
[-1, 1, Conv, [256, 1, 1]], ## 21
[-1, 1, Conv, [512, 3, 1]], # 22 (P4/16-medium) ## 22

[-2, 1, Conv, [128, 1, 1]], ## 23
[-1, 1, nn.Upsample, [None, 2, 'nearest']], ## 24
[[-1, 6], 1, Concat, [1]], # cat backbone P3 ## 25
[-1, 1, Bottleneck, [256, False]], ## 26
[-1, 2, Bottleneck, [256, False]], # 27 (P3/8-small) ## 27

[[27, 22, 15], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]


一开始看一头雾水,然后耐下心结合代码看还是很清晰的。

要注意# [from, number, module, args]。

from是从哪里接,-1就是代表上一层,-2就是上上层,具体数字就是具体哪一层。

层数就是我后面注释的##部分数字,就是从0排下来的。

number就是重复来几次,8, Bottleneck就是重复8次Bottleneck,和resnet里面的残差类似。

args就是module的参数。

解析yolov3.yaml代码如下:

def parse_model(d, ch):  # model_dict, input_channels(3)
logger.info('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchors
no = na * (nc + 5) # number of outputs = anchors * (classes + 5)

layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
# tmp_1 = d['backbone'] + d['head']
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']): # from, number, module, args
m = eval(m) if isinstance(m, str) else m # eval strings
for j, a in enumerate(args):
try:
args[j] = eval(a) if isinstance(a, str) else a # eval strings
except:
pass

n = max(round(n * gd), 1) if n > 1 else n # depth gain
if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP,
C3, C3TR]:
c1, c2 = ch[f], args[0]
if c2 != no: # if not output
c2 = make_divisible(c2 * gw, 8)

args = [c1, c2, *args[1:]]
if m in [BottleneckCSP, C3, C3TR]:
args.insert(2, n) # number of repeats
n = 1
elif m is nn.BatchNorm2d:
args = [ch[f]]
elif m is Concat:
c2 = sum([ch[x] for x in f])
elif m is Detect:
args.append([ch[x] for x in f])
if isinstance(args[1], int): # number of anchors
args[1] = [list(range(args[1] * 2))] * len(f)
elif m is Contract:
c2 = ch[f] * args[0] ** 2
elif m is Expand:
c2 = ch[f] // args[0] ** 2
else:
c2 = ch[f]
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
t = str(m)[8:-2].replace('__main__.', '') # module type
np = sum([x.numel() for x in m_.parameters()]) # number params
m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params
logger.info('%3s%18s%3s%10.0f %-40s%-30s' % (i, f, n, np, t, args)) # print
save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
# if len(save) != 0:
# ii = i
# tmp = -2 % i # -2 % 16 =14
# aa = 0
layers.append(m_)
if i == 0:
ch = []
ch.append(c2)
return nn.Sequential(*layers), sorted(save)


save保存了需要保存的feature map的序号。

​javascript:void(0)​

u版yolov3详解_2d

上面这张图大体显示了yolov3的网络,只不过它输入是256大小的。我列出了640大小的数据流表格如下:

u版yolov3详解_网络部分_02

in_num

out_num

k

s

out_shape

input





backbone

0

Conv

3

32

1

Conv

32

64


2

Bottleneck(×1)

64

64


3

Conv

64

128


4

Bottleneck(×2)

128

128


5

Conv

128

256


6

Bottleneck(×8)

256

256


7

Conv

256

512


8

Bottleneck(×8)

512

512


9

Conv

512

1024


10

Bottleneck(×4)

1024

1024


head

11

Bottleneck(×1)

1024

1024

12

Conv

1024

512


13

Conv

512

1024


14

Conv

1024

512


15

Conv

512

1024


head

16

[-2]Conv

512

256

17

nn.Upsample

256

256


18

[-1,8]Concat

[256,40,40] + [512,40,40]



19

Bottleneck(×1)

768

512


20

Bottleneck(×1)

512

512


21

Conv

512

256


22

Conv

265

512


head

23

[-2]Conv

256

128

24

nn.Upsample

128

128


25

[-1,6]Concat

[128,80,80] + [256,80,80]



26

Bottleneck(×1)

384

256


27

Bottleneck(×2)

256

256


Detect

28

[27]Conv

256

255

[22]Conv

512

255



[15]Conv

1024

255




好记性不如烂键盘---点滴、积累、进步!