1. 网络结构
注:
(1)图中的输出缺少一个 batch-size 的维度,例如 yolo1 的实际输出是 [bs, 3, 13, 13, 85]
(2)yolo 层的功能:yolo 层在 forward 时仅调整了输入特征的结构,并没有变动数值
(3)yolo 层的输出:3 代表 anchor 数量;13*13 代表图像划分的网格;85 代表网络预测 [x, y, w, h, obj, cls],即预测目标的中心坐标、长宽、是否存在目标、目标类别(coco中为80类)
2. Loss
整个 loss 由三部分组成:
lbox(对应输出的 x, y, w, h)
lobj(对应输出的 obj)
lcls(对应输出的 cls)
2.1 lobj
对所有的预测计算 lobj
标签:对于每一个网格中的每一个 anchor,存在符合要求的目标则为1,不存在则为0
要求:目标的中心坐标在网格中;anchor 和目标尺寸的 wh_iou > iou_t(0.2)
wh_iou 计算方式:
aw、ah为 anchor 尺寸,tw、th为目标尺寸
loss 计算方式:
使用 nn.BCEWithLogitsLoss,就是对输入先做个 sigmoid 再计算交叉熵
2.2 lcls
只对存在符合要求目标的位置的预测计算 lcls
标签:目标类别的 one-hot 向量
loss 计算方式:
与 lobj 相同,使用 nn.BCEWithLogitsLoss
2.3 lbox
只对存在符合要求目标的位置的预测计算 lbox
所有的相关数值按照该 yolo 层的步长缩小,包括标签目标的 x, y, w, h,和 anchor 的 w, h
以 yolo1 为例:步长为32,anchor 为 [116,90],[156,198],[373,326]→[ 3.62500,2.81250],[ 4.87500,6.18750],[11.65625,0.18750]
缩小过后的标签 x, y 中的整数部分可以得知中心坐标在哪个网格中,小数部分则是相对该网格左上角的偏移量,最终网络预测的 x, y 的 sigmoid 要趋近于这个小数部分;网络学习的并非实际的 x, y,而是网格中的相对位置。
网络预测的 w, h 做 exp 再乘上 anchor 的 w, h 要趋近于标签的 w, h;网络学习的并非实际的 w, h,而是目标与 anchor 的比例。
loss 计算方式:
为两个区域重叠面积
为两个区域所占总面积
为能包含两个区域的最小矩形面积
的范围是 ,当两个区域完全重叠时 ,当两个区域面积有限而间隔无穷远时 , 相对于 可以衡量两个区域没有交集时的距离。
Loss 核心代码
pxy = ps[:, :2].sigmoid()
pwh = ps[:, 2:4].exp().clamp(max=1E3) * anchors[i]
pbox = torch.cat((pxy, pwh), 1)
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True)
lbox += (1.0 - giou).mean()
lcls += BCEcls(ps[:, 5:], t) # t:
lobj += BCEobj(pi[..., 4], tobj) # tobj: shape[bs, 3, 13, 13], 存在目标位置为1, 其他为0
其中:
BCEcls = nn.BCEWithLogitsLoss(pos_weight=ft([h['cls_pw']]), reduction=red) # 'cls_pw': 1.0
BCEobj = nn.BCEWithLogitsLoss(pos_weight=ft([h['obj_pw']]), reduction=red) # 'obj_pw': 1.0
pi 为网络所有预测输出
ps 为符合要求的输出
lobj 对所有预测计算,lcls 和 lbox 对符合要求的预测计算
要求:
1. 有目标的中心坐标在网格中
2. anchor 和目标框的 wh_iou > iou_t(0.2)
wh_iou 的计算方式:
inter = min(aw,tw)*min(ah,th)
wh_iou = inter/(aw*ah+tw*th-inter)
3. 代码分析
3.0. Darknet
网络搭建核心为 create_modules;整个网络的 forward 在看完后文每一层的搭建再看更直观,最终得到 yolo_out,包含3个 yolo 层的输出。
由于 ONNX_EXPORT = False, augment = False, verbose = False
文中为了简短代码暂时将与这三个参数为 True 时的语句删除
# train.py 91
model = Darknet(cfg).to(device)
# models.py 225
class Darknet(nn.Module):
def __init__(self, cfg, img_size=(416, 416), verbose=False):
super(Darknet, self).__init__()
self.module_defs = parse_model_cfg(cfg)
self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg)
self.yolo_layers = get_yolo_layers(self) # yolov3:[82, 94, 106]
self.version = np.array([0, 2, 5], dtype=np.int32) # (int32) version info: major, minor, revision
self.seen = np.array([0], dtype=np.int64) # (int64) number of images seen during training
self.info(verbose) if not ONNX_EXPORT else None # print model description
def forward(self, x, augment=False, verbose=False):
if not augment:
return self.forward_once(x)
def forward_once(self, x, augment=False, verbose=False):
img_size = x.shape[-2:] # height, width
yolo_out, out = [], []
if verbose:
print('0', x.shape)
str = ''
for i, module in enumerate(self.module_list):
name = module.__class__.__name__
if name in ['WeightedFeatureFusion', 'FeatureConcat']: # sum, concat
x = module(x, out) # WeightedFeatureFusion(), FeatureConcat()
elif name == 'YOLOLayer':
yolo_out.append(module(x, out))
else: # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
x = module(x)
out.append(x if self.routs[i] else [])
if self.training: # train
return yolo_out
else: # inference or test
x, p = zip(*yolo_out) # inference output, training output
x = torch.cat(x, 1) # cat yolo outputs
return x, p
3.1. parse_model_cfg
输入:模型配置文件路径
输出:包含多个字典的列表,每层的配置信息为一个字典
# utils/parse_config.py 6
def parse_model_cfg(path):
# Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3'
if not path.endswith('.cfg'): # add .cfg suffix if omitted
path += '.cfg'
if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path): # add cfg/ prefix if omitted
path = 'cfg' + os.sep + path
with open(path, 'r') as f:
lines = f.read().split('\n')
lines = [x for x in lines if x and not x.startswith('#')]
# lstrip()去除开头空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
# rstrip()去除结尾空白符
lines = [x.rstrip().lstrip() for x in lines]
mdefs = [] # module definitions
for line in lines:
if line.startswith('['): # This marks the start of a new block
mdefs.append({})
mdefs[-1]['type'] = line[1:-1].rstrip()
if mdefs[-1]['type'] == 'convolutional':
mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later)
else:
key, val = line.split("=")
key = key.rstrip()
if key == 'anchors': # return nparray
mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2)) # np anchors
elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val): # return array
mdefs[-1][key] = [int(x) for x in val.split(',')]
else:
val = val.strip()
# isnumeric()判断字符串是否只由数字构成,无法检测浮点数(字符串中带小数点为False)
# TODO: .isnumeric() actually fails to get the float case
if val.isnumeric(): # return int or float(这里的判断暂时多余,应该全是int)
mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
else:
mdefs[-1][key] = val # return string
# 检查是否支持所有的字段
supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'probability']
f = [] # fields
for x in mdefs[1:]:
[f.append(k) for k in x if k not in f]
u = [x for x in f if x not in supported] # unsupported fields
assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
return mdefs
3.2. create_modules
在 yolov3.cfg 中,网络层类别有 [convolutional]、[shortcut]、[yolo]、[route]、[upsample]
1. [convolutional]
Conv2d + BatchNorm2d + LeakyReLU
注:
(1)由于 size 都是 single-size,所以卷积层都是 ‘Conv2d’,各参数按配置文件设置。
(2)文件中的 pad=1 并非 padding 值,padding 值由 kernel_size//2 确定。
(3)除了 yolo 层的前一个卷积层都有 BatchNorm 层。
(4) yolo 层的前一个卷积层对应的 routs 为True
# models.py 8
def create_modules(module_defs, img_size, cfg):
# Constructs module list of layer blocks from module configuration in module_defs
img_size = [img_size] * 2 if isinstance(img_size, int) else img_size # expand if necessary
_ = module_defs.pop(0) # cfg training hyperparams (unused)
output_filters = [3] # input channels
module_list = nn.ModuleList()
routs = [] # list of layers which rout to deeper layers
yolo_index = -1
for i, mdef in enumerate(module_defs):
modules = nn.Sequential()
if mdef['type'] == 'convolutional':
bn = mdef['batch_normalize']
filters = mdef['filters']
k = mdef['size'] # kernel size
stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x'])
if isinstance(k, int): # single-size conv
modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
out_channels=filters,
kernel_size=k,
stride=stride,
padding=k // 2 if mdef['pad'] else 0,
groups=mdef['groups'] if 'groups' in mdef else 1,
bias=not bn))
else: # multiple-size conv
modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1],
out_ch=filters,
k=k,
stride=stride,
bias=not bn))
if bn:
modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4))
else:
routs.append(i) # detection output (goes into yolo layer)
if mdef['activation'] == 'leaky': # activation study https://github.com/ultralytics/yolov3/issues/441
modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
elif mdef['activation'] == 'swish':
modules.add_module('activation', Swish())
elif mdef['activation'] == 'mish':
modules.add_module('activation', Mish())
# 其他type
else:
print('Warning: Unrecognized Layer Type: ' + mdef['type'])
# Register module list and number of output filters
module_list.append(modules)
output_filters.append(filters)
routs_binary = [False] * (i + 1)
for i in routs:
routs_binary[i] = True
return module_list, routs_binary
2. [shortcut]
[shortcut]
from=-3
activation=linear
注:
(1)应用于 res_block,将上一层和 from 指定层的输出相加。
(2)在最终得到的 module_list 里为 WeightedFeatureFusion()
(3)从 forward 最后的 Adjust channels 部分中可以看出,输出特征维度保持和上一层(-1层)输出相同。
(4)from 所指向的层对应的 routs 为True
elif mdef['type'] == 'shortcut': # nn.Sequential() placeholder for 'shortcut' layer
layers = mdef['from']
filters = output_filters[-1]
routs.extend([i + l if l < 0 else l for l in layers])
modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef)
# utils/layers.py 38
class WeightedFeatureFusion(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
def __init__(self, layers, weight=False):
super(WeightedFeatureFusion, self).__init__()
self.layers = layers # layer indices 网络层索引
self.weight = weight # apply weights boolean
self.n = len(layers) + 1 # number of layers
if weight:
# nn.Parameter 使 weights 可训练
self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True) # layer weights
def forward(self, x, outputs):
# Weights
if self.weight:
w = torch.sigmoid(self.w) * (2 / self.n) # sigmoid weights (0-1)
x = x * w[0]
# Fusion
nx = x.shape[1] # input channels
for i in range(self.n - 1):
a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]] # feature to add
na = a.shape[1] # feature channels
# Adjust channels
if nx == na: # same shape
x = x + a
elif nx > na: # slice input
x[:, :na] = x[:, :na] + a # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a
else: # slice feature
x = x + a[:, :nx]
return x
3. [route]
[route]
layers = -1, 61
注:
(1)layers 所指定层输出特征拼接,输出特征维度为所有输入维度和,单层则引用该层输出
(2)在最终得到的 module_list 里为 FeatureConcat()
(3)layers 所指向的层对应的 routs 为True
elif mdef['type'] == 'route': # nn.Sequential() placeholder for 'route' layer
layers = mdef['layers']
filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
routs.extend([i + l if l < 0 else l for l in layers])
modules = FeatureConcat(layers=layers)
# utils/layers.py 28
class FeatureConcat(nn.Module):
def __init__(self, layers):
super(FeatureConcat, self).__init__()
self.layers = layers # layer indices
self.multiple = len(layers) > 1 # multiple layers flag
def forward(self, x, outputs):
return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]]
4. [upsample]
[upsample]
stride=2
初始
假设
yolo1 后的上采样 g=1/16,size=(26,26)
yolo2 后的上采样 g=1/8,size=(52,52)
# 在 models.py 开头设置了 ONNX_EXPORT = False
elif mdef['type'] == 'upsample':
if ONNX_EXPORT: # explicitly state size, avoid scale_factor
g = (yolo_index + 1) * 2 / 32 # gain
modules = nn.Upsample(size=tuple(int(x * g) for x in img_size))
else:
modules = nn.Upsample(scale_factor=mdef['stride'])
5. [yolo]
[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
处理一下参数,然后利用 YOLOLayer 构建 yolo 层
(try 部分暂时搁置)
elif mdef['type'] == 'yolo':
yolo_index += 1 # 0,1,2 yolo层索引
stride = [32, 16, 8] # P5, P4, P3 strides
if any(x in cfg for x in ['panet', 'yolov4', 'cd53']): # stride order reversed
stride = list(reversed(stride))
layers = mdef['from'] if 'from' in mdef else [] # layers = []
modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']], # anchor list
nc=mdef['classes'], # number of classes (80)
img_size=img_size, # (416, 416)
yolo_index=yolo_index, # 0, 1, 2...
layers=layers, # output layers
stride=stride[yolo_index])
# Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
try:
j = layers[yolo_index] if 'from' in mdef else -1
# If previous layer is a dropout layer, get the one before
if module_list[j].__class__.__name__ == 'Dropout':
j -= 1
bias_ = module_list[j][0].bias # shape(255,)
bias = bias_[:modules.no * modules.na].view(modules.na, -1) # shape(3,85)
bias[:, 4] += -4.5 # obj
bias[:, 5:] += math.log(0.6 / (modules.nc - 0.99)) # cls (sigmoid(p) = 1/nc)
module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad) # bias数值调整时,bias_对应值也会调整
except:
print('WARNING: smart bias initialization failure.')
# 以 yolo1 为例 YOLOLayer 的输入
anchors = array([[116,90], [156,198], [373,326]])
nc = 80
img_size = (416,416)
yolo_index = 0
layers = []
stride = 32
训练阶段,把前一个卷积层的输出进行结构变换
输入:(batch_size, 255, 13, 13)
输出:(batch_size, 3, 13, 13, 85)(batch_size, anchors, grid, grid, classes + xywh)
forward 部分 ASFF 暂时搁置
class YOLOLayer(nn.Module):
def __init__(self, anchors, nc, img_size, yolo_index, layers, stride):
super(YOLOLayer, self).__init__()
self.anchors = torch.Tensor(anchors)
self.index = yolo_index # index of this layer in layers
self.layers = layers # model output layer indices
self.stride = stride # layer stride
self.nl = len(layers) # number of output layers (3) 应该是0
self.na = len(anchors) # number of anchors (3)
self.nc = nc # number of classes (80)
self.no = nc + 5 # number of outputs (85)
self.nx, self.ny, self.ng = 0, 0, 0 # initialize number of x, y gridpoints
self.anchor_vec = self.anchors / self.stride
self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)
def create_grids(self, ng=(13, 13), device='cpu'):
self.nx, self.ny = ng # x and y grid size
self.ng = torch.tensor(ng, dtype=torch.float)
# build xy offsets
if not self.training:
yv, xv = torch.meshgrid([torch.arange(self.ny, device=device), torch.arange(self.nx, device=device)])
self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float()
if self.anchor_vec.device != device:
self.anchor_vec = self.anchor_vec.to(device)
self.anchor_wh = self.anchor_wh.to(device)
def forward(self, p, out):
ASFF = False # https://arxiv.org/abs/1911.09516
if ASFF:
i, n = self.index, self.nl # index in layers, number of layers
p = out[self.layers[i]]
bs, _, ny, nx = p.shape # bs, 255, 13, 13
if (self.nx, self.ny) != (nx, ny):
self.create_grids((nx, ny), p.device)
# outputs and weights
# w = F.softmax(p[:, -n:], 1) # normalized weights
w = torch.sigmoid(p[:, -n:]) * (2 / n) # sigmoid weights (faster)
# w = w / w.sum(1).unsqueeze(1) # normalize across layer dimension
# weighted ASFF sum
p = out[self.layers[i]][:, :-n] * w[:, i:i + 1]
for j in range(n):
if j != i:
p += w[:, j:j + 1] * \
F.interpolate(out[self.layers[j]][:, :-n], size=[ny, nx], mode='bilinear', align_corners=False)
else:
bs, _, ny, nx = p.shape # bs, 255, 13, 13
if (self.nx, self.ny) != (nx, ny):
self.create_grids((nx, ny), p.device) # 训练阶段就是让 self.nx, self.ny = nx, ny
# p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85) # (bs, anchors, grid, grid, classes + xywh)
p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction
if self.training:
return p
else: # inference
io = p.clone() # inference output
io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid # xy
io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh # wh yolo method
io[..., :4] *= self.stride
torch.sigmoid_(io[..., 4:])
return io.view(bs, -1, self.no), p # view [1, 3, 13, 13, 85] as [1, 507, 85]
3.3. Loss
# train.py 278
# Forward
pred = model(imgs)
# Loss
loss, loss_items = compute_loss(pred, targets, model)
if not torch.isfinite(loss):
print('WARNING: non-finite loss, ending training ', loss_items)
return results
# utils/utils.py 353
def compute_loss(p, targets, model):
输入参数:
1. pred 网络的预测输出
shape: [yolo层数量, batch_size, anchor数量, grid, grid, classes + xywh]
2. targets 标签
shape: [nt, 6]
其中, nt 为该 batch_size 图像包含的目标数量,6 为标签信息 [图像编号, 类别, x, y, w, h]
# utils/utils.py 420
def build_targets(p, targets, model):
输入与 compute_loss 的输入相同
输出:tcls, tbox, indices, anch
4个输出都是包含3个数组(对应3个 yolo 层)的列表,每个数组包含在该 yolo 层符合要求的 targets 的信息;
要求是 anchors 和 targets 的 wh_iou > 模型超参数 iou_t;
wh_iou 的计算方式:
inter = min(aw,tw)*min(ah,th)
wh_iou = inter/(aw*ah+tw*th-inter)
1. indices: 列表内3个元组(yolo), 元组内4个tensor, tensor 内分别为 图像编号、anchor编号、y、x (数值全取整, 取整数部分; anchor编号为0,1,2)
2. tcls: 列表内3个tensor, tensor 内为 类别编号
3. tbox: 列表内3个tensor, [dx, dy, w, h], dx,dy 为 x,y 的小数部分
4. anch: 列表内3个tensor, anchor 的大小
xywh 和 anchor 大小数值都缩放至该 yolo 层网格大小坐标, 如(0,1),(0,416)→(0,13)
# utils/utils.py 239
def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False):
输出为 giou
giou = inter/union - (c_area-union)/c_area
其中:
inter 为两个区域重叠面积
union 为两个区域所占总面积
c_area 为能包含两个区域的最小矩形面积
# utils/utils.py 353
def compute_loss(p, targets, model): # predictions, targets, model
ft = torch.cuda.FloatTensor if p[0].is_cuda else torch.Tensor
lcls, lbox, lobj = ft([0]), ft([0]), ft([0])
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
h = model.hyp # hyperparameters
red = 'mean' # Loss reduction (sum or mean)
# Define criteria
BCEcls = nn.BCEWithLogitsLoss(pos_weight=ft([h['cls_pw']]), reduction=red) # 'cls_pw': 1.0
BCEobj = nn.BCEWithLogitsLoss(pos_weight=ft([h['obj_pw']]), reduction=red) # 'obj_pw': 1.0
# class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
cp, cn = smooth_BCE(eps=0.0) # cp=1.0 cn=0.0
# focal loss
g = h['fl_gamma'] # focal loss gamma 'fl_gamma': 0.0
if g > 0:
BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
# per output
nt = 0 # targets
for i, pi in enumerate(p): # layer index, layer predictions
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros_like(pi[..., 0]) # target obj shape[bs, 3, 13, 13]
nb = b.shape[0] # number of targets
if nb:
nt += nb # cumulative targets
ps = pi[b, a, gj, gi] # 通过 indices, 得到网络对于存在目标位置的预测值
# GIoU
pxy = ps[:, :2].sigmoid()
pwh = ps[:, 2:4].exp().clamp(max=1E3) * anchors[i]
pbox = torch.cat((pxy, pwh), 1) # predicted box
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True) # giou(prediction, target)
lbox += (1.0 - giou).sum() if red == 'sum' else (1.0 - giou).mean() # giou loss
# Obj
# model.gr 没找到出处, 数值是0
# tobj shape[bs, 3, 13, 13], 存在目标位置为1, 其他为0
tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio
# Class
if model.nc > 1: # cls loss (only if multiple classes)
t = torch.full_like(ps[:, 5:], cn) # targets
t[range(nb), tcls[i]] = cp
lcls += BCEcls(ps[:, 5:], t) # BCE
# Append targets to text file
# with open('targets.txt', 'a') as file:
# [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
lobj += BCEobj(pi[..., 4], tobj) # obj loss
lbox *= h['giou'] # 'giou': 3.54
lobj *= h['obj'] # 'obj': 64.3
lcls *= h['cls'] # 'cls': 37.4
if red == 'sum':
bs = tobj.shape[0] # batch size
g = 3.0 # loss gain
lobj *= g / bs
if nt:
lcls *= g / nt / model.nc
lbox *= g / nt
loss = lbox + lobj + lcls
return loss, torch.cat((lbox, lobj, lcls, loss)).detach()