参数正则化(Weight Regularization)

以前的方法

L2/L1 Regularization

机器学习中几乎都可以看到损失函数后面会添加一个额外项,常用的额外项一般有两种,称作**L1正则化L2正则化,或者L1范数L2范数**。

L1 正则化和 L2 正则化可以看做是损失函数的惩罚项。所谓 “惩罚” 是指对损失函数中的某些参数做一些限制。

  • L1 正则化是指权值向量 w中各个元素的**绝对值之和**,通常表示为 pytorch正则化层怎么加 pytorch l2正则_权值
  • L2 正则化是指权值向量 w中各个元素的**平方和然后再求平方根**,通常表示为{||w||}_2$

下面是L1正则化和L2正则化的作用,这些表述可以在很多文章中找到。

  • L1 正则化可以产生稀疏权值矩阵,即产生一个稀疏模型,可以用于特征选择
  • L2 正则化可以防止模型过拟合(overfitting);一定程度上,L1也可以防止过拟合

L2 正则化的实现方法:

reg = 1e-6
l2_loss = Variable(torch.FloatTensor(1), requires_grad=True)
for name, param in model.named_parameters():
    if \'bias\' not in name:
        l2_loss = l2_loss   (0.5 * reg * torch.sum(torch.pow(W, 2)))

L1 正则化的实现方法:

reg = 1e-6
l1_loss = Variable(torch.FloatTensor(1), requires_grad=True)
for name, param in model.named_parameters():
    if \'bias\' not in name:
        l1_loss = l1_loss   (reg * torch.sum(torch.abs(W)))
Orthogonal Regularization
reg = 1e-6
orth_loss = Variable(torch.FloatTensor(1), requires_grad=True)
for name, param in model.named_parameters():
    if \'bias\' not in name:
        param_flat = param.view(param.shape[0], -1)
        sym = torch.mm(param_flat, torch.t(param_flat))
        sym -= Variable(torch.eye(param_flat.shape[0]))
        orth_loss = orth_loss   (reg * sym.sum())
Max Norm Constraint

简单来讲就是对 w 的指直接进行限制。

def max_norm(model, max_val=3, eps=1e-8):
    for name, param in model.named_parameters():
        if \'bias\' not in name:
            norm = param.norm(2, dim=0, keepdim=True)
            desired = torch.clamp(norm, 0, max_val)
            param = param * (desired / (eps   norm))

L2正则

在pytorch中进行L2正则化,最直接的方式可以直接用优化器自带的weight_decay选项指定权值衰减率,相当于L2正则化中的λ

optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9,weight_decay=1e-5)
lambda = torch.tensor(1.) 
l2_reg = torch.tensor(0.) 
for param in model.parameters():     
	l2_reg += torch.norm(param) 
loss += lambda * l2_reg

此外,优化器还支持一种称之为Per-parameter options的操作,就是对每一个参数进行特定的指定,以满足更为细致的要求。做法也很简单,与上面不同的,我们传入的待优化变量不是一个Variable而是一个可迭代的字典,字典中必须有params的key,用于指定待优化变量,而其他的key需要匹配优化器本身的参数设置。

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)
weight_p, bias_p = [],[]
for name, p in model.named_parameters():
  if 'bias' in name:
     bias_p += [p]
   else:
     weight_p += [p]
# 这里的model中每个参数的名字都是系统自动命名的,只要是权值都是带有weight,偏置都带有bias,
# 因此可以通过名字判断属性,这个和tensorflow不同,tensorflow是可以用户自己定义名字的,当然也会系统自己定义。
optim.SGD([
          {'params': weight_p, 'weight_decay':1e-5},
          {'params': bias_p, 'weight_decay':0}
          ], lr=1e-2, momentum=0.9)

L1正则化

criterion= nn.CrossEntropyLoss()

classify_loss = criterion(input=out, target=batch_train_label)

lambda = torch.tensor(1.)
l1_reg = torch.tensor(0.)
for param in model.parameters():
    l1_reg += torch.sum(torch.abs(param))

loss =classify_loss+ lambda * l1_reg

定义正则化类

# 检查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device='cuda'
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
 
class Regularization(torch.nn.Module):
    def __init__(self,model,weight_decay,p=2):
        '''
        :param model 模型
        :param weight_decay:正则化参数
        :param p: 范数计算中的幂指数值,默认求2范数,
                  当p=0为L2正则化,p=1为L1正则化
        '''
        super(Regularization, self).__init__()
        if weight_decay <= 0:
            print("param weight_decay can not <=0")
            exit(0)
        self.model=model
        self.weight_decay=weight_decay
        self.p=p
        self.weight_list=self.get_weight(model)
        self.weight_info(self.weight_list)
 
    def to(self,device):
        '''
        指定运行模式
        :param device: cude or cpu
        :return:
        '''
        self.device=device
        super().to(device)
        return self
 
    def forward(self, model):
        self.weight_list=self.get_weight(model)#获得最新的权重
        reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)
        return reg_loss
 
    def get_weight(self,model):
        '''
        获得模型的权重列表
        :param model:
        :return:
        '''
        weight_list = []
        for name, param in model.named_parameters():
            if 'weight' in name:
                weight = (name, param)
                weight_list.append(weight)
        return weight_list
 
    def regularization_loss(self,weight_list, weight_decay, p=2):
        '''
        计算张量范数
        :param weight_list:
        :param p: 范数计算中的幂指数值,默认求2范数
        :param weight_decay:
        :return:
        '''
        # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
        # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
        # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
        # reg_loss=torch.FloatTensor([0.]).to(self.device)
        reg_loss=0
        for name, w in weight_list:
            l2_reg = torch.norm(w, p=p)
            reg_loss = reg_loss + l2_reg
 
        reg_loss=weight_decay*reg_loss
        return reg_loss
 
    def weight_info(self,weight_list):
        '''
        打印权重列表信息
        :param weight_list:
        :return:
        '''
        print("---------------regularization weight---------------")
        for name ,w in weight_list:
            print(name)
        print("---------------------------------------------------")
正则化类的使用
# 检查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
weight_decay=100.0 # 正则化参数
 
model = my_net().to(device)
# 初始化正则化
if weight_decay>0:
   reg_loss=Regularization(model, weight_decay, p=2).to(device)
else:
   print("no regularization")
 
 
criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy
optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay
 
# train
batch_train_data=...
batch_train_label=...
 
out = model(batch_train_data)
 
# loss and regularization
loss = criterion(input=out, target=batch_train_label)
if weight_decay > 0:
   loss = loss + reg_loss(model)
total_loss = loss.item()
 
# backprop
optimizer.zero_grad()#清除当前所有的累积梯度
total_loss.backward()
optimizer.step()