


蒸馏网络中的bias是指什么? —— 论文《Distilling the Knowledge in a Neural Network》—— 知识蒸馏_sed

If this bias is increased by 3.5


蒸馏网络中的bias是指什么? —— 论文《Distilling the Knowledge in a Neural Network》—— 知识蒸馏_git_02

Assuming you're referring to the previous sentence about logits, here's the translation with "If this bias is increased by 3.5":

如果将此偏差增加 3.5,则非常负的 logits 可能仍然包含来自复杂模型所学习知识的有用信息,但这些信息可能更难解读或利用。

Here's a breakdown of the addition:

如果将此偏差增加 3.5 (If this bias is increased by 3.5): This translates directly, indicating we're considering the scenario where the bias towards negative logits is amplified.
Explanation of the impact:

By increasing the bias towards negative logits, the model will become even more confident in assigning very low probabilities to certain categories.
While the logits might still hold some information about the model's learnings, it might be harder to interpret or leverage that information due to the stronger bias.
This could potentially make the model less accurate or adaptable.


蒸馏网络中的bias是指什么? —— 论文《Distilling the Knowledge in a Neural Network》—— 知识蒸馏_sed_03


CNN层 + 全连接层(输出的是logits) + softmax层(输出的是预测值概率P) + 交叉熵损失函数

在蒸馏网络中,Student网络是通过学习Teacher网络中的通过温度控制后的logits所形成的概率,也就是上面公式中的这个\(q_i\),上面的这个\(q_i\)是Teacher网络的,我们也需要构建Student网络得到一个对应的\(z_i^{'}\)并根据此得到对应的\({q}_i^{'}\),在得到Teacher网络的\(q_i\)和Student网络的\({q}_i^{'}\)后根据\(KL({q}_i, {q}_i^{'})\)来训练Student网络。

