深度学习均方误差计算公式均方误差的计算

转载

mob6454cc620c34 2024-05-16 03:47:17

文章标签 深度学习均方误差计算公式 python 机器学习 tensorflow 深度学习 文章分类 深度学习人工智能

误差计算

常见的误差计算函数有均方差、交叉熵、KL 散度、Hinge Loss 函数等，其中均方差函数和交叉熵函数在深度学习中比较常见，均方差主要用于回归问题，交叉熵主要用于分类问题。

均方误差(Mean Squared Error, MSE)

均方差误差(Mean Squared Error, MSE)函数把输出向量和真实向量映射到笛卡尔坐标系的两个点上，通过计算这两个点之间的欧式距离(准确地说是欧式距离的平方)来衡量两个向量之间的差距：

深度学习均方误差计算公式均方误差的计算_深度学习

函数实现：

import tensorflow as tf
import keras

# mse
o = tf.random.normal([2, 10])  # 构造网络输出
y_onehot = tf.constant([1, 3])  # 构造真实值
y_onehot = tf.one_hot(y_onehot, depth=10)
loss = keras.losses.MSE(y_onehot, o)  # 计算均方差
print(loss)

TensorFlow MSE 函数返回的是每个样本的均方差，需要在样本数量上再次平均来获得 batch 的均方差：

loss = tf.reduce_mean(loss) # 计算 batch 均方差 
print(loss)

层实现：

import tensorflow as tf
import keras

o = tf.random.normal([2, 10])  # 构造网络输出
y_onehot = tf.constant([1, 3])  # 构造真实值
y_onehot = tf.one_hot(y_onehot, depth=10)
# 创建mse类
criteon = keras.losses.MeanSquaredError() 
loss = criteon(y_onehot,o) # 计算 batch 均方差 
print(loss)

交叉熵误差

熵：熵越大，代表不确定性越大，信息量也就越大。某个分布P(𝑖)的熵定义为：

深度学习均方误差计算公式均方误差的计算_深度学习_02

𝐻§也可以使用其他底数的𝑙𝑜𝑔函数计算。
举个例子：
对于 4 分类问题，如果某个样本的真实标签是第 4 类，one-hot 编码为[0,0,0,1]，即这张图片的分类是唯一确定的，它属于第 4 类的概率 P(𝑦 𝑖𝑠 4|𝑥) = 1，不确定性为 0：

−0 ∗ 𝑙𝑜𝑔2 0 − 0 ∗ 𝑙𝑜𝑔2 0 −0 ∗ 𝑙𝑜𝑔2 0 − 1 ∗ 𝑙𝑜𝑔2 1 = 0

对于确定的分布，熵为 0，即不确定性最低。如果它预测的概率分布是 [0.1,0.1,0.1,0.7]，计算熵：

−0.1 ∗ 𝑙𝑜𝑔2 0.1 − 0.1 ∗ 𝑙𝑜𝑔2 0.1 − 0.1 ∗ 𝑙𝑜𝑔2 0.1 − 0.7 ∗ 𝑙𝑜𝑔2 0.7 ≈ 1.356

考虑随机分类器，它每个类别的预测概率是均等的：[0.25,0.25,0.25,0.25]，这种情况下的熵约为 2。
由于P(𝑖) ∈ [0,1], log2 P(𝑖) ≤ 0，因此熵总是大于等于 0。当熵取得最小值 0 时，不确定性为 0。分类问题的 One-hot 编码的分布就是熵为 0 的例子。

import numpy as np


def cross_entropy_error(x, labels):
    x = np.array(x)
    labels = np.array(labels)
    return np.sum(-labels * np.log(x))

像上面那样会报错，因为lon(0)无法计算，所以通常会给每一个概率加一个非常小的值：

import numpy as np


def cross_entropy_error(x, labels):
    x = np.array(x)
    delta = 1e-7  # 防止数据为0导致后面的训练无法进行
    labels = np.array(labels)
    return np.sum(-labels * np.log(x + delta))

在实际操作的时候，我们更多的是使用mini_batch的方法训练，将交叉熵误差改为mini_batch版的代码如下：

import numpy as np


def mini_cross_entropy_error(x, labels):
    if x.ndim == 1:
        labels = labels.reshape(1, -1)
        x = x.reshape(1, -1)

    batch_size = x.shape[0]
    return -np.sum(labels * np.log(x + 1e-7)) / batch_size

如果labels不是one-hot形式呢？
你可以将其转化为one-hot形式，下面展示两种实现one-hot转换的代码：

方法一：

import numpy as np


def one_hot(x, depth):
    x = np.array(x).reshape((1, -1))
    labels = np.zeros((x.shape[1], depth))
    for i in range(x.shape[1]):
        labels[i, x[0, i]] = 1
    return labels


x = [2, 4, 7, 8, 1]
a = one_hot(x, 10)
print(a)

方法二：

import numpy as np


def one_hot_(x, depth):
    x = np.array(x).reshape((1, -1))
    labels = np.zeros((x.shape[1], depth))
    labels[np.arange(x.shape[1]), x[0]] = 1
    return labels
 

x = [2, 4, 7, 8, 1]
a = one_hot(x, 10)
print(a)

除了将labels转换为one-hot形式，还有没有其他的解决方法呢？看下面代码实现：

import numpy as np


def mini_solve_not_onthot_cross_entropy_error(x, labels):
    x = np.array(x)
    labels = np.array(labels)
    if x.ndim == 1:
        x = x.reshape((1, -1))
        labels = labels.reshape((1, -1))
    batch_size = x.shape[0]
    return (-np.sum(np.log(x[np.arange(batch_size), labels] + 1e-7)) /
            batch_size)

上面代码兼顾了不是one-hot 和mini_batch 的问题。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。