Typical Loss
  • Mean Squared Error

  • Cross Entropy Loss

    • binary
    • multi-class
    • +softmax
MSE
  • loss=[y(xw+b)]2loss=∑[y−(xw+b)]2

  • L2norm=||y(xw+b)||2L2−norm=||y−(xw+b)||2

  • loss=norm(y(xw+b))2loss=norm(y−(xw+b))2

Derivative

  • loss=[yfθ(x)]2loss=∑[y−fθ(x)]2

  • lossθ=2[yfθ(x)]fθ(x)θ∇loss∇θ=2∑[y−fθ(x)]∗∇fθ(x)∇θ

MSE Gradient

import tensorflow as tf
x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])

with tf.GradientTape() as tape:
tape.watch([w, b])
prob = tf.nn.softmax(x @ w + b, axis=1)
loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y, depth=3), prob))

grads = tape.gradient(loss, [w, b])
grads[0]

<tf.Tensor: id=92, shape=(4, 3), dtype=float32, numpy=
array([[ 0.01156707, -0.00927749, -0.00228957],
       [ 0.03556816, -0.03894382,  0.00337564],
       [-0.02537526,  0.01924876,  0.00612648],
       [-0.0074787 ,  0.00161515,  0.00586352]], dtype=float32)>
grads[1]
<tf.Tensor: id=90, shape=(3,), dtype=float32, numpy=array([-0.01552947,  0.01993286, -0.00440337], dtype=float32)>
Softmax
  • soft version of max

  • 大的越来越大,小的越来越小、越密集

损失函数及其梯度_git

Derivative

pi=eaiNk=1eakpi=eai∑k=1Neak

  • i=j

piaj=eaiNk=1eakaj=pi(1pj)∂pi∂aj=∂eai∑k=1Neak∂aj=pi(1−pj)

  • iji≠j

piaj=eaiNk=1eakaj=pjpi∂pi∂aj=∂eai∑k=1Neak∂aj=−pj∗pi

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])

with tf.GradientTape() as tape:
tape.watch([w, b])
logits =x @ w + b
loss = tf.reduce_mean(
tf.losses.categorical_crossentropy(tf.one_hot(y, depth=3),
logits,
from_logits=True))

grads = tape.gradient(loss, [w, b])
grads[0]

<tf.Tensor: id=226, shape=(4, 3), dtype=float32, numpy=
array([[-0.38076094,  0.33844548,  0.04231545],
       [-1.0262716 , -0.6730384 ,  1.69931   ],
       [ 0.20613424, -0.50421923,  0.298085  ],
       [ 0.5800004 , -0.22329211, -0.35670823]], dtype=float32)>
grads[1]
<tf.Tensor: id=224, shape=(3,), dtype=float32, numpy=array([-0.3719653 ,  0.53269935, -0.16073406], dtype=float32)>