Neural Style原理

Cnn C n n 是一个预先训练好的深度卷积神经网络, X X 是输入图片. Cnn(X)Cnn(X) 是输入了图像 X X 的网络. FXL∈Cnn(X)FXL∈Cnn(X) L L 层的特征图. 我们通过 FXLFXL 定义了 L L XX 的Content. 如果 Y Y 是与 XX 同样大小的另一个图片,则 L L 层的Content的距离误差定义如下:

 

DLC(X,Y)=∥FXL−FYL∥2=∑i(FXL(i)−FYL(i))2(1)(1)DCL(X,Y)=‖FXL−FYL‖2=∑i(FXL(i)−FYL(i))2

FXL(i) F X L ( i ) FXL F X L 的第 i i 个元素 .GXLGXL 是一个 K K \ x\ KK 大小的矩阵. GXL G X L 中 第 k k 行 第ll列元素 GXL(k,l) G X L ( k , l ) FkXL F X L k FlXL F X L l 的积:

 

GXL(k,l)=⟨FkXL,FlXL⟩=∑iFkXL(i).FlXL(i)(2) (2) G X L ( k , l ) = ⟨ F X L k , F X L l ⟩ = ∑ i F X L k ( i ) . F X L l ( i )

 

FkXL(i) F X L k ( i ) FkXL F X L k 的第 i i 个元素. GXL(k,l)GXL(k,l) 用来度量 k k ll 之间的相关性. GXL G X L 表示 L L XX 特征图的相关矩阵. 定义 L L 层Style的距离误差如下:

 

DLS(X,Y)=∥GXL−GYL∥2=∑k,l(GXL(k,l)−GYL(k,l))2(3)(3)DSL(X,Y)=‖GXL−GYL‖2=∑k,l(GXL(k,l)−GYL(k,l))2

为了最小化关于可变图像 X X 和目标图像 CC DC(X,C) D C ( X , C ) 以及关于 X X 和目标样式 SS DS(X,S) D S ( X , S ) , 我们计算每个想要的layer上的梯度,并求和:

 

∇extittotal(X,S,C)=∑LCwCLC.∇LCextitcontent(X,C)+∑LSwSLS.∇LSextitstyle(X,S)(4) (4) ∇ e x t i t t o t a l ( X , S , C ) = ∑ L C w C L C . ∇ e x t i t c o n t e n t L C ( X , C ) + ∑ L S w S L S . ∇ e x t i t s t y l e L S ( X , S )

 

LC L C LS L S 分别是所需的图层的Content和Style, wCLC w C L C wSLS w S L S 分别为与之相关的权值. 梯度下降如下:

 

X←X−α∇extittotal(X,S,C)(19) (19) X ← X − α ∇ e x t i t t o t a l ( X , S , C )

 

基于TensorFlow的图像艺术化实现:

#导入库
import numpy as np
import tensorflow as tf
import scipy.io as sio
from PIL import Image
import matplotlib.pyplot as plt

#设置参数
#style权重和content权重可以控制结果是趋于风格还是趋于内容
STYLE_WEIGHT=1.5
CONTENT_WEIGHT=1
#style的层数越多,就越能挖掘更多的风格特征,content的层数越深,得到的特征越抽象
STYLE_LAYERS=['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1']
CONTENT_LAYERS=['relu4_2', 'relu5_2']
VGG_PATH = 'imagenet-vgg-verydeep-19.mat'
#VGG模型结构
VGG_LAYERS=(
        'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',
        'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',
        'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4','pool3',
        'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4',
        'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5'
    )
POOL_TYPE='max'

定义网络:

#定义vgg网络
def net_vgg19(input_image,layers,vgg_path,pool_type='max'):
    weights=sio.loadmat(vgg_path)['layers'][0]
    net=input_image
    network={}
    for i,name in enumerate(layers):
        layer_type=name[:4]
        if layer_type=='conv':
            kernels,bias=weights[i][0][0][0][0]
            kernels=np.transpose(kernels,(1,0,2,3))
            conv=tf.nn.conv2d(net,tf.constant(kernels),strides=(1,1,1,1),padding='SAME',name=name)
            net=tf.nn.bias_add(conv,bias.reshape(-1))
            net=tf.nn.relu(net)
        elif layer_type=='pool':
            if pool_type == 'avg':
                net=tf.nn.avg_pool(net, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),padding='SAME')
            else:
                net=tf.nn.max_pool(net, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),padding='SAME')

        network[name]=net

    return network

定义损失误差:

#定义损失误差
def loss_function(style_image,content_image,target_image):
    style_features=net_vgg19([style_image],VGG_LAYERS,VGG_PATH,POOL_TYPE)
    content_features=net_vgg19([content_image],VGG_LAYERS,VGG_PATH,POOL_TYPE)
    target_features=net_vgg19([target_image],VGG_LAYERS,VGG_PATH,POOL_TYPE)

    loss=0.0

    for layer in CONTENT_LAYERS:
        _,height,width,channel=map(lambda i:i.value,content_features[layer].get_shape())
        content_size=height*width*channel
        loss_content=tf.nn.l2_loss(target_features[layer]-content_features[layer])/content_size
        loss+=CONTENT_WEIGHT*loss_content

    for layer in STYLE_LAYERS:
        target_feature=target_features[layer]
        style_feature=style_features[layer]

        _,height,width,channel=map(lambda i:i.value,target_feature.get_shape())

        style_size=height*width*channel
        target_feature=tf.reshape(target_feature,(-1,channel))
        target_gram=tf.matmul(tf.transpose(target_feature),target_feature)/style_size

        style_feature=tf.reshape(style_feature,(-1,channel))
        style_gram=tf.matmul(tf.transpose(style_feature),style_feature)/style_size

        loss_style=tf.nn.l2_loss(target_gram-style_gram)/style_size

        loss+=STYLE_WEIGHT*loss_style

    return loss

训练部分:

#定义stylize函数,进行训练
def stylize(style_image,content_image,learning_rate=0.1,epochs=100):

    target = tf.Variable(tf.random_normal(content_image.shape),dtype=tf.float32)
    style_input = tf.constant(style_image,dtype=tf.float32)
    content_input = tf.constant(content_image, dtype=tf.float32)

    cost=loss_function(style_input,content_input,target)
    #定义优化器
    train=tf.train.AdamOptimizer(learning_rate).minimize(cost)
    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
        tf.global_variables_initializer().run()
        for i in range(epochs):
            _,loss,target_img=sess.run([train,cost,target])

            if(i+1)%100==0:
                print('迭代: %d ,loss: %.8f'%(i+1,loss))
                image=np.clip(target_img+128,0,255).astype(np.uint8)
                img=Image.fromarray(image)
                plt.imshow(img)
                plt.axis('on')
                plt.title('Image')
                plt.show()

读入数据部分:

style_image=Image.open('3-style.jpg')
style_image=np.array(style_image).astype(np.float32)-128.0
content_image=Image.open('shan2.jpg')
content_image=np.array(content_image).astype(np.float32)-128.0
stylize(style_image,content_image,0.2,1000)

本次案例迭代了1000次,每100次边显示艺术效果,可以看看艺术化的变化效果:
深度学习 Neural Style 之TensorFlow实践_图像艺术化
深度学习 Neural Style 之TensorFlow实践_深度学习_02
深度学习 Neural Style 之TensorFlow实践_Neural Style_03
深度学习 Neural Style 之TensorFlow实践_Neural Style_04
深度学习 Neural Style 之TensorFlow实践_图像艺术化_05
深度学习 Neural Style 之TensorFlow实践_Neural Style_06
深度学习 Neural Style 之TensorFlow实践_Neural Style_07
深度学习 Neural Style 之TensorFlow实践_深度学习_08
深度学习 Neural Style 之TensorFlow实践_Neural Style_09
深度学习 Neural Style 之TensorFlow实践_Neural Style_10