NNVC融合神经网络的屏幕内容编码神经网络模型融合

转载

mob6454cc6dcf7f 2024-01-10 13:10:20

文章标签 NNVC融合神经网络的屏幕内容编码 python tensorflow 深度学习代价函数 文章分类 神经网络人工智能

神经风格迁移

前言
正文

库、包、模板
图像预处理
构建神经风格迁移模型

VGG19预训练模型
基于VGG19自定义模型

代价函数

内容代价函数
风格代价函数
总代价函数

训练模型
结果展示

总结
源码
参考

一笔一划，一个世界

前言

本次搭建的卷积神经网络使用了VGG19预训练模型搭建神经风格迁移模型，所以使用CPU整个训练过程也很快。
环境和库：windows10+pycharm+tensorflow+keras+python+CPU
使用tensorflow_hub可以快速体验神经风格迁移，不过需要VPN

import tensorflow_hub as hub

# 如果搭了VPN还是出现超时的问题，可以直接点开该url，
# 然后直接download压缩文件，
# 然后解压后把解压后的文件夹路径替换该url即可
hub_module = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/1')
stylized_image = hub_module(tf.constant(content_image), tf.constant(style_image))[0]
# 自定义函数tensor_to_image，实现见下文
tensor_to_image(stylized_image)

正文

库、包、模板

import numpy as np
import pandas as pd

import tensorflow as tf
import keras.preprocessing.image as process_im
from keras.applications import vgg19
from keras.models import Model
from tensorflow.python.keras import models
from tensorflow.python.keras import losses
from tensorflow.python.keras import layers
from tensorflow.python.keras import backend as k

from PIL import Image		# Python Image Library
import matplotlib.pyplot as plt	# 可视化库
import functools			# 用以为可调用对象（callable objects）定义高阶函数或操作
import IPython.display		# python的交互式shell

图像预处理

PIL中的image对象调用resize方法，此处将图像进行缩放为某高宽的高质量图像
img_to_array转换前后类型都是一样的，唯一区别是转换前元素类型是整型，转换后元素类型是浮点型(和keras等机器学习框架相适应的图像类型。Keras introduced a special function called img_to_array which accepts an input image and then orders the channels correctly based on the image_data_format setting)
expand_dims在数组中增加一个维度，reshape能够达到相同的效果
作用：加载图像；将图像等比例缩放，最长边为512；并返回图像对应三维数组
源码：

# 加载图像；将图像进行缩放，最长边为512；并将图像转换为array输出
def load_img(img_path):
    """

    :param img_path: 
    :return: 
    """
    max_dim = 512
    img = tf.io.read_file(img_path)  # coded
    img = tf.image.decode_image(img, channels=3)  # 0-255
    img = tf.image.convert_image_dtype(img, tf.float32)  # 0-1

    shape = tf.cast(tf.shape(img)[:-1], tf.float32)
    long_dim = max(shape)
    scale = max_dim / long_dim

    new_shape = tf.cast(shape * scale, tf.int32)

    img = tf.image.resize(img, new_shape)
    img = img[tf.newaxis, :]        # 升维，便于卷积神经网络处理
    return img

构建神经风格迁移模型

VGG19预训练模型

函数原型如下，返回一个Kera Model对象

keras.applications.vgg19.VGG19(include_top=True,
							   weights='imagenet', 	
							   input_tensor=None,	
							   input_shape=None, 
							   pooling=None,
							   classes=1000)

有点儿不理解include_top的用法，文档中说：是否包括顶层的全连接层，但是还是有点儿不清楚，所以直接调用model.summary()来一探究竟，结果是：是否删去所有的全连接层
还有对于pooling的用法，实验说明：表示在include_top=False时，最后一层是否要加上一层什么样的池化层。注意：最后一块的最后一层本来就有一层最大池化层
此处我们使用VGG19预训练模型对于VGG19的需求如下：不需要最后的全连接层（include_top=False）、需要预训练权重（weights=‘imagenet’）、其他为默认值。

基于VGG19自定义模型

整个模型的输入即为VGG19的输入，而输出则需要根据神经风格迁移网络的代价函数来定，我们需要提取内容图像中的内容、风格图像中的风格(blog)，由此可知：内容代价函数我们需要提取靠近中间的某层的激活值输出，风格代价函数我们需要提取每一层的激活值输出。
值得一提的是：我们自定义好的模型之后，一些完全无关的层会被自动剔除，可用model.summary()验证
对于VGG19来说，模型结构如下图所示：
需要提取的层的激活值如下所示

content_layers = ['block5_conv2']
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']

在搞明白以上内容之后，我们可以调用models.Model(input=, output=)构建我们需要的神经风格迁移模型了。

content_layers = ['block5_conv2']
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']
# get the neural transfer model
def get_model(content_layers_names, style_layers_names):
    """
    :param content_layers_names: content layers names list
    :param style_layers_names: style layers name list
    :return: neural transfer model
    """
    vgg = tf.keras.applications.vgg19.VGG19(include_top=False)  # 删掉最后一层；默认加载ImageNet上的预训练权重
    vgg.trainable = False  # 参数不可训练

    # get the content layer and style layer
    content_output = [vgg.get_layer(name=layer).output for layer in content_layers_names]
    style_output = [vgg.get_layer(name=layer).output for layer in style_layers_names]
    model_output = style_output + content_output  # list combine

    # get the neural transfer model
    neural_transfer_model = models.Model([vgg.input], model_output)
    return neural_transfer_model

代价函数

内容代价函数

内容直接定义为内容层的激活值，所以内容代价函数定义为激活值的均方差误差MSE

# get content loss
def content_loss_layer(output_layer, target_layer):
    """
    :param output_layer:
    :param target_layer:
    :return: some layer content loss
    """
    # MSE, all elements
    content_loss = tf.reduce_mean(tf.square(output_layer - target_layer))
    return content_loss

风格代价函数

风格定义为不同通道之间激活值的相关系数，每一层有一个风格矩阵（即为Gram矩阵），风格代价函数定义类似于风格矩阵的均方误差

# get the gram matrix for style cost
def gram_matrix(layer):
    """
    :param layer: some layer of model
    :return: Gram matrix
    """
    channels = int(layer.shape[-1])		# 通道数为layer最后一层的维度
    vector = tf.reshape(layer, [-1, channels])  # reshape to n*channel
    n = tf.shape(vector)[0]			# n=nw*nh

    # vector.T*vector
    gram = tf.matmul(vector, vector, transpose_a=True)
    return gram / tf.cast(n, tf.float32)


# get some layer style loss
def style_loss_layer(output_layer, target_layer):
    """
    :param output_layer:
    :param target_layer:
    :return:
    """
    # get the gram matrix of the layer
    gram_output = gram_matrix(output_layer)
    gram_target = gram_matrix(target_layer)

    # use the gram to compute the loss of some layer
    # /(nw*nh*nc)
    style_loss = tf.reduce_mean(tf.square(gram_target - gram_output))
    return style_loss

总代价函数

总代价函数由内容代价函数和风格代价函数通过加权和得来
为什么在get_features方法中截取content_feature&style_feature的时候用的layer[0]，要解决这个问题必须要理解每一层的激活值到底是什么样的结构！
同样的问题：在neural_transfer_cost方法中对输出图像的输出激活值只去第零维的值
源码：

# get the style layer output and content layer output
def get_features(neural_transfer_model, content_image, style_image):
    """

    :param neural_transfer_model:
    :param content_image:
    :param style_image:
    :return: 
    """
	content_image = content_image*255.
	style_image = style_image*255.
	
    # preprocess the content image and style image
    content_image = tf.keras.applications.vgg19.preprocess_input(content_image)
    style_image = tf.keras.applications.vgg19.preprocess_input(style_image)

    # input the preprocessed content image and style image
    # get the output of the neural transfer model
    content_output = neural_transfer_model(content_image)
    style_output = neural_transfer_model(style_image)

    # extract the content layer output and style layer output
    content_feature = [layer for layer in content_output[number_style:]]
    style_feature = [layer for layer in style_output[:number_style]]

    return content_feature, style_feature


# get the cost of neural transfer model
def neural_transfer_cost(neural_transfer_model, loss_weights, output, target_style_features, target_content_features):
    """

    :param neural_transfer_model:
    :param loss_weights:
    :param output:
    :param target_style_features:
    :param target_content_features:
    :return: model cost
    """
    # initial var
    style_weight, content_weight = loss_weights
    content_loss = 0
    style_loss = 0
    
    # get output image features
    # 0-1 --> 0.-255.
    output = output*255.
   	output = tf.keras.applications.vgg19.preprocess_input(output)
    output_features = neural_transfer_model(output)
    output_style_features = output_features[:number_style]
    output_content_features = output_features[number_style:]

    # get the style loss
    for i, j in zip(target_style_features, output_style_features):
        style_loss += style_loss_layer(i, j)
    style_loss *= 1. / number_style

    # get the content loss
    for i, j in zip(target_content_features, output_content_features):
        content_loss += content_loss_layer(i, j)
    content_loss *= 1. / number_content

    # get the total cost of neural transfer model
    total_cost = content_weight * content_loss + style_weight * style_loss

    return total_cost

训练模型

使用参数字典作为方法参数
with tf.GradientTape() as tape:在context manager中监视计算过程，随后可以调用tape来计算偏导
tf.clip_by_value将tensor的值限制在一定范围内
var.numpy()是什么意思？

def run_style_transfer(content_image, style_image, epochs=500, content_weight=1e3, style_weight=1e-2):
    """

    :param content_image:
    :param style_image:
    :param epochs:
    :param content_weight:
    :param style_weight:
    :return:
    """
    # get the neural style model
    neural_transfer_model = get_model(content_layers_names=content_layers, style_layers_names=style_layers)

    # get the content layer and style layer features of content image and style image
    target_content_feature, target_style_feature = get_features(neural_transfer_model, content_image, style_image)

    # output image
    output = tf.keras.applications.vgg19.preprocess_input(content_image)
    output = tf.Variable(output, dtype=tf.float32)

    # loss weights
    loss_weights = (style_weight, content_weight)

    dictionary = {'neural_transfer_model': neural_transfer_model,
                  'loss_weights': loss_weights,
                  'output': output,
                  'target_style_features': target_style_feature,
                  'target_content_features': target_content_feature}

    # get the optimizer
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

    start_time = time.time()
    start_50_epochs = time.time()

    for i in range(epochs):
        with tf.GradientTape() as tape:
            model_cost = neural_transfer_cost(**dictionary)
        grad = tape.gradient(model_cost, output)
        optimizer.apply_gradients([(grad, output)])
        clipped = tf.clip_by_value(output, 0, 1)
        output.assign(clipped)

        # for visualization
        if i % 50 == 0:
            end_50_epochs = time.time()
            print('Epoch:{}, duration:{}s'.format(i, end_50_epochs-start_50_epochs))
            print('Total loss: {:.4e}, '.format(model_cost))
            print('----------------------------------------------')
            start_50_epochs = time.time()

            plt.imshow(tensor_to_image(output))     # 0-1, (1 512 512 3) tf.Variable float32
            plt.show()

    IPython.display.clear_output(wait=True)

    end_time = time.time()
    print(str(end_time - start_time) + 's')
    return output

结果展示

第一次用的广州塔的图片和梵高的星空，然后效果不是特别好，可能在参数方面需要调整。图我就不放了，效果不太行，有点丑。
然后我就换成了小黄狗，左图迭代10次，右图迭代大概100次，可以看到效果是不错的。

NNVC融合神经网络的屏幕内容编码神经网络模型融合_NNVC融合神经网络的屏幕内容编码

NNVC融合神经网络的屏幕内容编码神经网络模型融合_深度学习_02

NNVC融合神经网络的屏幕内容编码神经网络模型融合_NNVC融合神经网络的屏幕内容编码_03

NNVC融合神经网络的屏幕内容编码神经网络模型融合_tensorflow_04

NNVC融合神经网络的屏幕内容编码神经网络模型融合_python_05

NNVC融合神经网络的屏幕内容编码神经网络模型融合_tensorflow_06

源码：

# 显示图像和图像标题
def show_im(img, title=None):
    """
    :param img:
    :param title:
    """
    if len(img.shape) > 3:
        image = tf.squeeze(img, axis=0)
    plt.imshow(image)
    if title:
        plt.title(title)
    plt.show()


# get and show content image and style image 
content_path = './input/to.jpg'
style_path = './input/vagao.jpg'
content_image = load_img(content_path)
style_image = load_img(style_path)  # （1 512 512 3）, ndarry, float32, 0-1
show_im(content_image)
show_im(style_image)

# content layers and style layers
content_layers = ['block5_conv2']
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']
number_content = len(content_layers)
number_style = len(style_layers)

# run the style transfer
image_neural_transfer = run_style_transfer(content_image, style_image, epochs=1000)

# show the result
image_neural_transfer = tensor_to_image(image_neural_transfer)
plt.imshow(image_neural_transfer)
plt.show()

总结

主要练习两个内容：如何使用预训练模型、如何搭建神经风格迁移模型
在实现过程中遇到了不少麻烦，一直感觉：输出的图像有问题，但是应该不是模型的问题，而是数据类型转换的时候搞错了@.@
第二天更新：输出图像怪怪的（指广州塔）。现在解决了一个问题：弄清楚了图像输入的格式、图像显示的格式。vgg19.preprocess_input()的输入要求0.-255.，输出层的输出0.-1.，计算代价函数的输入0.-1.，所以要特别注意

源码

想了想还是不直接贴全部源码了

# 无

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：html5 能刷身份证吗 h5可以对接人脸识别吗

下一篇：基于 MapReduce 的大数据挖掘实例基于mapreduce的课程设计

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

NNVC融合神经网络的屏幕内容编码 神经网络模型融合

NNVC融合神经网络的屏幕内容编码 神经网络模型融合

神经风格迁移

前言

正文

库、包、模板

图像预处理

构建神经风格迁移模型

VGG19预训练模型

基于VGG19自定义模型

代价函数

内容代价函数

风格代价函数

总代价函数

训练模型

结果展示

总结

源码

51CTO博客

NNVC融合神经网络的屏幕内容编码神经网络模型融合

NNVC融合神经网络的屏幕内容编码神经网络模型融合