分析Tensorflow模型文件

在checkpoint_dir目录下保存的文件结构如下

|--checkpoint_dir
|    |--checkpoint
|    |--MyModel.meta
|    |--MyModel.data-00000-of-00001
|    |--MyModel.index

MyModel.meta文件保存的是图结构,meta文件是pb(protocol buffer)格式文件,包含变量、op、集合等。

ckpt文件是二进制文件,保存了所有的weights、biases、gradients等变量。

tensorflow 0.11之前,保存在.ckpt文件中。0.11后,通过两个文件保存:

MyModel.data-00000-of-00001
MyModel.index

我们还可以看,checkpoint_dir目录下还有checkpoint文件,该文件是个文本文件

里面记录了保存的最新的checkpoint文件以及其它checkpoint文件列表

inference时,可以通过修改这个文件,指定使用哪个model


保存Tensorflow模型

tensorflow 提供了tf.train.Saver类来保存模型

值得注意的是,在tensorflow中,变量是存在于Session环境中

也就是说,只有在Session环境下才会存有变量值,因此,保存模型时需要传入session

saver = tf.train.Saver()
saver.save(sess,"./checkpoint_dir/MyModel")

简单例子:

import tensorflow as tf

w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, './checkpoint_dir/MyModel')

执行后,在checkpoint_dir目录下创建模型文件如下:

checkpoint
MyModel.data-00000-of-00001
MyModel.index
MyModel.meta

另外,如果想要在1000次迭代后,再保存模型,只需设置global_step参数即可:

saver.save(sess, './checkpoint_dir/MyModel',global_step=1000)

保存的模型文件名称会在后面加-1000,如下:

checkpoint
MyModel-1000.data-00000-of-00001
MyModel-1000.index
MyModel-1000.meta

在实际训练中,我们可能会在每1000次迭代中保存一次模型数据

但是由于图是不变的,没必要每次都去保存,可以通过如下方式指定不保存图:

saver.save(sess, './checkpoint_dir/MyModel',global_step=step,write_meta_graph=False)

另一种比较实用的是,如果你希望每2小时保存一次模型,并且只保存最近的5个模型文件:

tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=2)

注意:tensorflow默认只会保存最近的5个模型文件,如果你希望保存更多,可以通过max_to_keep来指定

如果我们不对tf.train.Saver指定任何参数,默认会保存所有变量

如果你不想保存所有变量,而只保存一部分变量,可以通过指定variables/collections

在创建tf.train.Saver实例时,通过将需要保存的变量构造list或者dictionary,传入到Saver中:

import tensorflow as tf
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver([w1,w2])
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, './checkpoint_dir/MyModel',global_step=1000)

导入Tensorflow模型

把图搞进来

saver=tf.train.import_meta_graph('./checkpoint_dir/MyModel-1000.meta')

加载参数

仅仅有图并没有用,更重要的是,我们需要前面训练好的模型参数(即weights、biases等)

import tensorflow as tf
with tf.Session() as sess:
  new_saver = tf.train.import_meta_graph('./checkpoint_dir/MyModel-1000.meta')
  new_saver.restore(sess, tf.train.latest_checkpoint('./checkpoint_dir'))

此时,W1和W2加载进了图,并且可以被访问

import tensorflow as tf
with tf.Session() as sess:    
    saver = tf.train.import_meta_graph('./checkpoint_dir/MyModel-1000.meta')
    saver.restore(sess,tf.train.latest_checkpoint('./checkpoint_dir'))
    print(sess.run('w1:0'))
##Model has been restored. Above statement will print the saved value

执行后,打印如下:

[ 0.51480412 -0.56989086]

使用恢复的Tensorflow模型

很多时候,我们希望使用一些已经训练好的模型,如prediction、fine-tuning以及进一步训练等

这时候,我们可能需要获取训练好的模型中的一些中间结果值

可以通过 graph.get_tensor_by_name('w1:0')来获取,注意w1:0是tensor的name

假设我们有一个简单的网络模型,代码如下:

import tensorflow as tf


w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias") 

#定义一个op,用于后面恢复
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())

#创建一个Saver对象,用于保存所有变量
saver = tf.train.Saver()

#通过传入数据,执行op
print(sess.run(w4,feed_dict ={w1:4,w2:8}))
#打印 24.0 ==>(w1+w2)*b1

#现在保存模型
saver.save(sess, './checkpoint_dir/MyModel',global_step=1000)

接下来我们使用graph.get_tensor_by_name()方法来操纵这个保存的模型。

import tensorflow as tf

sess=tf.Session()
#先加载图和参数变量
saver = tf.train.import_meta_graph('./checkpoint_dir/MyModel-1000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./checkpoint_dir'))


# 访问placeholders变量,并且创建feed-dict来作为placeholders的新值
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}

#接下来,访问你想要执行的op
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")

print(sess.run(op_to_restore,feed_dict))
#打印结果为60.0==>(13+17)*2

注意:保存模型时,只会保存变量的值,placeholder里面的值不会被保存

如果你不仅仅是用训练好的模型,还要加入一些op,或者说加入一些layers并训练新的模型,可以通过一个简单例子来看如何操作:

import tensorflow as tf

sess = tf.Session()
# 先加载图和变量
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))

# 访问placeholders变量,并且创建feed-dict来作为placeholders的新值
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict = {w1: 13.0, w2: 17.0}

#接下来,访问你想要执行的op
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")

# 在当前图中能够加入op
add_on_op = tf.multiply(op_to_restore, 2)

print (sess.run(add_on_op, feed_dict))
# 打印120.0==>(13+17)*2*2

如果只想恢复图的一部分,并且再加入其它的op用于fine-tuning

只需通过graph.get_tensor_by_name()方法获取需要的op,并且在此基础上建立图

看一个简单例子,假设我们需要在训练好的VGG网络使用图,并且修改最后一层,将输出改为2,用于fine-tuning新数据

......
......
saver = tf.train.import_meta_graph('vgg.meta')
# 访问图
graph = tf.get_default_graph() 
 
#访问用于fine-tuning的output
fc7= graph.get_tensor_by_name('fc7:0')
 
#如果你想修改最后一层梯度,需要如下
fc7 = tf.stop_gradient(fc7) # It's an identity function
fc7_shape= fc7.get_shape().as_list()

new_outputs=2
weights = tf.Variable(tf.truncated_normal([fc7_shape[3], num_outputs], stddev=0.05))
biases = tf.Variable(tf.constant(0.05, shape=[num_outputs]))
output = tf.matmul(fc7, weights) + biases
pred = tf.nn.softmax(output)

# Now, you run this with fine-tuning data in sess.run()

手写数字识别案例(基于tensorflow实现)

首先我们需要下载vgg权重文件:https://github.com/tensorflow/models/tree/1af55e018eebce03fb61bba9959a04672536107d/research/slim

构建我们的vgg16网络结构.meta文件

def vgg16(inputs):
    with slim.arg_scope([slim.conv2d, slim.fully_connected],activation_fn=tf.nn.relu,weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),weights_regularizer=slim.l2_regularizer(0.0005)):
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
        net = slim.max_pool2d(net1, [2, 2], scope='pool1')
        net = slim.repeat(net2, 2, slim.conv2d, 128, [3, 3], scope='conv2')
        net = slim.max_pool2d(net3, [2, 2], scope='pool2')
        net = slim.repeat(net4, 3, slim.conv2d, 256, [3, 3], scope='conv3')
        net = slim.max_pool2d(net5, [2, 2], scope='pool3')
        net = slim.repeat(net6, 3, slim.conv2d, 512, [3, 3], scope='conv4')
        net = slim.max_pool2d(net7, [2, 2], scope='pool4')
        net = slim.repeat(net8, 3, slim.conv2d, 512, [3, 3], scope='conv5')
        net = slim.max_pool2d(net9, [2, 2], scope='pool5')
        return net

tf.contrib.slim他是可以帮助我们更加快捷的搭建我们的网络结构

例如上面我们的vgg16网络结构,中间可能涉及连续的卷积层

按照我们常规的写法,则我们的代码会存在很多重复,而使用slim则会简洁很多

例如net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope=‘conv1’)

第一个参数是我们的输入,第二个参数是我们重复的次数,第三个则是我们要用的层

第四个和第五个参数是我们重复的每一层的卷积核数量和大小,第五个就是scope,可用于指定这一层的变量的范围

我们构建的网络是怎么从权重文件中获取对应权重的呢?

这个就是变量名来决定的,至于这个变量名我们则需要查看上图中的Code源码进行查看

这就是我们在定义模型时指定scope的原因,就是为了和权重文件中对应权重的名称匹配

slim.arg_scope函数的作用主要就是为目标函数(上面是slim.conv2d, slim.fully_connected)设置默认超参数(activation_fn等…)

因为我们的卷积层中可能会有重复的超参数,这样在slim.arg_scope就可以简化我们的代码

如果我们某一层不需要默认的参数,则是需要在那一层重新传入即可。

def extract_features(inputs):
    input_image = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='input_image')
    with tf.variable_scope('vgg_16', reuse=tf.AUTO_REUSE):
        model = vgg16(input_image)
    variable_restore_op = slim.assign_from_checkpoint_fn("pretrain_models/vgg_16.ckpt", slim.get_trainable_variables(), ignore_missing_vars=True)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        variable_restore_op(sess)
        feature_x = sess.run(model, feed_dict={input_image:inputs})

    return feature_x

这里就是导入我们的预训练模型的权重

slim.assign_from_checkpoint_fn的第一个参数就是我们权重文件的路径

第二个参数是我们要把权重文件中的权重恢复到哪些变量上面

第三个参数ignore_missing_vars设为true的意义是忽略那些在定义的模型结构中可能存在的而在预训练模型中没有的变量

然后通过传进来的数据集,在sess下运行,输出我们通过卷积层提取到的特征。

拿到特征后,接着需要构建我们自己的模型,这就是基本的用TensorFlow训练模型的步骤

我们只需要把经过上面特征提取的数据集作为模型的输入进行训练即可

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    X = tf.placeholder(tf.float32, [None, n_H0, n_W0, n_C0])
    Y = tf.placeholder(tf.float32, [None, n_y])

    return X, Y

def forward_propagation(inputs):
    a = tf.contrib.layers.flatten(inputs)
    #fc1
    a1 = tf.contrib.layers.fully_connected(a, 4608)
    ad1 = tf.nn.dropout(a1,0.5)
    #fc2
    a2 = tf.contrib.layers.fully_connected(ad1, 1000)
    ad2 = tf.nn.dropout(a2, 0.5)
    #outputs
    z3 = tf.contrib.layers.fully_connected(ad2, 10, activation_fn=None)
    return z3

def compute_cost(Z3, Y):
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z3,labels=Y))
    return cost

def my_model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,num_epochs=30,minibatch_size=64,print_cost=True,isPlot=True):

    tf.reset_default_graph()
    tf.set_random_seed(1)
    seed = 3

    (m, n_H0, n_W0, n_C0) = X_train.shape
    n_y = Y_train.shape[1]
    costs = []
    X, Y = create_placeholders(n_H0,n_W0,n_C0,n_y)
    Z3 = forward_propagation(X)
    cost = compute_cost(Z3, Y)
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    total_time = 0

    with tf.Session() as sess:
        sess.run(init)
        for epoch in range(1,num_epochs+1):
            #每一迭代的开始时间
            start_time = time.clock()
            #每一次迭代中所有batch的代价,即总的代价
            minibatches_cost = 0

            num_minibatches = int(m / minibatch_size)
            #将我们的数据随机打乱并根据batch_size随机分成若干个batch
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                (minibatch_X, minibatch_Y) = minibatch
                _, temp_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y:minibatch_Y})
                minibatches_cost += temp_cost/num_minibatches

            end_time = time.clock()
            total_time += (end_time - start_time)
            if print_cost:
                if epoch % 10 == 0:
                    print("当前是第 " + str(epoch) + " 代,成本值为:" + str(minibatches_cost) + " ; 每一个epoch花费时间:" + str(
                        end_time - start_time) + " 秒,10个epoch总的时间:" + str(total_time))
                    total_time = 0

            if epoch % 10 == 0:
                costs.append(minibatches_cost)

        saver.save(sess, "model_tf/my-model")
        if isPlot:
            plt.plot(np.squeeze(costs))
            plt.ylabel("cost")
            plt.xlabel("iterations (per tens)")
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()

        predict_op = tf.argmax(Z3, 1)
        corrent_prediction = tf.equal(predict_op, tf.argmax(Y, 1))

        accuracy = tf.reduce_mean(tf.cast(corrent_prediction, "float"))

        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})

        print("训练集准确度:" + str(train_accuracy))
        print("测试及准确度:" + str(test_accuracy))

    return (train_accuracy, test_accuracy)

训练特定层,冻结其它层

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import time
slim = tf.contrib.slim

import os
import h5py
import math
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
import cv2
import random

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)]
    return Y

def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):

    m = X.shape[0]                  # number of training examples
    mini_batches = []
    np.random.seed(seed)
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[permutation,:,:,:]
    shuffled_Y = Y[permutation,:]
    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
        mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    # Handling the end case (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
        mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    return mini_batches

def vgg16(inputs):

    with slim.arg_scope([slim.conv2d, slim.fully_connected],activation_fn=tf.nn.relu,weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),weights_regularizer=slim.l2_regularizer(0.0005)):
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
        net = slim.max_pool2d(net, [2, 2], scope='pool1')
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
        net = slim.max_pool2d(net, [2, 2], scope='pool2')
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
        net = slim.max_pool2d(net, [2, 2], scope='pool3')
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
        net = slim.max_pool2d(net, [2, 2], scope='pool4')
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
        net = slim.max_pool2d(net, [2, 2], scope='pool5')
        # net = tf.contrib.layers.flatten(net)
        # net = slim.fully_connected(net, 4096, scope='fc6')
        net = slim.conv2d(net, 4096, [7, 7], padding="VALID", scope='fc6')
        net = slim.dropout(net, 0.5, scope='dropout6')
        # net = slim.fully_connected(net, 4096, scope='fc7')
        net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
        net = slim.dropout(net, 0.5, scope='dropout7')
        net = slim.conv2d(net, 10,[1, 1], activation_fn=None, scope='fc8')
        net = tf.squeeze(net, [1, 2])
        return net

def load_weights(input_image):
    # input_image = tf.placeholder(tf.float32, shape=[None, 96, 96, 3], name='input_image')
    with tf.variable_scope('vgg_16', reuse=tf.AUTO_REUSE):
        net = vgg16(input_image)

    variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
    length = len(variables)
    parameters = []
    for i in range(6):
        parameters.append(variables[length - 6 + i])

    variable_restore_op = slim.assign_from_checkpoint_fn("pretrain_models/vgg_16.ckpt",variables[:length-2],ignore_missing_vars=True)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        variable_restore_op(sess)
    print(parameters)
    return net, parameters

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    X = tf.placeholder(tf.float32, shape=[None, 224, 224, 3])
    Y = tf.placeholder(tf.float32, [None, n_y])

    return X, Y

def compute_cost(Z3, Y):
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z3,labels=Y))
    return cost

def my_model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,num_epochs=30,minibatch_size=64,print_cost=True,isPlot=True):

    tf.reset_default_graph()
    tf.set_random_seed(1)
    seed = 3

    (m, n_H0, n_W0, n_C0) = X_train.shape
    n_y = Y_train.shape[1]
    costs = []
    X, Y = create_placeholders(n_H0,n_W0,n_C0,n_y)
    Z, parameters = load_weights(X)
    cost = compute_cost(Z, Y)
    #只训练var_list中的权重参数
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=parameters )
    print(parameters)
    print("begin training...")
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    total_time = 0

    with tf.Session() as sess:
        sess.run(init)
        for epoch in range(1,num_epochs+1):
            #每一迭代的开始时间
            start_time = time.clock()
            #每一次迭代中所有batch的代价,即总的代价
            minibatches_cost = 0

            num_minibatches = int(m / minibatch_size)
            #将我们的数据随机打乱并根据batch_size随机分成若干个batch
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
            for minibatch in minibatches:
                
                (minibatch_X, minibatch_Y) = minibatch
                _, temp_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y:minibatch_Y})
                minibatches_cost += temp_cost/num_minibatches

            end_time = time.clock()
            total_time += (end_time - start_time)
            if print_cost:
                if epoch % 5 == 0:
                    print("当前是第 " + str(epoch) + " 代,成本值为:" + str(minibatches_cost) + " ; 每一个epoch花费时间:" + str(
                        end_time - start_time) + " 秒,5个epoch总的时间:" + str(total_time))
                    total_time = 0

            if epoch % 5 == 0:
                costs.append(minibatches_cost)

        saver.save(sess, "model_tf/my-model")
        if isPlot:
            plt.plot(np.squeeze(costs))
            plt.ylabel("cost")
            plt.xlabel("iterations (per tens)")
            plt.title("Learning rate =" + str(learning_rate))
            plt.show()

        predict_op = tf.argmax(Z, 1)
        corrent_prediction = tf.equal(predict_op, tf.argmax(Y, 1))

        accuracy = tf.reduce_mean(tf.cast(corrent_prediction, "float"))

        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})

        print("训练集准确度:" + str(train_accuracy))
        print("测试及准确度:" + str(test_accuracy))

    return (train_accuracy, test_accuracy)

from keras.datasets import mnist
(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x = train_x[:1000]
train_y = train_y[:1000]
test_x = test_x[:500]
test_y = test_y[:500]

train_x = np.array(train_x, dtype="uint8")
test_x = np.array(test_x, dtype="uint8")

#将灰度图中的单通道转为RGB多通道
train_x = [cv2.cvtColor(cv2.resize(x, (224, 224)), cv2.COLOR_GRAY2BGR) for x in train_x]
train_x = np.concatenate([arr[np.newaxis] for arr in train_x]).astype('float32')

test_x = [cv2.cvtColor(cv2.resize(x, (224, 224)), cv2.COLOR_GRAY2BGR) for x in test_x]
test_x = np.concatenate([arr[np.newaxis] for arr in test_x]).astype('float32')

train_y = train_y.reshape(len(train_y),1).astype(int)
test_y = test_y.reshape(len(test_y),1).astype(int)
train_y = convert_to_one_hot(train_y,10)
test_y = convert_to_one_hot(test_y, 10)

print("训练集:")
print(train_x.shape)
print(train_y.shape)
print("测试集:")
print(test_x.shape)
print(test_y.shape)

#训练模型
my_model(train_x, train_y, test_x, test_y)

我们需要我们构建的vgg16网络结构是包含全连接层的

我们需要把输出层的神经元修改为我们需要的类别数

在使用slim.assign_from_checkpoint_fn恢复权重的时候我们恢复除输出层的其它层的权重

因为我们已经修改了输出层的神经元数目,因此指定恢复变量为variables[:length-2]

因为最后一层包括weightsbias两个变量,所以是减2。

关于冻结部分层的做法:我们只需要在指定优化器的时候,将我们需要训练的层的变量作为var_list参数传进去

而在本例子中我们需要训练的层是后面的全连接层,因此在load_weights函数中我们返回了全连接层的变量。

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost,var_list=parameters)

参考文献: