mxnet作为一个强大的机器学习工具,一直缺乏像tensorflow一样的可视化工具,dmlc社区最近将tensorflow的tensorboard部分代码抽出来做成了一个适配mxnet的记录工具mxboard,使得mxnet里面打印出来的log,可以在tensorboard里面实现可视化
mxnet自带可视化
在这之前,可以先看看mxnet之前的网络可视化方式,mxnet内部集成了graphviz的部分功能,使得symbol类型的图可以画出来整个网络结构的拓扑图
示例1:
import mxnet as mx
# 定义网络(mxnet api)
input_symbol = mx.symbol.Variable('input_data')
fc1 = mx.symbol.FullyConnected(data=input_symbol, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data=fc1,name='relu1', act_type='relu')
fc2 = mx.symbol.FullyConnected(data=act1,name='fc2', num_hidden=64)
act2 = mx.symbol.Activation(data=fc2,name='relu2', act_type='relu')
net = mx.symbol.FullyConnected(data=act2,name='fc3', num_hidden=10)
# 绘图
mx.viz.plot_network(net)
示例2:
import mxnet as mx
# 定义网络(gluon api)
net = nn.Sequential()
with net.name_scope():
net.add(nn.Dense(128, activation='relu'))
net.add(nn.Dense(64, activation='relu'))
net.add(nn.Dense(10))
# gluon api实现的图,必须先用symbol初始化一下,注意,这个操作在进行训练的时候是不需要的
input_symbol = mx.symbol.Variable('input_data')
net = net(input_symbol)
# 绘图
mx.viz.plot_network(net)
在jupyter notebook中就可以打印出网络的结构图
需要注意的问题:
- 此处是借助graphviz库来实现的图片绘制,需要确保python安装了graphviz包,一般来说mxnet自带了
- 还需要安装graphviz的二进制工具,并且将bin目录配置到系统的环境变量,详情见:path问题
mxboard结合tensorboard可视化
相信大家对tensorflow的tensorboard可视化功能并不陌生,这个强大的工具可以将模型训练过程中的各种类型数据记录在文件中,然后采用网页的方式展示绘制出的图表,方便直观地了解整个过程
这里,mxboard实现了tensorboard里面的日志打印功能,基本原理是,在mxnet模型训练的过程中,插入mxboard的api调用语句,记录 scalar,text,hisogram等多种形式的数据,然后使用tensorboard打开网页查看(注意,mxboard仅仅只是一个writer,可视化还是要借助tensorflow的tensorboard工具)
安装
安装python对应的包
pip install mxnet
pip install mxboard
pip install tensorflow
pip install tensorboard
其中tensorboard安装完成后,会在python的工具目录下添加一个tensorboard的二进制可执行工具
使用
以mnist训练识别为例,在训练过程中打印出交叉熵,图片集,梯度,网络结构,预测精度等数据,并进行可视化
代码:
import numpy as np
import mxnet as mx
from mxnet import gluon, autograd
from mxnet.gluon import nn
from mxboard import SummaryWriter
# 定义网络(gluon api)
net = nn.HybridSequential()
with net.name_scope():
net.add(nn.Dense(128, activation='relu'))
net.add(nn.Dense(64, activation='relu'))
net.add(nn.Dense(10))
# 初始参数
batch_size = 100
epochs = 10
learning_rate = 0.1
momentum = 0.9
# 导入数据
def transformer(data, label):
data = data.reshape((-1,)).astype(np.float32)/255
return data, label
train_data = gluon.data.DataLoader(
gluon.data.vision.MNIST('./mnist', train=True, transform=transformer),
batch_size=batch_size, shuffle=True, last_batch='discard')
val_data = gluon.data.DataLoader(
gluon.data.vision.MNIST('./mnist', train=False, transform=transformer),
batch_size=batch_size, shuffle=False)
# 预测函数
def testAccuacy(ctx):
metric = mx.metric.Accuracy()
for data, label in val_data: # here use the global val data
data = data.as_in_context(ctx)
label = label.as_in_context(ctx)
output = net(data) # here use the global net
metric.update([label], [output])
return metric.get()
# 训练函数
def train(epochs, ctx):
# 初始化图
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
net.hybridize() # 注意此处是混合图,必须进行hybridize操作
# 定义训练器
trainer = gluon.Trainer(net.collect_params(), 'sgd',
{'learning_rate': learning_rate, 'momentum': momentum})
metric = mx.metric.Accuracy()
loss = gluon.loss.SoftmaxCrossEntropyLoss()
# 收集参数用于在每一步打印梯度等信息
params = net.collect_params()
param_names = params.keys()
# 定义mxboard的writer,设置写入频率2s间隔
sw = SummaryWriter(logdir='./logs', flush_secs=2)
# 定义总步数
global_step = 0
for epoch in range(epochs):
# 开始重置数据迭代器
metric.reset()
for i, (data, label) in enumerate(train_data):
# 在必要的时候将数据拷贝到ctx(比如gpu环境)
data = data.as_in_context(ctx)
label = label.as_in_context(ctx)
# 后向梯度下降
with autograd.record():
output = net(data)
L = loss(output, label)
# 记录交叉熵
sw.add_scalar(tag='cross_entropy', value=L.mean().asscalar(), global_step=global_step)
# 后向计算
global_step += 1
L.backward()
# 训练器步进
trainer.step(data.shape[0])
metric.update([label], [output])
# 记录第一批图片
if i == 0:
sw.add_image('minist_first_minibatch', data.reshape((batch_size, 1, 28, 28)), epoch)
# 第一次训练,记录图结构
if epoch == 0:
sw.add_graph(net)
# 记录每步的梯度
grads = [i.grad() for i in net.collect_params().values()]
assert len(grads) == len(param_names)
for i, name in enumerate(param_names):
sw.add_histogram(tag=name, values=grads[i], global_step=epoch, bins=1000)
# 记录训练数据预测精度
name, train_acc = metric.get()
print('[Epoch %d] Training: %s=%f' % (epoch, name, train_acc))
sw.add_scalar(tag='accuracy_curves', value=('train_acc', train_acc), global_step=epoch)
# 记录测试数据预测精度
name, val_acc = testAccuacy(ctx)
print('[Epoch %d] Validation: %s=%f' % (epoch, name, val_acc))
sw.add_scalar(tag='accuracy_curves', value=('valid_acc', val_acc), global_step=epoch)
# 关闭mxboard的writer
sw.close()
# 导出已训练的图
net.export("mynet", epoch)
# main
ctx = mx.cpu()
train(epochs, ctx)
文件结构
项目根目录
mnist数据目录
在项目根目录用命令行启动tensorboard服务
tensorboard --logdir=./logs --host=127.0.0.1 --port=7000
之后可以在浏览器中打开网页:http://localhost:7000 查看tensorboard可视化内容
输出结果
打印
[Epoch 0] Training: accuracy=0.917067
[Epoch 0] Validation: accuracy=0.947300
[Epoch 1] Training: accuracy=0.965583
[Epoch 1] Validation: accuracy=0.966800
[Epoch 2] Training: accuracy=0.975200
[Epoch 2] Validation: accuracy=0.963200
[Epoch 3] Training: accuracy=0.978600
[Epoch 3] Validation: accuracy=0.968400
[Epoch 4] Training: accuracy=0.982917
[Epoch 4] Validation: accuracy=0.972100
[Epoch 5] Training: accuracy=0.983933
[Epoch 5] Validation: accuracy=0.971900
[Epoch 6] Training: accuracy=0.987033
[Epoch 6] Validation: accuracy=0.976100
[Epoch 7] Training: accuracy=0.987850
[Epoch 7] Validation: accuracy=0.977300
[Epoch 8] Training: accuracy=0.988933
[Epoch 8] Validation: accuracy=0.971300
[Epoch 9] Training: accuracy=0.990800
[Epoch 9] Validation: accuracy=0.973400
日志
可视化
训练和预测精度曲线以及交叉熵
初始批次训练图片集
网络拓扑结构(这里的拓扑图是动态的,比自带效果好)
训练过程参数(权重,偏移,梯度)
所有的可视化可以在训练过程中实时生成和监控