python 网络分析 python网络数据

转载

网络小墨舞风 2023-09-14 16:14:48

文章标签 python 网络分析 python 文件名批处理 sed 文章分类 Python 后端开发

Python神经网络4之数据读取、神经网络

数据读取

文件读取流程

构造文件名队列
读取与解码
批处理
线程操作

图片数据

图像基本知识

图片三要素
张量形状

图片特征值处理
数据格式
案例：狗图片读取

二进制数据

CIFAR10二进制数据集介绍
CIFAR10二进制数据读取

NHWC与NCHW

数据读取

三种获取数据到TensorFlow程序的方法

QueueRunner：基于队列的输入管道从TensorFlow图形开头的文件中读取数据
Feeding：运行每一步时，Python代码提供数据
预加载数据：TensorFlow图中的张量包含所有数据（对于数据集）

文件读取流程

多线程+队列的方式

第一阶段构造文件名队列

第二阶段读取与解码

第三阶段批处理

注：这些操作需要启动运行这些队列操作的线程，以便我们在进行文件读取的过程中能够顺利进行入队出队操作

python 网络分析 python网络数据_sed

构造文件名队列

将需要读取的文件的文件名放入文件名队列

tf.train.string_input_producer(string_tensor,shuffle=True)
string_tensor:含有文件名+路径的1阶张量
num_epochs:过几遍数据，默认无限过数据
return 文件队列

读取与解码

从队列当中读取文件内容，并进行解码操作

读取文件内容
阅读器默认每次只读取一个样本

tf.TextLineReader
阅读文本文件逗号分隔值(CSV)格式，默认按行读取
return:读取器实例
tf.WholeFileReader：用于读取图片文件
retur：读取器实例
tf.FixedLengthRecordReader(record_bytes):二进制文件
要读取每个记录是固定数量字节的二进制文件
record_bytes：整型，指定每次读取（一个样本）的字节数
return：读取器实例
tf.TFRecordReader：读取TFReads文件
return：读取器实例

他们有共同的读取方法，read(file_queue),并且都会返回一个Tensor元祖（key文件名字，value默认的内容（一个样本））
由于默认只会读取一个样本，所以如果想要进行批处理，需要使用tf.train.batch或tf.train,shuffle_batch进行批处理操作，便于之后指定每批次多个样本的训练

内容解码
读取不同类型的文件，也应该对读取到的不同类型的内容进行相对应的解码操作，解码成统一的Tensor格式

tf.decode_csv：解码文本文件内容
tf.image.decode_jpeg(contents)
将JPEG编码的图像解码为unit8张量
return：unit8张量，3-D形状 [height,width,channels]
tf.image.decode_png(contents)
将PNG编码的图像解码成unit8张量
return:张量类型，3-D形状 [height,width,channels]
tf.decode_raw：解码二进制文件内容
与tf.FixedLengthRecordReader搭配使用，二进制读取为unit8类型

解码阶段。默认所有的内容多解码成tf.unit8类型，如果之后需要转换成指定类型则可使用tf.cast()进行相应转换

批处理

解码之后，可以直接获取默认的一个样本内容了，但如果想要获取多个样本，需要加入到新的队列进行批处理

tf.train.batch(tensors,batch_size,num_threads=1,capacity=32,name=None)
读取指定大小（个数）的张量
tensors：可以是包含张量的列表，批处理的内容放到列表当中
batch_size：从队列中读取的批处理大小
num_threads：进入队列的线程数
capacity：整数，队列中元素的最大数量
return：tensors
tf.train.shuffle_batch

线程操作

以上用到的队列都是tf.train.QueueRunner对象
每个QueueRunner都负责一个阶段，tf.train.start_queue_runners函数会要求图中的每个QueueRunner启动它的运行队列操作的线程。（这些操作需要在会话中开启）

tf.train.start_queue_runners(sess=None,coord=None)
收集图中所有的队列线程，默认同时启动线程
sess:所在的会话
coord:线程协调器
return：返回所有线程
tf.train.Coordinator()
线程协调员，对线程进行管理和协调
request_stop():请求停止
should_stop():询问是否结束
join(threads=None,stop_grace_period_secs=120):回收线程
return：线程协调员实例

图片数据

图像基本知识

特征抽取：
文本—数值（二维数组shape(n_samples,m_features))
字典—数值（二维数组shape(n_samples,m_features))
两种图片：黑白图片，彩色图片
组成图片的最基本单位是像素

图片三要素

图片长度，图片宽度，图片通道数
灰度图：每一个像素点[0,255]的数，灰度图[长，宽，1]
彩色图：每一个像素点用3个[0,255]的数表示，彩色图[长，宽，3]

张量形状

一张图片可以被表示成一个3D张量，即其形状为[height,width,channel],height就表示高，width表示宽，channel表示通道数

单个图片：[height,width,channel]
多个图片：[batch,height,width,channel]，batch表示一个批次的张量数量

图片特征值处理

缩放图片到统一大小
在进行图像识别的时候，每个图片样本的特征数量要保持相同，所以需要将所有图片张量大小统一转换。另一方面，如果图片的像素量太大，通过这种方式适当减少像素的数量，减少训练的计算开销

tf.image.resize_images(images,size)
缩小放大图片
images:4-D形状[batch,height,width,channels]或3-D形状的张量[height，width,channels]的图片数据
size：1-D int32张量：new_height,new_width,图像的新尺寸
返回4-D格式或者3-D格式图片

数据格式

存储：unit8(节约空间)
矩阵计算：float32(提高精度)

案例：狗图片读取

构造文件名队列
读取与解码
使样本形状和类型统一
批处理

准备100张狗图片

python 网络分析 python网络数据_python_02

import os
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

def picture_read(file_list):
    """
    狗图片读取案例
    :return:
    """
    # 1.构造文件名队列
    file_queue=tf.train.string_input_producer(file_list)

    #2.读取与解码
    reader=tf.WholeFileReader()
    #key文件名 value一张图片原始编码形式
    key,value=reader.read(file_queue)
    print("key:\n",key)
    print("value:\n",value)
    #解码阶段
    image=tf.image.decode_jpeg(value)
    print("image:\n",image)

    #图像的形状、类型修改
    image_resize=tf.image.resize_images(image,[200,200])
    print("image_resize:\n",image_resize)
    #静态形状修改
    image_resize.set_shape(shape=[200,200,3])
    print("image_resized:\n",image_resize)

    #3.批处理
    image_batch=tf.train.batch([image_resize],batch_size=100,num_threads=1,capacity=100)
    print("image_batch:\n",image_batch)


    #开启会话
    with tf.Session() as sess:
        #开启线程
        #线程协调员
        coord=tf.train.Coordinator()
        threads=tf.train.start_queue_runners(sess=sess,coord=coord)

        key_new,value_new,image_new,image_resize_new,image_batch_new=sess.run([key,value,image,image_resize,image_batch])
        print("key_new:\n",key_new)
        print("value_new:\n",value_new)
        print("image_new\n",image_new)
        print("image_resize_new:\n",image_resize_new)
        print("image_batch_new:\n",image_batch_new)

        #回收线程
        coord.request_stop()
        coord.join(threads)

if __name__ == '__main__':
    #构造路径+文件名列表
    filename=os.listdir("./dog")
    # print(filename)
    #拼接文件+路径名
    file_list=[os.path.join("./dog/",file)for file in filename]
    print(file_list)
    picture_read(file_list)

python 网络分析 python网络数据_批处理_03

python 网络分析 python网络数据_python 网络分析_04

python 网络分析 python网络数据_python_05

python 网络分析 python网络数据_python 网络分析_06

python 网络分析 python网络数据_python_07

python 网络分析 python网络数据_python_08

python 网络分析 python网络数据_python_09

python 网络分析 python网络数据_python_10

二进制数据

CIFAR10二进制数据集介绍

python 网络分析 python网络数据_python 网络分析_11

二进制版本数据文件
包含文件data_batch_1.bin,data_batch_2.bin,…,data_batch_5.bin以及test_batch.bin
这些文件中的每一个格式如下，数据中每个样本包含了特征值和目标值

<1×标签><3072×像素>
…
<1×标签><3072×像素>

每3073个字节是一个样本，1个目标值+3072个像素，第一个字节是第一个图像的标签，它是一个0-9范围的数字，接下来的3072个字节是图像像素的值。前1024个字节是红色通道值，下1024个绿色，最后1024个蓝色

CIFAR10二进制数据读取

构造文件队列
读取二进制数据并进行解码
处理图片数据形状以及数据类型，批处理返回
开启会话线程运行
一个样本image(3072字节=1024r+1024g+1024b)
[[r[32,32]],
[g[32,32]],
[b[32,32]]]
shape=(3,32,32)=(channels,height,width)=>TensorFlow的图像表示习惯
图片的形状、类型调整完毕

这里的图片形状设置从1维的排列到3维数据的时候，涉及到NHWC与NCHW的概念

NHWC与NCHW

在读取设置图片形状的时候有两种格式：
设置为“NHWC”时，排列顺序为[batch,height,width,channels]
设置为“NCHW”时，排列顺序为[batch,channels,height,width]
N表示这批图像有几张，H表示图像在竖直方向有多少像素，W表示水平方向像素数，C表示通道数

TensorFlow默认的[height,width,channel]
假设RGB三通道两种格式的区别如下图所示：

python 网络分析 python网络数据_sed_12

python 网络分析 python网络数据_文件名_13

二进制数据文件：

python 网络分析 python网络数据_文件名_14

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import os

class Cifar(object):

    def __init__(self):
        #初始化操作
        self.height=32
        self.width=32
        self.channels=3

        #字节数
        self.image_bytes=self.height*self.width*self.channels
        self.label_bytes=1
        self.all_bytes=self.label_bytes+self.image_bytes



    def read_and_decode(self,file_list):
        #1.构造文件名队列
        file_queue=tf.train.string_input_producer(file_list)
        #2.读取与解码
        reader=tf.FixedLengthRecordReader(self.all_bytes)
        #key文件名 value一个样本
        key,value=reader.read(file_queue)
        print("key:\n",key)
        print("value:\n",value)
        #解码阶段
        decoded=tf.decode_raw(value,tf.uint8)
        print("decoded:\n",decoded)

        #将目标值和特征值切片切开
        label=tf.slice(decoded,[0],[self.label_bytes])
        image=tf.slice(decoded,[self.label_bytes],[self.image_bytes])
        print("label:\n",label)
        print("image\n",image)

        #调整图片形状
        image_reshaped=tf.reshape(image,shape=[self.channels,self.height,self.width])
        print("image_reshaped:\n",image_reshaped)

        #转置，将图片的顺序转为height,width,channels
        image_transposed=tf.transpose(image_reshaped,[1,2,0])
        print("image_transposed:\n",image_transposed)

        #调整图像类型
        image_cast=tf.cast(image_transposed,tf.float32)

        #3.批处理
        label_batch,image_batch=tf.train.batch([label,image_cast],batch_size=100,num_threads=1,capacity=100)
        print("label_batch:\n",label_batch)
        print("image_batch:\n",image_batch)

        #开启会话
        with tf.Session() as sess:
            #开启线程
            coord=tf.train.Coordinator()
            threads=tf.train.start_queue_runners(sess=sess,coord=coord)

            key_new,value_new,decoded_new,label_new,image_new,image_reshaped_new,image_transposed_new=sess.run([key,value,decoded,label,image,image_reshaped,image_transposed])
            label_value,image_value=sess.run([label_batch,image_batch])
            print("key_new:\n",key_new)
            print("value_new:\n",value_new)
            print("decoded_new:\n",decoded_new)
            print("label_new:\n",label_new)
            print("image_new:\n",image_new)
            print("image_reshaped_new:\n",image_reshaped_new)
            print("image_transposed_new:\n",image_transposed_new)
            print("label_value:\n",label_value)
            print("image_value:\n",image_value)


            #回收线程
            coord.request_stop()
            coord.join(threads)
        return None;



if __name__ == "__main__":
    file_name=os.listdir("./cifar-10-batches-bin")
    print("file_name:\n",file_name)
    #构造文件名路径列表
    file_list=[os.path.join("./cifar-10-batches-bin/",file) for file in file_name if file[-3:]=="bin"]
    print("file_list:\n",file_list)

    #实例化Cifar
    cifar=Cifar()
    cifar.read_and_decode(file_list)

python 网络分析 python网络数据_sed_15

python 网络分析 python网络数据_python_16

python 网络分析 python网络数据_文件名_17

python 网络分析 python网络数据_批处理_18

python 网络分析 python网络数据_sed_19

python 网络分析 python网络数据_sed_20

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：用python做按钮跳出子窗口 python点击按钮弹出输入框

下一篇：主流数据仓库产品数据仓库厂商

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯