表情识别pytorch 表情识别能力测试

转载

mob6454cc7d4112 2023-12-01 12:40:19

文章标签 表情识别pytorch 人工智能数据数据集初始化 文章分类 PyTorch 人工智能

人脸表情识别（Face expression recognition 简称FER）
普遍认为人类主要有六种基本情感：愤怒（anger）、高兴（happiness）、悲伤（sadness）、惊讶（surprise）、
厌恶（disgust）、恐惧（fear）。而大多数表情识别是基于这六种情感及其拓展情绪实现的

主要困难点是：

1，表情的精细化程度划分：每种情绪最微弱的表现是否需要被分类，分类的界限需要产品给出评估规则。

2，表情类别的多样化：是否还需要补充其他类别的情绪，六种情绪在一些场景下远不能变现人类的真实情绪。

因此除了基本表情识别外，还有精细表情识别、混合表情识别、非基本表情识别等细致领域的研究。

3，缺少鲁棒性

表情识别pytorch 表情识别能力测试_初始化

Fer2013人脸表情数据集由35886张人脸表情图片组成，其中，测试图（Training）28708张，公共验证图（PublicTest）和私有验证图（PrivateTest）各3589张，每张图片是由大小固定为48×48的灰度图像组成，共有7种表情，分别对应于数字标签0-6，具体表情对应的标签和中英文如下： 0 anger 生气； 1 disgust 厌恶； 2 fear 恐惧； 3 happy 开心； 4 sad 伤心；5 surprised 惊讶； 6 normal 中性。

数据集并没有直接给出图片，而是将表情、图片数据、用途的数据保存到csv文件中：第一张图是csv文件的开头，第一行是表头，说明每列数据的含义，第一列表示表情标签，第二列即为图片数据，这里是原始的图片数据，最后一列为用途。

https://www.kaggle.com/deadskull7/fer2013

表情识别pytorch 表情识别能力测试_表情识别pytorch_02

表情识别pytorch 表情识别能力测试_人工智能_03

我们的目标是现在将这个.csv文件转换为HDF5格式，这样我们可以更轻松地在其上面训练卷积神经网络。解压缩fer2013.tar.gz文件后，我为以下文件设置了以下目录结构。

FER13一共有七个类别：生气，厌恶，恐惧，快乐，悲伤，惊奇和中性。但是，“厌恶”与其他表情之间存在严重失衡，因为它只有113张图片样本（每个类其余的样本超过1,000张）。建议将“厌恶”和“愤怒”合并为一个类（因为情绪在视觉上是相似的），因此将FER13变成了6类问题。

由于我们会将fer2013.csv文件转换为一系列用于训练，验证和测试的HDF5数据集，我们需要定义这些输出HDF5文件的路径。

表情识别pytorch 表情识别能力测试_数据集_04

我们将要实现的用于识别各种情绪和面部表情的网络是受VGG网络家族的启发：

1.网络中的CONV层将仅为3×3。 2.随着网络的加深，我们会将每个CONV层学习的过滤器数量增加一倍。为了帮助网络训练，我们将在第8章中应用从VGG和ImageNet实验获得的一些先验知识： 1.我们使用MSRA (He等人)的方法初始化CONV和FC层，这样做将使我们的网络学习更快。 2.由于已证明ELU和PReLU可以提高所有分类的分类准确性，在我们的实验中，我们仅以ELU而非ReLU开始。 3.表中包含了名为EmotionVGGNet的网络摘要。每次CONV层之后，我们将应用激活，然后进行批量归一化（将这些层排除在表外以节省空间）。

从SGD优化器开始，其基本学习率为1e-2，动量项为0.9，并且应用了Nesterov加速度。（默认）Xavier / Glorot初始化方法用于初始化CONV和FC层中的权重。此外，唯一的数据扩充功能是水平翻转-没有其他数据增扩方式（例如旋转缩放等）。

鉴于SGD在降低学习率时导致学习停滞，我决定换掉它，而使用基本学习率为1e-3的Adam代替SGD。除了调整优化器之外，此实验其他参数与第一个实验相同。在第30个epoch，开始注意到训练损失与验证损失之间的巨大差异，因此停止了训练并将学习率从1e-3降低到1e-4，然后允许网络再训练15个epoch。

但是，结果并不理想。如我们所见，显然存在过度拟合–训练损失继续减少，而验证损失不仅停滞不前，而且还在继续增加。话虽如此，在第45个epoch结束时，该网络仍能获得66.34％的准确性，比SGD好。如果我可以找到抑制过度拟合的方法，那么Adam优化器方法将在这种情况下可能表现很好。

表情识别pytorch 表情识别能力测试_数据_05

表情识别pytorch 表情识别能力测试_人工智能_06

表情识别pytorch 表情识别能力测试_初始化_07

表情识别pytorch 表情识别能力测试_人工智能_08

表情识别pytorch 表情识别能力测试_数据_09

表情识别pytorch 表情识别能力测试_数据集_10

表情识别pytorch 表情识别能力测试_人工智能_11

解决过拟合的一种常见方法是收集更多代表您的验证的训练数据和测试集。但是，由于FER2013数据集已预先编译，收集其他数据是不可能的。相反，我们可以应用数据增强来帮助减少过度拟合。在第三个实验中，我保留了Adam优化器，但还添加了随机旋转范围10度，zoom range为0.1等数据增强方法（zoom_range：浮点数或形如[lower,upper]的列表，随机缩放的幅度，若为浮点数，则相当于[lower,upper] = [1 - zoom_range, 1+zoom_range]）。有了新的数据扩充方案，我重复了第二个实验：

如下图所示，约在epoch35饱和开始出现。此时，我停止训练，降低了Adam的学习率，把它从1e-3降至1e-4，并恢复训练：这个过程先是导致了精度上的损失，后来精度恢复上升，所以我再次在60时停止训练，将学习率从1ee 4降低到1ee 5，并恢复训练再来15个epoch，共75epoch。

如我们所见，我们现在没有过度拟合的风险了–不利之处在于，前epoch45中，我们在准确性方面没有任何重大进步. 综上所述，通过应用数据增强，我们能够稳定学习，减少过度拟合，并允许我们在验证集上达到67.53％的分类精度。

在FER2013和EmotionVGGNet的最终实验中，我决定进行一些更改： 1.我将Xavier / Glorot初始化（Keras使用的默认设置）换成了MSRA / He初始化。 2.将所有ReLU替换为ELU，以进一步提高准确性。

在FER2013和EmotionVGGNet的最终实验中，我决定进行一些更改： 3.鉴于“厌恶”标签引起的数据失衡，我将“愤怒”和“厌恶”合并为一个类别。为了合并这两个类，我需要再次使用运行build_dataset.py使用其中的NUM_CLASSES 把它设置为六个而不是七个。

表情识别pytorch 表情识别能力测试_数据_12

表情识别pytorch 表情识别能力测试_人工智能_13

# 命令行参数：python build_dataset.py
# 负责提取fer2013.csv数据集文件，并输出一组HDF5文件；每个训练，验证和测试分组中分别一个。

from config import emotion_config as config
from pyimage.io import HDF5DatasetWriter
import numpy as np

#打开输入文件以进行读取（跳过标题），然后为训练，验证和测试集初始化数据和标签列表
print("[INFO] loading input data...")
f = open(config.INPUT_PATH)
#打开指向输入fer2013.csv文件的指针。通过调用文件的.next方法指针，我们可以跳到下一行，从而可以跳过CSV文件的标题
f.__next__() # Python 2.7：使用f.next()

#分别为训练，验证和测试集初始化图像和标签列表
(trainImages, trainLabels) = ([], [])
(valImages, valLabels) = ([], [])
(testImages, testLabels) = ([], [])

"""
Fer2013人脸表情数据集由35886张人脸表情图片组成，测试图（Training）28708张，公共验证图（PublicTest）和私有验证图（PrivateTest）各3589张，
每张图片是由大小固定为48×48的灰度图像组成，共有7种表情，分别对应于数字标签0-6，具体表情对应的标签和中英文如下：
0 anger 生气； 1 disgust 厌恶； 2 fear 恐惧； 3 happy 开心； 4 sad 伤心；5 surprised 惊讶； 6 normal 中性。

数据集并没有直接给出图片，而是将表情、图片数据、用途的数据保存到csv文件中：
	第一行是表头，说明每列数据的含义
	第一列表示表情标签
	第二列即为图片数据，是原始的图片数据
	第三列为用途。
"""
# 循环遍历输入文件中的每一行
for row in f:
	# 从每一行中提取第一列label标签、第二列image图像、第三列usage用途
	(label, image, usage) = row.strip().split(",")
	label = int(label)

	# 默认情况下，我们假设将FER13视为7类分类问题；但是，如果我们希望将愤怒和厌恶融合在一起分类，我们需要将厌恶标签从1更改为0。
	# 如果我们忽略“令人厌恶”的类别，那么总共会有6个类别标签，而不是7个
	if config.NUM_CLASSES == 6:
		# 合并“愤怒/生气”和“厌恶”
		if label == 1:
			label = 0
		#如果label的值大于零，请从中减去1以使所有标签顺序化（不是必需的，但在解释结果时会有所帮助）
		if label > 0:
			#应从每个标签中减去1，以确保每个类标签为连续的，不需要此减法，但在解释我们的结果时会有所帮助。
			label -= 1

	#图像只是一串整数。我们需要把这个字符串，分成一个列表，将其转换为无符号的8位整数数据类型，并将其整形为48×48灰度图像：
	#请记住，每个图像列都是2304个整数的列表。这2304个整数代表正方形48×48图像。
	image = np.array(image.split(" "), dtype="uint8")
	# 将展平的像素列表重塑为48x48（灰度）图像
	image = image.reshape((48, 48))

	# 检查我们是否正在检查训练图像：第三列usage用途为 Training
	if usage == "Training":
		trainImages.append(image)
		trainLabels.append(label)

	# 检查这是否是验证图像：第三列usage用途为 PrivateTest
	elif usage == "PrivateTest":
		valImages.append(image)
		valLabels.append(label)

	# 否则，这必须是测试图像：第三列usage用途为 PublicTest
	else:
		testImages.append(image)
		testLabels.append(label)

# 初始化数据集列表。列表中的每个条目都是原始的3元组图像，标签和输出HDF5路径。最后一步是遍历每个训练，验证和测试集：
# 构造一个列表，将训练，验证和测试图像及其对应的标签配对，并输出HDF5文件
datasets = [
	(trainImages, trainLabels, config.TRAIN_HDF5),
	(valImages, valLabels, config.VAL_HDF5),
	(testImages, testLabels, config.TEST_HDF5)
]
# 遍历数据集元组。
# 实例化HDF5DatasetWrite，然后将图像和标签以HDF5格式写入磁盘。
for (images, labels, outputPath) in datasets:
	# 创建HDF5编写器
	print("[INFO] building {}...".format(outputPath))
	#写入的为 48x48的len(images)张数量的（灰度）图像
	writer = HDF5DatasetWriter((len(images), 48, 48), outputPath)
	# 循环遍历图像并将其添加到数据集中
	for (image, label) in zip(images, labels):
		writer.add([image], [label])
	# 关闭HDF5写入器
	writer.close()

# 关闭输入文件
f.close()

from os import path

# 定义情感数据集的基本路径。输入数据集的路径。
BASE_PATH = "../datasets/fer2013/"

# 定义基本路径使用基本路径定义输入情绪文件到情绪数据集的路径
INPUT_PATH = path.sep.join([BASE_PATH, "fer2013/fer2013.csv"])

"""
FER13一共有七个类别：生气，厌恶，恐惧，快乐，悲伤，惊奇和中性。
但是，“厌恶”与其他表情之间存在严重失衡，因为它只有113张图片样本（每个类其余的样本超过1,000张）。
建议将“厌恶”和“愤怒”合并为一个类（因为情绪在视觉上是相似的），因此将FER13变成了6类问题。
"""
# 定义类的数量（如果您忽略“令人厌恶的”类，则设置为6）
# NUM_CLASSES = 7
NUM_CLASSES = 6

"""由于我们会将fer2013.csv文件转换为一系列用于训练，验证和测试的HDF5数据集，我们需要定义这些输出HDF5文件的路径"""
# 定义输出训练，验证和测试HDF5文件的路径。输出HDF5文件。
TRAIN_HDF5 = path.sep.join([BASE_PATH, "hdf5/train.hdf5"])
VAL_HDF5 = path.sep.join([BASE_PATH, "hdf5/val.hdf5"])
TEST_HDF5 = path.sep.join([BASE_PATH, "hdf5/test.hdf5"])

# 定义批量大小
BATCH_SIZE = 128

# 定义存储输出日志的路径
# OUTPUT_PATH = path.sep.join([BASE_PATH, "./output"])
OUTPUT_PATH = "./output"

# 命令行参数：
# 	python train_recognizer.py --checkpoints fer2013/checkpoints
# 	python train_recognizer.py --checkpoints fer2013/checkpoints --model fer2013/checkpoints/epoch_20.hdf5 --start-epoch 20
# 训练CNN以识别各种情绪

import matplotlib
#设置matplotlib后端，以便可以将图形保存在后台
matplotlib.use("Agg")

from config import emotion_config as config
from pyimage.preprocessing import ImageToArrayPreprocessor
from pyimage.callbacks import EpochCheckpoint
from pyimage.callbacks import TrainingMonitor
from pyimage.io import HDF5DatasetGenerator
from pyimage.nn.conv import EmotionVGGNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import load_model
import tensorflow.keras.backend as K
import argparse
import os

# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
#输出checkpoint目录的路径
ap.add_argument("-c", "--checkpoints", required=False, default="./checkpoints", help="path to output checkpoint directory")
#要加载的特定模型checkpoint的路径
ap.add_argument("-m", "--model", type=str, required=False, default="./checkpoints/epoch_75.hdf5", help="path to *specific* model checkpoint to load")
#在以下时间重新开始训练
ap.add_argument("-s", "--start-epoch", type=int, default=0, required=False, help="epoch to restart training at")
args = vars(ap.parse_args())

"""
ImageDataGenerator()
	keras.preprocessing.image模块中的图片生成器，同时也可以在batch中对数据进行增强，扩充数据集大小，增强模型的泛化能力。
	比如进行旋转，变形，归一化等等。
		rotation_range(): 旋转范围
		width_shift_range(): 水平平移范围
		height_shift_range(): 垂直平移范围
		zoom_range(): 缩放范围
		fill_mode: 填充模式, constant, nearest, reflect
		horizontal_flip(): 水平反转
		vertical_flip(): 垂直翻转

将在训练集中应用“数据增强”来帮助减少过度拟合和提高模型的分类精度，并且将“数据增强”应用于验证集。
valAug = ImageDataGenerator(rescale=1 / 255.0)
	rescale缩放属性（也是训练数据增强器的一部分）。
	因为之前的将fer2013.csv文件转换为HDF5数据集。我们把这些图像作为原始的、未归一化的RGB图像，这意味着像素值被允许存在于[0，255]范围内。
	然而，通常的做法是：(1)执行平均归一化 (2)缩放像素到一个更狭窄的变化区间。 
	Keras提供的图像数据生成器类可以自动为我们执行此缩放。 
	我们只需要设定rescale=1/255.0的缩放比，这样每幅图像都将以此比率为倍数，从而将像素缩小到[0，1]。
"""
# 构造训练和测试图像生成器以进行数据增强，然后初始化图像预处理器
trainAug = ImageDataGenerator(rotation_range=10, zoom_range=0.1, horizontal_flip=True, rescale=1 / 255.0, fill_mode="nearest")
valAug = ImageDataGenerator(rescale=1 / 255.0)
iap = ImageToArrayPreprocessor()

# 初始化训练和验证数据集生成器
trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, config.BATCH_SIZE, aug=trainAug, preprocessors=[iap], classes=config.NUM_CLASSES)
valGen = HDF5DatasetGenerator(config.VAL_HDF5, config.BATCH_SIZE, aug=valAug, preprocessors=[iap], classes=config.NUM_CLASSES)

# 如果磁盘上没有提供特定的checkpoint模型文件，则初始化网络并编译模型
if args["model"] is None:
	print("[INFO] compiling model...")
	model = EmotionVGGNet.build(width=48, height=48, depth=1, classes=config.NUM_CLASSES)
	opt = Adam(lr=1e-3)
	model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# 否则，从磁盘加载 checkpoint模型文件
else:
	print("[INFO] loading {}...".format(args["model"]))
	model = load_model(args["model"])
	# 更新学习率
	print("[INFO] old learning rate: {}".format(K.get_value(model.optimizer.lr)))
	K.set_value(model.optimizer.lr, 1e-5)
	print("[INFO] new learning rate: {}".format(K.get_value(model.optimizer.lr)))

# 构造一组callbacks回调函数
figPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.png"])
jsonPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.json"])
#构建一个callbacks列表，用于将检查点序持久化到磁盘，并在一段时间内记录准确性/损失
callbacks = [
	EpochCheckpoint(args["checkpoints"], every=5, startAt=args["start_epoch"]),
	TrainingMonitor(figPath, jsonPath=jsonPath, startAt=args["start_epoch"])
]

# 训练网络
model.fit_generator(
	trainGen.generator(),
	steps_per_epoch=trainGen.numImages // config.BATCH_SIZE,
	validation_data=valGen.generator(),
	validation_steps=valGen.numImages // config.BATCH_SIZE,
	epochs=15,
	max_queue_size=10,
	callbacks=callbacks,
	verbose=1)

# 关闭数据库
trainGen.close()
valGen.close()

# 命令行参数：python test_recognizer.py --model fer2013/checkpoints/epoch_75.hdf5
# 评估CNN的性能

from config import emotion_config as config
from pyimage.preprocessing import ImageToArrayPreprocessor
from pyimage.io import HDF5DatasetGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model
import argparse

# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
#要加载的特定模型checkpoint的路径
ap.add_argument("-m", "--model", type=str, required=False, default="./checkpoints/epoch_75.hdf5", help="path to model checkpoint to load")
args = vars(ap.parse_args())

"""
testAug = ImageDataGenerator(rescale=1 / 255.0)
	rescale缩放属性（也是训练数据增强器的一部分）。
	因为之前的将fer2013.csv文件转换为HDF5数据集。我们把这些图像作为原始的、未归一化的RGB图像，这意味着像素值被允许存在于[0，255]范围内。
	然而，通常的做法是：(1)执行平均归一化 (2)缩放像素到一个更狭窄的变化区间。 
	Keras提供的图像数据生成器类可以自动为我们执行此缩放。 
	我们只需要设定rescale=1/255.0的缩放比，这样每幅图像都将以此比率为倍数，从而将像素缩小到[0，1]。
"""
# 初始化测试数据生成器和图像预处理器
testAug = ImageDataGenerator(rescale=1 / 255.0)
iap = ImageToArrayPreprocessor()
# 初始化测试数据集生成器
testGen = HDF5DatasetGenerator(config.TEST_HDF5, config.BATCH_SIZE, aug=testAug, preprocessors=[iap], classes=config.NUM_CLASSES)

# 从磁盘加载checkpoint模型
print("[INFO] loading {}...".format(args["model"]))
model = load_model(args["model"])

# 评估网络
(loss, acc) = model.evaluate_generator(
	testGen.generator(),
	steps=testGen.numImages // config.BATCH_SIZE,
	max_queue_size=10
)
print("[INFO] accuracy: {:.2f}".format(acc * 100)) #accuracy: 65.49
# 关闭测试数据库
testGen.close()

# 命令行参数：python emotion_detector.py --cascade haarcascade_frontalface_default.xml --model output/epoch_75.hdf5
# 1.实时检测面部（如微笑检测器）。
# 2.应用我们的CNN识别最主要的情绪并显示每种情绪的概率分布。
# 最重要的是，该CNN能够在我们的设备上实时运行。

from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import cv2

# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--cascade", required=False, default="./haarcascade_frontalface_default.xml", help="path to where the face cascade resides")
ap.add_argument("-m", "--model", required=False, default="./checkpoints/epoch_75.hdf5",help="path to pre-trained emotion detector CNN")
ap.add_argument("-v", "--video", help="path to the (optional) video file")
args = vars(ap.parse_args())

# 加载面部检测器级联
detector = cv2.CascadeClassifier(args["cascade"])
# 加载情感检测CNN
model = load_model(args["model"])
#定义情感标签列表
EMOTIONS = ["angry", "scared", "happy", "sad", "surprised", "neutral"]

# 如果未提供视频路径，请获取对网络摄像头的引用
if not args.get("video", False):
	# 获取摄像头视频
	camera = cv2.VideoCapture(0)
# 否则，加载视频
else:
	# 加载本地视频
	camera = cv2.VideoCapture(args["video"])

# 循环读取每一帧
while True:
	# 抓取当前帧
	(grabbed, frame) = camera.read()
	# 如果我们正在观看视频，但没有抓取框架，则说明视频已到达结尾
	if args.get("video") and not grabbed:
		break
	# resize设置width=300时，可以无需同时设置height，因为height会自动根据width所设置的值按照原图的宽高比例进行自适应地缩放调整到合适的值
	# 调整框架大小
	frame = imutils.resize(frame, width=300)
	# 转换为灰度
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

	# 初始化用于可视化的画布，然后拷贝帧，以便我们可以在其上绘制
	canvas = np.zeros((220, 300, 3), dtype="uint8")
	frameClone = frame.copy()
	"""
	def detectMultiScale(self, image, scaleFactor=None, minNeighbors=None, flags=None, minSize=None, maxSize=None): 
	    image：待检测图片，一般为灰度图像加快检测速度；
	    scaleFactor：表示在前后两次相继的扫描中，搜索窗口的比例系数。默认为1.1即每次搜索窗口依次扩大10%;
	    minNeighbors：
	            表示构成检测目标的相邻矩形的最小个数(默认为3个)。
	            如果组成检测目标的小矩形的个数和小于 min_neighbors - 1 都会被排除。
	            如果min_neighbors 为 0, 则函数不做任何操作就返回所有的被检候选矩形框，
	            这种设定值一般用在用户自定义对检测结果的组合程序上；
	    flags：
	            要么使用默认值，要么使用CV_HAAR_DO_CANNY_PRUNING，如果设置为
	            CV_HAAR_DO_CANNY_PRUNING，那么函数将会使用Canny边缘检测来排除边缘过多或过少的区域，
	            因此这些区域通常不会是人脸所在区域；
	    minSize和maxSize用来限制得到的目标区域的范围。
	    
	detector.detectMultiScale 返回每张人脸的(x,y,w,h)
		如果需要在脸部周围绘制边界框：img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),5)
		(x,y)即 (startX, startY)
		(x+w,y+h)即 (endX, endY)
	"""
	# 在输入帧中检测人脸，然后克隆该框，以便我们可以在其上绘制
	rects = detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
	# 确保继续之前找到至少一张脸
	# if len(rects) > 0:
	#遍历每个检测到的人脸
	for rect in rects:
		"""
		rects：包含每张人脸的(x,y,w,h)
		计算脸部面积：(x[2] - x[0]) * (x[3] - x[1]) 即 (w - x) * (h - y)
		"""
		# 假如帧画面有多个人的话，那么该方式从帧画面中只取出其中一个人脸
		# 确定最大的脸部面积
		# rect = sorted(rects, reverse=True, key=lambda x: (x[2] - x[0]) * (x[3] - x[1]))[0]
		(fX, fY, fW, fH) = rect

		# 从图像中提取面部ROI，然后为网络进行预处理
		roi = gray[fY:fY + fH, fX:fX + fW]
		roi = cv2.resize(roi, (48, 48))
		roi = roi.astype("float") / 255.0
		roi = img_to_array(roi)
		roi = np.expand_dims(roi, axis=0)

		#  做出预测，然后查找标签
		preds = model.predict(roi)[0]
		label = EMOTIONS[preds.argmax()]

		# 遍历标签和概率并绘制它们
		for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)):
			# 构造标签文本
			text = "{}: {:.2f}%".format(emotion, prob * 100)

			# 在画布上绘制标签+概率栏
			w = int(prob * 300)
			cv2.rectangle(canvas, (5, (i * 35) + 5), (w, (i * 35) + 35), (0, 0, 255), -1)
			cv2.putText(canvas, text, (10, (i * 35) + 23), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 255, 255), 2)

		# 在框架上画标签
		cv2.putText(frameClone, label, (fX, fY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
		cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH), (0, 0, 255), 2)

	# 显示我们的分类+概率
	cv2.imshow("Face", frameClone)
	cv2.imshow("Probabilities", canvas)

	# 如果按下“ q”键，则停止循环
	if cv2.waitKey(1) & 0xFF == ord("q"):
		break

# 清理相机并关闭所有打开的窗口
camera.release()
cv2.destroyAllWindows()

pyimage

pyimage/callbacks/epochcheckpoint.py

# import the necessary packages
from tensorflow.keras.callbacks import Callback
import os

class EpochCheckpoint(Callback):
	def __init__(self, outputPath, every=5, startAt=0):
		# call the parent constructor
		super(Callback, self).__init__()

		# store the base output path for the model, the number of
		# epochs that must pass before the model is serialized to
		# disk and the current epoch value
		self.outputPath = outputPath
		self.every = every
		self.intEpoch = startAt

	def on_epoch_end(self, epoch, logs={}):
		# check to see if the model should be serialized to disk
		if (self.intEpoch + 1) % self.every == 0:
			p = os.path.sep.join([self.outputPath,
				"epoch_{}.hdf5".format(self.intEpoch + 1)])
			self.model.save(p, overwrite=True)

		# increment the internal epoch counter
		self.intEpoch += 1

pyimage/callbacks/trainingmonitor.py

# import the necessary packages
from tensorflow.keras.callbacks import BaseLogger
import matplotlib.pyplot as plt
import numpy as np
import json
import os

class TrainingMonitor(BaseLogger):
	def __init__(self, figPath, jsonPath=None, startAt=0):
		# store the output path for the figure, the path to the JSON
		# serialized file, and the starting epoch
		super(TrainingMonitor, self).__init__()
		self.figPath = figPath
		self.jsonPath = jsonPath
		self.startAt = startAt

	def on_train_begin(self, logs={}):
		# initialize the history dictionary
		self.H = {}

		# if the JSON history path exists, load the training history
		if self.jsonPath is not None:
			if os.path.exists(self.jsonPath):
				self.H = json.loads(open(self.jsonPath).read())

				# check to see if a starting epoch was supplied
				if self.startAt > 0:
					# loop over the entries in the history log and
					# trim any entries that are past the starting
					# epoch
					for k in self.H.keys():
						self.H[k] = self.H[k][:self.startAt]

	def on_epoch_end(self, epoch, logs={}):
		# loop over the logs and update the loss, accuracy, etc.
		# for the entire training process
		for (k, v) in logs.items():
			l = self.H.get(k, [])
			l.append(float(v))
			self.H[k] = l

		# check to see if the training history should be serialized
		# to file
		if self.jsonPath is not None:
			f = open(self.jsonPath, "w")
			f.write(json.dumps(self.H))
			f.close()

		# ensure at least two epochs have passed before plotting
		# (epoch starts at zero)
		if len(self.H["loss"]) > 1:
			# plot the training loss and accuracy
			N = np.arange(0, len(self.H["loss"]))
			plt.style.use("ggplot")
			plt.figure()
			plt.plot(N, self.H["loss"], label="train_loss")
			plt.plot(N, self.H["val_loss"], label="val_loss")
			plt.plot(N, self.H["accuracy"], label="train_acc")
			plt.plot(N, self.H["val_accuracy"], label="val_acc")
			plt.title("Training Loss and Accuracy [Epoch {}]".format(
				len(self.H["loss"])))
			plt.xlabel("Epoch #")
			plt.ylabel("Loss/Accuracy")
			plt.legend()

			# save the figure
			plt.savefig(self.figPath)
			plt.close()

pyimage/datasets/simpledatasetloader.py

# import the necessary packages
import numpy as np
import cv2
import os

class SimpleDatasetLoader:
	def __init__(self, preprocessors=None):
		# store the image preprocessor
		self.preprocessors = preprocessors

		# if the preprocessors are None, initialize them as an
		# empty list
		if self.preprocessors is None:
			self.preprocessors = []

	def load(self, imagePaths, verbose=-1):
		# initialize the list of features and labels
		data = []
		labels = []

		# loop over the input images
		for (i, imagePath) in enumerate(imagePaths):
			# load the image and extract the class label assuming
			# that our path has the following format:
			# /path/to/dataset/{class}/{image}.jpg
			image = cv2.imread(imagePath)
			label = imagePath.split(os.path.sep)[-2]

			# check to see if our preprocessors are not None
			if self.preprocessors is not None:
				# loop over the preprocessors and apply each to
				# the image
				for p in self.preprocessors:
					image = p.preprocess(image)

			# treat our processed image as a "feature vector"
			# by updating the data list followed by the labels
			data.append(image)
			labels.append(label)

			# show an update every `verbose` images
			if verbose > 0 and i > 0 and (i + 1) % verbose == 0:
				print("[INFO] processed {}/{}".format(i + 1,
					len(imagePaths)))

		# return a tuple of the data and labels
		return (np.array(data), np.array(labels))

pyimage/io/hdf5datasetgenerator.py

# import the necessary packages
from tensorflow.keras.utils import to_categorical
import numpy as np
import h5py

class HDF5DatasetGenerator:
	def __init__(self, dbPath, batchSize, preprocessors=None,
		aug=None, binarize=True, classes=2):
		# store the batch size, preprocessors, and data augmentor,
		# whether or not the labels should be binarized, along with
		# the total number of classes
		self.batchSize = batchSize
		self.preprocessors = preprocessors
		self.aug = aug
		self.binarize = binarize
		self.classes = classes

		# open the HDF5 database for reading and determine the total
		# number of entries in the database
		self.db = h5py.File(dbPath, "r")
		self.numImages = self.db["labels"].shape[0]

	def generator(self, passes=np.inf):
		# initialize the epoch count
		epochs = 0

		# keep looping infinitely -- the model will stop once we have
		# reach the desired number of epochs
		while epochs < passes:
			# loop over the HDF5 dataset
			for i in np.arange(0, self.numImages, self.batchSize):
				# extract the images and labels from the HDF dataset
				images = self.db["images"][i: i + self.batchSize]
				labels = self.db["labels"][i: i + self.batchSize]

				# check to see if the labels should be binarized
				if self.binarize:
					labels = to_categorical(labels,
						self.classes)

				# check to see if our preprocessors are not None
				if self.preprocessors is not None:
					# initialize the list of processed images
					procImages = []

					# loop over the images
					for image in images:
						# loop over the preprocessors and apply each
						# to the image
						for p in self.preprocessors:
							image = p.preprocess(image)

						# update the list of processed images
						procImages.append(image)

					# update the images array to be the processed
					# images
					images = np.array(procImages)

				# if the data augmenator exists, apply it
				if self.aug is not None:
					(images, labels) = next(self.aug.flow(images,
						labels, batch_size=self.batchSize))

				# yield a tuple of images and labels
				yield (images, labels)

			# increment the total number of epochs
			epochs += 1

	def close(self):
		# close the database
		self.db.close()

pyimage/io/hdf5datasetwriter.py

# import the necessary packages
import h5py
import os

class HDF5DatasetWriter:
	def __init__(self, dims, outputPath, dataKey="images",
		bufSize=1000):
		# check to see if the output path exists, and if so, raise
		# an exception
		if os.path.exists(outputPath):
			raise ValueError("The supplied `outputPath` already "
				"exists and cannot be overwritten. Manually delete "
				"the file before continuing.", outputPath)

		# open the HDF5 database for writing and create two datasets:
		# one to store the images/features and another to store the
		# class labels
		self.db = h5py.File(outputPath, "w")
		self.data = self.db.create_dataset(dataKey, dims,
			dtype="float")
		self.labels = self.db.create_dataset("labels", (dims[0],),
			dtype="int")

		# store the buffer size, then initialize the buffer itself
		# along with the index into the datasets
		self.bufSize = bufSize
		self.buffer = {"data": [], "labels": []}
		self.idx = 0

	def add(self, rows, labels):
		# add the rows and labels to the buffer
		self.buffer["data"].extend(rows)
		self.buffer["labels"].extend(labels)

		# check to see if the buffer needs to be flushed to disk
		if len(self.buffer["data"]) >= self.bufSize:
			self.flush()

	def flush(self):
		# write the buffers to disk then reset the buffer
		i = self.idx + len(self.buffer["data"])
		self.data[self.idx:i] = self.buffer["data"]
		self.labels[self.idx:i] = self.buffer["labels"]
		self.idx = i
		self.buffer = {"data": [], "labels": []}

	def storeClassLabels(self, classLabels):
		# create a dataset to store the actual class label names,
		# then store the class labels
		dt = h5py.special_dtype(vlen=str) # `vlen=unicode` for Py2.7
		labelSet = self.db.create_dataset("label_names",
			(len(classLabels),), dtype=dt)
		labelSet[:] = classLabels

	def close(self):
		# check to see if there are any other entries in the buffer
		# that need to be flushed to disk
		if len(self.buffer["data"]) > 0:
			self.flush()

		# close the dataset
		self.db.close()

pyimage/nn/conv/alexnet.py

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K

class AlexNet:
	@staticmethod
	def build(width, height, depth, classes, reg=0.0002):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		# Block #1: first CONV => RELU => POOL layer set
		model.add(Conv2D(96, (11, 11), strides=(4, 4),
			input_shape=inputShape, padding="same",
			kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
		model.add(Dropout(0.25))

		# Block #2: second CONV => RELU => POOL layer set
		model.add(Conv2D(256, (5, 5), padding="same",
			kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
		model.add(Dropout(0.25))

		# Block #3: CONV => RELU => CONV => RELU => CONV => RELU
		model.add(Conv2D(384, (3, 3), padding="same",
			kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(384, (3, 3), padding="same",
			kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(256, (3, 3), padding="same",
			kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
		model.add(Dropout(0.25))

		# Block #4: first set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(4096, kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# Block #5: second set of FC => RELU layers
		model.add(Dense(4096, kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes, kernel_regularizer=l2(reg)))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

pyimage/nn/conv/deepergooglenet.py

# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K

class DeeperGoogLeNet:
	@staticmethod
	def conv_module(x, K, kX, kY, stride, chanDim,
		padding="same", reg=0.0005, name=None):
		# initialize the CONV, BN, and RELU layer names
		(convName, bnName, actName) = (None, None, None)

		# if a layer name was supplied, prepend it
		if name is not None:
			convName = name + "_conv"
			bnName = name + "_bn"
			actName = name + "_act"

		# define a CONV => BN => RELU pattern
		x = Conv2D(K, (kX, kY), strides=stride, padding=padding,
			kernel_regularizer=l2(reg), name=convName)(x)
		x = BatchNormalization(axis=chanDim, name=bnName)(x)
		x = Activation("relu", name=actName)(x)

		# return the block
		return x

	@staticmethod
	def inception_module(x, num1x1, num3x3Reduce, num3x3,
		num5x5Reduce, num5x5, num1x1Proj, chanDim, stage,
		reg=0.0005):
		# define the first branch of the Inception module which
		# consists of 1x1 convolutions
		first = DeeperGoogLeNet.conv_module(x, num1x1, 1, 1,
			(1, 1), chanDim, reg=reg, name=stage + "_first")

		# define the second branch of the Inception module which
		# consists of 1x1 and 3x3 convolutions
		second = DeeperGoogLeNet.conv_module(x, num3x3Reduce, 1, 1,
			(1, 1), chanDim, reg=reg, name=stage + "_second1")
		second = DeeperGoogLeNet.conv_module(second, num3x3, 3, 3,
			(1, 1), chanDim, reg=reg, name=stage + "_second2")

		# define the third branch of the Inception module which
		# are our 1x1 and 5x5 convolutions
		third = DeeperGoogLeNet.conv_module(x, num5x5Reduce, 1, 1,
			(1, 1), chanDim, reg=reg, name=stage + "_third1")
		third = DeeperGoogLeNet.conv_module(third, num5x5, 5, 5,
			(1, 1), chanDim, reg=reg, name=stage + "_third2")

		# define the fourth branch of the Inception module which
		# is the POOL projection
		fourth = MaxPooling2D((3, 3), strides=(1, 1),
			padding="same", name=stage + "_pool")(x)
		fourth = DeeperGoogLeNet.conv_module(fourth, num1x1Proj,
			1, 1, (1, 1), chanDim, reg=reg, name=stage + "_fourth")

		# concatenate across the channel dimension
		x = concatenate([first, second, third, fourth], axis=chanDim,
			name=stage + "_mixed")

		# return the block
		return x

	@staticmethod
	def build(width, height, depth, classes, reg=0.0005):
		# initialize the input shape to be "channels last" and the
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		# define the model input, followed by a sequence of CONV =>
		# POOL => (CONV * 2) => POOL layers
		inputs = Input(shape=inputShape)
		x = DeeperGoogLeNet.conv_module(inputs, 64, 5, 5, (1, 1),
			chanDim, reg=reg, name="block1")
		x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
			name="pool1")(x)
		x = DeeperGoogLeNet.conv_module(x, 64, 1, 1, (1, 1),
			chanDim, reg=reg, name="block2")
		x = DeeperGoogLeNet.conv_module(x, 192, 3, 3, (1, 1),
			chanDim, reg=reg, name="block3")
		x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
			name="pool2")(x)

		# apply two Inception modules followed by a POOL
		x = DeeperGoogLeNet.inception_module(x, 64, 96, 128, 16,
			32, 32, chanDim, "3a", reg=reg)
		x = DeeperGoogLeNet.inception_module(x, 128, 128, 192, 32,
			96, 64, chanDim, "3b", reg=reg)
		x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
			name="pool3")(x)

		# apply five Inception modules followed by POOL
		x = DeeperGoogLeNet.inception_module(x, 192, 96, 208, 16,
			48, 64, chanDim, "4a", reg=reg)
		x = DeeperGoogLeNet.inception_module(x, 160, 112, 224, 24,
			64, 64, chanDim, "4b", reg=reg)
		x = DeeperGoogLeNet.inception_module(x, 128, 128, 256, 24,
			64, 64, chanDim, "4c", reg=reg)
		x = DeeperGoogLeNet.inception_module(x, 112, 144, 288, 32,
			64, 64, chanDim, "4d", reg=reg)
		x = DeeperGoogLeNet.inception_module(x, 256, 160, 320, 32,
			128, 128, chanDim, "4e", reg=reg)
		x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
			name="pool4")(x)

		# apply a POOL layer (average) followed by dropout
		x = AveragePooling2D((4, 4), name="pool5")(x)
		x = Dropout(0.4, name="do")(x)

		# softmax classifier
		x = Flatten(name="flatten")(x)
		x = Dense(classes, kernel_regularizer=l2(reg),
			name="labels")(x)
		x = Activation("softmax", name="softmax")(x)

		# create the model
		model = Model(inputs, x, name="googlenet")

		# return the constructed network architecture
		return model

pyimage/nn/conv/emotionvggnet.py

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import ELU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

"""
我们将要实现的用于识别各种情绪和面部表情的网络是受VGG网络家族的启发：
1.网络中的CONV层将仅为3×3。
2.随着网络的加深，我们会将每个CONV层学习的过滤器数量增加一倍。

为了帮助网络训练，我们将应用从VGG和ImageNet实验获得的一些先验知识：
1.我们使用MSRA (He等人)的方法初始化CONV和FC层，这样做将使我们的网络学习更快。
	Dense(64, kernel_initializer="he_normal")
	Conv2D(32, (3, 3), padding="same", kernel_initializer="he_normal")
2.由于已证明ELU和PReLU可以提高所有分类的分类准确性，在我们的实验中，我们仅以ELU而非ReLU开始。	
	ELU()
3.表中包含了名为EmotionVGGNet的网络摘要。每次CONV层之后，我们将应用激活，然后进行批量归一化（将这些层排除在表外以节省空间）。
"""
class EmotionVGGNet:
	@staticmethod
	def build(width, height, depth, classes):
		# 初始化模型以及输入形状为“通道最后channels last”和通道尺寸本身
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		#如果我们使用“通道优先channels first”，请更新输入形状和通道尺寸
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		"""
		EmotionVGGNet中的第一个块Block
			第一CONV层将学习32个3×3卷积核。然后，我们将应用ELU激活通过批量归一化。
			同样，第二个CONV层应用相同的模式，学习32个3×3卷积核，然后进行ELU和批量归一化。
			然后应用最大池化，然后Dropout层的概率为25％。
		"""
		# Block #1: first CONV => RELU => CONV => RELU => POOL
		# layer set
		model.add(Conv2D(32, (3, 3), padding="same", kernel_initializer="he_normal", input_shape=inputShape))
		model.add(ELU())
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(32, (3, 3), kernel_initializer="he_normal", padding="same"))
		model.add(ELU())
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		""" EmotionVGGNet中的第二个块Block与第一个块Block相同，只是现在将CONV层中的卷积核数为64，而不是32 """
		# Block #2: second CONV => RELU => CONV => RELU => POOL
		# layer set
		model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding="same"))
		model.add(ELU())
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding="same"))
		model.add(ELU())
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		"""  
		EmotionVGGNet中的第三个块Block 再次应用相同的模式，增加了卷积核从64到128 ,随着CNN的深入，
		我们需要学习的特征越多，需要的卷积核也越多：
		"""
		# Block #3: third CONV => RELU => CONV => RELU => POOL
		# layer set
		model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding="same"))
		model.add(ELU())
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding="same"))
		model.add(ELU())
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		"""
		接下来，我们需要构造第一个完全连接层，在这里，学习了64个隐藏节点，然后应用ELU激活功能并进行批量正则化。
		后续将以相同的方式应用第二个FC层
		"""
		# Block #4: first set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(64, kernel_initializer="he_normal"))
		model.add(ELU())
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# Block #6: second set of FC => RELU layers
		model.add(Dense(64, kernel_initializer="he_normal"))
		model.add(ELU())
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		""" 最后，我们将在FC层中应用提供的类数以及softmax分类器以获取我们的输出类标签概率 """
		# Block #7: softmax classifier
		model.add(Dense(classes, kernel_initializer="he_normal"))
		model.add(Activation("softmax"))

		# 返回构建的网络架构
		return model

pyimage/nn/conv/fcheadnet.py

# import the necessary packages
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense

class FCHeadNet:
	@staticmethod
	def build(baseModel, classes, D):
		# initialize the head model that will be placed on top of
		# the base, then add a FC layer
		headModel = baseModel.output
		headModel = Flatten(name="flatten")(headModel)
		headModel = Dense(D, activation="relu")(headModel)
		headModel = Dropout(0.5)(headModel)

		# add a softmax layer
		headModel = Dense(classes, activation="softmax")(headModel)

		# return the model
		return headModel

pyimage/nn/conv/lenet.py

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

class LeNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model
		model = Sequential()
		inputShape = (height, width, depth)

		# if we are using "channels first", update the input shape
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)

		# first set of CONV => RELU => POOL layers
		model.add(Conv2D(20, (5, 5), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

		# second set of CONV => RELU => POOL layers
		model.add(Conv2D(50, (5, 5), padding="same"))
		model.add(Activation("relu"))
		model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

		# first (and only) set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(500))
		model.add(Activation("relu"))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

pyimage/nn/conv/minigooglenet.py

# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate
from tensorflow.keras import backend as K

class MiniGoogLeNet:
	@staticmethod
	def conv_module(x, K, kX, kY, stride, chanDim, padding="same"):
		# define a CONV => BN => RELU pattern
		x = Conv2D(K, (kX, kY), strides=stride, padding=padding)(x)
		x = BatchNormalization(axis=chanDim)(x)
		x = Activation("relu")(x)

		# return the block
		return x

	@staticmethod
	def inception_module(x, numK1x1, numK3x3, chanDim):
		# define two CONV modules, then concatenate across the
		# channel dimension
		conv_1x1 = MiniGoogLeNet.conv_module(x, numK1x1, 1, 1,
			(1, 1), chanDim)
		conv_3x3 = MiniGoogLeNet.conv_module(x, numK3x3, 3, 3,
			(1, 1), chanDim)
		x = concatenate([conv_1x1, conv_3x3], axis=chanDim)

		# return the block
		return x

	@staticmethod
	def downsample_module(x, K, chanDim):
		# define the CONV module and POOL, then concatenate
		# across the channel dimensions
		conv_3x3 = MiniGoogLeNet.conv_module(x, K, 3, 3, (2, 2),
			chanDim, padding="valid")
		pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
		x = concatenate([conv_3x3, pool], axis=chanDim)

		# return the block
		return x

	@staticmethod
	def build(width, height, depth, classes):
		# initialize the input shape to be "channels last" and the
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		# define the model input and first CONV module
		inputs = Input(shape=inputShape)
		x = MiniGoogLeNet.conv_module(inputs, 96, 3, 3, (1, 1),
			chanDim)

		# two Inception modules followed by a downsample module
		x = MiniGoogLeNet.inception_module(x, 32, 32, chanDim)
		x = MiniGoogLeNet.inception_module(x, 32, 48, chanDim)
		x = MiniGoogLeNet.downsample_module(x, 80, chanDim)

		# four Inception modules followed by a downsample module
		x = MiniGoogLeNet.inception_module(x, 112, 48, chanDim)
		x = MiniGoogLeNet.inception_module(x, 96, 64, chanDim)
		x = MiniGoogLeNet.inception_module(x, 80, 80, chanDim)
		x = MiniGoogLeNet.inception_module(x, 48, 96, chanDim)
		x = MiniGoogLeNet.downsample_module(x, 96, chanDim)

		# two Inception modules followed by global POOL and dropout
		x = MiniGoogLeNet.inception_module(x, 176, 160, chanDim)
		x = MiniGoogLeNet.inception_module(x, 176, 160, chanDim)
		x = AveragePooling2D((7, 7))(x)
		x = Dropout(0.5)(x)

		# softmax classifier
		x = Flatten()(x)
		x = Dense(classes)(x)
		x = Activation("softmax")(x)

		# create the model
		model = Model(inputs, x, name="googlenet")

		# return the constructed network architecture
		return model

pyimage/nn/conv/minivggnet.py

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

class MiniVGGNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		# first CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(32, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# second CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

		# first (and only) set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(512))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

pyimage/nn/conv/resnet.py

# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import ZeroPadding2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import add
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K

class ResNet:
	@staticmethod
	def residual_module(data, K, stride, chanDim, red=False,
		reg=0.0001, bnEps=2e-5, bnMom=0.9):
		# the shortcut branch of the ResNet module should be
		# initialize as the input (identity) data
		shortcut = data

		# the first block of the ResNet module are the 1x1 CONVs
		bn1 = BatchNormalization(axis=chanDim, epsilon=bnEps,
			momentum=bnMom)(data)
		act1 = Activation("relu")(bn1)
		conv1 = Conv2D(int(K * 0.25), (1, 1), use_bias=False,
			kernel_regularizer=l2(reg))(act1)

		# the second block of the ResNet module are the 3x3 CONVs
		bn2 = BatchNormalization(axis=chanDim, epsilon=bnEps,
			momentum=bnMom)(conv1)
		act2 = Activation("relu")(bn2)
		conv2 = Conv2D(int(K * 0.25), (3, 3), strides=stride,
			padding="same", use_bias=False,
			kernel_regularizer=l2(reg))(act2)

		# the third block of the ResNet module is another set of 1x1
		# CONVs
		bn3 = BatchNormalization(axis=chanDim, epsilon=bnEps,
			momentum=bnMom)(conv2)
		act3 = Activation("relu")(bn3)
		conv3 = Conv2D(K, (1, 1), use_bias=False,
			kernel_regularizer=l2(reg))(act3)

		# if we are to reduce the spatial size, apply a CONV layer to
		# the shortcut
		if red:
			shortcut = Conv2D(K, (1, 1), strides=stride,
				use_bias=False, kernel_regularizer=l2(reg))(act1)

		# add together the shortcut and the final CONV
		x = add([conv3, shortcut])

		# return the addition as the output of the ResNet module
		return x

	@staticmethod
	def build(width, height, depth, classes, stages, filters,
		reg=0.0001, bnEps=2e-5, bnMom=0.9, dataset="cifar"):
		# initialize the input shape to be "channels last" and the
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		# set the input and apply BN
		inputs = Input(shape=inputShape)
		x = BatchNormalization(axis=chanDim, epsilon=bnEps,
			momentum=bnMom)(inputs)

		# check if we are utilizing the CIFAR dataset
		if dataset == "cifar":
			# apply a single CONV layer
			x = Conv2D(filters[0], (3, 3), use_bias=False,
				padding="same", kernel_regularizer=l2(reg))(x)

		# check to see if we are using the Tiny ImageNet dataset
		elif dataset == "tiny_imagenet":
			# apply CONV => BN => ACT => POOL to reduce spatial size
			x = Conv2D(filters[0], (5, 5), use_bias=False,
				padding="same", kernel_regularizer=l2(reg))(x)
			x = BatchNormalization(axis=chanDim, epsilon=bnEps,
				momentum=bnMom)(x)
			x = Activation("relu")(x)
			x = ZeroPadding2D((1, 1))(x)
			x = MaxPooling2D((3, 3), strides=(2, 2))(x)

		# loop over the number of stages
		for i in range(0, len(stages)):
			# initialize the stride, then apply a residual module
			# used to reduce the spatial size of the input volume
			stride = (1, 1) if i == 0 else (2, 2)
			x = ResNet.residual_module(x, filters[i + 1], stride,
				chanDim, red=True, bnEps=bnEps, bnMom=bnMom)

			# loop over the number of layers in the stage
			for j in range(0, stages[i] - 1):
				# apply a ResNet module
				x = ResNet.residual_module(x, filters[i + 1],
					(1, 1), chanDim, bnEps=bnEps, bnMom=bnMom)

		# apply BN => ACT => POOL
		x = BatchNormalization(axis=chanDim, epsilon=bnEps,
			momentum=bnMom)(x)
		x = Activation("relu")(x)
		x = AveragePooling2D((8, 8))(x)

		# softmax classifier
		x = Flatten()(x)
		x = Dense(classes, kernel_regularizer=l2(reg))(x)
		x = Activation("softmax")(x)

		# create the model
		model = Model(inputs, x, name="resnet")

		# return the constructed network architecture
		return model

pyimage/nn/conv/shallownet.py

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

class ShallowNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last"
		model = Sequential()
		inputShape = (height, width, depth)

		# if we are using "channels first", update the input shape
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)

		# define the first (and only) CONV => RELU layer
		model.add(Conv2D(32, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))

		# softmax classifier
		model.add(Flatten())
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

pyimage/nn/mxconv/mxalexnet.py

# import the necessary packages
import mxnet as mx

class MxAlexNet:
	@staticmethod
	def build(classes):
		# data input
		data = mx.sym.Variable("data")

		# Block #1: first CONV => RELU => POOL layer set
		conv1_1 = mx.sym.Convolution(data=data, kernel=(11, 11),
			stride=(4, 4), num_filter=96)
		act1_1 = mx.sym.LeakyReLU(data=conv1_1, act_type="elu")
		bn1_1 = mx.sym.BatchNorm(data=act1_1)
		pool1 = mx.sym.Pooling(data=bn1_1, pool_type="max",
			kernel=(3, 3), stride=(2, 2))
		do1 = mx.sym.Dropout(data=pool1, p=0.25)

		# Block #2: second CONV => RELU => POOL layer set
		conv2_1 = mx.sym.Convolution(data=do1, kernel=(5, 5),
			pad=(2, 2), num_filter=256)
		act2_1 = mx.sym.LeakyReLU(data=conv2_1, act_type="elu")
		bn2_1 = mx.sym.BatchNorm(data=act2_1)
		pool2 = mx.sym.Pooling(data=bn2_1, pool_type="max",
			kernel=(3, 3), stride=(2, 2))
		do2 = mx.sym.Dropout(data=pool2, p=0.25)

		# Block #3: (CONV => RELU) * 3 => POOL
		conv3_1 = mx.sym.Convolution(data=do2, kernel=(3, 3),
			pad=(1, 1), num_filter=384)
		act3_1 = mx.sym.LeakyReLU(data=conv3_1, act_type="elu")
		bn3_1 = mx.sym.BatchNorm(data=act3_1)
		conv3_2 = mx.sym.Convolution(data=bn3_1, kernel=(3, 3),
			pad=(1, 1), num_filter=384)
		act3_2 = mx.sym.LeakyReLU(data=conv3_2, act_type="elu")
		bn3_2 = mx.sym.BatchNorm(data=act3_2)
		conv3_3 = mx.sym.Convolution(data=bn3_2, kernel=(3, 3),
			pad=(1, 1), num_filter=256)
		act3_3 = mx.sym.LeakyReLU(data=conv3_3, act_type="elu")
		bn3_3 = mx.sym.BatchNorm(data=act3_3)
		pool3 = mx.sym.Pooling(data=bn3_3, pool_type="max",
			kernel=(3, 3), stride=(2, 2))
		do3 = mx.sym.Dropout(data=pool3, p=0.25)

		# Block #4: first set of FC => RELU layers
		flatten = mx.sym.Flatten(data=do3)
		fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=4096)
		act4_1 = mx.sym.LeakyReLU(data=fc1, act_type="elu")
		bn4_1 = mx.sym.BatchNorm(data=act4_1)
		do4 = mx.sym.Dropout(data=bn4_1, p=0.5)

		# Block #5: second set of FC => RELU layers
		fc2 = mx.sym.FullyConnected(data=do4, num_hidden=4096)
		act5_1 = mx.sym.LeakyReLU(data=fc2, act_type="elu")
		bn5_1 = mx.sym.BatchNorm(data=act5_1)
		do5 = mx.sym.Dropout(data=bn5_1, p=0.5)

		# softmax classifier
		fc3 = mx.sym.FullyConnected(data=do5, num_hidden=classes)
		model = mx.sym.SoftmaxOutput(data=fc3, name="softmax")

		# return the network architecture
		return model

pyimage/nn/mxconv/mxgooglenet.py

# import the necessary packages
import mxnet as mx

class MxGoogLeNet:
	@staticmethod
	def conv_module(data, K, kX, kY, pad=(0, 0), stride=(1, 1)):
		# define the CONV => BN => RELU pattern
		conv = mx.sym.Convolution(data=data, kernel=(kX, kY),
			num_filter=K, pad=pad, stride=stride)
		bn = mx.sym.BatchNorm(data=conv)
		act = mx.sym.Activation(data=bn, act_type="relu")

		# return the block
		return act

	@staticmethod
	def inception_module(data, num1x1, num3x3Reduce, num3x3,
		num5x5Reduce, num5x5, num1x1Proj):
		# the first branch of the Inception module consists of 1x1
		# convolutions
		conv_1x1 = MxGoogLeNet.conv_module(data, num1x1, 1, 1)

		# the second branch of the Inception module is a set of 1x1
		# convolutions followed by 3x3 convolutions
		conv_r3x3 = MxGoogLeNet.conv_module(data, num3x3Reduce, 1, 1)
		conv_3x3 = MxGoogLeNet.conv_module(conv_r3x3, num3x3, 3, 3,
			pad=(1, 1))

		# the third branch of the Inception module is a set of 1x1
		# convolutions followed by 5x5 convolutions
		conv_r5x5 = MxGoogLeNet.conv_module(data, num5x5Reduce, 1, 1)
		conv_5x5 = MxGoogLeNet.conv_module(conv_r5x5, num5x5, 5, 5,
			pad=(2, 2))

		# the final branch of the Inception module is the POOL +
		# projection layer set
		pool = mx.sym.Pooling(data=data, pool_type="max", pad=(1, 1),
			kernel=(3, 3), stride=(1, 1))
		conv_proj = MxGoogLeNet.conv_module(pool, num1x1Proj, 1, 1)

		# concatenate the filters across the channel dimension
		concat = mx.sym.Concat(*[conv_1x1, conv_3x3, conv_5x5,
			conv_proj])

		# return the block
		return concat

	@staticmethod
	def build(classes):
		# data input
		data = mx.sym.Variable("data")

		# Block #1: CONV => POOL => CONV => CONV => POOL
		conv1_1 = MxGoogLeNet.conv_module(data, 64, 7, 7,
			pad=(3, 3), stride=(2, 2))
		pool1 = mx.sym.Pooling(data=conv1_1, pool_type="max",
			pad=(1, 1), kernel=(3, 3), stride=(2, 2))
		conv1_2 = MxGoogLeNet.conv_module(pool1, 64, 1, 1)
		conv1_3 = MxGoogLeNet.conv_module(conv1_2, 192, 3, 3,
			pad=(1, 1))
		pool2 = mx.sym.Pooling(data=conv1_3, pool_type="max",
			pad=(1, 1), kernel=(3, 3), stride=(2, 2))

		# Block #3: (INCEP * 2) => POOL
		in3a = MxGoogLeNet.inception_module(pool2, 64, 96, 128, 16,
			32, 32)
		in3b = MxGoogLeNet.inception_module(in3a, 128, 128, 192, 32,
			96, 64)
		pool3 = mx.sym.Pooling(data=in3b, pool_type="max",
			pad=(1, 1), kernel=(3, 3), stride=(2, 2))

		# Block #4: (INCEP * 5) => POOL
		in4a = MxGoogLeNet.inception_module(pool3, 192, 96, 208, 16,
			48, 64)
		in4b = MxGoogLeNet.inception_module(in4a, 160, 112, 224, 24,
			64, 64)
		in4c = MxGoogLeNet.inception_module(in4b, 128, 128, 256, 24,
			64, 64)
		in4d = MxGoogLeNet.inception_module(in4c, 112, 144, 288, 32,
			64, 64)
		in4e = MxGoogLeNet.inception_module(in4d, 256, 160, 320, 32,
			128, 128,)
		pool4 = mx.sym.Pooling(data=in4e, pool_type="max",
			pad=(1, 1), kernel=(3, 3), stride=(2, 2))

		# Block #5: (INCEP * 2) => POOL => DROPOUT
		in5a = MxGoogLeNet.inception_module(pool4, 256, 160, 320, 32,
			128, 128)
		in5b = MxGoogLeNet.inception_module(in5a, 384, 192, 384, 48,
			128, 128)
		pool5 = mx.sym.Pooling(data=in5b, pool_type="avg",
			kernel=(7, 7), stride=(1, 1))
		do = mx.sym.Dropout(data=pool5, p=0.4)

		# softmax classifier
		flatten = mx.sym.Flatten(data=do)
		fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=classes)
		model = mx.sym.SoftmaxOutput(data=fc1, name="softmax")

		# return the network architecture
		return model

if __name__ == "__main__":
	# render a visualization of the network
	model = MxGoogLeNet.build(1000)
	v = mx.viz.plot_network(model, shape={"data": (1, 3, 224, 224)},
		node_attrs={"shape": "rect", "fixedsize": "false"})
	v.render()

pyimage/nn/mxconv/mxresnet.py

# import the necessary packages
import mxnet as mx

class MxResNet:
	# uses "bottleneck" module with pre-activation (He et al. 2016)
	@staticmethod
	def residual_module(data, K, stride, red=False, bnEps=2e-5,
		bnMom=0.9):
		# the shortcut branch of the ResNet module should be
		# initialized as the input (identity) data
		shortcut = data

		# the first block of the ResNet module are 1x1 CONVs
		bn1 = mx.sym.BatchNorm(data=data, fix_gamma=False,
			eps=bnEps, momentum=bnMom)
		act1 = mx.sym.Activation(data=bn1, act_type="relu")
		conv1 = mx.sym.Convolution(data=act1, pad=(0, 0),
			kernel=(1, 1), stride=(1, 1), num_filter=int(K * 0.25),
			no_bias=True)

		# the second block of the ResNet module are 3x3 CONVs
		bn2 = mx.sym.BatchNorm(data=conv1, fix_gamma=False,
			eps=bnEps, momentum=bnMom)
		act2 = mx.sym.Activation(data=bn2, act_type="relu")
		conv2 = mx.sym.Convolution(data=act2, pad=(1, 1),
			kernel=(3, 3), stride=stride, num_filter=int(K * 0.25),
			no_bias=True)

		# the third block of the ResNet module is another set of 1x1
		# CONVs
		bn3 = mx.sym.BatchNorm(data=conv2, fix_gamma=False,
			eps=bnEps, momentum=bnMom)
		act3 = mx.sym.Activation(data=bn3, act_type="relu")
		conv3 = mx.sym.Convolution(data=act3, pad=(0, 0),
			kernel=(1, 1), stride=(1, 1), num_filter=K, no_bias=True)

		# if we are to reduce the spatial size, apply a CONV layer
		# to the shortcut
		if red:
			shortcut = mx.sym.Convolution(data=act1, pad=(0, 0),
				kernel=(1, 1), stride=stride, num_filter=K,
				no_bias=True)

		# add together the shortcut and the final CONV
		add = conv3 + shortcut

		# return the addition as the output of the ResNet module
		return add

	@staticmethod
	def build(classes, stages, filters, bnEps=2e-5, bnMom=0.9):
		# data input
		data = mx.sym.Variable("data")

		# Block #1: BN => CONV => ACT => POOL, then initialize the
		# "body" of the network
		bn1_1 = mx.sym.BatchNorm(data=data, fix_gamma=True,
			eps=bnEps, momentum=bnMom)
		conv1_1 = mx.sym.Convolution(data=bn1_1, pad=(3, 3),
			kernel=(7, 7), stride=(2, 2), num_filter=filters[0],
			no_bias=True)
		bn1_2 = mx.sym.BatchNorm(data=conv1_1, fix_gamma=False,
			eps=bnEps, momentum=bnMom)
		act1_2 = mx.sym.Activation(data=bn1_2, act_type="relu")
		pool1 = mx.sym.Pooling(data=act1_2, pool_type="max",
			pad=(1, 1), kernel=(3, 3), stride=(2, 2))
		body = pool1

		# loop over the number of stages
		for i in range(0, len(stages)):
			# initialize the stride, then apply a residual module
			# used to reduce the spatial size of the input volume
			stride = (1, 1) if i == 0 else (2, 2)
			body = MxResNet.residual_module(body, filters[i + 1],
				stride, red=True, bnEps=bnEps, bnMom=bnMom)

			# loop over the number of layers in the stage
			for j in range(0, stages[i] - 1):
				# apply a ResNet module
				body = MxResNet.residual_module(body, filters[i + 1],
					(1, 1), bnEps=bnEps, bnMom=bnMom)

		# apply BN => ACT => POOL
		bn2_1 = mx.sym.BatchNorm(data=body, fix_gamma=False,
			eps=bnEps, momentum=bnMom)
		act2_1 = mx.sym.Activation(data=bn2_1, act_type="relu")
		pool2 = mx.sym.Pooling(data=act2_1, pool_type="avg",
			global_pool=True, kernel=(7, 7))

		# softmax classifier
		flatten = mx.sym.Flatten(data=pool2)
		fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=classes)
		model = mx.sym.SoftmaxOutput(data=fc1, name="softmax")

		# return the network architecture
		return model

if __name__ == "__main__":
	# render a visualization of the network
	model = MxResNet.build(1000, (3, 4, 6, 3),
		(64, 256, 512, 1024, 2048))
	v = mx.viz.plot_network(model, shape={"data": (1, 3, 224, 224)},
		node_attrs={"shape": "rect", "fixedsize": "false"})
	v.render()

pyimage/nn/mxconv/mxsqueezenet.py

# import the necessary packages
import mxnet as mx

class MxSqueezeNet:
	@staticmethod
	def squeeze(input, numFilter):
		# the first part of a FIRE module consists of a number of 1x1
		# filter squeezes on the input data followed by an activation
		squeeze_1x1 = mx.sym.Convolution(data=input, kernel=(1, 1),
			stride=(1, 1), num_filter=numFilter)
		act_1x1 = mx.sym.LeakyReLU(data=squeeze_1x1,
			act_type="elu")

		# return the activation for the squeeze
		return act_1x1

	@staticmethod
	def fire(input, numSqueezeFilter, numExpandFilter):
		# construct the 1x1 squeeze followed by the 1x1 expand
		squeeze_1x1 = MxSqueezeNet.squeeze(input, numSqueezeFilter)
		expand_1x1 = mx.sym.Convolution(data=squeeze_1x1,
			kernel=(1, 1), stride=(1, 1), num_filter=numExpandFilter)
		relu_expand_1x1 = mx.sym.LeakyReLU(data=expand_1x1,
			act_type="elu")

		# construct the 3x3 expand
		expand_3x3 = mx.sym.Convolution(data=squeeze_1x1, pad=(1, 1),
			kernel=(3, 3), stide=(1, 1), num_filter=numExpandFilter)
		relu_expand_3x3 = mx.sym.LeakyReLU(data=expand_3x3,
			act_type="elu")

		# the output of the FIRE module is the concatenation of the
		# activation for the 1x1 and 3x3 expands along the channel
		# dimension
		output = mx.sym.Concat(relu_expand_1x1, relu_expand_3x3,
			dim=1)

		# return the output of the FIRE module
		return output

	@staticmethod
	def build(classes):
		# data input
		data = mx.sym.Variable("data")

		# Block #1: CONV => RELU => POOL
		conv_1 = mx.sym.Convolution(data=data, kernel=(7, 7),
			stride=(2, 2), num_filter=96)
		relu_1 = mx.sym.LeakyReLU(data=conv_1, act_type="elu")
		pool_1 = mx.sym.Pooling(data=relu_1, kernel=(3, 3),
			stride=(2, 2), pool_type="max")

		# Block #2-4: (FIRE * 3) => POOL
		fire_2 = MxSqueezeNet.fire(pool_1, numSqueezeFilter=16,
			numExpandFilter=64)
		fire_3 = MxSqueezeNet.fire(fire_2, numSqueezeFilter=16,
			numExpandFilter=64)
		fire_4 = MxSqueezeNet.fire(fire_3, numSqueezeFilter=32,
			 numExpandFilter=128)
		pool_4 = mx.sym.Pooling(data=fire_4, kernel=(3, 3),
			stride=(2, 2), pool_type="max")

		# Block #5-8: (FIRE * 4) => POOL
		fire_5 = MxSqueezeNet.fire(pool_4, numSqueezeFilter=32,
			numExpandFilter=128)
		fire_6 = MxSqueezeNet.fire(fire_5, numSqueezeFilter=48,
			numExpandFilter=192)
		fire_7 = MxSqueezeNet.fire(fire_6, numSqueezeFilter=48,
			numExpandFilter=192)
		fire_8 = MxSqueezeNet.fire(fire_7, numSqueezeFilter=64,
			numExpandFilter=256)
		pool_8 = mx.sym.Pooling(data=fire_8, kernel=(3, 3),
			stride=(2, 2), pool_type="max")

		# Block #9-10: FIRE => DROPOUT => CONV => RELU => POOL
		fire_9 = MxSqueezeNet.fire(pool_8, numSqueezeFilter=64,
			numExpandFilter=256)
		do_9 = mx.sym.Dropout(data=fire_9, p=0.5)
		conv_10 = mx.sym.Convolution(data=do_9, num_filter=classes,
			kernel=(1, 1), stride=(1, 1))
		relu_10 = mx.sym.LeakyReLU(data=conv_10, act_type="elu")
		pool_10 = mx.sym.Pooling(data=relu_10, kernel=(13, 13),
			pool_type="avg")

		# softmax classifier
		flatten = mx.sym.Flatten(data=pool_10)
		model = mx.sym.SoftmaxOutput(data=flatten, name="softmax")

		# return the network architecture
		return model

pyimage/nn/mxconv/mxvggnet.py

# import the necessary packages
import mxnet as mx

class MxVGGNet:
	@staticmethod
	def build(classes):
		# data input
		data = mx.sym.Variable("data")

		# Block #1: (CONV => RELU) * 2 => POOL
		conv1_1 = mx.sym.Convolution(data=data, kernel=(3, 3),
			pad=(1, 1), num_filter=64, name="conv1_1")
		act1_1 = mx.sym.LeakyReLU(data=conv1_1, act_type="prelu",
			name="act1_1")
		bn1_1 = mx.sym.BatchNorm(data=act1_1, name="bn1_1")
		conv1_2 = mx.sym.Convolution(data=bn1_1, kernel=(3, 3),
			pad=(1, 1), num_filter=64, name="conv1_2")
		act1_2 = mx.sym.LeakyReLU(data=conv1_2, act_type="prelu",
			name="act1_2")
		bn1_2 = mx.sym.BatchNorm(data=act1_2, name="bn1_2")
		pool1 = mx.sym.Pooling(data=bn1_2, pool_type="max",
			kernel=(2, 2), stride=(2, 2), name="pool1")
		do1 = mx.sym.Dropout(data=pool1, p=0.25)

		# Block #2: (CONV => RELU) * 2 => POOL
		conv2_1 = mx.sym.Convolution(data=do1, kernel=(3, 3),
			pad=(1, 1), num_filter=128, name="conv2_1")
		act2_1 = mx.sym.LeakyReLU(data=conv2_1, act_type="prelu",
			name="act2_1")
		bn2_1 = mx.sym.BatchNorm(data=act2_1, name="bn2_1")
		conv2_2 = mx.sym.Convolution(data=bn2_1, kernel=(3, 3),
			pad=(1, 1), num_filter=128, name="conv2_2")
		act2_2 = mx.sym.LeakyReLU(data=conv2_2, act_type="prelu",
			name="act2_2")
		bn2_2 = mx.sym.BatchNorm(data=act2_2, name="bn2_2")
		pool2 = mx.sym.Pooling(data=bn2_2, pool_type="max",
			kernel=(2, 2), stride=(2, 2), name="pool2")
		do2 = mx.sym.Dropout(data=pool2, p=0.25)

		# Block #3: (CONV => RELU) * 3 => POOL
		conv3_1 = mx.sym.Convolution(data=do2, kernel=(3, 3),
			pad=(1, 1), num_filter=256, name="conv3_1")
		act3_1 = mx.sym.LeakyReLU(data=conv3_1, act_type="prelu",
			name="act3_1")
		bn3_1 = mx.sym.BatchNorm(data=act3_1, name="bn3_1")
		conv3_2 = mx.sym.Convolution(data=bn3_1, kernel=(3, 3),
			pad=(1, 1), num_filter=256, name="conv3_2")
		act3_2 = mx.sym.LeakyReLU(data=conv3_2, act_type="prelu",
			name="act3_2")
		bn3_2 = mx.sym.BatchNorm(data=act3_2, name="bn3_2")
		conv3_3 = mx.sym.Convolution(data=bn3_2, kernel=(3, 3),
			pad=(1, 1), num_filter=256, name="conv3_3")
		act3_3 = mx.sym.LeakyReLU(data=conv3_3, act_type="prelu",
			name="act3_3")
		bn3_3 = mx.sym.BatchNorm(data=act3_3, name="bn3_3")
		pool3 = mx.sym.Pooling(data=bn3_3, pool_type="max",
			kernel=(2, 2), stride=(2, 2), name="pool3")
		do3 = mx.sym.Dropout(data=pool3, p=0.25)

		# Block #4: (CONV => RELU) * 3 => POOL
		conv4_1 = mx.sym.Convolution(data=do3, kernel=(3, 3),
			pad=(1, 1), num_filter=512, name="conv4_1")
		act4_1 = mx.sym.LeakyReLU(data=conv4_1, act_type="prelu",
			name="act4_1")
		bn4_1 = mx.sym.BatchNorm(data=act4_1, name="bn4_1")
		conv4_2 = mx.sym.Convolution(data=bn4_1, kernel=(3, 3),
			pad=(1, 1), num_filter=512, name="conv4_2")
		act4_2 = mx.sym.LeakyReLU(data=conv4_2, act_type="prelu",
			name="act4_2")
		bn4_2 = mx.sym.BatchNorm(data=act4_2, name="bn4_2")
		conv4_3 = mx.sym.Convolution(data=bn4_2, kernel=(3, 3),
			pad=(1, 1), num_filter=512, name="conv4_3")
		act4_3 = mx.sym.LeakyReLU(data=conv4_3, act_type="prelu",
			name="act4_3")
		bn4_3 = mx.sym.BatchNorm(data=act4_3, name="bn4_3")
		pool4 = mx.sym.Pooling(data=bn4_3, pool_type="max",
			kernel=(2, 2), stride=(2, 2), name="pool3")
		do4 = mx.sym.Dropout(data=pool4, p=0.25)

		# Block #5: (CONV => RELU) * 3 => POOL
		conv5_1 = mx.sym.Convolution(data=do4, kernel=(3, 3),
			pad=(1, 1), num_filter=512, name="conv5_1")
		act5_1 = mx.sym.LeakyReLU(data=conv5_1, act_type="prelu",
			name="act5_1")
		bn5_1 = mx.sym.BatchNorm(data=act5_1, name="bn5_1")
		conv5_2 = mx.sym.Convolution(data=bn5_1, kernel=(3, 3),
			pad=(1, 1), num_filter=512, name="conv5_2")
		act5_2 = mx.sym.LeakyReLU(data=conv5_2, act_type="prelu",
			name="act5_2")
		bn5_2 = mx.sym.BatchNorm(data=act5_2, name="bn5_2")
		conv5_3 = mx.sym.Convolution(data=bn5_2, kernel=(3, 3),
			pad=(1, 1), num_filter=512, name="conv5_3")
		act5_3 = mx.sym.LeakyReLU(data=conv5_3, act_type="prelu",
			name="act5_3")
		bn5_3 = mx.sym.BatchNorm(data=act5_3, name="bn5_3")
		pool5 = mx.sym.Pooling(data=bn5_3, pool_type="max",
			kernel=(2, 2), stride=(2, 2), name="pool5")
		do5 = mx.sym.Dropout(data=pool5, p=0.25)

		# Block #6: FC => RELU layers
		flatten = mx.sym.Flatten(data=do5, name="flatten")
		fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=4096,
			name="fc1")
		act6_1 = mx.sym.LeakyReLU(data=fc1, act_type="prelu",
			name="act6_1")
		bn6_1 = mx.sym.BatchNorm(data=act6_1, name="bn6_1")
		do6 = mx.sym.Dropout(data=bn6_1, p=0.5)

		# Block #7: FC => RELU layers
		fc2 = mx.sym.FullyConnected(data=do6, num_hidden=4096,
			name="fc2")
		act7_1 = mx.sym.LeakyReLU(data=fc2, act_type="prelu",
			name="act7_1")
		bn7_1 = mx.sym.BatchNorm(data=act7_1, name="bn7_1")
		do7 = mx.sym.Dropout(data=bn7_1, p=0.5)

		# softmax classifier
		fc3 = mx.sym.FullyConnected(data=do7, num_hidden=classes,
			name="fc3")
		model = mx.sym.SoftmaxOutput(data=fc3, name="softmax")

		# return the network architecture
		return model

pyimage/nn/neuralnetwork.py

# import the necessary packages
import numpy as np

class NeuralNetwork:
	def __init__(self, layers, alpha=0.1):
		# initialize the list of weights matrices, then store the
		# network architecture and learning rate
		self.W = []
		self.layers = layers
		self.alpha = alpha

		# start looping from the index of the first layer but
		# stop before we reach the last two layers
		for i in np.arange(0, len(layers) - 2):
			# randomly initialize a weight matrix connecting the
			# number of nodes in each respective layer together,
			# adding an extra node for the bias
			w = np.random.randn(layers[i] + 1, layers[i + 1] + 1)
			self.W.append(w / np.sqrt(layers[i]))

		# the last two layers are a special case where the input
		# connections need a bias term but the output does not
		w = np.random.randn(layers[-2] + 1, layers[-1])
		self.W.append(w / np.sqrt(layers[-2]))

	def __repr__(self):
		# construct and return a string that represents the network
		# architecture
		return "NeuralNetwork: {}".format(
			"-".join(str(l) for l in self.layers))

	def sigmoid(self, x):
		# compute and return the sigmoid activation value for a
		# given input value
		return 1.0 / (1 + np.exp(-x))

	def sigmoid_deriv(self, x):
		# compute the derivative of the sigmoid function ASSUMING
		# that `x` has already been passed through the `sigmoid`
		# function
		return x * (1 - x)

	def fit(self, X, y, epochs=1000, displayUpdate=100):
		# insert a column of 1's as the last entry in the feature
		# matrix -- this little trick allows us to treat the bias
		# as a trainable parameter within the weight matrix
		X = np.c_[X, np.ones((X.shape[0]))]

		# loop over the desired number of epochs
		for epoch in np.arange(0, epochs):
			# loop over each individual data point and train
			# our network on it
			for (x, target) in zip(X, y):
				self.fit_partial(x, target)

			# check to see if we should display a training update
			if epoch == 0 or (epoch + 1) % displayUpdate == 0:
				loss = self.calculate_loss(X, y)
				print("[INFO] epoch={}, loss={:.7f}".format(
					epoch + 1, loss))

	def fit_partial(self, x, y):
		# construct our list of output activations for each layer
		# as our data point flows through the network; the first
		# activation is a special case -- it's just the input
		# feature vector itself
		A = [np.atleast_2d(x)]

		# FEEDFORWARD:
		# loop over the layers in the network
		for layer in np.arange(0, len(self.W)):
			# feedforward the activation at the current layer by
			# taking the dot product between the activation and
			# the weight matrix -- this is called the "net input"
			# to the current layer
			net = A[layer].dot(self.W[layer])

			# computing the "net output" is simply applying our
			# non-linear activation function to the net input
			out = self.sigmoid(net)

			# once we have the net output, add it to our list of
			# activations
			A.append(out)

		# BACKPROPAGATION
		# the first phase of backpropagation is to compute the
		# difference between our *prediction* (the final output
		# activation in the activations list) and the true target
		# value
		error = A[-1] - y

		# from here, we need to apply the chain rule and build our
		# list of deltas `D`; the first entry in the deltas is
		# simply the error of the output layer times the derivative
		# of our activation function for the output value
		D = [error * self.sigmoid_deriv(A[-1])]

		# once you understand the chain rule it becomes super easy
		# to implement with a `for` loop -- simply loop over the
		# layers in reverse order (ignoring the last two since we
		# already have taken them into account)
		for layer in np.arange(len(A) - 2, 0, -1):
			# the delta for the current layer is equal to the delta
			# of the *previous layer* dotted with the weight matrix
			# of the current layer, followed by multiplying the delta
			# by the derivative of the non-linear activation function
			# for the activations of the current layer
			delta = D[-1].dot(self.W[layer].T)
			delta = delta * self.sigmoid_deriv(A[layer])
			D.append(delta)

		# since we looped over our layers in reverse order we need to
		# reverse the deltas
		D = D[::-1]

		# WEIGHT UPDATE PHASE
		# loop over the layers
		for layer in np.arange(0, len(self.W)):
			# update our weights by taking the dot product of the layer
			# activations with their respective deltas, then multiplying
			# this value by some small learning rate and adding to our
			# weight matrix -- this is where the actual "learning" takes
			# place
			self.W[layer] += -self.alpha * A[layer].T.dot(D[layer])

	def predict(self, X, addBias=True):
		# initialize the output prediction as the input features -- this
		# value will be (forward) propagated through the network to
		# obtain the final prediction
		p = np.atleast_2d(X)

		# check to see if the bias column should be added
		if addBias:
			# insert a column of 1's as the last entry in the feature
			# matrix (bias)
			p = np.c_[p, np.ones((p.shape[0]))]

		# loop over our layers in the network
		for layer in np.arange(0, len(self.W)):
			# computing the output prediction is as simple as taking
			# the dot product between the current activation value `p`
			# and the weight matrix associated with the current layer,
			# then passing this value through a non-linear activation
			# function
			p = self.sigmoid(np.dot(p, self.W[layer]))

		# return the predicted value
		return p

	def calculate_loss(self, X, targets):
		# make predictions for the input data points then compute
		# the loss
		targets = np.atleast_2d(targets)
		predictions = self.predict(X, addBias=False)
		loss = 0.5 * np.sum((predictions - targets) ** 2)

		# return the loss
		return loss

pyimage/nn/perceptron.py

# import the necessary packages
import numpy as np

class Perceptron:
	def __init__(self, N, alpha=0.1):
		# initialize the weight matrix and store the learning rate
		self.W = np.random.randn(N + 1) / np.sqrt(N)
		self.alpha = alpha

	def step(self, x):
		# apply the step function
		return 1 if x > 0 else 0

	def fit(self, X, y, epochs=10):
		# insert a column of 1's as the last entry in the feature
		# matrix -- this little trick allows us to treat the bias
		# as a trainable parameter within the weight matrix
		X = np.c_[X, np.ones((X.shape[0]))]

		# loop over the desired number of epochs
		for epoch in np.arange(0, epochs):
			# loop over each individual data point
			for (x, target) in zip(X, y):
				# take the dot product between the input features
				# and the weight matrix, then pass this value
				# through the step function to obtain the prediction
				p = self.step(np.dot(x, self.W))

				# only perform a weight update if our prediction
				# does not match the target
				if p != target:
					# determine the error
					error = p - target

					# update the weight matrix
					self.W += -self.alpha * error * x

	def predict(self, X, addBias=True):
		# ensure our input is a matrix
		X = np.atleast_2d(X)

		# check to see if the bias column should be added
		if addBias:
			# insert a column of 1's as the last entry in the feature
			# matrix (bias)
			X = np.c_[X, np.ones((X.shape[0]))]

		# take the dot product between the input features and the
		# weight matrix, then pass the value through the step
		# function
		return self.step(np.dot(X, self.W))

pyimage/preprocessing/aspectawarepreprocessor.py

# import the necessary packages
import imutils
import cv2

class AspectAwarePreprocessor:
	def __init__(self, width, height, inter=cv2.INTER_AREA):
		# store the target image width, height, and interpolation
		# method used when resizing
		self.width = width
		self.height = height
		self.inter = inter

	def preprocess(self, image):
		# grab the dimensions of the image and then initialize
		# the deltas to use when cropping
		(h, w) = image.shape[:2]
		dW = 0
		dH = 0

		# if the width is smaller than the height, then resize
		# along the width (i.e., the smaller dimension) and then
		# update the deltas to crop the height to the desired
		# dimension
		if w < h:
			image = imutils.resize(image, width=self.width,
				inter=self.inter)
			dH = int((image.shape[0] - self.height) / 2.0)

		# otherwise, the height is smaller than the width so
		# resize along the height and then update the deltas
		# crop along the width
		else:
			image = imutils.resize(image, height=self.height,
				inter=self.inter)
			dW = int((image.shape[1] - self.width) / 2.0)

		# now that our images have been resized, we need to
		# re-grab the width and height, followed by performing
		# the crop
		(h, w) = image.shape[:2]
		image = image[dH:h - dH, dW:w - dW]

		# finally, resize the image to the provided spatial
		# dimensions to ensure our output image is always a fixed
		# size
		return cv2.resize(image, (self.width, self.height),
			interpolation=self.inter)

pyimage/preprocessing/croppreprocessor.py

# import the necessary packages
import numpy as np
import cv2

class CropPreprocessor:
	def __init__(self, width, height, horiz=True, inter=cv2.INTER_AREA):
		# store the target image width, height, whether or not
		# horizontal flips should be included, along with the
		# interpolation method used when resizing
		self.width = width
		self.height = height
		self.horiz = horiz
		self.inter = inter

	def preprocess(self, image):
		# initialize the list of crops
		crops = []

		# grab the width and height of the image then use these
		# dimensions to define the corners of the image based
		(h, w) = image.shape[:2]
		coords = [
			[0, 0, self.width, self.height],
			[w - self.width, 0, w, self.height],
			[w - self.width, h - self.height, w, h],
			[0, h - self.height, self.width, h]]

		# compute the center crop of the image as well
		dW = int(0.5 * (w - self.width))
		dH = int(0.5 * (h - self.height))
		coords.append([dW, dH, w - dW, h - dH])

		# loop over the coordinates, extract each of the crops,
		# and resize each of them to a fixed size
		for (startX, startY, endX, endY) in coords:
			crop = image[startY:endY, startX:endX]
			crop = cv2.resize(crop, (self.width, self.height),
				interpolation=self.inter)
			crops.append(crop)

		# check to see if the horizontal flips should be taken
		if self.horiz:
			# compute the horizontal mirror flips for each crop
			mirrors = [cv2.flip(c, 1) for c in crops]
			crops.extend(mirrors)

		# return the set of crops
		return np.array(crops)

pyimage/preprocessing/imagetoarraypreprocessor.py

# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array

class ImageToArrayPreprocessor:
	def __init__(self, dataFormat=None):
		# store the image data format
		self.dataFormat = dataFormat

	def preprocess(self, image):
		# apply the Keras utility function that correctly rearranges
		# the dimensions of the image
		return img_to_array(image, data_format=self.dataFormat)

pyimage/preprocessing/meanpreprocessor.py

# import the necessary packages
import cv2

class MeanPreprocessor:
	def __init__(self, rMean, gMean, bMean):
		# store the Red, Green, and Blue channel averages across a
		# training set
		self.rMean = rMean
		self.gMean = gMean
		self.bMean = bMean

	def preprocess(self, image):
		# split the image into its respective Red, Green, and Blue
		# channels
		(B, G, R) = cv2.split(image.astype("float32"))

		# subtract the means for each channel
		R -= self.rMean
		G -= self.gMean
		B -= self.bMean

		# merge the channels back together and return the image
		return cv2.merge([B, G, R])

pyimage/preprocessing/patchpreprocessor.py

# import the necessary packages
from sklearn.feature_extraction.image import extract_patches_2d

class PatchPreprocessor:
	def __init__(self, width, height):
		# store the target width and height of the image
		self.width = width
		self.height = height

	def preprocess(self, image):
		# extract a random crop from the image with the target width
		# and height
		return extract_patches_2d(image, (self.height, self.width),
			max_patches=1)[0]

pyimage/preprocessing/simplepreprocessor.py

# import the necessary packages
import cv2

class SimplePreprocessor:
	def __init__(self, width, height, inter=cv2.INTER_AREA):
		# store the target image width, height, and interpolation
		# method used when resizing
		self.width = width
		self.height = height
		self.inter = inter

	def preprocess(self, image):
		# resize the image to a fixed size, ignoring the aspect
		# ratio
		return cv2.resize(image, (self.width, self.height),
			interpolation=self.inter)

pyimage/utils/captchahelper.py

# import the necessary packages
import imutils
import cv2

def preprocess(image, width, height):
	# grab the dimensions of the image, then initialize
	# the padding values
	(h, w) = image.shape[:2]

	# if the width is greater than the height then resize along
	# the width
	if w > h:
		image = imutils.resize(image, width=width)

	# otherwise, the height is greater than the width so resize
	# along the height
	else:
		image = imutils.resize(image, height=height)

	# determine the padding values for the width and height to
	# obtain the target dimensions
	padW = int((width - image.shape[1]) / 2.0)
	padH = int((height - image.shape[0]) / 2.0)

	# pad the image then apply one more resizing to handle any
	# rounding issues
	image = cv2.copyMakeBorder(image, padH, padH, padW, padW,
		cv2.BORDER_REPLICATE)
	image = cv2.resize(image, (width, height))

	# return the pre-processed image
	return image

pyimage/utils/imagenethelper.py

# import the necessary packages
import numpy as np
import os

class ImageNetHelper:
	def __init__(self, config):
		# store the configuration object
		self.config = config

		# build the label mappings and validation blacklist
		self.labelMappings = self.buildClassLabels()
		self.valBlacklist = self.buildBlackist()

	def buildClassLabels(self):
		# load the contents of the file that maps the WordNet IDs
		# to integers, then initialize the label mappings dictionary
		rows = open(self.config.WORD_IDS).read().strip().split("\n")
		labelMappings = {}

		# loop over the labels
		for row in rows:
			# split the row into the WordNet ID, label integer, and
			# human readable label
			(wordID, label, hrLabel) = row.split(" ")

			# update the label mappings dictionary using the word ID
			# as the key and the label as the value, subtracting `1`
			# from the label since MATLAB is one-indexed while Python
			# is zero-indexed
			labelMappings[wordID] = int(label) - 1

		# return the label mappings dictionary
		return labelMappings

	def buildBlackist(self):
		# load the list of blacklisted image IDs and convert them to
		# a set
		rows = open(self.config.VAL_BLACKLIST).read()
		rows = set(rows.strip().split("\n"))

		# return the blacklisted image IDs
		return rows

	def buildTrainingSet(self):
		# load the contents of the training input file that lists
		# the partial image ID and image number, then initialize
		# the list of image paths and class labels
		rows = open(self.config.TRAIN_LIST).read().strip()
		rows = rows.split("\n")
		paths = []
		labels = []

		# loop over the rows in the input training file
		for row in rows:
			# break the row into the the partial path and image
			# number (the image number is sequential and is
			# essentially useless to us)
			(partialPath, imageNum) = row.strip().split(" ")

			# construct the full path to the training image, then
			# grab the word ID from the path and use it to determine
			# the integer class label
			path = os.path.sep.join([self.config.IMAGES_PATH,
				"train", "{}.JPEG".format(partialPath)])
			wordID = partialPath.split("/")[0]
			label = self.labelMappings[wordID]

			# update the respective paths and label lists
			paths.append(path)
			labels.append(label)

		# return a tuple of image paths and associated integer class
		# labels
		return (np.array(paths), np.array(labels))

	def buildValidationSet(self):
		# initialize the list of image paths and class labels
		paths = []
		labels = []

		# load the contents of the file that lists the partial
		# validation image filenames
		valFilenames = open(self.config.VAL_LIST).read()
		valFilenames = valFilenames.strip().split("\n")

		# load the contents of the file that contains the *actual*
		# ground-truth integer class labels for the validation set
		valLabels = open(self.config.VAL_LABELS).read()
		valLabels = valLabels.strip().split("\n")

		# loop over the validation data
		for (row, label) in zip(valFilenames, valLabels):
			# break the row into the partial path and image number
			(partialPath, imageNum) = row.strip().split(" ")

			# if the image number is in the blacklist set then we
			# should ignore this validation image
			if imageNum in self.valBlacklist:
				continue

			# construct the full path to the validation image, then
			# update the respective paths and labels lists
			path = os.path.sep.join([self.config.IMAGES_PATH, "val",
			 	"{}.JPEG".format(partialPath)])
			paths.append(path)
			labels.append(int(label) - 1)

		# return a tuple of image paths and associated integer class
		# labels
		return (np.array(paths), np.array(labels))

pyimage/utils/ranked.py

# import the necessary packages
import numpy as np

def rank5_accuracy(preds, labels):
	# initialize the rank-1 and rank-5 accuracies
	rank1 = 0
	rank5 = 0

	# loop over the predictions and ground-truth labels
	for (p, gt) in zip(preds, labels):
		# sort the probabilities by their index in descending
		# order so that the more confident guesses are at the
		# front of the list
		p = np.argsort(p)[::-1]

		# check if the ground-truth label is in the top-5
		# predictions
		if gt in p[:5]:
			rank5 += 1

		# check to see if the ground-truth is the #1 prediction
		if gt == p[0]:
			rank1 += 1

	# compute the final rank-1 and rank-5 accuracies
	rank1 /= float(len(preds))
	rank5 /= float(len(preds))

	# return a tuple of the rank-1 and rank-5 accuracies
	return (rank1, rank5)

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：python流水线一般配置哪些流程呢 python工作流

下一篇：android 设备号解决方案安卓设备码是什么

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯