人脸表情识别(Face expression recognition 简称FER)
普遍认为人类主要有六种基本情感:愤怒(anger)、高兴(happiness)、悲伤(sadness)、惊讶(surprise)、
厌恶(disgust)、恐惧(fear)。而大多数表情识别是基于这六种情感及其拓展情绪实现的
主要困难点是:
1,表情的精细化程度划分:每种情绪最微弱的表现是否需要被分类,分类的界限需要产品给出评估规则。
2,表情类别的多样化:是否还需要补充其他类别的情绪,六种情绪在一些场景下远不能变现人类的真实 情绪。
因此除了基本表情识别外,还有精细表情识别、混合表情识别、非基本表情识别等细致领域的研究。
3,缺少鲁棒性
Fer2013人脸表情数据集由35886张人脸表情图片组成,其中,测试图(Training)28708张,公共验证图(PublicTest)和私有验证图(PrivateTest)各3589张,每张图片是由大小固定为48×48的灰度图像组成,共有7种表情,分别对应于数字标签0-6,具体表情对应的标签和中英文如下: 0 anger 生气; 1 disgust 厌恶; 2 fear 恐惧; 3 happy 开心; 4 sad 伤心;5 surprised 惊讶; 6 normal 中性。
数据集并没有直接给出图片,而是将表情、图片数据、用途的数据保存到csv文件中: 第一张图是csv文件的开头,第一行是表头,说明每列数据的含义,第一列表示表情标签,第二列即为图片数据,这里是原始的图片数据,最后一列为用途。
https://www.kaggle.com/deadskull7/fer2013
数据集并没有直接给出图片,而是将表情、图片数据、用途的数据保存到csv文件中: 第一张图是csv文件的开头,第一行是表头,说明每列数据的含义,第一列表示表情标签,第二列即为图片数据,这里是原始的图片数据,最后一列为用途。
我们的目标是现在将这个.csv文件转换为HDF5格式,这样我们可以更轻松地 在其上面训练卷积神经网络。 解压缩fer2013.tar.gz文件后,我为以下文件设置了以下目录结构。
FER13一共有七个类别:生气,厌恶,恐惧,快乐,悲伤,惊奇和中性。但是,“厌恶”与其他表情之间存在严重失衡,因为它只有113张图片样本(每个类其余的样本超过1,000张)。建议将“厌恶”和“愤怒”合并为一个类(因为情绪在视觉上是相似的),因此将FER13变成了6类问题。
由于我们会将fer2013.csv文件转换为一系列用于训练,验证和测试的HDF5数据集,我们需要定义这些输出HDF5文件的路径。
我们将要实现的用于识别各种情绪和面部表情的网络是受VGG网络家族的启发:
1.网络中的CONV层将仅为3×3。 2.随着网络的加深,我们会将每个CONV层学习的过滤器数量增加一倍。 为了帮助网络训练,我们将在第8章中应用从VGG和ImageNet实验获得的一些先验知识: 1.我们使用MSRA (He等人)的方法初始化CONV和FC层,这样做将使我们的网络学习更快。 2.由于已证明ELU和PReLU可以提高所有分类的分类准确性,在我们的实验中,我们仅以ELU而非ReLU开始。 3.表中包含了名为EmotionVGGNet的网络摘要。每次CONV层之后,我们将应用激活,然后进行批量归一化(将这些层排除在表外以节省空间)。
从SGD优化器开始,其基本学习率为1e-2,动量项为0.9,并且应用了Nesterov加速度。 (默认)Xavier / Glorot初始化方法用于初始化CONV和FC层中的权重。此外,唯一的数据扩充功能是水平翻转-没有其他数据增扩方式(例如旋转缩放等)。
鉴于SGD在降低学习率时导致学习停滞,我决定换掉它,而使用基本学习率为1e-3的Adam代替SGD。除了调整优化器之外,此实验其他参数与第一个实验相同。 在第30个epoch,开始注意到训练损失与验证损失之间的巨大差异,因此停止了训练并将学习率从1e-3降低到1e-4,然后允许网络再训练15个epoch。
但是,结果并不理想。如我们所见,显然存在过度拟合–训练损失继续减少,而验证损失不仅停滞不前,而且还在继续增加。 话虽如此,在第45个epoch结束时,该网络仍能获得66.34%的准确性,比SGD好。如果我可以找到抑制过度拟合的方法,那么Adam优化器方法将在这种情况下可能表现很好。
解决过拟合的一种常见方法是收集更多代表您的验证的训练数据和测试集。 但是,由于FER2013数据集已预先编译,收集其他数据是不可能的。相反,我们可以应用数据增强来帮助减少过度拟合。 在第三个实验中,我保留了Adam优化器,但还添加了随机旋转范围10度,zoom range为0.1等数据增强方法(zoom_range:浮点数或形如[lower,upper]的列表,随机缩放的幅度,若为浮点数,则相当于[lower,upper] = [1 - zoom_range, 1+zoom_range])。 有了新的数据扩充方案,我重复了第二个实验:
如下图所示,约在epoch35饱和开始出现。此时,我停止训练,降低了Adam的学习率,把它 从1e-3降至1e-4,并恢复训练: 这个过程先是导致了精度上的损失,后来精度恢复上升,所以我再次在60时停止训练,将学习率从1ee 4降低到1ee 5,并恢复训练再来15个epoch,共75epoch。
如我们所见,我们现在没有过度拟合的风险了–不利之处在于,前epoch45中,我们在准确性方面没有任何重大进步. 综上所述,通过应用数据增强,我们能够稳定学习,减少过度拟合,并允许我们在验证集上达到67.53%的分类精度。
在FER2013和EmotionVGGNet的最终实验中,我决定进行一些更改: 1.我将Xavier / Glorot初始化(Keras使用的默认设置)换成了MSRA / He初始化。 2.将所有ReLU替换为ELU,以进一步提高准确性。
在FER2013和EmotionVGGNet的最终实验中,我决定进行一些更改: 3.鉴于“厌恶”标签引起的数据失衡,我将“愤怒”和“厌恶”合并为一个类别。 为了合并这两个类,我需要再次使用运行build_dataset.py使用其中的NUM_CLASSES 把它设置为六个而不是七个。
# 命令行参数:python build_dataset.py
# 负责提取fer2013.csv数据集文件,并输出一组HDF5文件;每个训练,验证和测试分组中分别一个。
from config import emotion_config as config
from pyimage.io import HDF5DatasetWriter
import numpy as np
#打开输入文件以进行读取(跳过标题),然后为训练,验证和测试集初始化数据和标签列表
print("[INFO] loading input data...")
f = open(config.INPUT_PATH)
#打开指向输入fer2013.csv文件的指针。通过调用文件的.next方法指针,我们可以跳到下一行,从而可以跳过CSV文件的标题
f.__next__() # Python 2.7:使用f.next()
#分别为训练,验证和测试集初始化图像和标签列表
(trainImages, trainLabels) = ([], [])
(valImages, valLabels) = ([], [])
(testImages, testLabels) = ([], [])
"""
Fer2013人脸表情数据集由35886张人脸表情图片组成,测试图(Training)28708张,公共验证图(PublicTest)和私有验证图(PrivateTest)各3589张,
每张图片是由大小固定为48×48的灰度图像组成,共有7种表情,分别对应于数字标签0-6,具体表情对应的标签和中英文如下:
0 anger 生气; 1 disgust 厌恶; 2 fear 恐惧; 3 happy 开心; 4 sad 伤心;5 surprised 惊讶; 6 normal 中性。
数据集并没有直接给出图片,而是将表情、图片数据、用途的数据保存到csv文件中:
第一行是表头,说明每列数据的含义
第一列表示表情标签
第二列即为图片数据,是原始的图片数据
第三列为用途。
"""
# 循环遍历输入文件中的每一行
for row in f:
# 从每一行中提取第一列label标签、第二列image图像、第三列usage用途
(label, image, usage) = row.strip().split(",")
label = int(label)
# 默认情况下,我们假设将FER13视为7类分类问题;但是,如果我们希望将愤怒和厌恶融合在一起分类,我们需要将厌恶标签从1更改为0。
# 如果我们忽略“令人厌恶”的类别,那么总共会有6个类别标签,而不是7个
if config.NUM_CLASSES == 6:
# 合并“愤怒/生气”和“厌恶”
if label == 1:
label = 0
#如果label的值大于零,请从中减去1以使所有标签顺序化(不是必需的,但在解释结果时会有所帮助)
if label > 0:
#应从每个标签中减去1,以确保每个类标签为连续的,不需要此减法,但在解释我们的结果时会有所帮助。
label -= 1
#图像只是一串整数。我们需要把这个字符串,分成一个列表,将其转换为无符号的8位整数数据类型,并将其整形为48×48灰度图像:
#请记住,每个图像列都是2304个整数的列表。这2304个整数代表正方形48×48图像。
image = np.array(image.split(" "), dtype="uint8")
# 将展平的像素列表重塑为48x48(灰度)图像
image = image.reshape((48, 48))
# 检查我们是否正在检查训练图像:第三列usage用途为 Training
if usage == "Training":
trainImages.append(image)
trainLabels.append(label)
# 检查这是否是验证图像:第三列usage用途为 PrivateTest
elif usage == "PrivateTest":
valImages.append(image)
valLabels.append(label)
# 否则,这必须是测试图像:第三列usage用途为 PublicTest
else:
testImages.append(image)
testLabels.append(label)
# 初始化数据集列表。列表中的每个条目都是原始的3元组图像,标签和输出HDF5路径。最后一步是遍历每个训练,验证和测试集:
# 构造一个列表,将训练,验证和测试图像及其对应的标签配对,并输出HDF5文件
datasets = [
(trainImages, trainLabels, config.TRAIN_HDF5),
(valImages, valLabels, config.VAL_HDF5),
(testImages, testLabels, config.TEST_HDF5)
]
# 遍历数据集元组。
# 实例化HDF5DatasetWrite,然后将图像和标签以HDF5格式写入磁盘。
for (images, labels, outputPath) in datasets:
# 创建HDF5编写器
print("[INFO] building {}...".format(outputPath))
#写入的为 48x48的len(images)张数量的(灰度)图像
writer = HDF5DatasetWriter((len(images), 48, 48), outputPath)
# 循环遍历图像并将其添加到数据集中
for (image, label) in zip(images, labels):
writer.add([image], [label])
# 关闭HDF5写入器
writer.close()
# 关闭输入文件
f.close()
from os import path
# 定义情感数据集的基本路径。输入数据集的路径。
BASE_PATH = "../datasets/fer2013/"
# 定义基本路径使用基本路径定义输入情绪文件到情绪数据集的路径
INPUT_PATH = path.sep.join([BASE_PATH, "fer2013/fer2013.csv"])
"""
FER13一共有七个类别:生气,厌恶,恐惧,快乐,悲伤,惊奇和中性。
但是,“厌恶”与其他表情之间存在严重失衡,因为它只有113张图片样本(每个类其余的样本超过1,000张)。
建议将“厌恶”和“愤怒”合并为一个类(因为情绪在视觉上是相似的),因此将FER13变成了6类问题。
"""
# 定义类的数量(如果您忽略“令人厌恶的”类,则设置为6)
# NUM_CLASSES = 7
NUM_CLASSES = 6
"""由于我们会将fer2013.csv文件转换为一系列用于训练,验证和测试的HDF5数据集,我们需要定义这些输出HDF5文件的路径"""
# 定义输出训练,验证和测试HDF5文件的路径。输出HDF5文件。
TRAIN_HDF5 = path.sep.join([BASE_PATH, "hdf5/train.hdf5"])
VAL_HDF5 = path.sep.join([BASE_PATH, "hdf5/val.hdf5"])
TEST_HDF5 = path.sep.join([BASE_PATH, "hdf5/test.hdf5"])
# 定义批量大小
BATCH_SIZE = 128
# 定义存储输出日志的路径
# OUTPUT_PATH = path.sep.join([BASE_PATH, "./output"])
OUTPUT_PATH = "./output"
# 命令行参数:
# python train_recognizer.py --checkpoints fer2013/checkpoints
# python train_recognizer.py --checkpoints fer2013/checkpoints --model fer2013/checkpoints/epoch_20.hdf5 --start-epoch 20
# 训练CNN以识别各种情绪
import matplotlib
#设置matplotlib后端,以便可以将图形保存在后台
matplotlib.use("Agg")
from config import emotion_config as config
from pyimage.preprocessing import ImageToArrayPreprocessor
from pyimage.callbacks import EpochCheckpoint
from pyimage.callbacks import TrainingMonitor
from pyimage.io import HDF5DatasetGenerator
from pyimage.nn.conv import EmotionVGGNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import load_model
import tensorflow.keras.backend as K
import argparse
import os
# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
#输出checkpoint目录的路径
ap.add_argument("-c", "--checkpoints", required=False, default="./checkpoints", help="path to output checkpoint directory")
#要加载的特定模型checkpoint的路径
ap.add_argument("-m", "--model", type=str, required=False, default="./checkpoints/epoch_75.hdf5", help="path to *specific* model checkpoint to load")
#在以下时间重新开始训练
ap.add_argument("-s", "--start-epoch", type=int, default=0, required=False, help="epoch to restart training at")
args = vars(ap.parse_args())
"""
ImageDataGenerator()
keras.preprocessing.image模块中的图片生成器,同时也可以在batch中对数据进行增强,扩充数据集大小,增强模型的泛化能力。
比如进行旋转,变形,归一化等等。
rotation_range(): 旋转范围
width_shift_range(): 水平平移范围
height_shift_range(): 垂直平移范围
zoom_range(): 缩放范围
fill_mode: 填充模式, constant, nearest, reflect
horizontal_flip(): 水平反转
vertical_flip(): 垂直翻转
将在训练集中应用“数据增强”来帮助减少过度拟合和提高模型的分类精度,并且将“数据增强”应用于验证集。
valAug = ImageDataGenerator(rescale=1 / 255.0)
rescale缩放属性(也是训练数据增强器的一部分)。
因为之前的将fer2013.csv文件转换为HDF5数据集。我们把这些图像作为原始的、未归一化的RGB图像,这意味着像素值被允许存在于[0,255]范围内。
然而,通常的做法是:(1)执行平均归一化 (2)缩放像素到一个更狭窄的变化区间。
Keras提供的图像数据生成器类可以自动为我们执行此缩放。
我们只需要设定rescale=1/255.0的缩放比,这样每幅图像都将以此比率为倍数,从而将像素缩小到[0,1]。
"""
# 构造训练和测试图像生成器以进行数据增强,然后初始化图像预处理器
trainAug = ImageDataGenerator(rotation_range=10, zoom_range=0.1, horizontal_flip=True, rescale=1 / 255.0, fill_mode="nearest")
valAug = ImageDataGenerator(rescale=1 / 255.0)
iap = ImageToArrayPreprocessor()
# 初始化训练和验证数据集生成器
trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, config.BATCH_SIZE, aug=trainAug, preprocessors=[iap], classes=config.NUM_CLASSES)
valGen = HDF5DatasetGenerator(config.VAL_HDF5, config.BATCH_SIZE, aug=valAug, preprocessors=[iap], classes=config.NUM_CLASSES)
# 如果磁盘上没有提供特定的checkpoint模型文件,则初始化网络并编译模型
if args["model"] is None:
print("[INFO] compiling model...")
model = EmotionVGGNet.build(width=48, height=48, depth=1, classes=config.NUM_CLASSES)
opt = Adam(lr=1e-3)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# 否则,从磁盘加载 checkpoint模型文件
else:
print("[INFO] loading {}...".format(args["model"]))
model = load_model(args["model"])
# 更新学习率
print("[INFO] old learning rate: {}".format(K.get_value(model.optimizer.lr)))
K.set_value(model.optimizer.lr, 1e-5)
print("[INFO] new learning rate: {}".format(K.get_value(model.optimizer.lr)))
# 构造一组callbacks回调函数
figPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.png"])
jsonPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.json"])
#构建一个callbacks列表,用于将检查点序持久化到磁盘,并在一段时间内记录准确性/损失
callbacks = [
EpochCheckpoint(args["checkpoints"], every=5, startAt=args["start_epoch"]),
TrainingMonitor(figPath, jsonPath=jsonPath, startAt=args["start_epoch"])
]
# 训练网络
model.fit_generator(
trainGen.generator(),
steps_per_epoch=trainGen.numImages // config.BATCH_SIZE,
validation_data=valGen.generator(),
validation_steps=valGen.numImages // config.BATCH_SIZE,
epochs=15,
max_queue_size=10,
callbacks=callbacks,
verbose=1)
# 关闭数据库
trainGen.close()
valGen.close()
# 命令行参数:python test_recognizer.py --model fer2013/checkpoints/epoch_75.hdf5
# 评估CNN的性能
from config import emotion_config as config
from pyimage.preprocessing import ImageToArrayPreprocessor
from pyimage.io import HDF5DatasetGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model
import argparse
# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
#要加载的特定模型checkpoint的路径
ap.add_argument("-m", "--model", type=str, required=False, default="./checkpoints/epoch_75.hdf5", help="path to model checkpoint to load")
args = vars(ap.parse_args())
"""
testAug = ImageDataGenerator(rescale=1 / 255.0)
rescale缩放属性(也是训练数据增强器的一部分)。
因为之前的将fer2013.csv文件转换为HDF5数据集。我们把这些图像作为原始的、未归一化的RGB图像,这意味着像素值被允许存在于[0,255]范围内。
然而,通常的做法是:(1)执行平均归一化 (2)缩放像素到一个更狭窄的变化区间。
Keras提供的图像数据生成器类可以自动为我们执行此缩放。
我们只需要设定rescale=1/255.0的缩放比,这样每幅图像都将以此比率为倍数,从而将像素缩小到[0,1]。
"""
# 初始化测试数据生成器和图像预处理器
testAug = ImageDataGenerator(rescale=1 / 255.0)
iap = ImageToArrayPreprocessor()
# 初始化测试数据集生成器
testGen = HDF5DatasetGenerator(config.TEST_HDF5, config.BATCH_SIZE, aug=testAug, preprocessors=[iap], classes=config.NUM_CLASSES)
# 从磁盘加载checkpoint模型
print("[INFO] loading {}...".format(args["model"]))
model = load_model(args["model"])
# 评估网络
(loss, acc) = model.evaluate_generator(
testGen.generator(),
steps=testGen.numImages // config.BATCH_SIZE,
max_queue_size=10
)
print("[INFO] accuracy: {:.2f}".format(acc * 100)) #accuracy: 65.49
# 关闭测试数据库
testGen.close()
# 命令行参数:python emotion_detector.py --cascade haarcascade_frontalface_default.xml --model output/epoch_75.hdf5
# 1.实时检测面部(如微笑检测器)。
# 2.应用我们的CNN识别最主要的情绪并显示每种情绪的概率分布。
# 最重要的是,该CNN能够在我们的设备上实时运行。
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import cv2
# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--cascade", required=False, default="./haarcascade_frontalface_default.xml", help="path to where the face cascade resides")
ap.add_argument("-m", "--model", required=False, default="./checkpoints/epoch_75.hdf5",help="path to pre-trained emotion detector CNN")
ap.add_argument("-v", "--video", help="path to the (optional) video file")
args = vars(ap.parse_args())
# 加载面部检测器级联
detector = cv2.CascadeClassifier(args["cascade"])
# 加载情感检测CNN
model = load_model(args["model"])
#定义情感标签列表
EMOTIONS = ["angry", "scared", "happy", "sad", "surprised", "neutral"]
# 如果未提供视频路径,请获取对网络摄像头的引用
if not args.get("video", False):
# 获取摄像头视频
camera = cv2.VideoCapture(0)
# 否则,加载视频
else:
# 加载本地视频
camera = cv2.VideoCapture(args["video"])
# 循环读取每一帧
while True:
# 抓取当前帧
(grabbed, frame) = camera.read()
# 如果我们正在观看视频,但没有抓取框架,则说明视频已到达结尾
if args.get("video") and not grabbed:
break
# resize设置width=300时,可以无需同时设置height,因为height会自动根据width所设置的值按照原图的宽高比例进行自适应地缩放调整到合适的值
# 调整框架大小
frame = imutils.resize(frame, width=300)
# 转换为灰度
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# 初始化用于可视化的画布,然后拷贝帧,以便我们可以在其上绘制
canvas = np.zeros((220, 300, 3), dtype="uint8")
frameClone = frame.copy()
"""
def detectMultiScale(self, image, scaleFactor=None, minNeighbors=None, flags=None, minSize=None, maxSize=None):
image:待检测图片,一般为灰度图像加快检测速度;
scaleFactor:表示在前后两次相继的扫描中,搜索窗口的比例系数。默认为1.1即每次搜索窗口依次扩大10%;
minNeighbors:
表示构成检测目标的相邻矩形的最小个数(默认为3个)。
如果组成检测目标的小矩形的个数和小于 min_neighbors - 1 都会被排除。
如果min_neighbors 为 0, 则函数不做任何操作就返回所有的被检候选矩形框,
这种设定值一般用在用户自定义对检测结果的组合程序上;
flags:
要么使用默认值,要么使用CV_HAAR_DO_CANNY_PRUNING,如果设置为
CV_HAAR_DO_CANNY_PRUNING,那么函数将会使用Canny边缘检测来排除边缘过多或过少的区域,
因此这些区域通常不会是人脸所在区域;
minSize和maxSize用来限制得到的目标区域的范围。
detector.detectMultiScale 返回每张人脸的(x,y,w,h)
如果需要在脸部周围绘制边界框:img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),5)
(x,y)即 (startX, startY)
(x+w,y+h)即 (endX, endY)
"""
# 在输入帧中检测人脸,然后克隆该框,以便我们可以在其上绘制
rects = detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
# 确保继续之前找到至少一张脸
# if len(rects) > 0:
#遍历每个检测到的人脸
for rect in rects:
"""
rects:包含每张人脸的(x,y,w,h)
计算脸部面积:(x[2] - x[0]) * (x[3] - x[1]) 即 (w - x) * (h - y)
"""
# 假如帧画面有多个人的话,那么该方式从帧画面中只取出其中一个人脸
# 确定最大的脸部面积
# rect = sorted(rects, reverse=True, key=lambda x: (x[2] - x[0]) * (x[3] - x[1]))[0]
(fX, fY, fW, fH) = rect
# 从图像中提取面部ROI,然后为网络进行预处理
roi = gray[fY:fY + fH, fX:fX + fW]
roi = cv2.resize(roi, (48, 48))
roi = roi.astype("float") / 255.0
roi = img_to_array(roi)
roi = np.expand_dims(roi, axis=0)
# 做出预测,然后查找标签
preds = model.predict(roi)[0]
label = EMOTIONS[preds.argmax()]
# 遍历标签和概率并绘制它们
for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)):
# 构造标签文本
text = "{}: {:.2f}%".format(emotion, prob * 100)
# 在画布上绘制标签+概率栏
w = int(prob * 300)
cv2.rectangle(canvas, (5, (i * 35) + 5), (w, (i * 35) + 35), (0, 0, 255), -1)
cv2.putText(canvas, text, (10, (i * 35) + 23), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 255, 255), 2)
# 在框架上画标签
cv2.putText(frameClone, label, (fX, fY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH), (0, 0, 255), 2)
# 显示我们的分类+概率
cv2.imshow("Face", frameClone)
cv2.imshow("Probabilities", canvas)
# 如果按下“ q”键,则停止循环
if cv2.waitKey(1) & 0xFF == ord("q"):
break
# 清理相机并关闭所有打开的窗口
camera.release()
cv2.destroyAllWindows()
pyimage
pyimage/callbacks/epochcheckpoint.py
# import the necessary packages
from tensorflow.keras.callbacks import Callback
import os
class EpochCheckpoint(Callback):
def __init__(self, outputPath, every=5, startAt=0):
# call the parent constructor
super(Callback, self).__init__()
# store the base output path for the model, the number of
# epochs that must pass before the model is serialized to
# disk and the current epoch value
self.outputPath = outputPath
self.every = every
self.intEpoch = startAt
def on_epoch_end(self, epoch, logs={}):
# check to see if the model should be serialized to disk
if (self.intEpoch + 1) % self.every == 0:
p = os.path.sep.join([self.outputPath,
"epoch_{}.hdf5".format(self.intEpoch + 1)])
self.model.save(p, overwrite=True)
# increment the internal epoch counter
self.intEpoch += 1
pyimage/callbacks/trainingmonitor.py
# import the necessary packages
from tensorflow.keras.callbacks import BaseLogger
import matplotlib.pyplot as plt
import numpy as np
import json
import os
class TrainingMonitor(BaseLogger):
def __init__(self, figPath, jsonPath=None, startAt=0):
# store the output path for the figure, the path to the JSON
# serialized file, and the starting epoch
super(TrainingMonitor, self).__init__()
self.figPath = figPath
self.jsonPath = jsonPath
self.startAt = startAt
def on_train_begin(self, logs={}):
# initialize the history dictionary
self.H = {}
# if the JSON history path exists, load the training history
if self.jsonPath is not None:
if os.path.exists(self.jsonPath):
self.H = json.loads(open(self.jsonPath).read())
# check to see if a starting epoch was supplied
if self.startAt > 0:
# loop over the entries in the history log and
# trim any entries that are past the starting
# epoch
for k in self.H.keys():
self.H[k] = self.H[k][:self.startAt]
def on_epoch_end(self, epoch, logs={}):
# loop over the logs and update the loss, accuracy, etc.
# for the entire training process
for (k, v) in logs.items():
l = self.H.get(k, [])
l.append(float(v))
self.H[k] = l
# check to see if the training history should be serialized
# to file
if self.jsonPath is not None:
f = open(self.jsonPath, "w")
f.write(json.dumps(self.H))
f.close()
# ensure at least two epochs have passed before plotting
# (epoch starts at zero)
if len(self.H["loss"]) > 1:
# plot the training loss and accuracy
N = np.arange(0, len(self.H["loss"]))
plt.style.use("ggplot")
plt.figure()
plt.plot(N, self.H["loss"], label="train_loss")
plt.plot(N, self.H["val_loss"], label="val_loss")
plt.plot(N, self.H["accuracy"], label="train_acc")
plt.plot(N, self.H["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy [Epoch {}]".format(
len(self.H["loss"])))
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
# save the figure
plt.savefig(self.figPath)
plt.close()
pyimage/datasets/simpledatasetloader.py
# import the necessary packages
import numpy as np
import cv2
import os
class SimpleDatasetLoader:
def __init__(self, preprocessors=None):
# store the image preprocessor
self.preprocessors = preprocessors
# if the preprocessors are None, initialize them as an
# empty list
if self.preprocessors is None:
self.preprocessors = []
def load(self, imagePaths, verbose=-1):
# initialize the list of features and labels
data = []
labels = []
# loop over the input images
for (i, imagePath) in enumerate(imagePaths):
# load the image and extract the class label assuming
# that our path has the following format:
# /path/to/dataset/{class}/{image}.jpg
image = cv2.imread(imagePath)
label = imagePath.split(os.path.sep)[-2]
# check to see if our preprocessors are not None
if self.preprocessors is not None:
# loop over the preprocessors and apply each to
# the image
for p in self.preprocessors:
image = p.preprocess(image)
# treat our processed image as a "feature vector"
# by updating the data list followed by the labels
data.append(image)
labels.append(label)
# show an update every `verbose` images
if verbose > 0 and i > 0 and (i + 1) % verbose == 0:
print("[INFO] processed {}/{}".format(i + 1,
len(imagePaths)))
# return a tuple of the data and labels
return (np.array(data), np.array(labels))
pyimage/io/hdf5datasetgenerator.py
# import the necessary packages
from tensorflow.keras.utils import to_categorical
import numpy as np
import h5py
class HDF5DatasetGenerator:
def __init__(self, dbPath, batchSize, preprocessors=None,
aug=None, binarize=True, classes=2):
# store the batch size, preprocessors, and data augmentor,
# whether or not the labels should be binarized, along with
# the total number of classes
self.batchSize = batchSize
self.preprocessors = preprocessors
self.aug = aug
self.binarize = binarize
self.classes = classes
# open the HDF5 database for reading and determine the total
# number of entries in the database
self.db = h5py.File(dbPath, "r")
self.numImages = self.db["labels"].shape[0]
def generator(self, passes=np.inf):
# initialize the epoch count
epochs = 0
# keep looping infinitely -- the model will stop once we have
# reach the desired number of epochs
while epochs < passes:
# loop over the HDF5 dataset
for i in np.arange(0, self.numImages, self.batchSize):
# extract the images and labels from the HDF dataset
images = self.db["images"][i: i + self.batchSize]
labels = self.db["labels"][i: i + self.batchSize]
# check to see if the labels should be binarized
if self.binarize:
labels = to_categorical(labels,
self.classes)
# check to see if our preprocessors are not None
if self.preprocessors is not None:
# initialize the list of processed images
procImages = []
# loop over the images
for image in images:
# loop over the preprocessors and apply each
# to the image
for p in self.preprocessors:
image = p.preprocess(image)
# update the list of processed images
procImages.append(image)
# update the images array to be the processed
# images
images = np.array(procImages)
# if the data augmenator exists, apply it
if self.aug is not None:
(images, labels) = next(self.aug.flow(images,
labels, batch_size=self.batchSize))
# yield a tuple of images and labels
yield (images, labels)
# increment the total number of epochs
epochs += 1
def close(self):
# close the database
self.db.close()
pyimage/io/hdf5datasetwriter.py
# import the necessary packages
import h5py
import os
class HDF5DatasetWriter:
def __init__(self, dims, outputPath, dataKey="images",
bufSize=1000):
# check to see if the output path exists, and if so, raise
# an exception
if os.path.exists(outputPath):
raise ValueError("The supplied `outputPath` already "
"exists and cannot be overwritten. Manually delete "
"the file before continuing.", outputPath)
# open the HDF5 database for writing and create two datasets:
# one to store the images/features and another to store the
# class labels
self.db = h5py.File(outputPath, "w")
self.data = self.db.create_dataset(dataKey, dims,
dtype="float")
self.labels = self.db.create_dataset("labels", (dims[0],),
dtype="int")
# store the buffer size, then initialize the buffer itself
# along with the index into the datasets
self.bufSize = bufSize
self.buffer = {"data": [], "labels": []}
self.idx = 0
def add(self, rows, labels):
# add the rows and labels to the buffer
self.buffer["data"].extend(rows)
self.buffer["labels"].extend(labels)
# check to see if the buffer needs to be flushed to disk
if len(self.buffer["data"]) >= self.bufSize:
self.flush()
def flush(self):
# write the buffers to disk then reset the buffer
i = self.idx + len(self.buffer["data"])
self.data[self.idx:i] = self.buffer["data"]
self.labels[self.idx:i] = self.buffer["labels"]
self.idx = i
self.buffer = {"data": [], "labels": []}
def storeClassLabels(self, classLabels):
# create a dataset to store the actual class label names,
# then store the class labels
dt = h5py.special_dtype(vlen=str) # `vlen=unicode` for Py2.7
labelSet = self.db.create_dataset("label_names",
(len(classLabels),), dtype=dt)
labelSet[:] = classLabels
def close(self):
# check to see if there are any other entries in the buffer
# that need to be flushed to disk
if len(self.buffer["data"]) > 0:
self.flush()
# close the dataset
self.db.close()
pyimage/nn/conv/alexnet.py
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
class AlexNet:
@staticmethod
def build(width, height, depth, classes, reg=0.0002):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# Block #1: first CONV => RELU => POOL layer set
model.add(Conv2D(96, (11, 11), strides=(4, 4),
input_shape=inputShape, padding="same",
kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.25))
# Block #2: second CONV => RELU => POOL layer set
model.add(Conv2D(256, (5, 5), padding="same",
kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.25))
# Block #3: CONV => RELU => CONV => RELU => CONV => RELU
model.add(Conv2D(384, (3, 3), padding="same",
kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(384, (3, 3), padding="same",
kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(256, (3, 3), padding="same",
kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.25))
# Block #4: first set of FC => RELU layers
model.add(Flatten())
model.add(Dense(4096, kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# Block #5: second set of FC => RELU layers
model.add(Dense(4096, kernel_regularizer=l2(reg)))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes, kernel_regularizer=l2(reg)))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
pyimage/nn/conv/deepergooglenet.py
# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
class DeeperGoogLeNet:
@staticmethod
def conv_module(x, K, kX, kY, stride, chanDim,
padding="same", reg=0.0005, name=None):
# initialize the CONV, BN, and RELU layer names
(convName, bnName, actName) = (None, None, None)
# if a layer name was supplied, prepend it
if name is not None:
convName = name + "_conv"
bnName = name + "_bn"
actName = name + "_act"
# define a CONV => BN => RELU pattern
x = Conv2D(K, (kX, kY), strides=stride, padding=padding,
kernel_regularizer=l2(reg), name=convName)(x)
x = BatchNormalization(axis=chanDim, name=bnName)(x)
x = Activation("relu", name=actName)(x)
# return the block
return x
@staticmethod
def inception_module(x, num1x1, num3x3Reduce, num3x3,
num5x5Reduce, num5x5, num1x1Proj, chanDim, stage,
reg=0.0005):
# define the first branch of the Inception module which
# consists of 1x1 convolutions
first = DeeperGoogLeNet.conv_module(x, num1x1, 1, 1,
(1, 1), chanDim, reg=reg, name=stage + "_first")
# define the second branch of the Inception module which
# consists of 1x1 and 3x3 convolutions
second = DeeperGoogLeNet.conv_module(x, num3x3Reduce, 1, 1,
(1, 1), chanDim, reg=reg, name=stage + "_second1")
second = DeeperGoogLeNet.conv_module(second, num3x3, 3, 3,
(1, 1), chanDim, reg=reg, name=stage + "_second2")
# define the third branch of the Inception module which
# are our 1x1 and 5x5 convolutions
third = DeeperGoogLeNet.conv_module(x, num5x5Reduce, 1, 1,
(1, 1), chanDim, reg=reg, name=stage + "_third1")
third = DeeperGoogLeNet.conv_module(third, num5x5, 5, 5,
(1, 1), chanDim, reg=reg, name=stage + "_third2")
# define the fourth branch of the Inception module which
# is the POOL projection
fourth = MaxPooling2D((3, 3), strides=(1, 1),
padding="same", name=stage + "_pool")(x)
fourth = DeeperGoogLeNet.conv_module(fourth, num1x1Proj,
1, 1, (1, 1), chanDim, reg=reg, name=stage + "_fourth")
# concatenate across the channel dimension
x = concatenate([first, second, third, fourth], axis=chanDim,
name=stage + "_mixed")
# return the block
return x
@staticmethod
def build(width, height, depth, classes, reg=0.0005):
# initialize the input shape to be "channels last" and the
# channels dimension itself
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# define the model input, followed by a sequence of CONV =>
# POOL => (CONV * 2) => POOL layers
inputs = Input(shape=inputShape)
x = DeeperGoogLeNet.conv_module(inputs, 64, 5, 5, (1, 1),
chanDim, reg=reg, name="block1")
x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
name="pool1")(x)
x = DeeperGoogLeNet.conv_module(x, 64, 1, 1, (1, 1),
chanDim, reg=reg, name="block2")
x = DeeperGoogLeNet.conv_module(x, 192, 3, 3, (1, 1),
chanDim, reg=reg, name="block3")
x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
name="pool2")(x)
# apply two Inception modules followed by a POOL
x = DeeperGoogLeNet.inception_module(x, 64, 96, 128, 16,
32, 32, chanDim, "3a", reg=reg)
x = DeeperGoogLeNet.inception_module(x, 128, 128, 192, 32,
96, 64, chanDim, "3b", reg=reg)
x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
name="pool3")(x)
# apply five Inception modules followed by POOL
x = DeeperGoogLeNet.inception_module(x, 192, 96, 208, 16,
48, 64, chanDim, "4a", reg=reg)
x = DeeperGoogLeNet.inception_module(x, 160, 112, 224, 24,
64, 64, chanDim, "4b", reg=reg)
x = DeeperGoogLeNet.inception_module(x, 128, 128, 256, 24,
64, 64, chanDim, "4c", reg=reg)
x = DeeperGoogLeNet.inception_module(x, 112, 144, 288, 32,
64, 64, chanDim, "4d", reg=reg)
x = DeeperGoogLeNet.inception_module(x, 256, 160, 320, 32,
128, 128, chanDim, "4e", reg=reg)
x = MaxPooling2D((3, 3), strides=(2, 2), padding="same",
name="pool4")(x)
# apply a POOL layer (average) followed by dropout
x = AveragePooling2D((4, 4), name="pool5")(x)
x = Dropout(0.4, name="do")(x)
# softmax classifier
x = Flatten(name="flatten")(x)
x = Dense(classes, kernel_regularizer=l2(reg),
name="labels")(x)
x = Activation("softmax", name="softmax")(x)
# create the model
model = Model(inputs, x, name="googlenet")
# return the constructed network architecture
return model
pyimage/nn/conv/emotionvggnet.py
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import ELU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
"""
我们将要实现的用于识别各种情绪和面部表情的网络是受VGG网络家族的启发:
1.网络中的CONV层将仅为3×3。
2.随着网络的加深,我们会将每个CONV层学习的过滤器数量增加一倍。
为了帮助网络训练,我们将应用从VGG和ImageNet实验获得的一些先验知识:
1.我们使用MSRA (He等人)的方法初始化CONV和FC层,这样做将使我们的网络学习更快。
Dense(64, kernel_initializer="he_normal")
Conv2D(32, (3, 3), padding="same", kernel_initializer="he_normal")
2.由于已证明ELU和PReLU可以提高所有分类的分类准确性,在我们的实验中,我们仅以ELU而非ReLU开始。
ELU()
3.表中包含了名为EmotionVGGNet的网络摘要。每次CONV层之后,我们将应用激活,然后进行批量归一化(将这些层排除在表外以节省空间)。
"""
class EmotionVGGNet:
@staticmethod
def build(width, height, depth, classes):
# 初始化模型以及输入形状为“通道最后channels last”和通道尺寸本身
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
#如果我们使用“通道优先channels first”,请更新输入形状和通道尺寸
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
"""
EmotionVGGNet中的第一个块Block
第一CONV层将学习32个3×3卷积核。然后,我们将应用ELU激活通过批量归一化。
同样,第二个CONV层应用相同的模式,学习32个3×3卷积核,然后进行ELU和批量归一化。
然后应用最大池化,然后Dropout层的概率为25%。
"""
# Block #1: first CONV => RELU => CONV => RELU => POOL
# layer set
model.add(Conv2D(32, (3, 3), padding="same", kernel_initializer="he_normal", input_shape=inputShape))
model.add(ELU())
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32, (3, 3), kernel_initializer="he_normal", padding="same"))
model.add(ELU())
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
""" EmotionVGGNet中的第二个块Block与第一个块Block相同,只是现在将CONV层中的卷积核数为64,而不是32 """
# Block #2: second CONV => RELU => CONV => RELU => POOL
# layer set
model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding="same"))
model.add(ELU())
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding="same"))
model.add(ELU())
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
"""
EmotionVGGNet中的第三个块Block 再次应用相同的模式,增加了卷积核从64到128 ,随着CNN的深入,
我们需要学习的特征越多,需要的卷积核也越多:
"""
# Block #3: third CONV => RELU => CONV => RELU => POOL
# layer set
model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding="same"))
model.add(ELU())
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding="same"))
model.add(ELU())
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
"""
接下来,我们需要构造第一个完全连接层,在这里,学习了64个隐藏节点,然后应用ELU激活功能并进行批量正则化。
后续将以相同的方式应用第二个FC层
"""
# Block #4: first set of FC => RELU layers
model.add(Flatten())
model.add(Dense(64, kernel_initializer="he_normal"))
model.add(ELU())
model.add(BatchNormalization())
model.add(Dropout(0.5))
# Block #6: second set of FC => RELU layers
model.add(Dense(64, kernel_initializer="he_normal"))
model.add(ELU())
model.add(BatchNormalization())
model.add(Dropout(0.5))
""" 最后,我们将在FC层中应用提供的类数以及softmax分类器以获取我们的输出类标签概率 """
# Block #7: softmax classifier
model.add(Dense(classes, kernel_initializer="he_normal"))
model.add(Activation("softmax"))
# 返回构建的网络架构
return model
pyimage/nn/conv/fcheadnet.py
# import the necessary packages
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
class FCHeadNet:
@staticmethod
def build(baseModel, classes, D):
# initialize the head model that will be placed on top of
# the base, then add a FC layer
headModel = baseModel.output
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(D, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
# add a softmax layer
headModel = Dense(classes, activation="softmax")(headModel)
# return the model
return headModel
pyimage/nn/conv/lenet.py
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
class LeNet:
@staticmethod
def build(width, height, depth, classes):
# initialize the model
model = Sequential()
inputShape = (height, width, depth)
# if we are using "channels first", update the input shape
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
# first set of CONV => RELU => POOL layers
model.add(Conv2D(20, (5, 5), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# second set of CONV => RELU => POOL layers
model.add(Conv2D(50, (5, 5), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
pyimage/nn/conv/minigooglenet.py
# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate
from tensorflow.keras import backend as K
class MiniGoogLeNet:
@staticmethod
def conv_module(x, K, kX, kY, stride, chanDim, padding="same"):
# define a CONV => BN => RELU pattern
x = Conv2D(K, (kX, kY), strides=stride, padding=padding)(x)
x = BatchNormalization(axis=chanDim)(x)
x = Activation("relu")(x)
# return the block
return x
@staticmethod
def inception_module(x, numK1x1, numK3x3, chanDim):
# define two CONV modules, then concatenate across the
# channel dimension
conv_1x1 = MiniGoogLeNet.conv_module(x, numK1x1, 1, 1,
(1, 1), chanDim)
conv_3x3 = MiniGoogLeNet.conv_module(x, numK3x3, 3, 3,
(1, 1), chanDim)
x = concatenate([conv_1x1, conv_3x3], axis=chanDim)
# return the block
return x
@staticmethod
def downsample_module(x, K, chanDim):
# define the CONV module and POOL, then concatenate
# across the channel dimensions
conv_3x3 = MiniGoogLeNet.conv_module(x, K, 3, 3, (2, 2),
chanDim, padding="valid")
pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = concatenate([conv_3x3, pool], axis=chanDim)
# return the block
return x
@staticmethod
def build(width, height, depth, classes):
# initialize the input shape to be "channels last" and the
# channels dimension itself
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# define the model input and first CONV module
inputs = Input(shape=inputShape)
x = MiniGoogLeNet.conv_module(inputs, 96, 3, 3, (1, 1),
chanDim)
# two Inception modules followed by a downsample module
x = MiniGoogLeNet.inception_module(x, 32, 32, chanDim)
x = MiniGoogLeNet.inception_module(x, 32, 48, chanDim)
x = MiniGoogLeNet.downsample_module(x, 80, chanDim)
# four Inception modules followed by a downsample module
x = MiniGoogLeNet.inception_module(x, 112, 48, chanDim)
x = MiniGoogLeNet.inception_module(x, 96, 64, chanDim)
x = MiniGoogLeNet.inception_module(x, 80, 80, chanDim)
x = MiniGoogLeNet.inception_module(x, 48, 96, chanDim)
x = MiniGoogLeNet.downsample_module(x, 96, chanDim)
# two Inception modules followed by global POOL and dropout
x = MiniGoogLeNet.inception_module(x, 176, 160, chanDim)
x = MiniGoogLeNet.inception_module(x, 176, 160, chanDim)
x = AveragePooling2D((7, 7))(x)
x = Dropout(0.5)(x)
# softmax classifier
x = Flatten()(x)
x = Dense(classes)(x)
x = Activation("softmax")(x)
# create the model
model = Model(inputs, x, name="googlenet")
# return the constructed network architecture
return model
pyimage/nn/conv/minivggnet.py
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
class MiniVGGNet:
@staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# first CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# second CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
pyimage/nn/conv/resnet.py
# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import ZeroPadding2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import add
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
class ResNet:
@staticmethod
def residual_module(data, K, stride, chanDim, red=False,
reg=0.0001, bnEps=2e-5, bnMom=0.9):
# the shortcut branch of the ResNet module should be
# initialize as the input (identity) data
shortcut = data
# the first block of the ResNet module are the 1x1 CONVs
bn1 = BatchNormalization(axis=chanDim, epsilon=bnEps,
momentum=bnMom)(data)
act1 = Activation("relu")(bn1)
conv1 = Conv2D(int(K * 0.25), (1, 1), use_bias=False,
kernel_regularizer=l2(reg))(act1)
# the second block of the ResNet module are the 3x3 CONVs
bn2 = BatchNormalization(axis=chanDim, epsilon=bnEps,
momentum=bnMom)(conv1)
act2 = Activation("relu")(bn2)
conv2 = Conv2D(int(K * 0.25), (3, 3), strides=stride,
padding="same", use_bias=False,
kernel_regularizer=l2(reg))(act2)
# the third block of the ResNet module is another set of 1x1
# CONVs
bn3 = BatchNormalization(axis=chanDim, epsilon=bnEps,
momentum=bnMom)(conv2)
act3 = Activation("relu")(bn3)
conv3 = Conv2D(K, (1, 1), use_bias=False,
kernel_regularizer=l2(reg))(act3)
# if we are to reduce the spatial size, apply a CONV layer to
# the shortcut
if red:
shortcut = Conv2D(K, (1, 1), strides=stride,
use_bias=False, kernel_regularizer=l2(reg))(act1)
# add together the shortcut and the final CONV
x = add([conv3, shortcut])
# return the addition as the output of the ResNet module
return x
@staticmethod
def build(width, height, depth, classes, stages, filters,
reg=0.0001, bnEps=2e-5, bnMom=0.9, dataset="cifar"):
# initialize the input shape to be "channels last" and the
# channels dimension itself
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# set the input and apply BN
inputs = Input(shape=inputShape)
x = BatchNormalization(axis=chanDim, epsilon=bnEps,
momentum=bnMom)(inputs)
# check if we are utilizing the CIFAR dataset
if dataset == "cifar":
# apply a single CONV layer
x = Conv2D(filters[0], (3, 3), use_bias=False,
padding="same", kernel_regularizer=l2(reg))(x)
# check to see if we are using the Tiny ImageNet dataset
elif dataset == "tiny_imagenet":
# apply CONV => BN => ACT => POOL to reduce spatial size
x = Conv2D(filters[0], (5, 5), use_bias=False,
padding="same", kernel_regularizer=l2(reg))(x)
x = BatchNormalization(axis=chanDim, epsilon=bnEps,
momentum=bnMom)(x)
x = Activation("relu")(x)
x = ZeroPadding2D((1, 1))(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
# loop over the number of stages
for i in range(0, len(stages)):
# initialize the stride, then apply a residual module
# used to reduce the spatial size of the input volume
stride = (1, 1) if i == 0 else (2, 2)
x = ResNet.residual_module(x, filters[i + 1], stride,
chanDim, red=True, bnEps=bnEps, bnMom=bnMom)
# loop over the number of layers in the stage
for j in range(0, stages[i] - 1):
# apply a ResNet module
x = ResNet.residual_module(x, filters[i + 1],
(1, 1), chanDim, bnEps=bnEps, bnMom=bnMom)
# apply BN => ACT => POOL
x = BatchNormalization(axis=chanDim, epsilon=bnEps,
momentum=bnMom)(x)
x = Activation("relu")(x)
x = AveragePooling2D((8, 8))(x)
# softmax classifier
x = Flatten()(x)
x = Dense(classes, kernel_regularizer=l2(reg))(x)
x = Activation("softmax")(x)
# create the model
model = Model(inputs, x, name="resnet")
# return the constructed network architecture
return model
pyimage/nn/conv/shallownet.py
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
class ShallowNet:
@staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last"
model = Sequential()
inputShape = (height, width, depth)
# if we are using "channels first", update the input shape
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
# define the first (and only) CONV => RELU layer
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
# softmax classifier
model.add(Flatten())
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
pyimage/nn/mxconv/mxalexnet.py
# import the necessary packages
import mxnet as mx
class MxAlexNet:
@staticmethod
def build(classes):
# data input
data = mx.sym.Variable("data")
# Block #1: first CONV => RELU => POOL layer set
conv1_1 = mx.sym.Convolution(data=data, kernel=(11, 11),
stride=(4, 4), num_filter=96)
act1_1 = mx.sym.LeakyReLU(data=conv1_1, act_type="elu")
bn1_1 = mx.sym.BatchNorm(data=act1_1)
pool1 = mx.sym.Pooling(data=bn1_1, pool_type="max",
kernel=(3, 3), stride=(2, 2))
do1 = mx.sym.Dropout(data=pool1, p=0.25)
# Block #2: second CONV => RELU => POOL layer set
conv2_1 = mx.sym.Convolution(data=do1, kernel=(5, 5),
pad=(2, 2), num_filter=256)
act2_1 = mx.sym.LeakyReLU(data=conv2_1, act_type="elu")
bn2_1 = mx.sym.BatchNorm(data=act2_1)
pool2 = mx.sym.Pooling(data=bn2_1, pool_type="max",
kernel=(3, 3), stride=(2, 2))
do2 = mx.sym.Dropout(data=pool2, p=0.25)
# Block #3: (CONV => RELU) * 3 => POOL
conv3_1 = mx.sym.Convolution(data=do2, kernel=(3, 3),
pad=(1, 1), num_filter=384)
act3_1 = mx.sym.LeakyReLU(data=conv3_1, act_type="elu")
bn3_1 = mx.sym.BatchNorm(data=act3_1)
conv3_2 = mx.sym.Convolution(data=bn3_1, kernel=(3, 3),
pad=(1, 1), num_filter=384)
act3_2 = mx.sym.LeakyReLU(data=conv3_2, act_type="elu")
bn3_2 = mx.sym.BatchNorm(data=act3_2)
conv3_3 = mx.sym.Convolution(data=bn3_2, kernel=(3, 3),
pad=(1, 1), num_filter=256)
act3_3 = mx.sym.LeakyReLU(data=conv3_3, act_type="elu")
bn3_3 = mx.sym.BatchNorm(data=act3_3)
pool3 = mx.sym.Pooling(data=bn3_3, pool_type="max",
kernel=(3, 3), stride=(2, 2))
do3 = mx.sym.Dropout(data=pool3, p=0.25)
# Block #4: first set of FC => RELU layers
flatten = mx.sym.Flatten(data=do3)
fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=4096)
act4_1 = mx.sym.LeakyReLU(data=fc1, act_type="elu")
bn4_1 = mx.sym.BatchNorm(data=act4_1)
do4 = mx.sym.Dropout(data=bn4_1, p=0.5)
# Block #5: second set of FC => RELU layers
fc2 = mx.sym.FullyConnected(data=do4, num_hidden=4096)
act5_1 = mx.sym.LeakyReLU(data=fc2, act_type="elu")
bn5_1 = mx.sym.BatchNorm(data=act5_1)
do5 = mx.sym.Dropout(data=bn5_1, p=0.5)
# softmax classifier
fc3 = mx.sym.FullyConnected(data=do5, num_hidden=classes)
model = mx.sym.SoftmaxOutput(data=fc3, name="softmax")
# return the network architecture
return model
pyimage/nn/mxconv/mxgooglenet.py
# import the necessary packages
import mxnet as mx
class MxGoogLeNet:
@staticmethod
def conv_module(data, K, kX, kY, pad=(0, 0), stride=(1, 1)):
# define the CONV => BN => RELU pattern
conv = mx.sym.Convolution(data=data, kernel=(kX, kY),
num_filter=K, pad=pad, stride=stride)
bn = mx.sym.BatchNorm(data=conv)
act = mx.sym.Activation(data=bn, act_type="relu")
# return the block
return act
@staticmethod
def inception_module(data, num1x1, num3x3Reduce, num3x3,
num5x5Reduce, num5x5, num1x1Proj):
# the first branch of the Inception module consists of 1x1
# convolutions
conv_1x1 = MxGoogLeNet.conv_module(data, num1x1, 1, 1)
# the second branch of the Inception module is a set of 1x1
# convolutions followed by 3x3 convolutions
conv_r3x3 = MxGoogLeNet.conv_module(data, num3x3Reduce, 1, 1)
conv_3x3 = MxGoogLeNet.conv_module(conv_r3x3, num3x3, 3, 3,
pad=(1, 1))
# the third branch of the Inception module is a set of 1x1
# convolutions followed by 5x5 convolutions
conv_r5x5 = MxGoogLeNet.conv_module(data, num5x5Reduce, 1, 1)
conv_5x5 = MxGoogLeNet.conv_module(conv_r5x5, num5x5, 5, 5,
pad=(2, 2))
# the final branch of the Inception module is the POOL +
# projection layer set
pool = mx.sym.Pooling(data=data, pool_type="max", pad=(1, 1),
kernel=(3, 3), stride=(1, 1))
conv_proj = MxGoogLeNet.conv_module(pool, num1x1Proj, 1, 1)
# concatenate the filters across the channel dimension
concat = mx.sym.Concat(*[conv_1x1, conv_3x3, conv_5x5,
conv_proj])
# return the block
return concat
@staticmethod
def build(classes):
# data input
data = mx.sym.Variable("data")
# Block #1: CONV => POOL => CONV => CONV => POOL
conv1_1 = MxGoogLeNet.conv_module(data, 64, 7, 7,
pad=(3, 3), stride=(2, 2))
pool1 = mx.sym.Pooling(data=conv1_1, pool_type="max",
pad=(1, 1), kernel=(3, 3), stride=(2, 2))
conv1_2 = MxGoogLeNet.conv_module(pool1, 64, 1, 1)
conv1_3 = MxGoogLeNet.conv_module(conv1_2, 192, 3, 3,
pad=(1, 1))
pool2 = mx.sym.Pooling(data=conv1_3, pool_type="max",
pad=(1, 1), kernel=(3, 3), stride=(2, 2))
# Block #3: (INCEP * 2) => POOL
in3a = MxGoogLeNet.inception_module(pool2, 64, 96, 128, 16,
32, 32)
in3b = MxGoogLeNet.inception_module(in3a, 128, 128, 192, 32,
96, 64)
pool3 = mx.sym.Pooling(data=in3b, pool_type="max",
pad=(1, 1), kernel=(3, 3), stride=(2, 2))
# Block #4: (INCEP * 5) => POOL
in4a = MxGoogLeNet.inception_module(pool3, 192, 96, 208, 16,
48, 64)
in4b = MxGoogLeNet.inception_module(in4a, 160, 112, 224, 24,
64, 64)
in4c = MxGoogLeNet.inception_module(in4b, 128, 128, 256, 24,
64, 64)
in4d = MxGoogLeNet.inception_module(in4c, 112, 144, 288, 32,
64, 64)
in4e = MxGoogLeNet.inception_module(in4d, 256, 160, 320, 32,
128, 128,)
pool4 = mx.sym.Pooling(data=in4e, pool_type="max",
pad=(1, 1), kernel=(3, 3), stride=(2, 2))
# Block #5: (INCEP * 2) => POOL => DROPOUT
in5a = MxGoogLeNet.inception_module(pool4, 256, 160, 320, 32,
128, 128)
in5b = MxGoogLeNet.inception_module(in5a, 384, 192, 384, 48,
128, 128)
pool5 = mx.sym.Pooling(data=in5b, pool_type="avg",
kernel=(7, 7), stride=(1, 1))
do = mx.sym.Dropout(data=pool5, p=0.4)
# softmax classifier
flatten = mx.sym.Flatten(data=do)
fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=classes)
model = mx.sym.SoftmaxOutput(data=fc1, name="softmax")
# return the network architecture
return model
if __name__ == "__main__":
# render a visualization of the network
model = MxGoogLeNet.build(1000)
v = mx.viz.plot_network(model, shape={"data": (1, 3, 224, 224)},
node_attrs={"shape": "rect", "fixedsize": "false"})
v.render()
pyimage/nn/mxconv/mxresnet.py
# import the necessary packages
import mxnet as mx
class MxResNet:
# uses "bottleneck" module with pre-activation (He et al. 2016)
@staticmethod
def residual_module(data, K, stride, red=False, bnEps=2e-5,
bnMom=0.9):
# the shortcut branch of the ResNet module should be
# initialized as the input (identity) data
shortcut = data
# the first block of the ResNet module are 1x1 CONVs
bn1 = mx.sym.BatchNorm(data=data, fix_gamma=False,
eps=bnEps, momentum=bnMom)
act1 = mx.sym.Activation(data=bn1, act_type="relu")
conv1 = mx.sym.Convolution(data=act1, pad=(0, 0),
kernel=(1, 1), stride=(1, 1), num_filter=int(K * 0.25),
no_bias=True)
# the second block of the ResNet module are 3x3 CONVs
bn2 = mx.sym.BatchNorm(data=conv1, fix_gamma=False,
eps=bnEps, momentum=bnMom)
act2 = mx.sym.Activation(data=bn2, act_type="relu")
conv2 = mx.sym.Convolution(data=act2, pad=(1, 1),
kernel=(3, 3), stride=stride, num_filter=int(K * 0.25),
no_bias=True)
# the third block of the ResNet module is another set of 1x1
# CONVs
bn3 = mx.sym.BatchNorm(data=conv2, fix_gamma=False,
eps=bnEps, momentum=bnMom)
act3 = mx.sym.Activation(data=bn3, act_type="relu")
conv3 = mx.sym.Convolution(data=act3, pad=(0, 0),
kernel=(1, 1), stride=(1, 1), num_filter=K, no_bias=True)
# if we are to reduce the spatial size, apply a CONV layer
# to the shortcut
if red:
shortcut = mx.sym.Convolution(data=act1, pad=(0, 0),
kernel=(1, 1), stride=stride, num_filter=K,
no_bias=True)
# add together the shortcut and the final CONV
add = conv3 + shortcut
# return the addition as the output of the ResNet module
return add
@staticmethod
def build(classes, stages, filters, bnEps=2e-5, bnMom=0.9):
# data input
data = mx.sym.Variable("data")
# Block #1: BN => CONV => ACT => POOL, then initialize the
# "body" of the network
bn1_1 = mx.sym.BatchNorm(data=data, fix_gamma=True,
eps=bnEps, momentum=bnMom)
conv1_1 = mx.sym.Convolution(data=bn1_1, pad=(3, 3),
kernel=(7, 7), stride=(2, 2), num_filter=filters[0],
no_bias=True)
bn1_2 = mx.sym.BatchNorm(data=conv1_1, fix_gamma=False,
eps=bnEps, momentum=bnMom)
act1_2 = mx.sym.Activation(data=bn1_2, act_type="relu")
pool1 = mx.sym.Pooling(data=act1_2, pool_type="max",
pad=(1, 1), kernel=(3, 3), stride=(2, 2))
body = pool1
# loop over the number of stages
for i in range(0, len(stages)):
# initialize the stride, then apply a residual module
# used to reduce the spatial size of the input volume
stride = (1, 1) if i == 0 else (2, 2)
body = MxResNet.residual_module(body, filters[i + 1],
stride, red=True, bnEps=bnEps, bnMom=bnMom)
# loop over the number of layers in the stage
for j in range(0, stages[i] - 1):
# apply a ResNet module
body = MxResNet.residual_module(body, filters[i + 1],
(1, 1), bnEps=bnEps, bnMom=bnMom)
# apply BN => ACT => POOL
bn2_1 = mx.sym.BatchNorm(data=body, fix_gamma=False,
eps=bnEps, momentum=bnMom)
act2_1 = mx.sym.Activation(data=bn2_1, act_type="relu")
pool2 = mx.sym.Pooling(data=act2_1, pool_type="avg",
global_pool=True, kernel=(7, 7))
# softmax classifier
flatten = mx.sym.Flatten(data=pool2)
fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=classes)
model = mx.sym.SoftmaxOutput(data=fc1, name="softmax")
# return the network architecture
return model
if __name__ == "__main__":
# render a visualization of the network
model = MxResNet.build(1000, (3, 4, 6, 3),
(64, 256, 512, 1024, 2048))
v = mx.viz.plot_network(model, shape={"data": (1, 3, 224, 224)},
node_attrs={"shape": "rect", "fixedsize": "false"})
v.render()
pyimage/nn/mxconv/mxsqueezenet.py
# import the necessary packages
import mxnet as mx
class MxSqueezeNet:
@staticmethod
def squeeze(input, numFilter):
# the first part of a FIRE module consists of a number of 1x1
# filter squeezes on the input data followed by an activation
squeeze_1x1 = mx.sym.Convolution(data=input, kernel=(1, 1),
stride=(1, 1), num_filter=numFilter)
act_1x1 = mx.sym.LeakyReLU(data=squeeze_1x1,
act_type="elu")
# return the activation for the squeeze
return act_1x1
@staticmethod
def fire(input, numSqueezeFilter, numExpandFilter):
# construct the 1x1 squeeze followed by the 1x1 expand
squeeze_1x1 = MxSqueezeNet.squeeze(input, numSqueezeFilter)
expand_1x1 = mx.sym.Convolution(data=squeeze_1x1,
kernel=(1, 1), stride=(1, 1), num_filter=numExpandFilter)
relu_expand_1x1 = mx.sym.LeakyReLU(data=expand_1x1,
act_type="elu")
# construct the 3x3 expand
expand_3x3 = mx.sym.Convolution(data=squeeze_1x1, pad=(1, 1),
kernel=(3, 3), stide=(1, 1), num_filter=numExpandFilter)
relu_expand_3x3 = mx.sym.LeakyReLU(data=expand_3x3,
act_type="elu")
# the output of the FIRE module is the concatenation of the
# activation for the 1x1 and 3x3 expands along the channel
# dimension
output = mx.sym.Concat(relu_expand_1x1, relu_expand_3x3,
dim=1)
# return the output of the FIRE module
return output
@staticmethod
def build(classes):
# data input
data = mx.sym.Variable("data")
# Block #1: CONV => RELU => POOL
conv_1 = mx.sym.Convolution(data=data, kernel=(7, 7),
stride=(2, 2), num_filter=96)
relu_1 = mx.sym.LeakyReLU(data=conv_1, act_type="elu")
pool_1 = mx.sym.Pooling(data=relu_1, kernel=(3, 3),
stride=(2, 2), pool_type="max")
# Block #2-4: (FIRE * 3) => POOL
fire_2 = MxSqueezeNet.fire(pool_1, numSqueezeFilter=16,
numExpandFilter=64)
fire_3 = MxSqueezeNet.fire(fire_2, numSqueezeFilter=16,
numExpandFilter=64)
fire_4 = MxSqueezeNet.fire(fire_3, numSqueezeFilter=32,
numExpandFilter=128)
pool_4 = mx.sym.Pooling(data=fire_4, kernel=(3, 3),
stride=(2, 2), pool_type="max")
# Block #5-8: (FIRE * 4) => POOL
fire_5 = MxSqueezeNet.fire(pool_4, numSqueezeFilter=32,
numExpandFilter=128)
fire_6 = MxSqueezeNet.fire(fire_5, numSqueezeFilter=48,
numExpandFilter=192)
fire_7 = MxSqueezeNet.fire(fire_6, numSqueezeFilter=48,
numExpandFilter=192)
fire_8 = MxSqueezeNet.fire(fire_7, numSqueezeFilter=64,
numExpandFilter=256)
pool_8 = mx.sym.Pooling(data=fire_8, kernel=(3, 3),
stride=(2, 2), pool_type="max")
# Block #9-10: FIRE => DROPOUT => CONV => RELU => POOL
fire_9 = MxSqueezeNet.fire(pool_8, numSqueezeFilter=64,
numExpandFilter=256)
do_9 = mx.sym.Dropout(data=fire_9, p=0.5)
conv_10 = mx.sym.Convolution(data=do_9, num_filter=classes,
kernel=(1, 1), stride=(1, 1))
relu_10 = mx.sym.LeakyReLU(data=conv_10, act_type="elu")
pool_10 = mx.sym.Pooling(data=relu_10, kernel=(13, 13),
pool_type="avg")
# softmax classifier
flatten = mx.sym.Flatten(data=pool_10)
model = mx.sym.SoftmaxOutput(data=flatten, name="softmax")
# return the network architecture
return model
pyimage/nn/mxconv/mxvggnet.py
# import the necessary packages
import mxnet as mx
class MxVGGNet:
@staticmethod
def build(classes):
# data input
data = mx.sym.Variable("data")
# Block #1: (CONV => RELU) * 2 => POOL
conv1_1 = mx.sym.Convolution(data=data, kernel=(3, 3),
pad=(1, 1), num_filter=64, name="conv1_1")
act1_1 = mx.sym.LeakyReLU(data=conv1_1, act_type="prelu",
name="act1_1")
bn1_1 = mx.sym.BatchNorm(data=act1_1, name="bn1_1")
conv1_2 = mx.sym.Convolution(data=bn1_1, kernel=(3, 3),
pad=(1, 1), num_filter=64, name="conv1_2")
act1_2 = mx.sym.LeakyReLU(data=conv1_2, act_type="prelu",
name="act1_2")
bn1_2 = mx.sym.BatchNorm(data=act1_2, name="bn1_2")
pool1 = mx.sym.Pooling(data=bn1_2, pool_type="max",
kernel=(2, 2), stride=(2, 2), name="pool1")
do1 = mx.sym.Dropout(data=pool1, p=0.25)
# Block #2: (CONV => RELU) * 2 => POOL
conv2_1 = mx.sym.Convolution(data=do1, kernel=(3, 3),
pad=(1, 1), num_filter=128, name="conv2_1")
act2_1 = mx.sym.LeakyReLU(data=conv2_1, act_type="prelu",
name="act2_1")
bn2_1 = mx.sym.BatchNorm(data=act2_1, name="bn2_1")
conv2_2 = mx.sym.Convolution(data=bn2_1, kernel=(3, 3),
pad=(1, 1), num_filter=128, name="conv2_2")
act2_2 = mx.sym.LeakyReLU(data=conv2_2, act_type="prelu",
name="act2_2")
bn2_2 = mx.sym.BatchNorm(data=act2_2, name="bn2_2")
pool2 = mx.sym.Pooling(data=bn2_2, pool_type="max",
kernel=(2, 2), stride=(2, 2), name="pool2")
do2 = mx.sym.Dropout(data=pool2, p=0.25)
# Block #3: (CONV => RELU) * 3 => POOL
conv3_1 = mx.sym.Convolution(data=do2, kernel=(3, 3),
pad=(1, 1), num_filter=256, name="conv3_1")
act3_1 = mx.sym.LeakyReLU(data=conv3_1, act_type="prelu",
name="act3_1")
bn3_1 = mx.sym.BatchNorm(data=act3_1, name="bn3_1")
conv3_2 = mx.sym.Convolution(data=bn3_1, kernel=(3, 3),
pad=(1, 1), num_filter=256, name="conv3_2")
act3_2 = mx.sym.LeakyReLU(data=conv3_2, act_type="prelu",
name="act3_2")
bn3_2 = mx.sym.BatchNorm(data=act3_2, name="bn3_2")
conv3_3 = mx.sym.Convolution(data=bn3_2, kernel=(3, 3),
pad=(1, 1), num_filter=256, name="conv3_3")
act3_3 = mx.sym.LeakyReLU(data=conv3_3, act_type="prelu",
name="act3_3")
bn3_3 = mx.sym.BatchNorm(data=act3_3, name="bn3_3")
pool3 = mx.sym.Pooling(data=bn3_3, pool_type="max",
kernel=(2, 2), stride=(2, 2), name="pool3")
do3 = mx.sym.Dropout(data=pool3, p=0.25)
# Block #4: (CONV => RELU) * 3 => POOL
conv4_1 = mx.sym.Convolution(data=do3, kernel=(3, 3),
pad=(1, 1), num_filter=512, name="conv4_1")
act4_1 = mx.sym.LeakyReLU(data=conv4_1, act_type="prelu",
name="act4_1")
bn4_1 = mx.sym.BatchNorm(data=act4_1, name="bn4_1")
conv4_2 = mx.sym.Convolution(data=bn4_1, kernel=(3, 3),
pad=(1, 1), num_filter=512, name="conv4_2")
act4_2 = mx.sym.LeakyReLU(data=conv4_2, act_type="prelu",
name="act4_2")
bn4_2 = mx.sym.BatchNorm(data=act4_2, name="bn4_2")
conv4_3 = mx.sym.Convolution(data=bn4_2, kernel=(3, 3),
pad=(1, 1), num_filter=512, name="conv4_3")
act4_3 = mx.sym.LeakyReLU(data=conv4_3, act_type="prelu",
name="act4_3")
bn4_3 = mx.sym.BatchNorm(data=act4_3, name="bn4_3")
pool4 = mx.sym.Pooling(data=bn4_3, pool_type="max",
kernel=(2, 2), stride=(2, 2), name="pool3")
do4 = mx.sym.Dropout(data=pool4, p=0.25)
# Block #5: (CONV => RELU) * 3 => POOL
conv5_1 = mx.sym.Convolution(data=do4, kernel=(3, 3),
pad=(1, 1), num_filter=512, name="conv5_1")
act5_1 = mx.sym.LeakyReLU(data=conv5_1, act_type="prelu",
name="act5_1")
bn5_1 = mx.sym.BatchNorm(data=act5_1, name="bn5_1")
conv5_2 = mx.sym.Convolution(data=bn5_1, kernel=(3, 3),
pad=(1, 1), num_filter=512, name="conv5_2")
act5_2 = mx.sym.LeakyReLU(data=conv5_2, act_type="prelu",
name="act5_2")
bn5_2 = mx.sym.BatchNorm(data=act5_2, name="bn5_2")
conv5_3 = mx.sym.Convolution(data=bn5_2, kernel=(3, 3),
pad=(1, 1), num_filter=512, name="conv5_3")
act5_3 = mx.sym.LeakyReLU(data=conv5_3, act_type="prelu",
name="act5_3")
bn5_3 = mx.sym.BatchNorm(data=act5_3, name="bn5_3")
pool5 = mx.sym.Pooling(data=bn5_3, pool_type="max",
kernel=(2, 2), stride=(2, 2), name="pool5")
do5 = mx.sym.Dropout(data=pool5, p=0.25)
# Block #6: FC => RELU layers
flatten = mx.sym.Flatten(data=do5, name="flatten")
fc1 = mx.sym.FullyConnected(data=flatten, num_hidden=4096,
name="fc1")
act6_1 = mx.sym.LeakyReLU(data=fc1, act_type="prelu",
name="act6_1")
bn6_1 = mx.sym.BatchNorm(data=act6_1, name="bn6_1")
do6 = mx.sym.Dropout(data=bn6_1, p=0.5)
# Block #7: FC => RELU layers
fc2 = mx.sym.FullyConnected(data=do6, num_hidden=4096,
name="fc2")
act7_1 = mx.sym.LeakyReLU(data=fc2, act_type="prelu",
name="act7_1")
bn7_1 = mx.sym.BatchNorm(data=act7_1, name="bn7_1")
do7 = mx.sym.Dropout(data=bn7_1, p=0.5)
# softmax classifier
fc3 = mx.sym.FullyConnected(data=do7, num_hidden=classes,
name="fc3")
model = mx.sym.SoftmaxOutput(data=fc3, name="softmax")
# return the network architecture
return model
pyimage/nn/neuralnetwork.py
# import the necessary packages
import numpy as np
class NeuralNetwork:
def __init__(self, layers, alpha=0.1):
# initialize the list of weights matrices, then store the
# network architecture and learning rate
self.W = []
self.layers = layers
self.alpha = alpha
# start looping from the index of the first layer but
# stop before we reach the last two layers
for i in np.arange(0, len(layers) - 2):
# randomly initialize a weight matrix connecting the
# number of nodes in each respective layer together,
# adding an extra node for the bias
w = np.random.randn(layers[i] + 1, layers[i + 1] + 1)
self.W.append(w / np.sqrt(layers[i]))
# the last two layers are a special case where the input
# connections need a bias term but the output does not
w = np.random.randn(layers[-2] + 1, layers[-1])
self.W.append(w / np.sqrt(layers[-2]))
def __repr__(self):
# construct and return a string that represents the network
# architecture
return "NeuralNetwork: {}".format(
"-".join(str(l) for l in self.layers))
def sigmoid(self, x):
# compute and return the sigmoid activation value for a
# given input value
return 1.0 / (1 + np.exp(-x))
def sigmoid_deriv(self, x):
# compute the derivative of the sigmoid function ASSUMING
# that `x` has already been passed through the `sigmoid`
# function
return x * (1 - x)
def fit(self, X, y, epochs=1000, displayUpdate=100):
# insert a column of 1's as the last entry in the feature
# matrix -- this little trick allows us to treat the bias
# as a trainable parameter within the weight matrix
X = np.c_[X, np.ones((X.shape[0]))]
# loop over the desired number of epochs
for epoch in np.arange(0, epochs):
# loop over each individual data point and train
# our network on it
for (x, target) in zip(X, y):
self.fit_partial(x, target)
# check to see if we should display a training update
if epoch == 0 or (epoch + 1) % displayUpdate == 0:
loss = self.calculate_loss(X, y)
print("[INFO] epoch={}, loss={:.7f}".format(
epoch + 1, loss))
def fit_partial(self, x, y):
# construct our list of output activations for each layer
# as our data point flows through the network; the first
# activation is a special case -- it's just the input
# feature vector itself
A = [np.atleast_2d(x)]
# FEEDFORWARD:
# loop over the layers in the network
for layer in np.arange(0, len(self.W)):
# feedforward the activation at the current layer by
# taking the dot product between the activation and
# the weight matrix -- this is called the "net input"
# to the current layer
net = A[layer].dot(self.W[layer])
# computing the "net output" is simply applying our
# non-linear activation function to the net input
out = self.sigmoid(net)
# once we have the net output, add it to our list of
# activations
A.append(out)
# BACKPROPAGATION
# the first phase of backpropagation is to compute the
# difference between our *prediction* (the final output
# activation in the activations list) and the true target
# value
error = A[-1] - y
# from here, we need to apply the chain rule and build our
# list of deltas `D`; the first entry in the deltas is
# simply the error of the output layer times the derivative
# of our activation function for the output value
D = [error * self.sigmoid_deriv(A[-1])]
# once you understand the chain rule it becomes super easy
# to implement with a `for` loop -- simply loop over the
# layers in reverse order (ignoring the last two since we
# already have taken them into account)
for layer in np.arange(len(A) - 2, 0, -1):
# the delta for the current layer is equal to the delta
# of the *previous layer* dotted with the weight matrix
# of the current layer, followed by multiplying the delta
# by the derivative of the non-linear activation function
# for the activations of the current layer
delta = D[-1].dot(self.W[layer].T)
delta = delta * self.sigmoid_deriv(A[layer])
D.append(delta)
# since we looped over our layers in reverse order we need to
# reverse the deltas
D = D[::-1]
# WEIGHT UPDATE PHASE
# loop over the layers
for layer in np.arange(0, len(self.W)):
# update our weights by taking the dot product of the layer
# activations with their respective deltas, then multiplying
# this value by some small learning rate and adding to our
# weight matrix -- this is where the actual "learning" takes
# place
self.W[layer] += -self.alpha * A[layer].T.dot(D[layer])
def predict(self, X, addBias=True):
# initialize the output prediction as the input features -- this
# value will be (forward) propagated through the network to
# obtain the final prediction
p = np.atleast_2d(X)
# check to see if the bias column should be added
if addBias:
# insert a column of 1's as the last entry in the feature
# matrix (bias)
p = np.c_[p, np.ones((p.shape[0]))]
# loop over our layers in the network
for layer in np.arange(0, len(self.W)):
# computing the output prediction is as simple as taking
# the dot product between the current activation value `p`
# and the weight matrix associated with the current layer,
# then passing this value through a non-linear activation
# function
p = self.sigmoid(np.dot(p, self.W[layer]))
# return the predicted value
return p
def calculate_loss(self, X, targets):
# make predictions for the input data points then compute
# the loss
targets = np.atleast_2d(targets)
predictions = self.predict(X, addBias=False)
loss = 0.5 * np.sum((predictions - targets) ** 2)
# return the loss
return loss
pyimage/nn/perceptron.py
# import the necessary packages
import numpy as np
class Perceptron:
def __init__(self, N, alpha=0.1):
# initialize the weight matrix and store the learning rate
self.W = np.random.randn(N + 1) / np.sqrt(N)
self.alpha = alpha
def step(self, x):
# apply the step function
return 1 if x > 0 else 0
def fit(self, X, y, epochs=10):
# insert a column of 1's as the last entry in the feature
# matrix -- this little trick allows us to treat the bias
# as a trainable parameter within the weight matrix
X = np.c_[X, np.ones((X.shape[0]))]
# loop over the desired number of epochs
for epoch in np.arange(0, epochs):
# loop over each individual data point
for (x, target) in zip(X, y):
# take the dot product between the input features
# and the weight matrix, then pass this value
# through the step function to obtain the prediction
p = self.step(np.dot(x, self.W))
# only perform a weight update if our prediction
# does not match the target
if p != target:
# determine the error
error = p - target
# update the weight matrix
self.W += -self.alpha * error * x
def predict(self, X, addBias=True):
# ensure our input is a matrix
X = np.atleast_2d(X)
# check to see if the bias column should be added
if addBias:
# insert a column of 1's as the last entry in the feature
# matrix (bias)
X = np.c_[X, np.ones((X.shape[0]))]
# take the dot product between the input features and the
# weight matrix, then pass the value through the step
# function
return self.step(np.dot(X, self.W))
pyimage/preprocessing/aspectawarepreprocessor.py
# import the necessary packages
import imutils
import cv2
class AspectAwarePreprocessor:
def __init__(self, width, height, inter=cv2.INTER_AREA):
# store the target image width, height, and interpolation
# method used when resizing
self.width = width
self.height = height
self.inter = inter
def preprocess(self, image):
# grab the dimensions of the image and then initialize
# the deltas to use when cropping
(h, w) = image.shape[:2]
dW = 0
dH = 0
# if the width is smaller than the height, then resize
# along the width (i.e., the smaller dimension) and then
# update the deltas to crop the height to the desired
# dimension
if w < h:
image = imutils.resize(image, width=self.width,
inter=self.inter)
dH = int((image.shape[0] - self.height) / 2.0)
# otherwise, the height is smaller than the width so
# resize along the height and then update the deltas
# crop along the width
else:
image = imutils.resize(image, height=self.height,
inter=self.inter)
dW = int((image.shape[1] - self.width) / 2.0)
# now that our images have been resized, we need to
# re-grab the width and height, followed by performing
# the crop
(h, w) = image.shape[:2]
image = image[dH:h - dH, dW:w - dW]
# finally, resize the image to the provided spatial
# dimensions to ensure our output image is always a fixed
# size
return cv2.resize(image, (self.width, self.height),
interpolation=self.inter)
pyimage/preprocessing/croppreprocessor.py
# import the necessary packages
import numpy as np
import cv2
class CropPreprocessor:
def __init__(self, width, height, horiz=True, inter=cv2.INTER_AREA):
# store the target image width, height, whether or not
# horizontal flips should be included, along with the
# interpolation method used when resizing
self.width = width
self.height = height
self.horiz = horiz
self.inter = inter
def preprocess(self, image):
# initialize the list of crops
crops = []
# grab the width and height of the image then use these
# dimensions to define the corners of the image based
(h, w) = image.shape[:2]
coords = [
[0, 0, self.width, self.height],
[w - self.width, 0, w, self.height],
[w - self.width, h - self.height, w, h],
[0, h - self.height, self.width, h]]
# compute the center crop of the image as well
dW = int(0.5 * (w - self.width))
dH = int(0.5 * (h - self.height))
coords.append([dW, dH, w - dW, h - dH])
# loop over the coordinates, extract each of the crops,
# and resize each of them to a fixed size
for (startX, startY, endX, endY) in coords:
crop = image[startY:endY, startX:endX]
crop = cv2.resize(crop, (self.width, self.height),
interpolation=self.inter)
crops.append(crop)
# check to see if the horizontal flips should be taken
if self.horiz:
# compute the horizontal mirror flips for each crop
mirrors = [cv2.flip(c, 1) for c in crops]
crops.extend(mirrors)
# return the set of crops
return np.array(crops)
pyimage/preprocessing/imagetoarraypreprocessor.py
# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array
class ImageToArrayPreprocessor:
def __init__(self, dataFormat=None):
# store the image data format
self.dataFormat = dataFormat
def preprocess(self, image):
# apply the Keras utility function that correctly rearranges
# the dimensions of the image
return img_to_array(image, data_format=self.dataFormat)
pyimage/preprocessing/meanpreprocessor.py
# import the necessary packages
import cv2
class MeanPreprocessor:
def __init__(self, rMean, gMean, bMean):
# store the Red, Green, and Blue channel averages across a
# training set
self.rMean = rMean
self.gMean = gMean
self.bMean = bMean
def preprocess(self, image):
# split the image into its respective Red, Green, and Blue
# channels
(B, G, R) = cv2.split(image.astype("float32"))
# subtract the means for each channel
R -= self.rMean
G -= self.gMean
B -= self.bMean
# merge the channels back together and return the image
return cv2.merge([B, G, R])
pyimage/preprocessing/patchpreprocessor.py
# import the necessary packages
from sklearn.feature_extraction.image import extract_patches_2d
class PatchPreprocessor:
def __init__(self, width, height):
# store the target width and height of the image
self.width = width
self.height = height
def preprocess(self, image):
# extract a random crop from the image with the target width
# and height
return extract_patches_2d(image, (self.height, self.width),
max_patches=1)[0]
pyimage/preprocessing/simplepreprocessor.py
# import the necessary packages
import cv2
class SimplePreprocessor:
def __init__(self, width, height, inter=cv2.INTER_AREA):
# store the target image width, height, and interpolation
# method used when resizing
self.width = width
self.height = height
self.inter = inter
def preprocess(self, image):
# resize the image to a fixed size, ignoring the aspect
# ratio
return cv2.resize(image, (self.width, self.height),
interpolation=self.inter)
pyimage/utils/captchahelper.py
# import the necessary packages
import imutils
import cv2
def preprocess(image, width, height):
# grab the dimensions of the image, then initialize
# the padding values
(h, w) = image.shape[:2]
# if the width is greater than the height then resize along
# the width
if w > h:
image = imutils.resize(image, width=width)
# otherwise, the height is greater than the width so resize
# along the height
else:
image = imutils.resize(image, height=height)
# determine the padding values for the width and height to
# obtain the target dimensions
padW = int((width - image.shape[1]) / 2.0)
padH = int((height - image.shape[0]) / 2.0)
# pad the image then apply one more resizing to handle any
# rounding issues
image = cv2.copyMakeBorder(image, padH, padH, padW, padW,
cv2.BORDER_REPLICATE)
image = cv2.resize(image, (width, height))
# return the pre-processed image
return image
pyimage/utils/imagenethelper.py
# import the necessary packages
import numpy as np
import os
class ImageNetHelper:
def __init__(self, config):
# store the configuration object
self.config = config
# build the label mappings and validation blacklist
self.labelMappings = self.buildClassLabels()
self.valBlacklist = self.buildBlackist()
def buildClassLabels(self):
# load the contents of the file that maps the WordNet IDs
# to integers, then initialize the label mappings dictionary
rows = open(self.config.WORD_IDS).read().strip().split("\n")
labelMappings = {}
# loop over the labels
for row in rows:
# split the row into the WordNet ID, label integer, and
# human readable label
(wordID, label, hrLabel) = row.split(" ")
# update the label mappings dictionary using the word ID
# as the key and the label as the value, subtracting `1`
# from the label since MATLAB is one-indexed while Python
# is zero-indexed
labelMappings[wordID] = int(label) - 1
# return the label mappings dictionary
return labelMappings
def buildBlackist(self):
# load the list of blacklisted image IDs and convert them to
# a set
rows = open(self.config.VAL_BLACKLIST).read()
rows = set(rows.strip().split("\n"))
# return the blacklisted image IDs
return rows
def buildTrainingSet(self):
# load the contents of the training input file that lists
# the partial image ID and image number, then initialize
# the list of image paths and class labels
rows = open(self.config.TRAIN_LIST).read().strip()
rows = rows.split("\n")
paths = []
labels = []
# loop over the rows in the input training file
for row in rows:
# break the row into the the partial path and image
# number (the image number is sequential and is
# essentially useless to us)
(partialPath, imageNum) = row.strip().split(" ")
# construct the full path to the training image, then
# grab the word ID from the path and use it to determine
# the integer class label
path = os.path.sep.join([self.config.IMAGES_PATH,
"train", "{}.JPEG".format(partialPath)])
wordID = partialPath.split("/")[0]
label = self.labelMappings[wordID]
# update the respective paths and label lists
paths.append(path)
labels.append(label)
# return a tuple of image paths and associated integer class
# labels
return (np.array(paths), np.array(labels))
def buildValidationSet(self):
# initialize the list of image paths and class labels
paths = []
labels = []
# load the contents of the file that lists the partial
# validation image filenames
valFilenames = open(self.config.VAL_LIST).read()
valFilenames = valFilenames.strip().split("\n")
# load the contents of the file that contains the *actual*
# ground-truth integer class labels for the validation set
valLabels = open(self.config.VAL_LABELS).read()
valLabels = valLabels.strip().split("\n")
# loop over the validation data
for (row, label) in zip(valFilenames, valLabels):
# break the row into the partial path and image number
(partialPath, imageNum) = row.strip().split(" ")
# if the image number is in the blacklist set then we
# should ignore this validation image
if imageNum in self.valBlacklist:
continue
# construct the full path to the validation image, then
# update the respective paths and labels lists
path = os.path.sep.join([self.config.IMAGES_PATH, "val",
"{}.JPEG".format(partialPath)])
paths.append(path)
labels.append(int(label) - 1)
# return a tuple of image paths and associated integer class
# labels
return (np.array(paths), np.array(labels))
pyimage/utils/ranked.py
# import the necessary packages
import numpy as np
def rank5_accuracy(preds, labels):
# initialize the rank-1 and rank-5 accuracies
rank1 = 0
rank5 = 0
# loop over the predictions and ground-truth labels
for (p, gt) in zip(preds, labels):
# sort the probabilities by their index in descending
# order so that the more confident guesses are at the
# front of the list
p = np.argsort(p)[::-1]
# check if the ground-truth label is in the top-5
# predictions
if gt in p[:5]:
rank5 += 1
# check to see if the ground-truth is the #1 prediction
if gt == p[0]:
rank1 += 1
# compute the final rank-1 and rank-5 accuracies
rank1 /= float(len(preds))
rank5 /= float(len(preds))
# return a tuple of the rank-1 and rank-5 accuracies
return (rank1, rank5)