by Sigurður Skúli

通过SigurðurSkúli

(Making your own Face Recognition System)

Face recognition is the latest trend when it comes to user authentication. Apple recently launched their new iPhone X which uses Face ID to authenticate users. OnePlus 5 is getting the Face Unlock feature from theOnePlus 5T soon. And Baidu is using face recognition instead of ID cards to allow their employees to enter their offices. These applications may seem like magic to a lot of people. But in this article we aim to demystify the subject by teaching you how to make your own simplified version of a face recognition system in Python.

人脸识别是涉及用户身份验证的最新趋势。 苹果最近推出了他们的新iPhone X,该手机使用Face ID来验证用户身份。 OnePlus 5即将从OnePlus 5T获得面部解锁功能百度正在使用人脸识别代替身份证,允许员工进入办公室 。 对于许多人来说,这些应用程序似乎是不可思议的。 但是在本文中,我们旨在通过教您如何使用Python制作自己的简化版本的人脸识别系统来揭开这个主题的神秘面纱。

Github link for those who do not like reading and only want the code

Github链接适合那些不喜欢阅读而只想要代码的人

(Background)

Before we get into the details of the implementation I want to discuss the details of FaceNet. Which is the network we will be using in our system.

在详细介绍实现之前,我想讨论一下FaceNet的细节。 我们将在系统中使用哪个网络。

(FaceNet)

FaceNet is a neural network that learns a mapping from face images to a compact Euclidean space where distances correspond to a measure of face similarity. That is to say, the more similar two face images are the lesser the distance between them.

FaceNet是一个神经网络,可学习从人脸图像到紧凑欧几里得空间的映射,其中距离对应于人脸相似性的度量。 也就是说,两个人脸图像越相似,它们之间的距离就越小。

(Triplet Loss)

FaceNet uses a distinct loss method called Triplet Loss to calculate loss. Triplet Loss minimises the distance between an anchor and a positive, images that contain same identity, and maximises the distance between the anchor and a negative, images that contain different identities.

FaceNet使用一种称为三重损失的独特损失方法来计算损失。 三重损失使锚点和包含相同标识的正像之间的距离最小化,并使锚点和包含不同标识的负像之间的距离最大化。

  • f(a) refers to the output encoding of the anchor
    f(a)表示锚点的输出编码
  • f(p) refers to the output encoding of the positive
    f(p)表示正数的输出编码
  • f(n) refers to the output encoding of the negative
    f(n)表示负数的输出编码
  • alpha is a constant used to make sure that the network does not try to optimise towards f(a) - f(p) = f(a) - f(n) = 0.
    alpha是一个常数,用于确保网络不会尝试朝f(a)-f(p)= f(a)-f(n)= 0优化。
  • […]+ is equal to max(0, sum)
    […] +等于max(0,sum)
(Siamese Networks)

FaceNet is a Siamese Network. A Siamese Network is a type of neural network architecture that learns how to differentiate between two inputs. This allows them to learn which images are similar and which are not. These images could be contain faces.

FaceNet是一个暹罗网络。 暹罗网络是一种神经网络架构,可以学习如何区分两个输入。 这使他们能够了解哪些图像相似,哪些不相似。 这些图像可能包含面Kong。

Siamese networks consist of two identical neural networks, each with the same exact weights. First, each network take one of the two input images as input. Then, the outputs of the last layers of each network are sent to a function that determines whether the images contain the same identity.

连体网络由两个相同的神经网络组成,每个神经网络具有相同的精确权重。 首先,每个网络都将两个输入图像之一作为输入。 然后,每个网络的最后一层的输出将发送到确定图像是否包含相同身份的功能。

In FaceNet, this is done by calculating the distance between the two outputs.

在FaceNet中,这是通过计算两个输出之间的距离来完成的。

(Implementation)

Now that we have clarified the theory, we can jump straight into the implementation.

现在,我们已经阐明了理论,我们可以直接进入实现过程。

In our implementation we’re going to be using Keras and Tensorflow. Additionally, we’re using two utility files that we got from deeplearning.ai’s repo to abstract all interactions with the FaceNet network.:

在我们的实现,我们要使用KerasTensorflow 。 此外,我们使用从deeplearning.ai的仓库中获得的两个实用程序文件来抽象与FaceNet网络的所有交互。

  • fr_utils.py contains functions to feed images to the network and getting the encoding of images
    fr_utils.py包含将图像馈送到网络并获取图像编码的函数
  • inception_blocks_v2.py contains functions to prepare and compile the FaceNet network
    inception_blocks_v2.py包含用于准备和编译FaceNet网络的函数
(Compiling the FaceNet network)

The first thing we have to do is compile the FaceNet network so that we can use it for our face recognition system.

我们要做的第一件事是编译FaceNet网络,以便我们可以将其用于面部识别系统。

import osimport globimport numpy as npimport cv2import tensorflow as tffrom fr_utils import *from inception_blocks_v2 import *from keras import backend as K
K.set_image_data_format('channels_first')
FRmodel = faceRecoModel(input_shape=(3, 96, 96))
def triplet_loss(y_true, y_pred, alpha = 0.3):    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,               positive)), axis=-1)    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,                negative)), axis=-1)    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)    loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))       return loss
FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])load_weights_from_FaceNet(FRmodel)

We’ll start by initialising our network with an input shape of (3, 96, 96). That means that the Red-Green-Blue (RGB) channels are the first dimension of the image volume fed to the network. And that all images that are fed to the network must be 96x96 pixel images.

我们将从初始化输入形状为(3,96,96)的网络开始。 这意味着红绿蓝(RGB)通道是馈送到网络的图像量的第一维。 并且所有送入网络的图像都必须是96x96像素的图像。

Next we’ll define the Triplet Loss function. The function in the code snippet above follows the definition of the Triplet Loss equation that we defined in the previous section.

接下来,我们将定义三重损失函数。 上面的代码片段中的函数遵循我们在上一节中定义的Triplet Loss方程的定义。

If you are unfamiliar with any of the Tensorflow functions used to perform the calculation, I’d recommend reading the documentation (for which I have added links to for each function) as it will improve your understanding of the code. But comparing the function to the equation in Figure 1 should be enough.

如果您不熟悉用于执行计算的任何Tensorflow函数,我建议您阅读文档(为此我添加了每个函数的链接),因为它会增进您对代码的理解。 但是将函数与图1中的方程进行比较就足够了。

Once we have our loss function, we can compile our face recognition model using Keras. And we’ll use the Adam optimizer to minimise the loss calculated by the Triplet Loss function.

一旦有了损失功能,就可以使用Keras编译人脸识别模型。 而且,我们将使用Adam优化器来最小化由Triplet Loss函数计算的损耗。

(Preparing a Database)

Now that we have compiled FaceNet, we are going to prepare a database of individuals we want our system to recognise. We are going to use all the images contained in our images directory for our database of individuals.

现在我们已经编译了FaceNet,我们将准备一个我们希望系统识别的个人数据库。 我们将使用图像中包含的所有图像 个人数据库的目录。

NOTE: We are only going to use one image of each individual in our implementation. The reason is that the FaceNet network is powerful enough to only need one image of an individual to recognise them!

注意:在我们的实现中,我们将只使用一个图像。 原因是FaceNet网络足够强大,只需要一个人的图像就可以识别它们!

def prepare_database():    database = {}
for file in glob.glob("images/*"):        identity = os.path.splitext(os.path.basename(file))[0]        database[identity] = img_path_to_encoding(file, FRmodel)
return database

For each image, we will convert the image data to an encoding of 128 float numbers. We do this by calling the function img_path_to_encoding. The function takes in a path to an image and feeds the image to our face recognition network. Then, it returns the output from the network, which happens to be the encoding of the image.

对于每张图像,我们会将图像数据转换为128个浮点数的编码。 我们通过调用img_path_to_encoding函数来实现 。 该功能将获取图像的路径,并将图像提供给我们的面部识别网络。 然后,它从网络返回输出,该输出恰好是图像的编码。

Once we have added the encoding for each image to our database, our system can finally start recognising individuals!

将每个图像的编码添加到数据库后,我们的系统最终可以开始识别个人!

(Recognising a Face)

As discussed in the Background section, FaceNet is trained to minimise the distance between images of the same individual and maximise the distance between images of different individuals. Our implementation uses this information to determine which individual the new image fed to our system is most likely to be.

如背景技术部分中所述,FaceNet经过训练可以使同一个人的图像之间的距离最小,而使不同个人的图像之间的距离最大。 我们的实现使用此信息来确定新图像最有可能是哪个人。

def who_is_it(image, database, model):    encoding = img_to_encoding(image, model)        min_dist = 100    identity = None        # Loop over the database dictionary's names and encodings.    for (name, db_enc) in database.items():        dist = np.linalg.norm(db_enc - encoding)
print('distance for %s is %s' %(name, dist))
if dist < min_dist:            min_dist = dist            identity = name        if min_dist > 0.52:        return None    else:        return identity

The function above feeds the new image into a utility function called img_to_encoding. The function processes an image using FaceNet and returns the encoding of the image. Now that we have the encoding we can find the individual that the image most likely belongs to.

上面的函数将新图像馈送到名为img_to_encoding的实用程序函数中。 该函数使用FaceNet处理图像并返回图像的编码。 现在我们有了编码,我们可以找到图像最有可能属于的个人。

To find the individual, we go through our database and calculate the distance between our new image and each individual in the database. The individual with the lowest distance to the new image is then chosen as the most likely candidate.

为了找到个人,我们遍历数据库并计算新图像与数据库中每个个人之间的距离。 然后选择与新图像距离最短的个人作为最可能的候选人。

Finally, we must determine whether the candidate image and the new image contain the same person or not. Since by the end of our loop we have only determined the most likely individual. This is where the following code snippet comes into play.

最后,我们必须确定候选图像和新图像是否包含同一个人。 由于在循环结束时,我们仅确定了最有可能的个人。 这是以下代码段起作用的地方。

if min_dist > 0.52:    return Noneelse:    return identity
  • If the distance is above 0.52, then we determine that the individual in the new image does not exist in our database.
  • But, if the distance is equal to or below 0.52, then we determine they are the same individual!

Now the tricky part here is that the value 0.52 was achieved through trial-and-error on my behalf for my specific dataset. The best value might be much lower or slightly higher and it will depend on your implementation and data. I recommend trying out different values and see what fits your system best!

现在,这里最棘手的部分是,我代表我的特定数据集通过反复试验获得了0.52的值。 最佳值可能会低得多或略高,这取决于您的实现和数据。 我建议尝试不同的值,然后看看最适合您的系统!

(Building a System using Face Recognition)

Now that we know the details on how we recognise a person using a face recognition algorithm, we can start having some fun with it.

既然我们知道了如何使用面部识别算法识别人的细节,我们就可以开始使用它了。

In the Github repository I linked to at the beginning of this article is a demo that uses a laptop’s webcam to feed video frames to our face recognition algorithm. Once the algorithm recognises an individual in the frame, the demo plays an audio message that welcomes the user using the name of their image in the database. Figure 3 shows an example of the demo in action.

在本文开头我链接到的Github存储库中,是一个演示,该演示使用便携式计算机的网络摄像头将视频帧馈送到我们的面部识别算法。 一旦算法识别出帧中的某个人,演示就会播放音频消息,使用数据库中其图像的名称来欢迎用户。 图3显示了演示示例。

(Conclusion)

By now you should be familiar with how face recognition systems work and how to make your own simplified face recognition system using a pre-trained version of the FaceNet network in python!

现在,您应该熟悉面部识别系统的工作原理,以及如何使用经过预训练的python FaceNet网络版本制作自己的简化面部识别系统!

If you want to play around with the demonstration in the Github repository and add images of people you know then go ahead and fork the repository.

如果您想在Github存储库中进行演示,并添加认识的人的图像,请继续并添加存储库。

Have some fun with the demonstration and impress all your friends with your awesome knowledge of face recognition!

在演示中玩得开心,并以您对人脸识别的精通知识打动所有朋友!

翻译自: https://www.freecodecamp.org/news/making-your-own-face-recognition-system-29a8e728107c/