计算机视觉知识基础

My introduction to Computer Vision happened in 2017 when I was doing Self-driving Car Nanodegree from Udacity. The first semester was mainly related to Computer Vision and Deep Learning which sparked my interest in the subject. This post would cover some basic introduction of Computer Vision as well as Camera Calibration and affine transformations.

我对计算机视觉的介绍发生在2017年,当时我在Udacity进行自动驾驶汽车纳米学位课程。 第一学期主要与计算机视觉和深度学习有关,这激发了我对该学科的兴趣。 这篇文章将涵盖计算机视觉的一些基本介绍,以及相机校准和仿射变换。

The goal of computer vision is to aid machines to see and understand the content of digital images. It deals with perceiving and understanding the world around you through images. Each digital image is made up of different pixels which are the smallest building blocks for an image. Mathematically, it's these pixels that contain different values for different features -colors. A simplified example would be an image in an RGB color scheme with every pixel containing values of Red, Green, Blue. In this case, the image can be seen as a matrix whose values can be utilized by different algorithms. A video stream is just a collection of different 2D images played over time.

计算机视觉的目的是帮助机器查看和理解数字图像的内容。 它通过图像来感知和理解您周围的世界。 每个数字图像都由不同的像素组成,这些像素是图像的最小组成部分。 在数学上,正是这些像素针对不同的功能-颜色包含了不同的值。 一个简化的示例是采用RGB配色方案的图像,其中每个像素都包含Red,Green,Blue的值。 在这种情况下,图像可以看作是矩阵,其值可以被不同的算法利用。 视频流只是随时间推移播放的不同2D图像的集合。

Different algorithms can be used to extract information from images and videos. These algorithms might look at different features in the image and apply different techniques:

可以使用不同的算法从图像和视频中提取信息。 这些算法可能会查看图像中的不同特征并应用不同的技术:

  • Colour Detection- Different colors are coded differently mathematically. 颜色检测-不同的颜色在数学上进行了不同的编码。
  • Edge detection: edge detection helps the computer to distinguish between different object shapes, sizes, etc. 边缘检测 :边缘检测可帮助计算机区分不同的物体形状,大小等。
  • Masking/unmasking: only using the specified area of interest. For example, if you are looking for lane lines from dashcam, you might only want to look lower half of the image 遮罩/取消遮罩 :仅使用指定的关注区域。 例如,如果您正在寻找行车记录仪的车道线,则可能只想看图像的下半部分
  • Shape and feature extraction: Using colors and shapes to identify objects 形状和特征提取 :使用颜色和形状识别对象
  • Machine/deep learning: It can also use different features to learn itself about different objects etc. 机器/深度学习 :它还可以使用不同的功能来了解有关不同对象等的自身。

How to apply these techniques/algorithms? There are different libraries but OpenCV is one of the most versatile and widely used. Its open-source, originally developed by Intel and support various programming platforms like Python, C++, etc. It is well documented and as a result of a large user base, online help is readily available.

如何应用这些技术/算法? 有许多不同的库,但是OpenCV是功能最丰富且使用最广泛的库之一。 它的开放源代码最初是由Intel开发的,并且支持各种编程平台,例如Python,C ++等。该文档有据可查,并且由于庞大的用户群,可以轻松获得在线帮助。

These algorithms and techniques then can used for various Computer Vision tasks such as:

这些算法和技术随后可用于各种计算机视觉任务,例如:

  • Object Classification: What broad category of object is in this image? 对象分类 :此图像中对象的大致类别是什么?
  • Object Identification: Which type of a given object is in this image? 对象识别 :此图像中给定对象的类型是什么?
  • Object Verification: Is the object in the image? 对象验证 :图像中是否有对象?
  • Object Detection: Where are the objects in the image? 对象检测 :图像中的对象在哪里?
  • Object Landmark Detection: What are the key points for the object in the image? 对象地标检测 :图像中对象的关键点是什么?
  • Object Segmentation: What pixels belong to the object in the image? 对象分割 :图像中的对象属于哪些像素?
  • Object Recognition: What objects are in this image and where are they? 对象识别 :此图像中有哪些对象,它们在哪里?

But before these algorithms can be applied, some image processing is required. Image processing is an integral part of computer vision. The images are preprocessed :

但是在应用这些算法之前,需要进行一些图像处理。 图像处理是计算机视觉的组成部分。 图像经过预处理:

  • to preprocess the image for the algorithm
  • to clean up the image or a dataset for algorithm to use
  • to generate new images for the machine/deep learning also to use
  • To better understand the scene, by using say perspective transform

Camera Model

相机型号

The image itself coming out of any camera must first be undistorted as a first step. Image distortion occurs when a camera looks at 3D objects in the real world and transforms them into a 2D image; this transformation isn’t perfect. Distortion actually changes what the shape and size of these 3D objects appear to be. So, the first step in analysing camera images, is to undo this distortion so that you can get correct and useful information out of them.

第一步,必须首先使任何摄像机发出的图像本身不失真。 当相机观看现实世界中的3D对象并将其转换为2D图像时,就会发生图像失真。 这种转变并不完美。 失真实际上会改变这些3D对象的形状和大小。 因此,分析摄像机图像的第一步是消除这种失真,以便从中获得正确和有用的信息。

计算机视觉考试题目及参考答案选择题_计算机视觉

Image Courtesy: Udacity 图片提供:Udacity

In a pin hole camera, 2D image is formed when the light from 3D objects in the real world is focused through the lens on the screen. The image formed is reversed as shown in the figure. The image then needs to be converted using the Camera Matrix.

在针Kong照相机中,当来自现实世界中3D对象的光通过屏幕上的镜头聚焦时,就会形成2D图像。 如图所示,形成的图像反转。 然后需要使用Camera Matrix转换图像。

计算机视觉考试题目及参考答案选择题_python_02

Image Courtesy: Udacity

图片提供:Udacity

However most of the cameras don’t just use a pinhole. They use lenses which causes distortion.

但是,大多数相机不仅仅使用针Kong。 他们使用会导致变形的镜头。

Types of Distortion

失真类型

Radial Distortion

径向变形

Real cameras use curved lenses to form an image, and light rays often bend a little too much or too little at the edges of these lenses. This creates an effect that distorts the edges of images, so that lines or objects appear more or less curved than they actually are. This is called radial distortion, and it’s the most common type of distortion.

实际的相机使用弯曲的镜头来形成图像,并且光线通常在这些镜头的边缘弯曲得太多或太少。 这会产生扭曲图像边缘的效果,从而使线条或对象看起来比实际弯曲的程度或大或小。 这称为径向变形 ,这是最常见的变形类型。

计算机视觉考试题目及参考答案选择题_计算机视觉考试题目及参考答案选择题_03

Image Courtesy : OpenCV

图片提供:OpenCV

Another type of distortion, is tangential distortion. This occurs when a camera’s lens is not aligned perfectly parallel to the imaging plane, where the camera film or sensor is. This makes an image look tilted so that some objects appear farther away or closer than they actually are.

另一类失真是切向失真 。 当相机镜头未完全平行于相机胶卷或传感器所在的成像平面对齐时,就会发生这种情况。 这会使图像看起来倾斜,从而使某些对象看起来比实际位置更远或更近。

计算机视觉考试题目及参考答案选择题_计算机视觉_04

Courtesy: Udacity 礼貌:Udacity

Distortion Coefficients and Correction

失真系数和校正

The first step for distortion correction is finding the Distortion coefficients. There are three coefficients needed to correct for radial distortion: k1, k2, and k3 and two for radial distortion: p1 and p2.

失真校正的第一步是找到失真系数。 校正径向失真需要三个系数:k1,k2和k3,而对于径向失真则需要两个系数:p1和p2。

计算机视觉考试题目及参考答案选择题_计算机视觉考试题目及参考答案选择题_05

To correct the appearance of radially distorted points in an image, one can use a correction formula:

要校正图像中径向变形点的外观,可以使用一种校正公式:

计算机视觉考试题目及参考答案选择题_计算机视觉考试题目及参考答案选择题_06

where (x,y) is a point in a distorted image. k1, k2, and k3 — Radial distortion coefficients of the lens. r2: x2 + y2.

其中( x , y )是变形图像中的一个点。 k1,k2和k3-镜头的径向畸变系数。 r 2: x 2 + y 2。

To undistort these points, OpenCV calculates r, which is the known distance between a point in an undistorted (corrected) image and the center of the image distortion, which is often the center of that image is sometimes referred to as the distortion center.

为了使这些点不失真,OpenCV计算r , r是未失真(校正)图像中的点与图像失真中心之间的已知距离,通常该图像的中心有时称为失真中心。

Similarly the tangential distortion correction can be applied as :

类似地,切向失真校正可以应用为:

计算机视觉考试题目及参考答案选择题_计算机视觉_07

The corrected coordinates x and y are then converted to normalised image coordinates. Normalised image coordinates are calculated from pixel coordinates by translating to the optical center and dividing by the focal length in pixels.

然后将校正后的坐标x和y转换为归一化图像坐标。 通过平移到光学中心并除以像素的焦距,可以从像素坐标中计算出归一化的图像坐标。

计算机视觉考试题目及参考答案选择题_java_08

where fx, fy are camera focal lengths and cx, cy are optical centers.

其中fx,fy是相机焦距,cx,cy是光学中心。

The distortion coefficient k3 is required to accurately reflect major radial distortion (like in wide angle lenses). However, for minor radial distortion, which most regular camera lenses have, k3 has a value close to or equal to zero and is negligible. So, in OpenCV, you can choose to ignore this coefficient; this is why it appears at the end of the distortion values array: [k1, k2, p1, p2, k3].

需要畸变系数k3才能准确反映主要的径向畸变(例如在广角镜中)。 但是,对于大多数常规摄像机镜头所具有的较小径向变形,k3的值接近或等于零,可以忽略不计。 因此,在OpenCV中,您可以选择忽略该系数。 这就是为什么它出现在失真值数组的末尾:[k1,k2,p1,p2,k3]。

(Methodology)

For distortion correction, the most common way is to use the check board images. The process involves mapping distorted points to undistorted points in order to check for the amount of distortion. The chessboard is a great place to start as it has multiple checkpoints(corners) that can be used to identify distortion at various locations in the image. Its better to do this for multiple images in order to get the full gauge. The general recommendation is to use >20 images.

对于失真校正,最常见的方法是使用检查板图像。 该过程涉及将变形点映射到非变形点,以检查变形量。 棋盘是一个很好的起点,因为它有多个检查点(角落),可用于识别图像中各个位置的变形。 最好对多个图像执行此操作以获得完整的规格。 一般建议使用> 20张图像。

1 - Get a chessboard and click pictures (>20) from different angles to have a starting. set. The basic idea is to find the corners in the distorted chess board images and map them to undistorted corners in the real world.

1-获取棋盘,然后从不同角度单击图片(> 20)以开始。 组。 基本思想是在变形的棋盘图像中找到拐角并将其映射到现实世界中未变形的拐角。

2- Start by preparing the obj points which are undistorted coordinates of the chessboard corners in the world. Assuming the chessboard is fixed on the (x, y) plane at z=0, such that the object points are the same for each calibration image. Thus, objp is just a replicated array of coordinates, and objpoints will be appended with a copy of it every time.

2-首先准备obj点,它们是世界上棋盘角的不变形坐标。 假设棋盘固定在z = 0的(x,y)平面上,以使每个校准图像的目标点都相同。 因此,objp只是坐标的复制数组,而objpoints每次都会附加一个副本。

3- All chessboard corners in a test image are detected using the OpenCV functions findChessboardCorners() and drawChessboardCorners()

3-使用OpenCV函数findChessboardCorners()和drawChessboardCorners()检测测试图像中的所有棋盘角

计算机视觉考试题目及参考答案选择题_计算机视觉考试题目及参考答案选择题_09

Output from findChessboardCorners() and drawChessboardCorners() findChessboardCorners()和drawChessboardCorners()的输出

4- Imgpoints which are the corners of distorted image in 2D world are appended with the (x, y) pixel position of each of the corners in the image plane with each successful chessboard detection.

每次成功完成棋盘检测后,在2D世界中作为扭曲图像角的4个Imgpoints会附加在图像平面中每个角的(x,y)像素位置。

5- Output objpoints and imgpoints are used to compute the camera calibration and distortion coefficients using the cv2.calibrateCamera() function.

5-输出的objpoint和imgpoint用于使用cv2.calibrateCamera()函数来计算相机校准和失真系数。

6- This distortion correction is applied to the test image using the cv2.undistort() function and obtained this result:

6-使用cv2.undistort()函数将此失真校正应用于测试图像,并获得以下结果:

计算机视觉考试题目及参考答案选择题_java_10

After that, these distortion coefficients are written to wide_dist_pickle.p to be used later for distortion correction of camera

之后,将这些失真系数写入 wide_dist_pickle.p稍后将用于相机的失真校正

Functions for distortion correction

失真校正功能

Finding chessboard corners (for an 8x6 board):

查找棋盘角(用于8x6棋盘):

ret, corners = cv2.findChessboardCorners(gray, (8,6), None)

Drawing detected corners on an image:

在图像上绘制检测到的角:

img = cv2.drawChessboardCorners(img, (8,6), corners, ret)

Camera calibration, given object points, image points, and the shape of the grayscale image:

相机校准,给定的对象点,图像点和灰度图像的形状:

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

Undistorting a test image:

使测试图像不失真:

dst = cv2.undistort(img, mtx, dist, None, mtx)

The shape of the image, which is passed into the calibrateCamera function, is just the height and width of the image. One way to retrieve these values is by retrieving them from the grayscale image shape array gray.shape[::-1].

传递到calibrateCamera函数中的图像形状就是图像的高度和宽度。 检索这些值的一种方法是通过从灰度图像形状数组gray.shape [::-1]中检索它们。

Another way to retrieve the image shape, is to get them directly from the color image by retrieving the first two values in the color image shape array using img.shape[1::-1]. Greyscale images on the other hand only have 2 dimensions (color images have three, height, width, and depth).

检索图像形状的另一种方法是,通过使用img.shape [1 ::-1]检索彩色图像形状数组中的前两个值,直接从彩色图像中获取它们。 另一方面,灰度图像只有2个维度(彩色图像有3个,高度,宽度和深度)。

Full Code

完整代码

import numpy as np
import cv2
import glob
import matplotlib.pyplot as plt
import pickle# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*9,3), np.float32)
objp[:,:2] = np.mgrid[0:9, 0:6].T.reshape(-1,2)#print (objp )## Arrays to store object points and image points from all the images.
objpoints = [] # 3d points in real world space
imgpoints = [] # 2d points in image plane.# Make a list of calibration images
images = glob.glob('/Users/architrastogi/Documents/camera_cal/calibration*.jpg')# Step through the list and search for chessboard corners
for idx, fname in enumerate(images):
    img = cv2.imread (fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# Find the chessboard corners
    ret, corners = cv2.findChessboardCorners(gray, (9,6), None)# If found, add object points, image points
    if ret == True:
        objpoints.append(objp)
        imgpoints.append(corners)
        cv2.startWindowThread()
        # Draw and display the corners
        cv2.drawChessboardCorners(img, (9,6), corners, ret)
        write_name = 'corners_found'+str(idx)+'.jpg'
        print(write_name)# Test undistortion on an image
img = cv2.imread('/Users/architrastogi/Documents/camera_cal/calibration2.jpg')
img_size = (img.shape[1], img.shape[0])# Do camera calibration given object points and image points
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, img_size,None,None)# Applying the undistort to an image based on the calibration 
dst = cv2.undistort(img, mtx, dist, None, mtx)# writing out the image 
cv2.imwrite('/Users/architrastogi/Documents/output_images/test_undist.jpg',dst)# Save the camera calibration result for later use (we won't worry about rvecs / tvecs)
dist_pickle = {}
dist_pickle["mtx"] = mtx
dist_pickle["dist"] = dist
pickle.dump( dist_pickle, open( "/Users/architrastogi/Documents/camera_cal/wide_dist_pickle.p", "wb" ) )
#dst = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB)
# Visualize undistortion
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,10))
ax1.imshow(img)
ax1.set_title('Original Image', fontsize=30)
ax2.imshow(dst)
ax2.set_title('Undistorted Image', fontsize=30)

Affine Transformations

仿射变换

Once the distortion correction is done, the images can be used for Computer vision work. Geometric transformations can be applied to them for various purposes. The most common are affine transformations. Affine transformation are those that can be expressed in the form of a matrix multiplication (linear transformation) followed by a vector addition (translation).The reasons you might want to apply transformations include:

失真校正完成后,图像可用于计算机视觉工作。 可以将几何变换应用于各种目的。 最常见的是仿射变换。 仿射变换是可以用矩阵乘法(线性变换)然后向量相加(平移)的形式表示的。您可能要应用变换的原因包括:

  • To enhance the dataset- Sometimes Machine/Deep learning algorithms need a bigger dataset than is available. In those cases one can augment the dataset by applying these transformations to the images. 为了增强数据集 -有时机器/深度学习算法需要比可用数据更大的数据集。 在那些情况下,可以通过将这些变换应用于图像来扩充数据集。
  • To extract some particular information: You might only be interested in rotated figures etc. or need to have bird’s eye view for your algorithm. 提取一些特定信息:您可能只对旋转的图形等感兴趣,或者需要对算法有鸟瞰图。

Different Affine Transformation and Implementation

不同的仿射转换和实现

OpenCV provides two transformation functions, cv2.warpAffine and cv2.warpPerspective

OpenCV提供了两个转换函数cv2.warpAffine和cv2.warpPerspective

Scaling

缩放比例

Scaling is a linear transformation that enlarges or shrinks objects by a scale factor that is the same in all directions. Scaling is just resizing of the image. OpenCV comes with a function cv2.resize() for this purpose.

缩放是一种线性变换,可以通过在所有方向上相同的缩放因子来放大或缩小对象。 缩放只是调整图像的大小。 为此,OpenCV带有一个函数cv2.resize() 。

Translation

翻译

A translation is a function that moves every point with a constant distance in a specified direction. Mathematically, the transformation matrix M can be represented as

平移是一种功能,可以使每个点在指定方向上以恒定距离移动。 从数学上讲,变换矩阵M可以表示为

计算机视觉考试题目及参考答案选择题_计算机视觉_11

Where tx and ty are the translation in x and y.

其中tx和ty是x和y的转换。

If the original picture is like

如果原始图片像

计算机视觉考试题目及参考答案选择题_计算机视觉_12

Original Image 原始图片

计算机视觉考试题目及参考答案选择题_python_13

Grayscaled Translated Image 灰度翻译图像

A sample code to achieve this could look like :

实现此目的的示例代码如下所示:

img = cv2.imread('/Users/architrastogi/Documents/blog/michigan.jpeg',0)
rows,cols = img.shapeM = np.float32([[1,0,100],[0,1,50]])
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.imwrite('/Users/architrastogi/Documents/blog/michigan_trans.jpeg', dst)
cv2.imshow('img',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()

Rotation

回转

Rotation is a circular transformation around a point or an axis. We can specify the angle of rotation to rotate our image around a point or an axis.

旋转是围绕点或轴的圆形变换。 我们可以指定旋转角度以围绕点或轴旋转图像。

Rotation transformation matrix can be defined as

旋转变换矩阵可以定义为

计算机视觉考试题目及参考答案选择题_java_14

where theta is the angle of rotation

θ是旋转角度

计算机视觉考试题目及参考答案选择题_java_15

Grayscaled Rotated Image to 90 deg 灰度旋转图像到90度

The sample code to achieve this :

实现此目的的示例代码:

img = cv2.imread('/Users/architrastogi/Documents/blog/michigan.jpeg',0)
rows,cols = img.shapeM = cv2.getRotationMatrix2D((cols/2,rows/2),90,1)
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.imwrite('/Users/architrastogi/Documents/blog/michigan_rot.jpeg', dst)
cv2.imshow('img',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()

Perspective transform

透视变换

A perspective transform maps the points in a given image to different, desired, image points with a new perspective. One of the most common use of perspective transform is to convert to bird’s eye view.

透视变换将给定图像中的点映射到具有新透视图的不同的所需图像点。 透视变换的最常见用途之一是将其转换为鸟瞰图。

计算机视觉考试题目及参考答案选择题_python_16

Courtesy: Udacity 礼貌:Udacity

Aside from creating a bird’s eye view representation of an image, a perspective transform can also be used for all kinds of different view points.

除了创建图像的鸟瞰图表示之外,透视变换还可以用于各种不同的视点。

计算机视觉考试题目及参考答案选择题_计算机视觉_17

Courtesy: Udacity 礼貌:Udacity

The difference between camera calibration and perspective transform is that in perspective transform you are mapping image points to different image points while in calibration, you map object points to image points. OpenCV provides tailored functions for perspective transform.

相机校准和透视变换之间的区别在于,在透视变换中,您将图像点映射到不同的图像点,而在校准中,将对象点映射到图像点。 OpenCV提供用于透视转换的量身定制的功能。

Compute the perspective transform, M, given source and destination points:

给定源点和目标点,计算透视变换M:

M = cv2.getPerspectiveTransform(src, dst)

Compute the inverse perspective transform:

计算逆透视变换:

Minv = cv2.getPerspectiveTransform(dst, src)

Warp an image using the perspective transform, M:

使用透视变换M扭曲图像:

warped = cv2.warpPerspective(img, M, img_size, flags=cv2.INTER_LINEAR)

You can either detect the source points manually or using specific programs.

您可以手动或使用特定程序检测源点。

Thats all for now.

目前为止就这样了。

Written while listening to Father John Misty

在听约翰·米斯蒂神父时写的

翻译自: https://medium.com/swlh/i-see-you-computer-vision-fundamentals-64cc662d0b05

计算机视觉知识基础