opencv 检测几何图形



A simple yet powerful pipeline for detecting shapes in scanned documents

一个简单而强大的管道,用于检测扫描文档中的形状

这是什么意思? (What is this about ?)

One of the most rapidly growing sub fields in the domain of Artificial Intelligence is Natural language processing (NLP), it deals with the interactions between computers and human (natural) languages, in particular how to program computers to process and make sense of large amounts of natural language data.

在人工智能领域增长最Swift的子场ØNE是自然语言处理(NLP),它与计算机和人类(自然)语言之间的相互作用涉及,特别是如何计划的计算机处理和大有意义自然语言数据量。

Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation among others. Out of these, information extraction problems such as NER (Named Entity Recognition) are fast becoming one of the cornerstone applications of NLP. In this post, I am going to share a solution for one of the trickiest problems that comes up while performing NER.

自然语言处理中的挑战经常涉及语音识别,自然语言理解和自然语言生成等。 其中,诸如NER(命名实体识别)之类的信息提取问题正Swift成为NLP的基础应用之一。 在本文中,我将分享执行NER时遇到的最棘手问题之一的解决方案。

(Why do we need a custom solution ?)




OpenCV中几何形状识别与测量 opencv几何图形检测_OpenCV中几何形状识别与测量


Photo by Rock'n Roll Monkey on Unsplash Rock'n Roll MonkeyUnsplash上的 照片

Recent developments in Deep Learning has led to an explosion of sophisticated techniques that are available for entity extraction and other NLP related tasks. More often than not, enterprise grade OCR softwares (ABBY, ADLIB etc.) are used to transform massive volumes of unstructured and image-based documents into fully searchable PDF and PDF/A assets. Subsequently, one can use state of the art algorithms (BERT, ELMo etc.) to create highly contextual language models to infer the extracted information and achieve NLP objective.

深度学习的最新发展导致了可用于实体提取和其他NLP相关任务的复杂技术的爆炸式增长。 企业级OCR软件(ABBY,ADLIB等)通常用于将大量非结构化和基于图像的文档转换为可完全搜索的PDF和PDF / A资产。 随后,人们可以使用最先进的算法(BERT,ELMo等)来创建高度上下文相关的语言模型,以推断提取的信息并实现NLP目标。

In reality though, not all documents are comprised solely of language based data. A document can have lot of other non-linguistic elements such as radio buttons or a signature block or some other geometrical shape that may contain useful information but cannot be easily interpreted by either OCR or any of the aforementioned algorithms. So, there exists a need to design a specialized solution to identify and interpret such elements and that’s our Why.

但是实际上, 并非所有文档都仅包含基于语言的数据。 文档可以具有许多其他非语言元素,例如单选按钮或签名块或某些其他几何形状 ,这些元素可能包含有用的信息,但是无论OCR还是任何上述算法都不能轻易解释。 因此,需要设计一种专门的解决方案来识别和解释这些元素,这就是我们的原因。


OpenCV中几何形状识别与测量 opencv几何图形检测_python_02

An example of check boxes and radio buttons in a document

文档中复选框和单选按钮的示例

(How do we do it ?)

Now, this where the things get interesting. How do we perform extraction and identification of such elements from a scanned document ? To answer this, the author proposes a 3 step architecture that can be potentially used to detect any shape (a universal shape detector ? maybe). It’s a pretty straightforward approach and the one that promises a good accuracy.

现在,事情变得有趣起来了。 我们如何从扫描的文档中提取和识别此类元素? 为了回答这个问题,作者提出了一个三步体系结构,可以潜在地用于检测任何形状(通用形状检测器?可能) 。 这是一种非常简单的方法,可以保证较高的准确性。

Step 1: Convert the documents (pdfs etc.) to image files. Write a heuristics code based on OpenCV APIs to extract all potential image segments. This code should be optimized for coverage rather than accuracy.

步骤1:将文档(pdf等)转换为图像文件。 编写基于OpenCV API的启发式代码以提取所有可能的图像段。 该代码应针对覆盖率而不是准确性进行优化。

Step 2: Label the images extracted in Step 1 accordingly. Create a CNN based Deep Learning network, and train it on the labelled images. This step will take care of the accuracy.

步骤2:相应地标记在步骤1中提取的图像。 创建一个基于CNN的深度学习网络,并在标记的图像上对其进行训练。 此步骤将确保准确性。

Step 3: Create a Sklearn pipeline, integrating both the above steps , so when a documents is ingested, extract all of the potential images and then subsequently use the trained CNN model to predict images of the desired shape.

步骤3:创建一个Sklearn流水线,将以上两个步骤集成在一起,因此,在提取文档时,提取所有可能的图像,然后使用经过训练的CNN模型来预测所需形状的图像。


OpenCV中几何形状识别与测量 opencv几何图形检测_人工智能_03

A high level overview of the solution

解决方案的高级概述

(Design Considerations)

Its important that the OpenCV code is able to identify as many image segments of the desired shape as possible. Essentially, we need to have a wide detection range, and don’t worry about the false positives, they will be taken care by the subsequent ConvNet model. We also need to parameterize the classes/functions up to the brim, this will enable easy configuration for a variety of documents going forward. I have chosen CNN for image classification because its easy and quick to model but one can use any other algorithm of choice as long as performance and accuracy are within acceptable limits.

重要的是,OpenCV代码能够识别所需形状的尽可能多的图像段。 本质上,我们需要具有广泛的检测范围,并且不必担心误报,后续的ConvNet模型将对它们进行处理。 我们还需要对类/函数进行参数化设置,直到最高级为止,这将使以后的各种文档的配置变得容易。 我之所以选择CNN进行图像分类,是因为其易于建模且可以快速建模,但是只要性能和准确性在可接受的范围内,就可以使用其他任何选择的算法。

Pipelining plays a pivotal role in structuring ML code. It helps in streamlining the workflow and enforcing the order of step execution. Moreover, a production level code should always be piped.

流水线在构建ML代码中起着关键作用。 它有助于简化工作流程并加强步骤执行的顺序。 此外,应该始终通过管道传送生产级别代码。

(Lets take 3 steps)

Step #1: The OpenCV

步骤#1:OpenCV

This code serves dual purpose, 1) creating training/test data (when executed standalone) and 2) extracting image segments when integrated in the pipeline.

该代码具有双重目的:1)创建训练/测试数据(当独立执行时)和2)集成到管道中时提取图像段。

The extraction code can currently detect 2 types (Radio Button and Check-boxes) but additional objects can be easily supported by adding the new methods under the ShapeFinder class, below is the code snippet to identify squares/rectangles aka check-boxes. (go here to see the complete code base)

提取代码目前可以检测2种类型(单选按钮和复选框),但是可以通过在ShapeFinder类下添加新方法来轻松支持其他对象以下代码段用于标识正方形/矩形或复选框。 (去这里查看完整的代码库)


*Use pdf2image to convert the pdf to image. I have not included this in Git since my data was already in image format.

*使用pdf2image将pdf转换为图像。 由于我的数据已经是图像格式,因此我没有将其包含在Git中。

def Img2Pdf(dirname):

    images = []

    #get the pdf file
    for x in os.listdir(dirname):
        if (dirname.split('.')[1]) == 'pdf':
            pdf_filename = x
            images_from_path = convert_from_path(os.path.join(dirname),dpi=300, poppler_path = r'C:\Program Files (x86)\poppler-0.68.0_x86\poppler-0.68.0\bin')for image in images_from_path:
                images.append(np.array(image))

    return images

Now lets talk about the step #2 i.e. Convolutional Neural Network

现在让我们谈谈第二步,即卷积神经网络

Since the extracted image segments will have relatively small dimensions, a simple 3 layer CNN will do for us but we still need to throw in some regularization and an Adam to optimize the output.

由于提取的图像片段将具有相对较小的尺寸,因此简单的3层CNN可以为我们完成工作,但我们仍然需要进行一些正则化和Adam以优化输出。

The network should be trained separately on each type of image samples for better accuracy. You may create a new network in case a new image shape is added, but for now I have used the same for both checkbox and radio button. Its currently only a binary classification but further categorization can also be done like:

该网络应分别针对每种类型的图像样本进行培训,以提高准确性。 万一添加了新的图像形状,您可以创建一个新的网络,但是到目前为止,我已经将其用于复选框和单选按钮。 它目前仅是一个二进制分类,但是还可以像下面这样进行进一步分类:

  • Ticked checkbox
  • Empty checkbox
  • Others

Finally in step #3 we will be stitching all the things in a single Sklearn pipeline and expose this through the predict function.

最后,在第3步中,我们将所有内容缝合在一个Sklearn管道中,并通过预测函数将其公开。

One important functionality that I have not covered is to associate the checkbox or radio button to their corresponding texts in the document. Just detecting elements without association is frankly useless in real world applications. I would leave this as an open challenge to you guys but think of it as a text proximity problem.

我没有涉及的一项重要功能是将复选框或单选按钮与其在文档中对应的文本相关联。 坦白说,仅在没有关联的情况下检测元素在现实世界的应用中是无用的。 我将这留给你们一个开放的挑战,但是将其视为文本接近问题。

(Final thoughts)

‘One size doesn’t always fits all’ and this is specially true here, tend to think of this code as a kind of template. As-is, this code is not intended to work for everyone and that’s perfectly fine, but this approach will always work for the given documents/shapes provided some effort is put in to fine tune the parameters and create the training data.

“一个大小并不总是适合所有大小”,这在这里尤其正确,倾向于将这段代码视为一种模板。 照原样,此代码并不适合每个人使用,这很好,但是只要付出一些努力来微调参数并创建训练数据,此方法就始终适用于给定的文档/形状。

Link to Git

链接到Git

Drop in your feedback in the comments !

在评论中加入您的反馈!


翻译自: https://medium.com/swlh/extraction-of-geometrical-elements-using-opencv-convnets-48fd92168dfe

opencv 检测几何图形