我该如何在 Tesseract 和 OpenCV 之间做出选择?

我最近遇到了 宇宙魔方OpenCV。看起来 Tesseract 是一个成熟的 OCR 引擎,OpenCV 可以作为一个框架来创建 OCR 应用程序/服务。

我尝试过在我的一些图像上使用宇宙立方,它的准确性似乎还不错。后来,我发现了一个非常简单的 教程,它使用 OpenCV 来使用 Python 执行 OCR,这给我留下了深刻的印象。在几分钟内,我完成了系统的训练,它的准确性是好的。当然,采用这种方法意味着需要使用大型训练集对系统进行广泛的训练。

我的具体问题如下:

  • 如何在 Tesseract 和使用 OpenCV 构建自定义 OCR 应用程序之间做出选择?
  • 有不同语言的 Tesseract 可用的训练数据集。OpenCV 是否有类似的东西,这样我就不必从头开始实现 OCR?
  • 哪一个对于想要成为商业应用程序更好?

有什么建议吗?

67725 次浏览
  • Tesseract is an OCR engine. It's used, worked on and funded by Google specifically to read text from images, perform basic document segmentation and operate on specific image inputs (a single word, line, paragraph, page, limited dictionaries, etc.).

  • OpenCV, on the other hand, is a computer vision library that includes features that let you perform some feature extraction and data classification. You can create a simple letter segmenter and classifier that performs basic OCR, but it is not a very good OCR engine (I've made one in Python before from scratch. It's really inaccurate for input that deviates from your training data).

If you want to get a basic understanding of how hard OCR is, try OpenCV. Tesseract is for real OCR.

I am the author of that digit recognition tutorial you mentioned, and I would say, that is no way substitute for tesseract.

Tesseract is a really good OCR engine, may be the best OpenSource OCR engine.

The tutorial you mentioned is just a try, to understand most simple working of OCR.

So, if you are looking for OCR app, I would recommend you to use OpenCV for preprocessing the image and then apply tesseract engine.

The two can be complementary. If you read the paper on OpenCV: https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf

It highlights that "Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined."

This type of task can be performed by OpenCV and the resulting image handed off to Tesseract. You can find a sample of this type of code in the Git repo: https://github.com/Itseez/opencv_contrib/tree/master/modules/text/samples The samples use Tesseract APIs to do image to text conversion.

OpenCV is a library for CV, used to analyze and process images in general. Tesseract is a library for OCR, which is a specialized subset of CV that's dedicated to extracting text from images.

From OpenCV.org

.....used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc

From Tesseract Github:

.....can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages.