Pytesseract: “ TesseractNotfound 错误: 没有安装 tesseract 或者它不在您的路径中”，我如何修复这个错误？

小开

你需要安装宇宙魔方。

Https://github.com/tesseract-ocr/tesseract/wiki

查看以上关于安装的文档。

小开

来自 https://pypi.org/project/pytesseract/:

pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

小开

首先你应该安装二进制文件:

在 Linux 上

sudo apt-get update
sudo apt-get install libleptonica-dev tesseract-ocr tesseract-ocr-dev libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn

在 Mac 身上

brew install tesseract

在视窗上

从 https://github.com/UB-Mannheim/tesseract/wiki下载二进制文件，然后将 pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'添加到脚本中。

然后使用 pip 安装 python 包:

pip install tesseract
pip install tesseract-ocr

参考文献: Https://pypi.org/project/pytesseract/ (安装部分)及 Https://tesseract-ocr.github.io/tessdoc/installation.html

小开

使用以下命令安装 tesseract

pip install tesseract

小开

在窗口:

pip install tesseract


pip install tesseract-ocr

并检查存储在系统 usr/appdata/local/programs/site-pakages/python/python36/lib/pytesseract/pytesseract.py文件中的文件然后编译文件

小开

你可以安装这个软件包。 Https://github.com/ub-mannheim/tesseract/wiki 然后您应该走这条路径 C: Program Files (x86) Tesseract-OCR tesseract.exe 然后运行宇宙魔方文件。我觉得这个能帮到你。

小开

最佳答案

我看到步骤分散在不同的答案中。根据我最近在 Windows 上遇到的这个 pytesseract 错误，按顺序编写不同的步骤，以便更容易地解决这个错误:

使用以下 Windows Installer 安装 tesseract: < a href = “ https://github.com/UB-mannheim/tesseract/wiki”rel = “ norefrer”> https://github.com/ub-mannheim/tesseract/wiki

2强。请注意安装中的魔方路径。编辑时的默认安装路径是: C:\Users\USER\AppData\Local\Tesseract-OCR。它可能会改变，所以请检查安装路径。

3 . pip install pytesseract

在调用 image_to_string之前，在脚本中设置 tesseract 路径:

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'

小开

第一步:

根据操作系统在系统上安装宇宙魔方。最新的安装程序可以在 < a href = “ https://github.com/UB-mannheim/tesseract/wiki”rel = “ nofollow norefrer”> https://github.com/ub-mannheim/tesseract/wiki 找到

第二步: 使用以下方法安装依赖项库: 安装 pip pytesseract Pip 安装 opencv-python Pip install numpy

第三步: 样本代码

import cv2 import numpy as np import pytesseract from PIL import Image from pytesseract import image_to_string # Path of working folder on Disk Replace with your working folder src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\" # If you don't have tesseract executable in your PATH, include the following: pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract- OCR/tesseract' TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR' def get_string(img_path): # Read image with opencv img = cv2.imread(img_path) # Convert to gray img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Apply dilation and erosion to remove some noise kernel = np.ones((1, 1), np.uint8) img = cv2.dilate(img, kernel, iterations=1) img = cv2.erode(img, kernel, iterations=1) # Write image after removed noise cv2.imwrite(src_path + "removed_noise.png", img) # Apply threshold to get image with only black and white #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2) # Write the image after apply opencv to do some ... cv2.imwrite(src_path + "thres.png", img) # Recognize text with tesseract for python result = pytesseract.image_to_string(Image.open(src_path + "thres.png")) # Remove template file #os.remove(temp) return result print('--- Start recognize text from image ---') print(get_string(src_path + "image.png") ) print("------ Done -------")

小开

只适用于视窗

你需要在你的计算机上安装 Tesseract OCR。

接下来交给你了。 Https://github.com/ub-mannheim/tesseract/wiki

下载合适的版本。

2-将 Tesseract 路径添加到您的系统环境，即编辑系统变量。

3-运行 pip install pytesseract和 pip install tesseract

4-每次都要将这一行添加到 Python 脚本中

pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe' # your path may be different

运行代码。

小开

在 windows 中，命令路径必须重定向，以便进行默认的 windows tesseract 安装。

在32位系统中，在导入命令后添加此行。

pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

在64位系统中，改为添加此行。

pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'

小开

在 Mac 上，你可以像下面这样安装它。

brew install tesseract

小开

# {Windows 10 instructions} # before you use the script you need to install the dependence # 1. download the tesseract from the official link: # https://github.com/UB-Mannheim/tesseract/wiki # 2. install the tesseract # i chosed this path # *replace the user string in the below path with you name of user that you are using in your current machine # C:\Users\user\AppData\Local\Tesseract-OCR\ # 3. Install the pillow for your python version # * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7): # * if you are using another version of python first look how you start the python from you CMD # * for some machine the run of python from the CMD is different # [examples] # ================================= # PYTHON VERSION 3.7 # python # python3.7 # python -3.7 # python 3.7 # python3 # python -3 # python 3 # py3.7 # py -3.7 # py 3.7 # py3 # py -3 # py 3 # PYTHON VERSION 3.6 # python # python3.6 # python -3.6 # python 3.6 # python3 # python -3 # python 3 # py3.6 # py -3.6 # py 3.6 # py3 # py -3 # py 3 # PYTHON VERSION 2.7 # python # python2.7 # python -2.7 # python 2.7 # python2 # python -2 # python 2 # py2.7 # py -2.7 # py 2.7 # py2 # py -2 # py 2 # ================================ # we are using pip to install the dependences # because for me i start the python version 3.7 with the following line # py -3.7 # open the CMD in windows machine and type the following line: # py -3.7 -m pip install pillow # 4. Install the pytesseract and tesseract for your python version # * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7): # we are using pip to install the dependences # open the CMD in windows machine and type the following lines: # py -3.7 -m pip install pytesseract # py -3.7 -m pip install tesseract #!/usr/bin/python from PIL import Image import pytesseract import os import getpass def extract_text_from_image(image_file_name_arg): # IMPORTANT # if you have followed my instructions to install this dependence in above text explanatin # for my machine is # if you don't put the right path for tesseract.exe the script will not work username = getpass.getuser() # here above line get the username for your machine automatically tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe" pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation # specify the direction of your image files manually or use line bellow if the images are in the script directory in folder images # image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images" image_dir=os.getcwd()+"\\images" dir_seperator="\\" image_file_name=image_file_name_arg # if your image are in different format change the extension(ex. ".png") image_ext=".jpg" image_path_dir=image_dir+dir_seperator+image_file_name+image_ext print("=============================================================================") print("image used is in the following path dir:") print("\t"+image_path_dir) print("=============================================================================") img=Image.open(image_path_dir) text=pytesseract.image_to_string(img, lang="eng") print(text) # change the name "image_1" whith the name without extension for your image name # image_file_name_arg="image_1" image_file_name_arg="image_2" # image_file_name_arg="image_3" # image_file_name_arg="image_4" # image_file_name_arg="image_5" extract_text_from_image(image_file_name_arg) # ================================== # CREATED BY: SHERIFI # e-mail: sherif_co@yahoo.com # git-link for script: https://github.com/sherifi/ai_example.git # ==================================

小开

For Ubuntu 18.04

如果您得到的错误类似于

tesseract is not installed or it's not in your path and OSError: [Errno 12] Cannot allocate memory

这可能会导致交换内存分配问题

您可以检查这个答案分配更多的交换内存希望有所帮助:)

Https://askubuntu.com/questions/920595/fallocate-fallocate-failed-text-file-busy-in-ubuntu-17-04?answertab=active#tab-top

小开

在 Windows64位上，只需在 PATH 环境变量中添加以下内容: "C:\Program Files\Tesseract-OCR"，它将工作。

小开

我可以通过在 pytesseract.py 文件中使用 bin/tesseract 路径更新 tesseract _ cmd 变量来解决这个问题

小开

我在 Windows 上也遇到过同样的问题。我试图更新环境变量的路径，这是不工作的四方。

对我有效的方法是修改 pytesseract.py，它可以在路径 C:\Program Files\Python37\Lib\site-packages\pytesseract或通常在 C:\Users\YOUR USER\APPDATA\Python中找到

我修改了一行如下:

#tesseract_cmd = 'tesseract' #tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'

注意，我必须在 tesseract 之前放入一个额外的 \，因为 Python 解释的是与 \t相同的内容，您将得到以下错误消息:

TesseractNotFoundError: C: Program Files Tesseract-OCR esseract.exe 未安装或不在您的路径中

小开

此错误是因为您的计算机上没有安装 tesseract。

如果您使用 Ubuntu 安装 tesseract，请使用以下命令:

sudo apt-get install tesseract-ocr

对于 Mac:

brew install tesseract

小开

也许发生这种情况是因为，即使正确安装了 Tesseract，您也没有安装您的语言，就像我的情况一样。幸运的是，这是非常容易修复，我甚至不需要与 tesseract_cmd混淆。

sudo apt-get install tesseract-ocr -y sudo apt-get install tesseract-ocr-spa -y tesseract --list-langs

注意，在第二行中，我们为西班牙语指定了 -spa。

如果安装成功，您应该获得可用语言的列表，如:

List of available languages (3): eng osd spa

我发现这在这篇博文(西班牙语)。也有一个职位为在 Windows 中安装西班牙语(显然不那么容易)。

注意 : 由于问题使用的是 lang = 'eng'，所以在特定的情况下这可能不是答案。但是同样的错误可能会发生在其他情况下，这就是为什么我在这里发布了答案。

小开

这个问题已经有很多很好的答案，但我想分享一个精彩的网站，我无法解决“ TesseractNotfound 错误: tesseract 没有安装或它不在你的路径”请参考这个网站: https://www.thetopsites.net/article/50655738.shtml

我意识到我得到了这个错误，因为我安装了带 pip 的 比特立方，但是忘记了安装二进制文件。你可能在你的机器上遗漏了 tesseract-ocr，点击这里查看安装说明: < a href = “ https://github.com/tesseract/wiki”rel = “ nofollow norefrer”> https://github.com/tesseract-ocr/tesseract/wiki

在 Mac 上，你可以用自制软件安装:

brew install tesseract

之后应该就没问题了！

在 Windows10操作系统环境 下，以下方法对我适用:

点击这个链接下载 tesseract 并安装，Windows 版本可以在这里找到: < a href = “ https://github.com/UB-Mannheim/tesseract/wiki”rel = “ nofollow norefrer”> https://github.com/ub-mannheim/tesseract/wiki

从 C: Users User Anaconda3 Lib site-package pytesseract.py 中查找脚本文件 pytesseract.py 并打开它。将以下代码从 Tesseract _ cmd = ‘ tesseract’更改为: < strong > tesseract _ cmd = ‘ C:/Program Files (x86)/Tesseract-OCR/tesseract.exe’ (这是您安装 Tesseract-OCR 的路径，所以请检查您安装它的地方，并相应地更新路径)

你可能还需要添加环境变量 C:/Program Files (x86)/tesseract-OCR/

希望对你有用！

小开

仅适用于 Windows 用户:

使用以下方法安装四方体:

pip install tesseract

然后将这一行添加到代码中，注意“”

pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe"

小开

UBUNTU 的解决方案适合我:

通过以下链接在 ubuntu 中安装了 tesseract

Https://medium.com/quantrium-tech/installing-tesseract-4-on-ubuntu-18-04-b6fcd0cbd78f

后来增加了火车数据语言的 tessdata 通过以下链接

立方体运行错误

小开

Pip 模块 pytesseract = 0.3.7的最新版本似乎有问题。我已经将它降级为 pytesseract = 0.3.6，没有看到错误。

小开

只要使用 conda安装 tesseract，我就可以做到这一点。

conda install -c conda-forge tesseract

小开

适用于 Windows 的简单步骤:

从 < a href = “ https://github.com/UB-mannheim/tesseract/wiki”rel = “ nofollow norefrer”> https://github.com/ub-mannheim/tesseract/wiki 下载 Windows 版本

安装

在.py 文件中写入以下内容(检查安装位置)

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" img_text = pytesseract.image_to_string(Image.open(filename))

小开

对我来说，它的工作方式是单引号

pytesseract.pytesseract.tesseract_cmd =r'C:/Program Files/Tesseract-OCR/tesseract.exe'

实际上放在双引号里就是自动插入不需要的字符

小开

Linux 发行版(Ubuntu)

试试看

sudo apt install tesseract-ocr sudo apt install libtesseract-dev

小开

上面的提示没有帮助我解决这个问题，因为在安装 pytesseract (pycharm，python 2.7)时发生了本节中指定的错误。奇怪的是，tesseract 是从命令行工作的，所以安装是正确的。

我通过以下步骤解决了这个问题:

从保险库 https://github.com/madmaze/pytesseract下载 pytesseract.py

删除所有与解释器差异相关的语法错误(2.7和3.*) ，包括 try catch 方法

将编辑后的脚本作为自编写脚本导入到程序中，并根据存储库中的建议配置 tesseract _ cmd 变量。

随后，图像到文本的转换功能在 python 2.7中工作

小开

蟒蛇装置:

适用于 Mac、 Linux 和 Windows

Conda-Forge/package/tesseract 4.1.1

第一步:

conda install -c conda-forge tesseract

步骤2: 查找 Tesseract PATH (如果还没有)

for r,s,f in os.walk("/"): for i in f: if "tesseract" in i: print(os.path.join(r,i))

例如，我的 Tesseract PATH 是/anaconda/bin/Tesseract

步骤3: 将 tesseract 添加到 PATH

pytesseract.pytesseract.tesseract_cmd = r'/anaconda/bin/tesseract'

小开

我已经在我的树莓派上试过这个了，我只是改变了路径:

C:/Program Files/Tesseract-OCR/tesseract.exe'

(因为，它是窗户)到这里:

/usr/local/lib/python3.7/dist-packages

因为，这是我每次尝试运行这个命令时看到的路径:

pip3 show pytesseract

为了更加清晰，这里有一条信息。这里是命令行

小开

我也面临着同样的错误，而安装在窗口的立方体。

基于我最近解决的问题，我遵循以下步骤

安装 tesseract 使用的 Windows Installer 可以在 gievn 链接中找到: < a href = “ https://github.com/UB-mannheim/tesseract/wiki”rel = “ nofollow norefrer”> https://github.com/ub-mannheim/tesseract/wiki

请注意安装中的魔方路径。编辑时的默认安装路径是: C: Users USER AppData Local Tesseract-OCR。它可能会改变，所以请检查安装路径。

安装后，它仍然显示错误或没有安装错误，然后按下 windows + R 键并运行您的文件路径(C: Program Files Tesseract-OCR tesseract.exe)它将为我工作,

3. pip install pytesseract

在调用“‘ image _ to _ string:”之前，在脚本中设置 tesseract 路径

对于 Windows 文件路径-

pytesseract.pytesseract.tesseract_cmd=r'C:\Program Files(x86)\Tesseract-OCR\tesseract.exe'

安装 opencv 请参考这个问题链接

用于 Linux 安装

$ sudo apt install tesseract-ocr $ sudo apt install libtesseract-dev $ tesseract --version

运行这个命令之后，应该是这样的:

tesseract 4.0.0-beta.1 leptonica-1.75.3

3. 一旦您的魔方安装成功，您可以运行以下命令进行检查

$ tesseract --list-langs

4. 可以预期产生以下结果:

List of available languages (2): eng osd

5. linux 文件路径如下所示

pytesseract.pytesseract.tesseract_cmd = r'home/user/bin/tesseract'

小开

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'

这对我的案子有帮助