Create a config file (e.g "letters") in tessdata/configs directory - usually /usr/share/tesseract/tessdata/configs
or /usr/share/tesseract-ocr/tessdata/configs
To use whitelist in a config file or using the -c tessedit_char_whitelist=... command-line switch, in the newest 4.0 version you will have to set OCR Engine mode to the "Original Tesseract only". This is because the new "Neural nets LSTM" mode doesn't respect the whitelist setting.
Example of proper command-line for 4.0 version:
UPDATE: In newer versions (4.0) there's corrupted eng.traineddata file installed by default by Windows and some Linux installers. Temporary solution is to replace tessdata\eng.traineddata file with one from older version. This file should be about 30MB. Otherwise you'll get Error: "Tesseract couldn't load any languages!" or similar.
Update from tesseract 4.1.1
However, in tesseract 4.1.1 the above bug is fixed, that is, in tesseract 4.1.1 the following works like a charm
I am using Ubuntu 18.04.4 LTS. The default tesseract is version 4. I can not use whitelist with it. Then I upgrade it to version 5. Then I use below command and it worked.
tesseract sample.jpg stdout -l eng --oem 3 --psm 7
Warning: Invalid resolution 0 dpi. Using 70 instead.
LL £036 GL)
tesseract sample.jpg stdout -l eng --oem 3 --psm 7 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Warning: Invalid resolution 0 dpi. Using 70 instead.
L4036GL
My answer is derived wholly from the accepted answer, and is added here to benefit any .NET windows developers using the Tesseract NuGet package - however, take note of my bullet 2 which applies to anybody using any kind of Tesseract on Windows
Create a config folder inside your tessdata folder where the other training data is located.
Add a letters file inside the config folder.
Use an editor like TextPad that will help you save it in UNIX
format, ANSI encoding (I had initially tried UTF-8 / IBM PC and
tesseract was puking an error into my Tests output)
Just like your training files, ensure the letters file, in the Properties panel has a Build Action set to Content and further marked to copy to the output directory:
Invoke your tesseract engine class thusly:
var ocrEng = new TesseractEngine("./tessdata", "eng", EngineMode.Default, "letters");