ocr

ocr ( string pathImage , string pathImageMagick , string _pathTesseract , string _debug ) : string

Read the text in an image by an OCR processus.
Returns null if there is an OCR error.
Note: Please note that if the text boxes are facing each other, the lines of the different texts may appear mixed. Partition the image into sub-blocks of text to avoid this.

Note 2: With artificial intelligence image processing functions, be sure to protect your functions from very large images as this processing would then take a long time. To do this use resizeImage.

Installation

To use this function, you need first install Tesseract (UB Mannheim Binaries) and ImageMagick.

Example

text = ocr(path("desktop")+"invoice.jpg", path("program_files")+"ImageMagick-7.1.0-Q16-HDRI\\magick.exe")


Problems

If you parallel the function, beware of CPU overloads. If you run several instances of Tesseract in parallel, then Tesseract may in some cases mishandle multi-instances and do multiple alternations between processes, so that the whole thing takes much longer. To avoid this behavior, use getPerformanceStats to check if you are not over 90% CPU. This problem seems to be present on Linux and not on Windows.

See also

partitionImage
straightenImage
pdfToImage

Parameters

pathImage

the path of the image to read

pathImageMagick

This argument is mandatory for Windows but not for Linux. You need to set the path to the magick.exe of ImageMagick. Ex: C:\Program Files\ImageMagick-7.1.0-Q16-HDRI\magick.exe.

_pathTesseract (optional)

The path to the Tesseract FOLDER. Ex: C:\Program Files\Tesseract-OCR.

_debug (optional)

Display some debug message.