partitionImage

partitionImage ( string pathImageOrigine , int _blocHeight , int _blocWidth , boolean[default:true] _crop , int _pathFrameImage ) : list

Locates and cuts out the text blocks of an image to divide it into more easily exploitable elements. An artificial intelligence algorithm of our own design is used. It is considered today as being stable and efficient on scanned images.

Parameters

pathImageOrigine

path of the image to analyze. Avoid spaces in your file path.

_blocHeight (optional)

Minimum height of the text blocks to be extracted. By default 3. You have to make successive tests to know what is appropriate. By experience on a classic A4 image resolution, for line extraction we recommend 3 and for text block extraction, 30.

_blocWidth (optional)

Minimum width of the text blocks to extract. By default 15. You need to test this successively to know what is appropriate. By experience on a classic A4 image resolution, for line extraction we recommend 15 and for text block extraction, 100.

_crop (optional)

Default:true. Do you want to crop the origine image in multiple cropped sub-images corresponding to text blocs? If yes, the return table will contain an entry "croppedImage", else it just returns coordinates of blocs. Sub-images will be created in the Temp directory.

_pathFrameImage (optional)

You can indicate here for your tests the path of an image which will take the original image and will frame the blocks to extract. This allows you to play with the different parameters to get an idea of the final result. PNG format is used.

Return value

Returns a list of associative arrays that correspond to the coordinates of the different text blocks, possibly with the path of the clipped block image if the _crop option is used.

Note: With artificial intelligence image processing functions, be sure to protect your functions from very large images as this processing would then take a long time. To do this use resizeImage.

Example

console( partitionImage(path("desktop")+"origine.png",30,100,true,path("desktop")+"frameImage.png") )

/*-> [{croppedImage=C:\Users\XXX\AppData\Local\Temp\1385.png, x=448, width=758, y=2196, height=137},
{croppedImage=C:\Users\XXX\AppData\Local\Temp\7008.png, x=1464, width=174, y=1470, height=53},
....
]*/


Example of produced file with _pathFrameImage



See also

ocr
pdfToImage