GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Image::OCR::Tesseract(3) User Contributed Perl Documentation Image::OCR::Tesseract(3)

Image::OCR::Tesseract - read an image with tesseract ocr and get output

        use Image::OCR::Tesseract 'get_ocr';

        my $image = './hi.jpg';

        my $text = get_ocr($image);

This is a wrapper for tesseract. Tesseract expects a tiff file, get_ocr() will convert to a temporary tiff. If your file is not a tiff file, that way you don't have to worry about your image format for ocr.

Tesseract spits out a text file- get_ocr() will erase that and return you the output.

No subs are exported by default.

Argument is abs path to image file. Can be most image formats. Second argument is optional, abs path to temp dir. Third optional argument is optional, it is the -l language type argument to tesseract.

If you don't have write access to the directory the image resides on, you should provide as argument a directory you do have write access to, this would be the second argument.

Returns text content as read by tesseract.

Does not clean up after itself if DEBUG is on.

Warns if no output.

This takes care of converting to the right image format, etc. The original image is unchanged.

First argument is abs path to tif file. Second argument is optional, it is the -l language type argument to tesseract.

Will return text output. If none inside or tesseract fails, returns empty string. If tesseract fails, warns.

Argument is abs path to image file. Optional argument is abs path to image out. Returns abs path of image created. Uses 'convert', from ImageMagick.

   my $img_non_tif = './img.jpg';
   my $img_out     = './img.tif';

   my $out = convert_8bpp_tif( $img_non_tif );
   my $out = convert_8bpp_tif( $img_non_tif, $img_out );

Tesseract is an open source ocr engine. For an image to be read by tesseract properly, it must be an 8 bit per pixel tif format image file. What this module does is to create a temporary file from your target image, which will be an 8 bit per pixel image, it then reads the output and returns it to you as a string.

Included in this package is t/tesseract_install_helper.pl which will check for packages needed.

Installing tesseract can be tricky. You will basically need gcc-c++ and automake installed on your system. After you have automake and gcc-c++, you should be able to install.

SVN

You may be able to simply install the SVN version of Tesseract by using:

 svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
 ./runautoconf
 mkdir build-directory
 cd build-directory
 ../configure
 make
 make install

for more see google project on ocr, they use tesseract

Another great OCR engine is gocr, but it is not suited for the purpose of reading text from images. gocr is great if you need to tweak what you are reading, and for other specialized purposes.

An example using gocr as engine is Finance::MICR::GOCR::Check.

tesseract on google code. gocr convert ImageMagick.

ocr

This module is for POSIX systems. It is not intended to run on other "systems" and no support for such will be added in the future. Attempting to install on an unsupported OS will throw an exception.

Set the debug flag on: $Image::OCR::Tesseract::DEBUG = 1;

A temporary file is created, if DEBUG is on, the file is not deleted, the file path is printed to STDERR.

Leo Charre leocharre at cpan dot org

Daniel Beuchler - patches.

Copyright (c) 2009 Leo Charre. All rights reserved.

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the "Artistic License" or the "GNU General Public License".

This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the "GNU General Public License" for more details.

2010-02-16 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.