GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  UNICHARAMBIGS (5)

.ds Aq ’

NAME

unicharambigs - Tesseract unicharset ambiguities

CONTENTS

DESCRIPTION

The unicharambigs file (a component of traineddata, see combine_tessdata(1) ) is used by Tesseract to represent possible ambiguities between characters, or groups of characters.

The file contains a number of lines, laid out as follow:

[num] <TAB> [char(s)] <TAB> [num] <TAB> [char(s)] <TAB> [num]

Characters appearing in fields two and four should appear in unicharset. The numbers in fields one and three refer to the number of unichars (not bytes).

EXAMPLE

2             1       "     1
1       m       2       r n   0
3       i i i   1       m     0

In this example, all instances of the 2 character sequence will always be replaced by the 1 character sequence "; a 1 character sequence m may be replaced by the 2 character sequence rn, and the 3 character sequence may be replaced by the 1 character sequence m.

HISTORY

The unicharambigs file first appeared in Tesseract 3.00; prior to that, a similar format, called DangAmbigs (dangerous ambiguities) was used: the format was almost identical, except only mandatory replacements could be specified, and field 5 was absent.

BUGS

This is a documentation "bug": it\(cqs not currently clear what should be done in the case of ligatures (such as fi) which may also appear as regular letters in the unicharset.

SEE ALSO

tesseract(1), unicharset(5)

AUTHOR

The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).

Search for    or go to Top of page |  Section 5 |  Main Index


& UNICHARAMBIGS (5) 02/09/2012

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.