GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
WORDLIST2DAWG(1)   WORDLIST2DAWG(1)

wordlist2dawg - convert a wordlist to a DAWG for Tesseract

wordlist2dawg WORDLIST DAWG lang.unicharset

wordlist2dawg -t WORDLIST DAWG lang.unicharset

wordlist2dawg -r 1 WORDLIST DAWG lang.unicharset

wordlist2dawg -r 2 WORDLIST DAWG lang.unicharset

wordlist2dawg -l <short> <long> WORDLIST DAWG lang.unicharset

wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract. A DAWG is a compressed, space and time efficient representation of a word list.

-t Verify that a given dawg file is equivalent to a given wordlist.

-r 1 Reverse a word if it contains an RTL character.

-r 2 Reverse all words.

-l <short> <long> Produce a file with several dawgs in it, one each for words of length <short>, <short+1>,... <long>

WORDLIST A plain text file in UTF-8, one word per line.

DAWG The output DAWG to write.

lang.unicharset The unicharset of the language. This is the unicharset generated by mftraining(1).

tesseract(1), combine_tessdata(1), dawg2wordlist(1)

https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html

Copyright (C) 2006 Google, Inc. Licensed under the Apache License, Version 2.0

The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).
06/07/2022  

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.