|When initex digests hyphenation patterns, T\h-0.1667m\v0.20vE\v-0.20v\h-0.125mX first expands macros and the result must entirely consist of digits (hyphenation levels), dots (`., edge of a word), and letters. In pattern files for non-English languages letters are often represented by macros or other expandable constructs. For the purpose of patgen these are just character sequences, subject to the condition that no such sequence is a prefix of another one.|
A dictionary file contains a weighted list of hyphenated words, one word
per line starting in column 1. A digit in column 1 indicates a global
word weight (initially =1) applicable to all following words up to the
next global word weight. A digit at some intercharacter position
indicates a weight for that position only.
The hyphens in a word are indicated by `-, `*, or `. (or their replacements as defined in the translate file) for hyphens yet to be found, `good hyphens (correctly found by the patterns), and `bad hyphens (erroneously found by the patterns) respectively; when reading a dictionary file `* is treated like `- and `. is ignored.
A pattern file contains only patterns in the format above, e.g., from a
previous run of patgen. It may not contain any T\h-0.1667m\v0.20vE\v-0.20v\h-0.125mX comments or
control sequences. For instance, this is not a valid pattern file:
It can only contain the actual patterns, i.e., the ....
A translate file starts with a line containing the values of
left_hyphen_min in columns 1-2,
right_hyphen_min in columns 3-4, and either a blank or the replacement for one of the
"hyphen" characters `-, `*, and `. in columns 5, 6, and 7. (Input
lines are padded with blanks as for many T\h-0.1667m\v0.20vE\v-0.20v\h-0.125mX related programs.)
Each following line defines one `letter: an arbitrary delimiter character in column 1, followed by one or more external representations of that character (first the `lower case one used for output), each one terminated by the delimiter and the whole sequence terminated by another delimiter.
If the translate file is empty, the values left_hyphen_min=2, right_hyphen_min=3, and the 26 lower case letters a...z with their upper case representations A...Z are assumed.
After reading the
translate_file and any previously-generated patterns from
patgen requests input from the users terminal.
First the integer values of hyph_start and hyph_finish, the lowest and highest hyphenation level for which patterns are to be generated. The value of hyph_start should be larger than any hyphenation level already present in pattern_file.
Then, for each hyphenation level, the integer values of pat_start and pat_finish, the smallest and largest pattern length to be analyzed, as well as good weight, bad weight, and threshold, the weights for good and bad hyphens and a weight threshold for useful patterns.
Finally the decision (`y or `Y vs. anything else) whether or not to produce a hyphenated word list.
$TEXMFMAIN/tex/generic/hyphen/hyphen.tex The original hyphenation patterns for English, by Donald Knuth and Frank Liang. http://www.ctan.org/pkg/ushyph Additional hyphenation patterns for English, extended by Gerard Kuiken. http://www.ctan.org/pkg/hyph-utf8 Collected hyphenation patterns for many languages in many formats. http://www.ctan.org/tex-archive/language/ General CTAN directory for patterns and support for many other languages.
Frank Liang and Peter Breitenlohner, patgen.web.
Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford University Ph.D. thesis, 1983, http://tug.org/docs/liang.
Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.
Frank Liang wrote the first version of this program. Peter Breitenlohner made a substantial revision in 1991 for T\h-0.1667m\v0.20vE\v-0.20v\h-0.125mX 3. The first version was published as the appendix to the TeXware technical report. Howard Trickey originally ported it to Unix.
|Web2C 2015||PATGEN (1)||2 December 2014|