Manual Reference Pages - LT-PROC (1)
lt-proc - This application is part of the lexical processing modules
and tools (
This tool is part of the apertium machine translation
-w ] fst_file [input_file [output_file]]
--help ] fst_file [input_file [output_file]]
lt-proc is the application responsible for providing the four lexical
o morphological analyser ( option -a )
o lexical transfer ( option -n )
o morphological generator ( option -g )
o post-generator ( option -p )
It accomplishes these tasks by reading binary files containing a
compact and efficient representation of dictionaries (a class of
finite-state transducers called augmented letter transducers). These
files are generated by lt-comp(1).
It is worth to mention that some characters
([, ], $, ^, /, +) are
special chars used for format and encapsulation. They should be
escaped if they have to be used literally, for
instance: [...] are ignored and the format of a
linefeed is ^...$.
-a, --analysis |
Tokenizes the text in surface forms (lexical units as they appear in
texts) and delivers, for each surface form, one or more lexical forms
consisting of lemma, lexical category and morphological inflection
information. Tokenization is not straightforward due to the existence,
on the one hand, of contractions, and, on the other hand, of
multi-word lexical units. For contractions, the system reads in a
single surface form and delivers the corresponding sequence of lexical
forms. Multi-word surface forms are analysed in a left-to-right,
longest-match fashion. Multi-word surface forms may be invariable
(such as a multi-word preposition or conjunction) or inflected (for
example, in es, "echaban de menos", they missed, is a
form of the imperfect indicative tense of the verb "echar de
menos", to miss). Limited support for some kinds of
discontinuous multi-word units is also available. Single-word surface
forms analysis produces output like the one in these examples:
"cantar" -> ^cantar/cantar<vblex><inf>$ or
-b, --bilingual |
Does lexical transference, attaching queues of morphological symbols
not specified in the dictionaries. As the analysis mode, supports
multiple lexical forms in the target language for a given lexical
form in the source language. Works tipically with the output of
-o, --surf-bilingual |
As with -b, but takes input from apertium-tagger -p , with
surface forms, and if the lexical form is not found in the bilingual
dictionary, it outputs the surface form of the word.
-c, --case-sensitive Use the literal case of the incoming characters
-d, --debugged-gen |
Morph. generation with all the stuff
-e, --decompose-compounds |
Try to treat unknown words as compounds, and decompose them.
-w, --dictionary-case |
Use the case information contained in the lexicon, instead of the surface
case (only applied in analysis mode).
-g, --generation |
Delivers a target-language surface form for each target-language
lexical form, by suitably inflecting it.
-n, --non-marked-gen |
Morphological generation (like -g) but without unknown word
marks (asterisk *).
-b, --tagged-gen |
Morphological generation (like -g) but retaining part-of-speech
-p, --post-generation |
Performs orthographical operations such as contractions and
apostrophations. The post-generator is usually dormant (just
copies the input to the output) until a special alarm symbol
contained in some target-language surface forms wakes it up to
perform a particular string transformation if necessary; then it goes
back to sleep.
-s, --sao ||
Input processing is in orthoepikon (previously sao)
annotation system format: http://orthoepikon.sf.net.
-t, --transliteration |
Apply a transliteration dictionary
-z, --null-flush |
Flush output on the null character
-v, --version |
Display the version number.
-h, --help ||
Display this help.
input_file The input compiled dictionary.
Lots of...lurking in the dark and waiting for you!
(c) 2005,2006 Universitat dAlacant / Universidad de Alicante.
Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.