GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  LT-PROC (1)

NAME

lt-proc - This application is part of the lexical processing modules and tools ( lttoolbox )

This tool is part of the apertium machine translation architecture: http://www.apertium.org.

CONTENTS

Synopsis
Description
Options
Files
See Also
Bugs
Author

SYNOPSIS

lt-proc [ -a | -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v | -h -z -w ] fst_file [input_file [output_file]]

lt-proc [ --analysis | --bilingual | --surf-bilingual | --case-sensitive | --debugged-gen | --decompose-nouns | --generation | --non-marked-gen | --tagged-gen | --post-generation | --sao | --transliteration | --null-flush --dictionary-case --decompose-compounds | --version | --help ] fst_file [input_file [output_file]]

DESCRIPTION

lt-proc is the application responsible for providing the four lexical processing functionalities

o morphological analyser ( option -a )

o lexical transfer ( option -n )

o morphological generator ( option -g )

o post-generator ( option -p )

It accomplishes these tasks by reading binary files containing a compact and efficient representation of dictionaries (a class of finite-state transducers called augmented letter transducers). These files are generated by lt-comp(1).

It is worth to mention that some characters (‘[’, ‘]’, ‘$’, ‘^’, ‘/’, ‘+’) are special chars used for format and encapsulation. They should be escaped if they have to be used literally, for instance: ‘[’...‘]’ are ignored and the format of a linefeed is ‘^...$’.

OPTIONS

-a, --analysis
  Tokenizes the text in surface forms (lexical units as they appear in texts) and delivers, for each surface form, one or more lexical forms consisting of lemma, lexical category and morphological inflection information. Tokenization is not straightforward due to the existence, on the one hand, of contractions, and, on the other hand, of multi-word lexical units. For contractions, the system reads in a single surface form and delivers the corresponding sequence of lexical forms. Multi-word surface forms are analysed in a left-to-right, longest-match fashion. Multi-word surface forms may be invariable (such as a multi-word preposition or conjunction) or inflected (for example, in es, "echaban de menos", \(dqthey missed\(dq, is a form of the imperfect indicative tense of the verb "echar de menos", \(dqto miss\(dq). Limited support for some kinds of discontinuous multi-word units is also available. Single-word surface forms analysis produces output like the one in these examples: "cantar" -> ‘^cantar/cantar<vblex><inf>$’ or ‘"daba" -> ‘^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$’.
-b, --bilingual
  Does lexical transference, attaching queues of morphological symbols not specified in the dictionaries. As the analysis mode, supports multiple lexical forms in the target language for a given lexical form in the source language. Works tipically with the output of apertium-pretransfer.
-o, --surf-bilingual
  As with -b, but takes input from apertium-tagger -p , with surface forms, and if the lexical form is not found in the bilingual dictionary, it outputs the surface form of the word.

-c, --case-sensitive Use the literal case of the incoming characters
-d, --debugged-gen
  Morph. generation with all the stuff
-e, --decompose-compounds
  Try to treat unknown words as compounds, and decompose them.
-w, --dictionary-case
  Use the case information contained in the lexicon, instead of the surface case (only applied in analysis mode).
-g, --generation
  Delivers a target-language surface form for each target-language lexical form, by suitably inflecting it.
-n, --non-marked-gen
  Morphological generation (like -g) but without unknown word marks (asterisk ‘*’).
-b, --tagged-gen
  Morphological generation (like -g) but retaining part-of-speech tags.
-p, --post-generation
  Performs orthographical operations such as contractions and apostrophations. The post-generator is usually dormant (just copies the input to the output) until a special alarm symbol contained in some target-language surface forms wakes it up to perform a particular string transformation if necessary; then it goes back to sleep.
-s, --sao Input processing is in orthoepikon (previously ‘sao’) annotation system format: http://orthoepikon.sf.net.
-t, --transliteration
  Apply a transliteration dictionary
-z, --null-flush
  Flush output on the null character
-v, --version
  Display the version number.
-h, --help Display this help.

FILES

input_file The input compiled dictionary.

SEE ALSO

lt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).

BUGS

Lots of...lurking in the dark and waiting for you!

AUTHOR

(c) 2005,2006 Universitat d’Alacant / Universidad de Alicante.
Search for    or go to Top of page |  Section 1 |  Main Index


LT-PROC (1) 2006-03-23

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.