apertium — machine
    translation application platform
  
    | apertium | [ -au] [-ddatadir] [-fformat] language-pair
      [infile [outfile]] | 
apertium is the application that most
    people will be using as it simplifies the use of apertium/lt-toolbox tools
    for machine translation purposes.
This tool tries to ease the use of
    lt-toolbox
    (which contains all the lexical processing modules and tools) and
    apertium
    (which contains the rest of the engine) by providing a unique front-end to
    the end-user.
The different modules behind the apertium machine translation
    architecture are in order:
  - de-formatter
- Separates the text to be translated from the format information.
- morphological-analyser
- Tokenizes the text in surface forms.
- part-of-speech tagger
- Chooses one surface forms among homographs.
- lexical transfer module
- Reads each source-language lexical form and delivers a corresponding
      target-language lexical form.
- structural transfer module
- Detects fixed-length patterns of lexical forms (chunks or phrases) needing
      special processing due to grammatical divergences between the two
      languages and performs the corresponding transformations.
- morphological generator
- Delivers a target-language surface form for each target-language lexical
      form, by suitably inflecting it.
- post-generator
- Performs orthographical operations such as contractions and
      apostrophations.
- re-formatter
- Restores the format information encapsulated by the de-formatter into the
      translated text and removes the encapsulation sequences used to protect
      certain characters in the source text.
  - -ddatadir
- The directory holding the linguistic data. By default it will use the
      expected installation path.
- language-pair
- The language pair:
      LANG1–LANG2 (for
      instance “es-ca” or “ca-es”).
- -fformat
- Specifies the format of the input and output files which can have these
      values:
    
      - txt
- (default
          value) Input and output files are in text format.
- html
- Input and output files are in “html” format. This
          “html” is the one accepted by the vast majority of web
          browsers.
- html-noent
- Input and output files are in “html” format, but
          preserving native encoding characters rather than using HTML text
          entities.
- rtf
- Input and output files are in “rtf” format. The accepted
          “rtf” is the one generated by Microsoft WordPad and
          Microsoft Office up to and including Office 97.
 
- -u
- Disable marking of unknown words with the
      ‘*’ character.
- -H
- Enable header-detection (only used in some language pairs; will lead to
      stray ‘❡’ characters in pairs
      that don't support it).
- -a
- Enable marking of disambiguated words with the
      ‘=’ character.
These are the two files that can be used with this command:
  - -mmemory.tmx
- use a translation memory to recycle translations
- -odirection
- translation direction using the translation memory, by default
      “direction” is used instead
- -l
- lists the available translation directions and exits direction typically,
      LANG1–LANG2, but see
      modes.xml in language data
- infile
- Input file (stdinby
      default).
- outfile
- Output file (stdoutby
      default).
Copyright © 2005, 2006 Universitat d'Alacant / Universidad
    de Alicante. This is free software. You may redistribute copies of it under
    the terms of the
    GNU General Public License.
Many... lurking in the dark and waiting for you!