lt-trim
— compiled
dictionary trimmer for Apertium
lt-trim |
analyser_binary bidix_binary
trimmed_analyser_binary |
lt-trim
is the application responsible for
trimming compiled dictionaries. The analyses (right-side when compiling lr)
of analyser_binary are trimmed to the input side of bidix_binary (left-side
when compiling lr, right-side when compiling rl), such that only analyses
which would pass through
‘lt-proc(1)
-b
bidix_binary
’ are
kept.
Both compound tags (“<compound-only-L>”,
“<compound-R>”) and join elements
(“<j/>” in XML, “+” in the stream) and the
group element (“<g/>” in XML, “#” in the
stream) should be handled correctly, even combinations of + followed by # in
monodix are handled.
Some minor caveats: If you have the capitalised lemma
“Foo” in the monodix, but “foo” in the bidix, an
analysis “^Foo<tag>$” would pass through bidix when
doing
lt-proc(1)
-b
, but will not make it through trimming. Make sure
your lemmas have the same capitalisation in the different dictionaries.
Also, you should not have literal ‘+
’
or ‘#
’ in your lemmas. Since
lt-comp(1)
doesn't escape these, lt-trim
cannot know that they
are different from “<j/>” or “<g/>”,
and you may get @-marked output this way. You can analyse
‘+
’ or
‘#
’ by having the literal symbol in
the “<l>” part and some other string (e.g.,
“plus”) in the “<r>”.
You should not trim a generator unless you have a
very simple
translator pipeline, since the output of bidix seldom goes unchanged through
transfer.
-s
,
--match-section
- A section with this name (id@type) in the analyser will only be trimmed
against a section with the same id in the bidix. (The default is to trim
all sections of the analyser against all sections of the bidix.) Using
this option can some times speed up trimming considerably. For example, if
you have some complicated regular expressions, try putting them in a
<section id="regex" type="standard">
in both .dix files and passing “regex@standard”
to --match-section.
This argument may be used multiple times to specify multiple
sections that must match by name.
- analyser_binary
- The untrimmed analyser dictionary (a finite state transducer).
- bidix_binary
- The dictionary to use as trimmer (a finite state transducer).
- trimmed_analyser_binary
- The trimmed analyser dictionary (a finite state transducer).
Copyright © 2005, 2006 Universitat d'Alacant / Universidad
de Alicante. This is free software. You may redistribute copies of it under
the terms of the
GNU General Public License.
Many... lurking in the dark and waiting for you!