lt-trim is the application responsible for trimming compiled dictionaries. The
analyses (right-side when compiling lr) of analyser_binary are trimmed
to the input side of bidix_binary (left-side when compiling lr,
right-side when compiling rl), such that only analyses which would
pass through lt-proc -b bidix_binary are kept.
Warning: this program is experimental! It has been tested, but
not deployed extensively yet.
Both compund tags (<compound-only-L>, <compound-R>) and join
elements (<j/> in XML, + in the stream) and the group element
(<g/> in XML, # in the stream) should be handled correctly, even
combinations of + followed by # in monodix are handled.
Some minor caveats: If you have the capitalised lemma "Foo" in the
monodix, but "foo" in the bidix, an analysis "^Foo<tag>$" would pass
through bidix when doing lt-proc -b, but will not make it through
trimming. Make sure your lemmas have the same capitalisation in the
different dictionaries. Also, you should not have literal + or #
in your lemmas. Since lt-comp doesnt escape these, lt-trim cannot
know that they are different from <j/> or <g/>, and you may get
@-marked output this way. You can analyse + or # by having the
literal symbol in the <l> part and some other string (e.g. "plus")
in the <r>.
You should not trim a generator unless you have a very simple
translator pipeline, since the output of bidix seldom goes unchanged