Token is the fundamental unit used by KinoSearch1s Analyzer subclasses. Each
Token has 4 attributes: text, start_offset, end_offset, and pos_inc (for
The text of a token is a string.
A Tokens start_offset and end_offset locate it within a larger text, even if
the Tokens text attribute gets modified by stemming, for instance. The
Token for beating in the text beating a dead horse begins life with a
start_offset of 0 and an end_offset of 7; after stemming, the text is beat,
but the end_offset is still 7.
The position increment, which defaults to 1, is a an advanced tool for
manipulating phrase matching. Ordinarily, Tokens are assigned consecutive
position numbers: 0, 1, and 2 for three blind mice. However, if you set the
position increment for blind to, say, 1000, then the three tokens will end
up assigned to positions 0, 1, and 1001 and will no longer produce a phrase
match for the query three blind mice.