GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  KINOSEARCH1::ANALYSIS::TOKEN (3)

.ds Aq ’

NAME

KinoSearch1::Analysis::Token - unit of text

CONTENTS

SYNOPSIS



    # private class - no public API



PRIVATE CLASS

You can’t actually instantiate a Token object at the Perl level — however, you can affect individual Tokens within a TokenBatch by way of TokenBatch’s (experimental) API.

DESCRIPTION

Token is the fundamental unit used by KinoSearch1’s Analyzer subclasses. Each Token has 4 attributes: text, start_offset, end_offset, and pos_inc (for position increment).

The text of a token is a string.

A Token’s start_offset and end_offset locate it within a larger text, even if the Token’s text attribute gets modified — by stemming, for instance. The Token for beating in the text beating a dead horse begins life with a start_offset of 0 and an end_offset of 7; after stemming, the text is beat, but the end_offset is still 7.

The position increment, which defaults to 1, is a an advanced tool for manipulating phrase matching. Ordinarily, Tokens are assigned consecutive position numbers: 0, 1, and 2 for three blind mice. However, if you set the position increment for blind to, say, 1000, then the three tokens will end up assigned to positions 0, 1, and 1001 — and will no longer produce a phrase match for the query ’three blind mice’.

COPYRIGHT

Copyright 2006-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch1 version 1.01.
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 KINOSEARCH1::ANALYSIS::TOKEN (3) 2016-03-17

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.