GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  WORDTYPE (3)

NAME

WordType - defines a word in term of allowed characters, length etc.

CONTENTS

Synopsis
Description
Configuration
Methods
Authors
See Also

SYNOPSIS


Only called thru WordContext::Initialize()

DESCRIPTION

WordType defines an indexed word and operations to validate a word to be indexed. All words inserted into the mifluz index are Normalize d before insertion. The configuration options give some control over the definition of a word.

CONFIGURATION

For more information on the configuration attributes and a complete list of attributes, see the mifluz(3) manual page.
wordlist_locale <locale> (default C)
  Set the locale of the program to locale
.See setlocale(3) for more information.
wordlist_allow_numbers {true|false} <number> (default false)
  A digit is considered a valid character within a word if this configuration parameter is set to true otherwise it is an error to insert a word containing digits. See the Normalize method for more information.
wordlist_mimimun_word_length <number> (default 3)
  The minimum length of a word. See the Normalize method for more information.
wordlist_maximum_word_length <number> (default 25)
  The maximum length of a word. See the Normalize method for more information.
wordlist_allow_numbers {true|false} <number> (default false)
  A digit is considered a valid character within a word if this configuration parameter is set to true otherwise it is an error to insert a word containing digits. See the Normalize method for more information.
wordlist_truncate {true|false} <number> (default true)
  If a word is too long according to the wordlist_maximum_word_length it is truncated if this configuration parameter is true otherwise it is considered an invalid word.
wordlist_lowercase {true|false} <number> (default true)
  If a word contains upper case letters it is converted to lowercase if this configuration parameter is true, otherwise it is left untouched.
wordlist_valid_punctuation [characters] (default none)
  A list of punctuation characters that may appear in a word. These characters will be removed from the word before insertion in the index.

METHODS

int Normalize(String &s) const
  Normalize a word according to configuration specifications and builtin transformations. Every word inserted in the inverted index goes thru this function. If a word is rejected (return value has WORD_NORMALIZE_NOTOK bit set) it will not be inserted in the index. If a word is accepted (return value has WORD_NORMALIZE_OK bit set) it will be inserted in the index. In addition to these two bits, informational values are stored that give information on the processing done on the word. The bit field values and their meanings are as follows:
WORD_NORMALIZE_TOOLONG
  the word length exceeds the value of
the wordlist_maximum_word_length configuration parameter.
WORD_NORMALIZE_TOOSHORT
  the word length is smaller than the value of
the wordlist_minimum_word_length configuration parameter.
WORD_NORMALIZE_CAPITAL
  the word contained capital letters and has been converted
to lowercase. This bit is only set
if the wordlist_lowercase configuration parameter
is true.
WORD_NORMALIZE_NUMBER
  the word contains digits and the configuration
parameter wordlist_allow_numbers is set to false.
WORD_NORMALIZE_CONTROL
  the word contains control characters.
WORD_NORMALIZE_BAD
  the word is listed in the file pointed by
the wordlist_bad_word_list configuration parameter.
WORD_NORMALIZE_NULL
  the word is a zero length string.
WORD_NORMALIZE_PUNCTUATION
  at least one character listed in
the wordlist_valid_punctuation attribute was removed
from the word.
WORD_NORMALIZE_NOALPHA
  the word does not contain any alphanumerical character.
static String NormalizeStatus(int flags)
  Returns a string explaining the return flags of the Normalize method.

AUTHORS

Loic Dachary loic@gnu.org

The Ht://Dig group http://dev.htdig.org/

SEE ALSO

htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1), mifluzload(1), mifluzsearch(1), mifluzdict(1), WordContext(3), WordList(3), WordDict(3), WordListOne(3), WordKey(3), WordKeyInfo(3), WordDBInfo(3), WordRecordInfo(3), WordRecord(3), WordReference(3), WordCursor(3), WordCursorOne(3), WordMonitor(3), Configuration(3), mifluz(3)

Search for    or go to Top of page |  Section 3 |  Main Index


--> WORDTYPE (3) local

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.