|
|
| |
WordList(3) |
FreeBSD Library Functions Manual |
WordList(3) |
WordList -
abstract class to manage and use an inverted index file.
#include <mifluz.h>
WordContext context;
WordList* words = context->List();
delete words;
WordList is the mifluz equivalent of a database handler. Each WordList
object is bound to an inverted index file and implements the operations to
create it, fill it with word occurrences and search for an entry matching a
given criterion.
WordList is an abstract class and cannot be instanciated. The
List method of the class WordContext will create an instance using
the appropriate derived class, either WordListOne or WordListMulti. Refer to
the corresponding manual pages for more information on their specific
semantic.
When doing bulk insertions, mifluz creates temporary files that
contain the entries to be inserted in the index. Those files are typically
named indexC00000000 temporary file is wordlist_cache_size /
2. When the maximum size of the temporary file is reached, mifluz creates
another temporary file named indexC00000001 created 50 temporary
file. At this point it merges all temporary files into one that replaces the
first indexC00000000 to create temporary file again and keeps
following this algorithm until the bulk insertion is finished. When the bulk
insertion is finished, mifluz has one big file named indexC00000000
that contains all the entries to be inserted in the index. mifluz inserts
all the entries from indexC00000000 into the index and delete the
temporary file when done. The insertion will be fast since all the entries
in indexC00000000 are already sorted.
The parameter wordlist_cache_max can be used to prevent the
temporary files to grow indefinitely. If the total cumulated size of the
indexC* files grow beyond this parameter, they are merged into the
main index and deleted. For instance setting this parameter value to 500Mb
garanties that the total size of the indexC* files will not grow
above 500Mb.
For more information on the configuration attributes and a complete list of
attributes, see the mifluz(3) manual page.
- wordlist_extend {true|false} (default false)
- If true maintain reference count of unique words. The
Noccurrence method gives access to this count.
- wordlist_verbose <number> (default 0)
- Set the verbosity level of the WordList class.
1 walk logic
2 walk logic details
3 walk logic lots of details
- wordlist_page_size <bytes> (default 8192)
- Berkeley DB page size (see Berkeley DB documentation)
- wordlist_cache_size <bytes> (default 500K)
- Berkeley DB cache size (see Berkeley DB documentation) Cache makes a huge
difference in performance. It must be at least 2% of the expected total
data size. Note that if compression is activated the data size is eight
times larger than the actual file size. In this case the cache must be
scaled to 2% of the data size, not 2% of the file size. See Cache
tuning in the mifluz guide for more hints. See WordList(3) for the
rationale behind cache file handling.
- wordlist_cache_max <bytes> (default 0)
- Maximum size of the cumulated cache files generated when doing bulk
insertion with the BatchStart() function. When this limit is
reached, the cache files are all merged into the inverted index. The value
0 means infinite size allowed. See WordList(3) for the rationale behind
cache file handling.
- wordlist_cache_inserts {true|false} (default false)
- If true all Insert calls are cached in memory. When the WordList
object is closed or a different access method is called the cached entries
are flushed in the inverted index.
- wordlist_compress {true|false} (default false)
- Activate compression of the index. The resulting index is eight times
smaller than the uncompressed index.
- inline WordContext* GetContext()
- Return a pointer to the WordContext object used to create this
instance.
- inline const WordContext* GetContext() const
- Return a pointer to the WordContext object used to create this instance as
a const.
- virtual inline int Override(const WordReference& wordRef)
- Insert wordRef in index. If the Key() part of the
wordRef exists in the index, override it. Returns OK on success,
NOTOK on error.
- virtual int Exists(const WordReference& wordRef)
- Returns OK if wordRef exists in the index, NOTOK otherwise.
- inline int Exists(const String& word)
- Returns OK if word exists in the index, NOTOK otherwise.
- virtual int WalkDelete(const WordReference& wordRef)
- Delete all entries in the index whose key matches the Key() part of
wordRef , using the Walk method. Returns the number of
entries successfully deleted.
- virtual int Delete(const WordReference& wordRef)
- Delete the entry in the index that exactly matches the Key() part
of wordRef. Returns OK if deletion is successfull, NOTOK
otherwise.
- virtual int Open(const String& filename, int mode)
- Open inverted index filename. mode may be O_RDONLY or
O_RDWR. If mode is O_RDWR it can be or'ed with
O_TRUNC to reset the content of an existing inverted index. Return
OK on success, NOTOK otherwise.
- virtual int Close()
- Close inverted index. Return OK on success, NOTOK otherwise.
- virtual unsigned int Size() const
- Return the size of the index in pages.
- virtual int Pagesize() const
- Return the page size
- virtual WordDict *Dict()
- Return a pointer to the inverted index dictionnary.
- const String& Filename() const
- Return the filename given to the last call to Open.
- int Flags() const
- Return the mode given to the last call to Open.
- inline List *Find(const WordReference& wordRef)
- Returns the list of word occurrences exactly matching the Key()
part of wordRef. The List returned contains pointers to
WordReference objects. It is the responsibility of the caller to
free the list. See List.h header for usage.
- inline List *FindWord(const String& word)
- Returns the list of word occurrences exactly matching the word. The
List returned contains pointers to WordReference objects. It
is the responsibility of the caller to free the list. See List.h header
for usage.
- virtual List *operator [] (const WordReference& wordRef)
- Alias to the Find method.
- inline List *operator [] (const String& word)
- Alias to the FindWord method.
- virtual List *Prefix (const WordReference& prefix)
- Returns the list of word occurrences matching the Key() part of
wordRef. In the Key() , the string (accessed with
GetWord() ) matches any string that begins with it. The List
returned contains pointers to WordReference objects. It is the
responsibility of the caller to free the list.
- inline List *Prefix (const String& prefix)
- Returns the list of word occurrences matching the word. In the
Key() , the string (accessed with GetWord() ) matches any
string that begins with it. The List returned contains pointers to
WordReference objects. It is the responsibility of the caller to
free the list.
- virtual List *Words()
- Returns a list of all unique words contained in the inverted index. The
List returned contains pointers to String objects. It is the
responsibility of the caller to free the list. See List.h header for
usage.
- virtual List *WordRefs()
- Returns a list of all entries contained in the inverted index. The
List returned contains pointers to WordReference objects. It
is the responsibility of the caller to free the list. See List.h header
for usage.
- virtual WordCursor *Cursor(wordlist_walk_callback_t callback, Object
*callback_data)
- Create a cursor that searches all the occurrences in the inverted index
and call ncallback with ncallback_data for every match.
- virtual WordCursor *Cursor(const WordKey &searchKey, int action =
HTDIG_WORDLIST_WALKER)
- Create a cursor that searches all the occurrences in the inverted index
and that match nsearchKey. If naction is set to
HTDIG_WORDLIST_WALKER calls searchKey.callback with
searchKey.callback_data for every match. If naction is set
to HTDIG_WORDLIST_COLLECT push each match in searchKey.collectRes
data member as a WordReference object. It is the responsibility of
the caller to free the searchKey.collectRes list.
- virtual WordCursor *Cursor(const WordKey &searchKey,
wordlist_walk_callback_t callback, Object * callback_data)
- Create a cursor that searches all the occurrences in the inverted index
and that match nsearchKey and calls ncallback with
ncallback_data for every match.
- virtual WordKey Key(const String& bufferin)
- Create a WordKey object and return it. The bufferin argument is
used to initialize the key, as in the WordKey::Set method. The first
component of bufferin must be a word that is translated to the
corresponding numerical id using the WordDict::Serial method.
- virtual WordReference Word(const String& bufferin, int exists =
0)
- Create a WordReference object and return it. The bufferin argument
is used to initialize the structure, as in the WordReference::Set method.
The first component of bufferin must be a word that is translated
to the corresponding numerical id using the WordDict::Serial method. If
the exists argument is set to 1, the method WordDict::SerialExists
is used instead, that is no serial is assigned to the word if it does not
already have one. Before translation the word is normalized using the
WordType::Normalize method. The word is saved using the
WordReference::SetWord method.
- virtual WordReference WordExists(const String& bufferin)
- Alias for Word(bufferin, 1).
- virtual void BatchStart()
- Accelerate bulk insertions in the inverted index. All insertion done with
the Override method are batched instead of being updating the
inverted index immediately. No update of the inverted index file is done
before the BatchEnd method is called.
- virtual void BatchEnd()
- Terminate a bulk insertion started with a call to the BatchStart
method. When all insertions are done the AllRef method is called to
restore statistics.
- virtual int Noccurrence(const String& key, unsigned int&
noccurrence) const
- Return in noccurrence the number of occurrences of the string
contained in the GetWord() part of key. Returns OK on
success, NOTOK otherwise.
- virtual int Write(FILE* f)
- Write on file descriptor f an ASCII description of the index. Each
line of the file contains a WordReference ASCII description. Return
OK on success, NOTOK otherwise.
- virtual int WriteDict(FILE* f)
- Write on file descriptor f the complete dictionnary with
statistics. Return OK on success, NOTOK otherwise.
- virtual int Read(FILE* f)
- Read WordReference ASCII descriptions from f , returns the
number of inserted WordReference or < 0 if an error occurs. Invalid
descriptions are ignored as well as empty lines.
Loic Dachary loic@gnu.org
The Ht://Dig group http://dev.htdig.org/
htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1), mifluzload(1),
mifluzsearch(1), mifluzdict(1), WordContext(3), WordDict(3), WordListOne(3),
WordKey(3), WordKeyInfo(3), WordType(3), WordDBInfo(3), WordRecordInfo(3),
WordRecord(3), WordReference(3), WordCursor(3), WordCursorOne(3),
WordMonitor(3), Configuration(3), mifluz(3)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |