![]() |
![]()
| ![]() |
![]()
NAMEdtsrindex — Load
SYNOPSISdtsrindex-ddbname [-tetxstr] [-h0 | -hhashsz ] [-rrecdots] [-bbatchsz] [-ccachesz] [-iinbufsz] file DESCRIPTIONdtsrindex is the second of a pair of programs that load a database with documents data from an input fzk file. dtsrload loads document header information and optionally the documents themselves. dtsrindex parses words from document text and loads them into the inverted index files. Word parsing is performed in the specified language and linguistic codeset of the database. The inverted index contains the search terms used for subsequent online queries. An fzk file can be generated by dtsrhan manually with a text editor, or by a special application program created for the purpose. Typically the same fzk file is used for dtsrload and dtsrindex. However, it is not required and there are situations where it may not be desirable. If the same fzk file is not used by both programs, the one used for dtsrindex must represent the same objects in the same order. Only the unique key line and the text portions of the file are used by this program. (See dtsrfzkfiles(4) for information about DtSearch fzk files). A document's unique key in the fzk file must already preexist in the database (that is, dtsrload must be executed before dtsrindex). If any words are already indexed for the unique document key, indicating dtsrload "updated" the document, then the newly parsed words from the current fzk file will totally replace the previously indexed words. When duplicate record ids are encountered in a single fzk file, only the first occurrence of the document is indexed into the database; the second one is discarded. Sinxe this is exactly the same discard order as dtsrload, the same fzk file can be used for both programs. Duplicate record ids are maintained during execution with a hash table. dtsrindex performs two passes. In the first pass, dtsrindex constructs an inverted index in memory of all the words it parses from the fzk file. Since the index is built in memory, it is possible to run out of memory for very large fzk files. For this reason very large fzk files are processed in batches. Execution time in the first pass depends on the size of the fzk file. In the second pass, dtsrindex merges the information in the memory index into the database's disk inverted index. Execution time in the second pass depends on both the size of the incoming fzk file and the overall size of the database. If dtsrindex is interrupted in the first pass, it can be reexecuted without database damage. However if it is interrupted in the second pass, the database will be corrupted. Database backups are always recommended. Caution:
To prevent database corruption, execute dtsrindex only after all users of a preexisting database have exited their search programs. For a single fzk file, dtsrload must be executed immediately before dtsrindex so that dtsrindex can map the words it indexes to the correct internal database addresses. Only after both programs successfully complete execution may users again be allowed to perform online searches of the database. OPTIONSThe following options are available: Note:
If an option takes a value, the value must be directly appended to the option name without white space.
OPERANDSThe required input file name (file) identifies the file to be processed by dtsrindex. It can optionally include a path prefix, either from root or relative to the current working directory. If a file name extension is not specified, dtsrindex assumes a default extension of .fzk. ENVIRONMENT VARIABLESNone. RESOURCESNone. ACTIONS/MESSAGESNone. RETURN VALUESThe return values are as follows:
FILESdtsrindex reads the specified fzk file and opens all the database and related language files for the specified database name. dtsrindex updates the following database files:
EXAMPLESIndex all words in the fzk file named batch1.fzk in the current working directory into database mydb. dtsrindex -dmydb batch1
Load database mydb with the documents specified in the fzk file /u/dtsearch/jpndocs.1. Three ASCII plus signs at the bottom of each document signals the end of document text and the beginning of the next fzk file record. dtsrindex -dmydb -t+++ /u/dtsearch/jpndocs.1
SEE ALSOdtsrload(1), dtsrhan(1), dtsrfzkfiles(4), DtSearch(5)
|