 |
|
| |
dtsrhanfile(special file) |
|
dtsrhanfile(special file) |
dtsrhanfile — Describes the format and syntax of DtSearch
han files
Han files are the user generated profile files for dtsrhan.
They identify fields in incoming text from which output fzk file fields can
be constructed. The data from han files are loaded into memory by dtsrhan at
initialization time. dtsrhan and han files have not been
internationalized; han files may only contain ASCII characters.
All identifiers must begin with a letter, and must be composed
entirely of alphanumerics and/or the underscore.
Observe the following points when using using
"strings":
- •
- If an identifying string contains quotes, use a backslash to create the
quote. Example:
this string
-
- would find the string this string "contains" quotes.
- •
- The above point makes it necessary to use double backslashes to create a
single backslash. Example:
this string has a \ backslash
-
- would find the string this string has a backslash.
- •
- Actually, using the backslash in any string will cause the next character
to be included without exception. Thus, a string with this is test
will end up being this is a test. The backslash is ignored, and the
next character is imbedded in the string. This is only needed in the two
cases described above, but can be used for any purpose.
- # ... | blank line
- Han file comment. Any line beginning with a pound sign in the first
column, or any blank line, is discarded.
- line identifier =
physical_line_number
- Defines a line with a physical line number in the record.
physical_line_number must be a number.
- line identifier =
column_number,
- Defines a line using a column number and a 'signature' string that
should appear at that column. column_number can be a number, or
* for 'any column'. "string" should be a string
that occurs on the line in question. It is possible to define complex
signatures using multiple clauses.
- field identifier =
line_identifier,
- Defines a field based on a declared line, a string found on that
line, the offset from the first letter of the string, and the length of
field.
-
- line_identifier is an identifier declared with the line
directive (see above).
-
- "string" is a string for relative positioning, where a
field will follow a string that may not always occur in the same position
on a line. If it is known that the field will always be in the same
position, an empty string("") may be used. string must be
enclosed in double quotes. offset must be a number, identifying the
offset from the first character in the string. It starts at position 1,
not 0, and may be negative.
-
- length represents the length of the field. It may be a number, or
it may be one of two special tokens:
- eow
- End of word. The field will begin at offset and continue until the
next white-space character.
- eoln
- End of line. The field will begin at offset and continue to the end
of the line.
-
- An identifier string beginning with 3 uppercase M's
("MMM...") will be considered an English month name string. At
run time, if the first 3 chars of the field's value equal the first three
chars of an English month name, the value string will be translated to a
two character string of digits in the range "01" to
"12". For example, if field MMMmymonth had an original
value of "April ", it will be translated to "04"
before use.
-
- In the case where a line identifier is associated with multiple
lines in a single document, the field value will be determined from the
last occurrence of the line within the record.
- constant
identifier =
- Defines a constant field that can be used in abstracts and keys.
The identifier is defined exactly the same as a field
identifier. The value must be enclosed in double quotes.
- date = null | field_id
[+ field_id] ...
- Defines the document date for each document. It will be converted into a
correctly formatted fzk file date line.
-
- null specifies undated documents. Undated documents always qualify
for searches irrespective of date qualifiers in DtSearchQuery.
-
- field_id is an identifier declared using the field or
constant directives (see above). "MMM" fields are often
useful for date assemblies.
-
- Multiple fields may be concatenated into a date.
-
- After concatenation, the assembled date must be of the following format:
YYYYMMDDhhmm (exactly 12 digits). For example, 199404171701
is April 17, 1994 at 5:01 pm. 200405031000 is May 3, 2004, at 10:00
am (10 o'oclock).
-
- Dates before 1900 or after 5995 are invalid.
-
- If date is not specified or is invalid, a generated date based on
the current date and time will be used, but an invalid date will
also generate an error message.
- key = field_id [+
field_id] ... | time | count
- Defines the unique database key for each record in a fzk file.
-
- field_id is a field identifier declared using the field or
constant directives.
-
- Multiple fields may be concatenated into a key.
-
- time is a special keyword used to generate keys based on the
current run date and time, plus a sequential count suffix.
-
- count is a special keyword used to generate keys based on a
sequential count of records.
- upper
- Specifies that keys written by handel are to be entirely converted to
upper case. Without using this directive, mixed-case keys are
allowed.
- keychar = A | B |
...Z
- Defines the character used to categorize keys for DtSearch. It must be an
uppercase ASCII alphabetic character.
- delimiter =
line_identifier, bottom
- Defines the end of text (ETX) delimiter that will separate records.
-
- line_identifier is an identifier declared with the line
directive.
-
- bottom is required. It specifies that the ETX will occur at the
bottom of each record. Top of record delimiters are not supported.
- image = all | none
- Defines whether the document image retrieved by DtSearchRetrieve is
to contain all or none of the record, prior to application of
imageinclude or imageexclude directives later in the han
file. It defaults to all.
- imageinclude =
line_identifier [- line_identifier]
- Defines a line (or range of lines) to be included in the image.
line_identifier is an identifier declared with the line
directive.
- imageexclude =
line_identifier [- line_identifier]
- Defines a line (or range of lines) to be excluded from the image.
line_identifier is an identifier declared with the line
directive.
- abstract = field(s)
field_identifier [+ field_identifier]...
- Defines the abstract to be placed into the fzk file. It is created from
the concatenations of fields. field_identifier is an identifier
declared with the field directive.
- delblanklines
= true | false
- Determines if blank lines are to be removed from the record image or not.
It defaults to false.
The sample han file shown here describes a text file containing a
concatenated set of man pages documents.
# All records in the incoming text file are delimited by the same
# end of text convention as the default for an fzk file, namely
# a linefeed (control-L) on a line by itself (").
# Define a line named "etx" with that description,
# and declare it to be the &<delimiter>.
# Note that there must be a real ASCII control-L character between
# the quotes in the line below.
line etx = *,"^L"
delimiter = etx, bottom
# The command name that the man page is describing is on the first line.
# To access it we need to define a line directive for line number 1.
line line1 = 1
# The name of the man page command begins in column 3 of line 1,
# and the length is variable. So we define a field identifier
# named "command1" from column 3 to the end of the word.
field command1 = line1,"",3,eow
# We want each document abstract to have a constant prefix
# followed by the name of the command.
constant preabs = "Man Pages for "
abstract = fields preabs + command1
# We want all keys to be the name of the command, prefixed with
# the same identifying character, an uppercase M.
keychar = M
key = command1
# We want the each document date to be equivalent to the release
# date of the original man pages, which we choose here to hard code
# as November 1, 1994, at 1 o'clock in the afternoon.
constant datecons = "199411011300"
date = datecons
dtsrhan(1),
dtsrindex(1), dtsrfzkfiles(4),
dtsrlangfiles(4), DtSearch(5)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|