af - Amberfish text retrieval software
af [-i] [options] [file
...]
af [-s] [options]
af [-L] [options]
af [-l] [options]
af [--fetch] [file] [begin]
[end]
af [--version]
The program af is a text-based interface to Amberfish
functions for indexing and searching documents. A simple indexing example
would look something like:
af -iCv -d mydb *.txt
This creates a new database, mydb, containing an index to the set
of files, *.txt.
Then to search:
af -s -d mydb -q 'Beethoven piano sonatas'
Or for a Boolean query:
af -s -d mydb -Q '(Robert or Clara) and Schumann'
Only one of these options can be used at a time.
- -i, --index
- Index documents (either file ... or specified via standard input if
-F is used).
- -s, --search
- Search an indexed database.
- -l, --list
- List the documents contained in a database.
- --fetch
- Output a portion of a file. This command takes no other options. The file
name file, starting offset begin, and ending offset
end are specified at the end of the line.
- --version
- Print the af version number.
These options are generally available with all command
options.
- -d, --db
dbname
- Use dbname as the database name. With some command options such as
-s, this option can be supplied multiple times to specify multiple
databases.
- -v, --verbose
- Show verbose output. This option can be supplied multiple times to
increase verbosity.
- -D, --debug
- Show extremely verbose (debugging) output. Using this option once is
equivalent to -vvvvv, and it can be supplied multiple times to
increase verbosity further.
The following options can only be used together with the indexing
(-i) command.
- -C, --create
- Create a new database, overwriting any existing one with the same name.
- -m, --memory
maximum
- Set the maximum amount of memory in megabytes to use for indexing. More
memory speeds up indexing.
- --phrase
- Enable phrase searching. This can only be used together with -C.
- --split
delimiter
- Parse input files into multiple documents at points where the specified
delimiter string is found.
- -t, --doctype=text,
--doctype=xml, --doctype=erc
- Set the document type. The default is text. Specifying xml
enables functions related to searching and retrieving within nested tags
in XML documents. The erc doctype is for kernel metadata in
Electronic Resource Citation (ERC) format.
- --dlevel
level
- The maximum resolution (levels of descent) for retrieval of nested
documents. The default value is 1; increasing it lengthens indexing time
significantly. Use this for XML instead of --split to subdivide
documents. Note that this only affects resolution of elements returned
from searches and is unrelated to nested queries which have much higher
(fixed) resolution.
- --no-stem
- Do not perform stemming. This can only be used together with -C.
Normally, stemming is automatically enabled if Amberfish was compiled with
the stemming function. This option disables stemming even if it is
available. Note that the stemming function is not distributed with this
package and must be installed manually.
- -F
- Read list of documents to be indexed from standard input, rather than from
the end of the command line.
The following options can only be used together with the searching
(-s) command.
- -q
query_string
- Search for the specified free text query string.
- -Q, --query-boolean
query_string
- Search for the specified Boolean query string.
- -n, --numhits
x
- Output a maximum of x results.
- --skiphits
x
- Do not output the first x results.
- --totalhits
- Output the total number of results.
- --style=list,
--style=lineage, --style=trec
- Set style of printed result sets. The default is list. Use the
lineage style with XML to see hierarchical results. For the
trec style, it is assumed that the indexed file names are the
document numbers and that --skiphits is not used (because rank
always starts at 1).
- --trec-tag
run_tag
- Output TREC results with the specified run tag. (This is to be used with
--style=trec.)
- --trec-topic
topic_number
- Output TREC results with the specified topic number. (This is to be used
with --style=trec.)
The following options can only be used together with the linearize
(-L) command.
- -m, --memory
maximum
- Set the maximum amount of memory in megabytes to use for linearizing. More
memory speeds up linearizing.
- --no-linear-buffer
- Do not use a memory buffer to speed up linearizing. This option will be
removed once the linearization buffer code proves to be reliable.
Nassib Nassar; see http://www.etymon.com/ for updates.
Copyright (C) 1999-2004 Etymon Systems, Inc.