GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  ESTWAVER (1)

NAME

estwaver - command line interface of web crawler

CONTENTS

Synopsis
Description
See Also

SYNOPSIS

estwaver init [-apn|-acc] [-xs|-xl|-xh] [-sv|-si|-sa] rootdir

estwaver crawl [-restart|-revisit|-revcont] rootdir

estwaver unittest rootdir

estwaver fetch [-proxy hostr port] [-tout num] [-il lang] url

DESCRIPTION

estwaver is an aggregation of sub commands. The name of a sub command is specified by the first argument. Other arguments are parsed according to each sub command. The argument rootdir specifies the crawler root directory which contains configuration file and so on.
estwaver init [-apn|-acc] [-xs|-xl|-xh] [-sv|-si|-sa] rootdir
  Create the crawler root directory.
If -apn is specified, N-gram analysis is performed against European text also.
If -acc is specified, character category analysis is performed instead of N-gram analysis.
If -xs is specified, the index is tuned to register less than 50000 documents.
If -xl is specified, the index is tuned to register more than 300000 documents.
If -xh is specified, the index is tuned to register more than 1000000 documents.
If -sv is specified, scores are stored as void.
If -si is specified, scores are stored as 32-bit integer.
If -sa is specified, scores are stored as-is and marked not to be tuned when search.
estwaver crawl [-restart|-revisit|-revcont] rootdir
  Start crawling.
If -restart is specified, crawling is restarted from the seed documents.
If -revisit is specified, collected documents are revisited.
If -revcont is specified, collected documents are revisited and then crawling is continued.</dd>
estwaver unittest rootdir
  Perform unit tests.
estwaver fetch [-proxy hostr port] [-tout num] [-il lang] url
  Fetch a document.
url specifies the URL of a document.
-proxy specifies the host name and the port number of the proxy server.
-tout specifies timeout in seconds.
-il specifies the preferred language. By default, it is English.
All sub commands return 0 if the operation is success, else return 1. A running crawler finishes with closing the database when it catches the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), or 15 (SIGTERM).

When crawling finishes, there is a directory _index in the crawler root directory. It is an index available by estcmd and so on.

SEE ALSO

estconfig(1), estcmd(1), estmaster(1), estcall(1), estraier(3), estnode(3)

Search for    or go to Top of page |  Section 1 |  Main Index


Man Page ESTWAVER (3) 2007-03-06

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.