GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Search::Estraier(3) User Contributed Perl Documentation Search::Estraier(3)

Search::Estraier - pure perl module to use Hyper Estraier search engine

        use Search::Estraier;

        # create and configure node
        my $node = new Search::Estraier::Node(
                url => 'http://localhost:1978/node/test',
                user => 'admin',
                passwd => 'admin',
                create => 1,
                label => 'Label for node',
                croak_on_error => 1,
        );

        # create document
        my $doc = new Search::Estraier::Document;

        # add attributes
        $doc->add_attr('@uri', "http://estraier.gov/example.txt");
        $doc->add_attr('@title', "Over the Rainbow");

        # add body text to document
        $doc->add_text("Somewhere over the rainbow.  Way up high.");
        $doc->add_text("There's a land that I heard of once in a lullaby.");

        die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) });

        use Search::Estraier;

        # create and configure node
        my $node = new Search::Estraier::Node(
                url => 'http://localhost:1978/node/test',
                user => 'admin',
                passwd => 'admin',
                croak_on_error => 1,
        );

        # create condition
        my $cond = new Search::Estraier::Condition;

        # set search phrase
        $cond->set_phrase("rainbow AND lullaby");

        my $nres = $node->search($cond, 0);

        if (defined($nres)) {
                print "Got ", $nres->hits, " results\n";

                # for each document in results
                for my $i ( 0 ... $nres->doc_num - 1 ) {
                        # get result document
                        my $rdoc = $nres->get_doc($i);
                        # display attribte
                        print "URI: ", $rdoc->attr('@uri'),"\n";
                        print "Title: ", $rdoc->attr('@title'),"\n";
                        print $rdoc->snippet,"\n";
                }
        } else {
                die "error: ", $node->status,"\n";
        }

This module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine.

It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes.

There are few examples in "scripts" directory of this distribution.

This methods should really move somewhere else.

Remove multiple whitespaces from string, as well as whitespaces at beginning or end

 my $text = $self->_s(" this  is a text  ");
 $text = 'this is a text';

This class implements Document which is single item in Hyper Estraier.

It's is collection of:

attributes
'key' => 'value' pairs which can later be used for filtering of results

You can add common filters to "attrindex" in estmaster's "_conf" file for better performance. See "attrindex" in Hyper Estraier P2P Guide <http://hyperestraier.sourceforge.net/nguide-en.html>.

vectors
also 'key' => 'value' pairs
display text
Text which will be used to create searchable corpus of your index and included in snippet output.
hidden text
Text which will be searchable, but will not be included in snippet.

Create new document, empty or from draft.

  my $doc = new Search::HyperEstraier::Document;
  my $doc2 = new Search::HyperEstraier::Document( $draft );

Add an attribute.

  $doc->add_attr( name => 'value' );

Delete attribute using

  $doc->add_attr( name => undef );

Add a sentence of text.

  $doc->add_text('this is example text to display');

Add a hidden sentence.

  $doc->add_hidden_text('this is example text just for search');

Add a vectors

  $doc->add_vector(
        'vector_name' => 42,
        'another' => 12345,
  );

Set the substitute score

  $doc->set_score(12345);

Get the substitute score

Get the ID number of document. If the object has never been registred, "-1" is returned.

  print $doc->id;

Returns array with attribute names from document object.

  my @attrs = $doc->attr_names;

Returns value of an attribute.

  my $value = $doc->attr( 'attribute' );

Returns array with text sentences.

  my @texts = $doc->texts;

Return whole text as single scalar.

 my $text = $doc->cat_texts;

Dump draft data from document object.

  print $doc->dump_draft;

Empty document object

  $doc->delete;

This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically.

  my $cond = new Search::HyperEstraier::Condition;

  $cond->set_phrase('search phrase');

  $cond->add_attr('@URI STRINC /~dpavlin/');

  $cond->set_order('@mdate NUMD');

  $cond->set_max(42);

  $cond->set_options( 'SURE' );

  $cond->set_options( qw/AGITO NOIDF SIMPLE/ );

Possible options are:

SURE
check every N-gram
USUAL
check every second N-gram
FAST
check every third N-gram
AGITO
check every fourth N-gram
NOIDF
don't perform TF-IDF tuning
SIMPLE
use simplified query phrase

Skipping N-grams will speed up search, but reduce accuracy. Every call to "set_options" will reset previous options;

This option changed in version 0.04 of this module. It's backwards compatibile.

Return search phrase.

  print $cond->phrase;

Return search result order.

  print $cond->order;

Return search result attrs.

  my @cond_attrs = $cond->attrs;

Return maximum number of results.

  print $cond->max;

"-1" is returned for unitialized value, 0 is unlimited.

Return options for this condition.

  print $cond->options;

Options are returned in numerical form.

Set number of skipped documents from beginning of results

  $cond->set_skip(42);

Similar to "offset" in RDBMS.

Return skip for this condition.

  print $cond->skip;

  $cond->set_distinct('@author');

Return distinct attribute

  print $cond->distinct;

Filter out some links when searching.

Argument array of link numbers, starting with 0 (current node).

  $cond->set_mask(qw/0 1 4/);

  my $rdoc = new Search::HyperEstraier::ResultDocument(
        uri => 'http://localhost/document/uri/42',
        attrs => {
                foo => 1,
                bar => 2,
        },
        snippet => 'this is a text of snippet'
        keywords => 'this\tare\tkeywords'
  );

Return URI of result document

  print $rdoc->uri;

Returns array with attribute names from result document object.

  my @attrs = $rdoc->attr_names;

Returns value of an attribute.

  my $value = $rdoc->attr( 'attribute' );

Return snippet from result document

  print $rdoc->snippet;

Return keywords from result document

  print $rdoc->keywords;

  my $res = new Search::HyperEstraier::NodeResult(
        docs => @array_of_rdocs,
        hits => %hash_with_hints,
  );

Return number of documents

  print $res->doc_num;

This will return real number of documents (limited by "max"). If you want to get total number of hits, see "hits".

Return single document

  my $doc = $res->get_doc( 42 );

Returns undef if document doesn't exist.

Return specific hint from results.

  print $res->hint( 'VERSION' );

Possible hints are: "VERSION", "NODE", "HIT", "HINT#n", "DOCNUM", "WORDNUM", "TIME", "LINK#n", "VIEW".

More perlish version of "hint". This one returns hash.

  my %hints = $res->hints;

Syntaxtic sugar for total number of hits for this query

  print $res->hits;

It's same as

  print $res->hint('HIT');

but shorter.

  my $node = new Search::HyperEstraier::Node;

or optionally with "url" as parametar

  my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' );

or in more verbose form

  my $node = new Search::HyperEstraier::Node(
        url => 'http://localhost:1978/node/test',
        user => 'admin',
        passwd => 'admin'
        create => 1,
        label => 'optional node label',
        debug => 1,
        croak_on_error => 1
  );

with following arguments:

url
URL to node
user
specify username for node server authentication
passwd
password for authentication
create
create node if it doesn't exists
label
optional label for new node if "create" is used
debug
dumps a lot of debugging output
croak_on_error
very helpful during development. It will croak on all errors instead of silently returning "-1" (which is convention of Hyper Estraier API in other languages).

Specify URL to node server

  $node->set_url('http://localhost:1978');

Specify proxy server to connect to node server

  $node->set_proxy('proxy.example.com', 8080);

Specify timeout of connection in seconds

  $node->set_timeout( 15 );

Specify name and password for authentication to node server.

  $node->set_auth('clint','eastwood');

Return status code of last request.

  print $node->status;

"-1" means connection failure.

Add a document

  $node->put_doc( $document_draft ) or die "can't add document";

Return true on success or false on failure.

Remove a document

  $node->out_doc( document_id ) or "can't remove document";

Return true on success or false on failture.

Remove a registrated document using it's uri

  $node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document";

Return true on success or false on failture.

Edit attributes of a document

  $node->edit_doc( $document_draft ) or die "can't edit document";

Return true on success or false on failture.

Retreive document

  my $doc = $node->get_doc( document_id ) or die "can't get document";

Return true on success or false on failture.

Retreive document

  my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document";

Return true on success or false on failture.

Retrieve the value of an atribute from object

  my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or
        die "can't get document attribute";

Retrieve the value of an atribute from object

  my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or
        die "can't get document attribute";

Exctract document keywords

  my $keywords = $node->etch_doc( document_id ) or die "can't etch document";

Retreive document

  my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document";

Return true on success or false on failture.

Get ID of document specified by URI

  my $id = $node->uri_to_id( 'file:///document/uri/42' );

This method won't croak, even if using "croak_on_error".

Private function used for implementing of "get_doc", "get_doc_by_uri", "etch_doc", "etch_doc_by_uri".

 # this will decode received draft into Search::Estraier::Document object
 my $doc = $node->_fetch_doc( id => 42 );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' );

 # to extract keywords, add etch
 my $doc = $node->_fetch_doc( id => 42, etch => 1 );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 );

 # to get document attrubute add attr
 my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' );

 # more general form which allows implementation of
 # uri_to_id
 my $id = $node->_fetch_doc(
        uri => 'file:///document/uri/42',
        path => '/uri_to_id',
        chomp_resbody => 1
 );

  my $node_name = $node->name;

  my $node_label = $node->label;

  my $documents_in_node = $node->doc_num;

  my $words_in_node = $node->word_num;

  my $node_size = $node->size;
Search documents which match condition

  my $nres = $node->search( $cond, $depth );

$cond is "Search::Estraier::Condition" object, while <$depth> specifies depth for meta search.

Function results "Search::Estraier::NodeResult" object.

Return URI encoded string generated from Search::Estraier::Condition

  my $args = $node->cond_to_query( $cond, $depth );

This is method which uses "LWP::UserAgent" to communicate with Hyper Estraier node master.

  my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody );

$resheads and $resbody booleans controll if response headers and/or response body will be saved within object.

Set width of snippets in results

  $node->set_snippet_width( $wwidth, $hwidth, $awidth );

$wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not sent with results. If it is negative, whole document text is sent instead of snippet.

$hwidth specified width of strings from beginning of string. Default value is 96. Negative or zero value keep previous value.

$awidth specifies width of strings around each highlighted word. It's 96 by default. If negative of zero value is provided previous value is kept unchanged.

Manage users of node

  $node->set_user( 'name', $mode );

$mode can be one of:

0
delete account
1
set administrative right for user
2
set user account as guest

Return true on success, otherwise false.

Manage node links

  $node->set_link('http://localhost:1978/node/another', 'another node label', $credit);

If $credit is negative, link is removed.

 my @admins = @{ $node->admins };

Return array of users with admin rights on node

 my @guests = @{ $node->guests };

Return array of users with guest rights on node

 my $links = @{ $node->links };

Return array of links for this node

Return cache usage for a node

  my $cache = $node->cacheusage;

Set actions on Hyper Estraier node master ("estmaster" process)

  $node->master(
        action => 'sync'
  );

All available actions are documented in <http://hyperestraier.sourceforge.net/nguide-en.html#protocol>

You could call those directly, but you don't have to. I hope.

Set information for node

  $node->_set_info;

Clear information for node

  $node->_clear_info;

On next call to "name", "label", "doc_num", "word_num" or "size" node info will be fetch again from Hyper Estraier.

Nothing.

<http://hyperestraier.sourceforge.net/>

Hyper Estraier Ruby interface on which this module is based.

Hyper Estraier now also has pure-perl binding included in distribution. It's a faster way to access databases directly if you are not running "estmaster" P2P server.

Dobrica Pavlinusic, <dpavlin@rot13.org>

Robert Klep <robert@klep.name> contributed refactored search code

Copyright (C) 2005-2006 by Dobrica Pavlinusic

This library is free software; you can redistribute it and/or modify it under the GPL v2 or later.

Hey! The above document had some coding errors, which are explained below:
Around line 1775:
Expected text after =item, not a number
Around line 1779:
Expected text after =item, not a number
2008-01-20 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.