 |
|
| |
Apache::Solr(3) |
User Contributed Perl Documentation |
Apache::Solr(3) |
Apache::Solr - Apache Solr (Lucene) extension
Apache::Solr is extended by
Apache::Solr::JSON
Apache::Solr::XML
# use Log::Report mode => "DEBUG";
my $solr = Apache::Solr->new(server => $url);
my $lwp = $solr->agent; # internal LWP::UserAgent
my $doc = Apache::Solr::Document->new(...);
my $results = $solr->addDocument($doc);
$results or die $results->errors;
my $results = $solr->select(q => 'author:mark');
my $doc = $results->selected(3);
print $doc->_author;
my $results = $solr->select(q => "really", hl => {fl=>'content'});
while(my $doc = $results->nextSelected)
{ my $hldoc = $results->highlighted($doc);
print $hldoc->_content;
...
}
# based on Log::Report, hence (for communication errors and such)
use Log::Report;
dispatcher SYSLOG => 'default'; # now all warnings/error to syslog
try { $solr->select(...) }; print $@->wasFatal;
# [1.11] Information about communication errors
my $result = try { $solr->select(...) };
if(my $ex = $@->wasFatal)
{ $result = $ex->message->valueOf('result');
if(defined $result) #!! defined !!
{ warn Dumper $result->decoded;
Solr is a stand-alone full-text search-engine (based on Lucent),
with loads of features. This module tries to provide a high level interface
to the Solr server.
See http://wiki.apache.org/solr/ and
http://lucene.apache.org/solr/
- Apache::Solr->new(%options)
- Create a client to connect to one "core" (collection) of the
Solr server.
-Option --Default
agent <created internally>
autocommit true
core undef
format 'XML'
retry_max 60
retry_wait 5
server <required>
server_version <latest>
- agent => LWP::UserAgent
object
- Agent which implements the communication between this client and the Solr
server.
When you have multiple
"Apache::Solr" objects in your
program, you may want to share this agent, to share the connection.
Since [0.94], this will happen automagically: the parameter defaults to
the agent created for the previous object.
Do not forget to install LWP::Protocol::https if you need to
connect via https.
- autocommit =>
BOOLEAN
- Commit all changes immediately unless specified differently.
- core => NAME
- Set the core name to be addressed by this client. When there is no core
name specified, the core is selected by the server or already part of the
URL.
You probably want to set-up a core dedicated for testing and
one for the live environment.
- format =>
'XML'|'JSON'
- Communication format between client and server. You may also instantiate
Apache::Solr::XML or Apache::Solr::JSON directly.
- retry_max =>
COUNT
- [1.09] When the server(-connection) persists in producing errors, it may
not recover at all. Let's not block the main code. Of course, it may take
considerable time for each error to show, so the communication failure can
take much, much longer than "retry_wait"
times "retry_max" seconds.
You can disable retries with with '0'.
- retry_wait =>
SECONDS
- [1.09] When the connection to the Solr server fails, or when the server
does not respond correctly, a retry is attempted after waiting a few
seconds. You may use '0' to avoid waiting.
- server => URL
- The locations of the Solr server depends on the way the java environment
is set-up. The URL is either an URI object or a string which can be
instantiated as such.
- server_version
=> VERSION
- By default the latest version of the server software, currently 4.5. Try
to get this setting right, because it will help you a lot in correct
parameter use and support for the right features.
We know now that this can be requested via
"/admin/info/system", but do not spend
more development time on this module until it gets more users.
- $obj->agent()
- Returns the LWP::UserAgent object which maintains the connection to the
server.
- $obj->autocommit( [BOOLEAN] )
- $obj->core( [$core] )
- Returns the $core, when not defined the default
core as set by new(core). May return
"undef".
- $obj->server( [$uri|STRING] )
- Returns the URI object which refers to the server base address. You need
to clone() it before modifying. You may set a new value as STRING
or $uri object.
- $obj->serverVersion()
- Returns the specified version of the Solr server software (by default the
latest). Treat this version as string, to avoid rounding errors.
Search
- $obj->queryTerms($terms)
- Search for often used terms. See
http://wiki.apache.org/solr/TermsComponent
$terms are passed to
expandTerms() before being used.
Be warned: The result is not sorted when XML
communication is used, even when you explicitly request it.
example:
my $r = $self->queryTerms(fl => 'subject', limit => 100);
if($r->success)
{ foreach my $hit ($r->terms('subject'))
{ my ($term, $count) = @$hit;
print "term=$term, count=$count\n";
}
}
if(my $r = $self->queryTerms(fl => 'subject', limit => 100))
...
- $obj->select( [\%options], @parameters )
- Find information in the document collection.
This method has a HUGE number of parameters. These values are
passed in the uri of the http query to the solr server. See
expandSelect() for all the simplifications offered here. Sets of
there parameters may need configuration help in the server as well.
[1.06] You may pass some options to process the selected
results (the Apache::Solr::Result object initiation). For instance,
"sequential". For backwards
compatability reasons, they have to be passed in a HASH as optional
first parameter.
Updates
See http://wiki.apache.org/solr/UpdateXmlMessages. Missing
are the atomic updates.
- $obj->addDocument( <$doc|ARRAY>, %options )
- Add one or more documents (Apache::Solr::Document objects) to the Solr
database on the server.
-Option --Default
allowDups <false>
commit <autocommit>
commitWithin undef
overwrite <true>
overwriteCommitted <not allowDups>
overwritePending <not allowDups>
- $obj->commit(%options)
-
-Option --Default
expungeDeletes <false>
softCommit <false>
waitFlush <true>
waitSearcher <true>
- $obj->delete(%options)
- Remove one or more documents, based on id or query.
-Option --Default
commit <autocommit>
fromCommitted true
fromPending true
id undef
query undef
- $obj->extractDocument(%options)
- Call the Solr Tika built-in to have the server translate various kinds of
structured documents into Solr searchable documents. This component is
also called "Solr Cell".
The %options are mostly passed on as
attributes to the server call, but there are a few more. You need to
pass either a "file" or
"string" with data.
See
http://wiki.apache.org/solr/ExtractingRequestHandler
-Option --Default
commit new(autocommit)
content_type <from> filename
file undef
string undef
example:
my $r = $solr->extractDocument(file => 'design.pdf',
literal_id => 'host');
- $obj->optimize(%options)
-
-Option --Default
maxSegments 1
softCommit <false>
waitFlush <true>
waitSearcher <true>
- $obj->rollback()
- [solr 1.4]
Core management
See https://solr.apache.org/guide/6_6/coreadmin-api.html
The CREATE, SWAP, ALIAS, and RENAME actions are not yet supported, because
they are not very useful, it seems.
- $obj->coreReload( [$core] )
- [0.94] Load a new core (on the server) from the configuration of this
core. While the new core is initializing, the existing one will continue
to handle requests. When the new Solr core is ready, it takes over and the
old core is unloaded.
-Option--Default
core <this core>
example:
my $result = $solr->coreReload;
$result or die $result->errors;
- $obj->coreStatus()
- [0.94] Returns a HASH with information about this core. There is no
description about the exact structure and interpretation of this data.
-Option--Default
core <this core>
example:
my $result = $solr->coreStatus;
$result or die $result->errors;
use Data::Dumper;
print Dumper $result->decoded->{status};
- $obj->coreUnload(%options)
- Removes a core from Solr. Active requests will continue to be processed,
but no new requests will be sent to the named core. If a core is
registered under more than one name, only the given name is removed.
-Option--Default
core <this core>
Parameter pre-processing
Many parameters are passed to the server. The syntax of the
communication protocol is not optimal for the end-user: it is too verbose
and depends on the Solr server version.
General rules:
- you can group them on prefix
- use underscore as alternative to dots: less quoting needed
- boolean values in Perl will get translated into 'true' and 'false'
- when an ARRAY (or LIST), the order of the parameters get preserved
- $obj->deprecated($message)
- Produce a warning $message about deprecated
parameters with the indicated server version.
- $obj->expandExtract(PAIRS|ARRAY)
- Used by extractDocument().
[0.93] If the key is
"literal" or
"literals", then the keys in the value
HASH (or ARRAY of PAIRS) get 'literal.' prepended. "Literals"
are fields you add yourself to the SolrCEL output. Unless
"extractOnly", you need to specify the
'id' literal.
[0.94] You can also use
"fmap",
"boost", and
"resource" with an HASH (or
ARRAY-of-PAIRS). [0.97] the value in each PAIR may be a SCALAR (ref
string) which circumvents some copying.
example:
my $result = $solr->extractDocument(string => $document,
resource_name => $fn, extractOnly => 1,
literals => { id => 5, b => 'tic' }, literal_xyz => 42,
fmap => { id => 'doc_id' }, fmap_subject => 'mysubject',
boost => { abc => 3.5 }, boost_xyz => 2.0
);
- $obj->expandSelect(PAIRS)
- The select() method accepts many, many parameters. These are passed
to modules in the server, which need configuration before being usable.
Besides the common parameters, like 'q' (query) and 'rows',
there are parameters for various (pluggable) backends, usually prefixed
by the backend abbreviation.
- expand
- facet -> http://wiki.apache.org/solr/SimpleFacetParameters
- hl (highlight) ->
http://wiki.apache.org/solr/HighlightingParameters
- mlt -> https://solr.apache.org/guide/8_11/morelikethis.html
- stats -> http://wiki.apache.org/solr/StatsComponent
- suggest ->
https://solr.apache.org/guide/8_11/suggester.html
- group -> http://wiki.apache.org/solr/FieldCollapsing
You may use WebService::Solr::Query to construct the query
('q').
example:
my @r = $solr->expandSelect(
q => 'inStock:true', rows => 10,
facet => {limit => -1, field => [qw/cat inStock/], mincount => 1},
f_cat_facet => {missing => 1},
hl => {},
mlt => { fl => 'manu,cat', mindf => 1, mintf => 1 },
stats => { field => [ 'price', 'popularity' ] },
group => { query => 'price:[0 TO 99.99]', limit => 3 },
);
# becomes (one line)
...?rows=10&q=inStock:true
&facet=true&facet.limit=-1&facet.field=cat
&f.cat.facet.missing=true&facet.mincount=1&facet.field=inStock
&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1
&stats=true&stats.field=price&stats.field=popularity
&group=true&group.query=price:[0+TO+99.99]&group.limit=3
- $obj->expandTerms(PAIRS|ARRAY)
- Used by queryTerms() only.
example:
my @t = $solr->expandTerms('terms.lower.incl' => 'true');
my @t = $solr->expandTerms([lower_incl => 1]); # same
my $r = $self->queryTerms(fl => 'subject', limit => 100);
- $obj->ignored($message)
- Produce a warning $message about parameters which
will get ignored because they were not yet supported by the indicated
server version.
- $obj->removed($message)
- Produce a warning $message about parameters which
will not be passed on, because they were removed from the indicated server
version.
Other helpers
- $obj->endpoint($action, %options)
- Compute the address to be called (for HTTP)
-Option--Default
core new(core)
params []
- core => NAME
- If no core is specified, the default of the server is addressed.
- params =>
HASH|ARRAY-of-pairs
- The order of the parameters will be preserved when an ARRAY or parameters
is passed; you never know for a HASH.
- $obj->request($url, $result, $body, $ct)
- Send a request to the server $url and return the
response (an HTTP::Response object). A trace of the activity is added to
the $result object. The
$body of the request can be provided as bytes or
reference to bytes (SCALAR). The content-type $ct
must match the body bytes.
Compared to WebService::Solr
WebService::Solr is a good module, with a lot of miles. The main
differences is that "Apache::Solr" has
much more abstraction.
- simplified parameter syntax, improving readibility
- real Perl-level boolean parameters, not 'true' and 'false'
- warnings for deprecated and ignored parameters
- smart result object with built-in trace and timing
- hidden paging of results
- flexible logging framework (Log::Report)
- both-way XML or both-way JSON, not requests in XML and answers in
JSON
- access to plugings like terms and tika
- no Moose
This module is part of Apache-Solr distribution version 1.11,
built on May 08, 2025. Website: http://perl.overmeer.net/CPAN/
Copyrights 2012-2025 by [Mark Overmeer]. For other contributors
see ChangeLog.
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself. See
http://dev.perl.org/licenses/
Hey! The above document had some coding errors, which are
explained below:
- Around line 52:
- Unterminated F<...> sequence
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|