GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  TEXT::MATCH::FASTALTERNATIVES (3)

.ds Aq ’

NAME

Text::Match::FastAlternatives - efficient search for many strings

CONTENTS

SYNOPSIS



    use Text::Match::FastAlternatives;

    my $expletives = Text::Match::FastAlternatives->new(@naughty);
    while (my $line = <>) {
        print "Do you email your mother with that keyboard?\n"
            if $expletives->match($line);
    }



DESCRIPTION

This module allows you to search for any of a list of substrings (keys) in a larger string. It is particularly efficient when the set of keys is large.

This efficiency comes at the cost of some flexibility: if you want case-insensitive matching, you have to fold case yourself:



    my $expletives = Text::Match::FastAlternatives->new(
        map { lc } @naughty);
    while (my $line = <>) {
        print "Do you email your mother with that keyboard?\n"
            if $expletives->match(lc $line);
    }



This module is designed as a drop-in replacement for Perl code of the following form:



    my $expletives_regex = join |, map { quotemeta } @naughty;
    $expletives_regex = qr/$expletives_regex/;
    while (my $line = <>) {
        print "Do you email your mother with that keyboard?\n"
            if $line =~ $expletives_regex;
    }



Text::Match::FastAlternatives can easily perform this test a hundred times faster than the equivalent regex, if you have enough keys. The more keys it searches for, the faster it gets compared to the regex.

Modules like Regexp::Trie can build an optimised version of such a regex, designed to take advantage of the niceties of perl’s regex engine. With a large number of keys, this module will substantially outperform even an optimised regex like that. In one real-world situation with 339 keys, running on Perl 5.8, Regexp::Trie produced a regex that ran 857% faster than the naive regex (according to Benchmark), but using Text::Match::FastAlternatives ran 18275% faster than the naive regex, or twenty times faster than Regexp::Trie’s optimised regex.

The enhancements to the regex engine in Perl 5.10 include algorithms similar to those in Text::Match::FastAlternatives. However, even with very small sets of keys, Perl has to do extra work to be fully general, so Text::Match::FastAlternatives is still faster. The difference is greater for larger sets of keys. For one test with only 5 keys, Text::Match::FastAlternatives was 21% faster than perl-5.10.0; with 339 keys (as before), the difference was 111% (that is, slightly over twice as fast).

METHODS

Text::Match::FastAlternatives->new(@keys) Constructs a matcher that can efficiently search for all of the @keys in parallel. Throws an exception if any of the keys are undefined.
$matcher->match($target) Returns a boolean value indicating whether the $target string contains any of the keys in $matcher.
$matcher->match_at($target, $pos) Returns a boolean value indicating whether the $target string contains any of the keys in $matcher at position $pos. Returns false (without emitting any warning) if $pos is larger than the length of $string.
$matcher->exact_match($target) Returns a boolean value indicating whether the $target string is exactly equal to any of the keys in $matcher.

CAVEATS

    Subclassing

Text::Match::FastAlternatives has a DESTROY method implemented in XS. If you write a subclass with its own destructor, you will need to invoke the base destructor, or you will leak memory.

    Interaction with Perl internals

Text::Match::FastAlternatives may change the Perl-internal encoding of strings passed to new or to its match methods. This is not considered a bug, as the Perl-internal encoding of a string is not normally of interest to Perl code (as opposed to Perl internals). However, you may encounter situations where preserving a string’s existing encoding is important (perhaps to work around a bug in some other module). If so, you may need to copy scalar variables before matching them:



    $matches++ if $tmfa->match(my $temporary_copy = $original);



IMPLEMENTATION

Text::Match::FastAlternatives manages to be so fast by using a trie internally. The time to find a match at a given position in the string (or determine that there is no match) is independent of the number of keys being sought; worst-case match time is linear in the length of the longest key. Since a match must be attempted at each position in the target string, total worst-case search time is O(mn) where m is the length of the target string and n is the length of the longest key.

The match_at and exact_match methods only need to find a match at one position, so they have worst-case running time of O(min(n, m)).

SEE ALSO

<http://en.wikipedia.org/wiki/Trie>, Regexp::Trie, Regexp::Optimizer, Regexp::Assemble, perl5100delta, perlunitut, perlunifaq.

AUTHOR

Aaron Crane <arc@cpan.org>

COPYRIGHT

Copyright 2006, 2007, 2008 Aaron Crane.

This library is free software; you can redistribute it and/or modify it under the terms of the Artistic License, or (at your option) under the terms of the GNU General Public License version 2.

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 TEXT::MATCH::FASTALTERNATIVES (3) 2008-09-28

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.