GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Text::Match::FastAlternatives(3) User Contributed Perl Documentation Text::Match::FastAlternatives(3)

Text::Match::FastAlternatives - efficient search for many strings

    use Text::Match::FastAlternatives;

    my $expletives = Text::Match::FastAlternatives->new(@naughty);
    while (my $line = <>) {
        print "Do you email your mother with that keyboard?\n"
            if $expletives->match($line);
    }

This module allows you to search for any of a list of substrings ("keys") in a larger string. It is particularly efficient when the set of keys is large.

This efficiency comes at the cost of some flexibility: if you want case-insensitive matching, you have to fold case yourself:

    my $expletives = Text::Match::FastAlternatives->new(
        map { lc } @naughty);
    while (my $line = <>) {
        print "Do you email your mother with that keyboard?\n"
            if $expletives->match(lc $line);
    }

(The same applies if you want matching that is insensitive to Unicode normalization forms; see Unicode::Normalize.)

This module is designed as a drop-in replacement for Perl code of the following form:

    my $expletives_regex = join '|', map { quotemeta } @naughty;
    $expletives_regex = qr/$expletives_regex/;
    while (my $line = <>) {
        print "Do you email your mother with that keyboard?\n"
            if $line =~ $expletives_regex;
    }

Text::Match::FastAlternatives can easily perform this test a hundred times faster than the equivalent regex, if you have enough keys. The more keys it searches for, the faster it gets compared to the regex.

Modules like Regexp::Trie can build an optimised version of such a regex, designed to take advantage of the niceties of perl's regex engine. With a large number of keys, this module will substantially outperform even an optimised regex like that. In one real-world situation with 339 keys, running on Perl 5.8, Regexp::Trie produced a regex that ran 857% faster than the naive regex (according to Benchmark), but using Text::Match::FastAlternatives ran 18275% faster than the naive regex, or twenty times faster than Regexp::Trie's optimised regex.

The enhancements to the regex engine in Perl 5.10 include algorithms similar to those in Text::Match::FastAlternatives. However, even with very small sets of keys, Perl has to do extra work to be fully general, so Text::Match::FastAlternatives is still faster. The difference is greater for larger sets of keys. For one test with only 5 keys, Text::Match::FastAlternatives was 21% faster than perl-5.10.0; with 339 keys (as before), the difference was 111% (that is, slightly over twice as fast).

Text::Match::FastAlternatives->new(@keys)
Constructs a matcher that can efficiently search for all of the @keys in parallel. Throws an exception if any of the keys are undefined, or any of them contain malformed UTF-8.
$matcher->match($target)
Returns a boolean value indicating whether the $target string contains any of the keys in $matcher.
$matcher->match_at($target, $pos)
Returns a boolean value indicating whether the $target string contains any of the keys in $matcher at position $pos. Returns false (without emitting any warning) if $pos is larger than the length of $string.
$matcher->exact_match($target)
Returns a boolean value indicating whether the $target string is exactly equal to any of the keys in $matcher.

Text::Match::FastAlternatives has a "DESTROY" method implemented in XS. If you write a subclass with its own destructor, you will need to invoke the base destructor, or you will leak memory.

Text::Match::FastAlternatives may change the Perl-internal encoding of strings passed to "new" or to its "match" methods. This is not considered a bug, as the Perl-internal encoding of a string is not normally of interest to Perl code (as opposed to "perl" internals). However, you may encounter situations where preserving a string's existing encoding is important (perhaps to work around a bug in some other module). If so, you may need to copy scalar variables before matching them:

    $matches++ if $tmfa->match(my $temporary_copy = $original);

Text::Match::FastAlternatives manages to be so fast by using an Aho-Corasick automaton internally. The time to search for a match (or determine that there is no match) is independent of the number of keys being sought; total worst-case search time is O(n) where n is the length of the target string.

The "match_at" and "exact_match" methods only need to find a match at one position, so they have worst-case running time of O(min(n, m)) where m is the length of the longest key.

Text::Match::FastAlternatives uses an adaptive technique to minimise memory usage; in the best case, each character in a key requires only 4 bytes of storage in the automaton. This low memory usage has the additional advantage of reducing contention for CPU cache lines, further improving performance.

<http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm>, Regexp::Trie, Regexp::Optimizer, Regexp::Assemble, Unicode::Normalize, perl5100delta, perlunitut, perlunifaq.

Aaron Crane <arc@cpan.org>

Copyright 2006, 2007, 2008, 2010, 2012 Aaron Crane.

This library is free software; you can redistribute it and/or modify it under the terms of the Artistic License, or (at your option) under the terms of the GNU General Public License version 2.

2012-12-28 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.