GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  LINGUA::STOPWORDS (3)

.ds Aq ’

NAME

Lingua::StopWords - Stop words for several languages.

CONTENTS

SYNOPSIS



    use Lingua::StopWords qw( getStopWords );
    my $stopwords = getStopWords(en);
   
    my @words = qw( i am the walrus goo goo gjoob );
   
    # prints "walrus goo goo gjoob"
    print join  , grep { !$stopwords->{$_} } @words;



DESCRIPTION

In keyword search, it is common practice to suppress a collection of stopwords: words such as the, and, maybe, etc. which exist in in a large number of documents and do not tell you anything important about any document which contains them. This module provides such stoplists in several languages.

    Supported Languages



    |-----------------------------------------------------------|
    | Language   | ISO code | default encoding | also available |
    |-----------------------------------------------------------|
    | Danish     | da       | ISO-8859-1       | UTF-8          |
    | Dutch      | nl       | ISO-8859-1       | UTF-8          |
    | English    | en       | ISO-8859-1       | UTF-8          |
    | Finnish    | fi       | ISO-8859-1       | UTF-8          |
    | French     | fr       | ISO-8859-1       | UTF-8          |
    | German     | de       | ISO-8859-1       | UTF-8          |
    | Hungarian  | hu       | ISO-8859-1       | UTF-8          |
    | Italian    | it       | ISO-8859-1       | UTF-8          |
    | Norwegian  | no       | ISO-8859-1       | UTF-8          |
    | Portuguese | pt       | ISO-8859-1       | UTF-8          |
    | Spanish    | es       | ISO-8859-1       | UTF-8          |
    | Swedish    | sv       | ISO-8859-1       | UTF-8          |
    | Russian    | ru       | KOI8-R           | UTF-8          |
    |-----------------------------------------------------------|



FUNCTIONS

    getStopWords



    my $stoplist      = getStopWords(en);
    my $utf8_stoplist = getStopWords(en, UTF-8);



Retrieve a stoplist in the form of a hashref where the keys are all stopwords and the values are all 1.



    $stoplist = {
        and => 1,
        if  => 1,
        # ...
    };



getStopWords() expects 1-2 arguments. The first, which is required, is an ISO code representing a supported language. If the ISO code cannot be found, getStopWords returns undef.

The second argument should be ’UTF-8if you want the stopwords encoded in UTF-8. The UTF-8 flag will be turned on, so make sure you understand all the implications of that.

SEE ALSO

The stoplists supplied by this module were created as part of the Snowball project (see <http://snowball.tartarus.org>, Lingua::Stem::Snowball).

Lingua::EN::StopWords provides a different stoplist for English.

AUTHOR

Maintained by Marvin Humphrey <marvin at rectangular dot com>. Original author Fabien Potencier, <fabpot at cpan dot org>.

COPYRIGHT AND LICENSE

Copyright 2004-2008 Fabien Potencier, Marvin Humphrey

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or, at your option, any later version of Perl 5 you may have available.

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 LINGUA::STOPWORDS (3) 2008-08-22

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.