GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Lingua::Stem::EnBroken(3) User Contributed Perl Documentation Lingua::Stem::EnBroken(3)

Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic' English

    use Lingua::Stem::EnBroken;
    my $stems   = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference,
                                        -locale => 'en',
                                    -exceptions => $exceptions_hash,
                                     });

This routine MIS-applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is an intentionally broken version of Lingua::Stem::En for people needing backwards compatibility with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you aren't one of those people.

It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes:

   Purpose:    Implementation of the Porter stemming algorithm documented
               in: Porter, M.F., "An Algorithm For Suffix Stripping,"
               Program 14 (3), July 1980, pp. 130-137.
   Provenance: Written by B. Frakes and C. Cox, 1986.

I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may misbehave on short words starting with "y", but I can't think of any examples.

The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix.

 2003.09.28 -  Documentation fix

 2000.09.14 -  Forked from the Lingua::Stem::En.pm module to provide
               a backward compatibly broken version for people needing
               consistent behavior with 0.30 and 0.40 more than accurate
               stemming.

stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions });
Stems a list of passed words using the rules of US English. Returns an anonymous array reference to the stemmed words.

Example:

  my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => \@words,
                                              -locale => 'en',
                                          -exceptions => \%exceptions,
                          });
    
stem_caching({ -level => 0|1|2 });
Sets the level of stem caching.

'0' means 'no caching'. This is the default level.

'1' means 'cache per run'. This caches stemming results during a single call to 'stem'.

'2' means 'cache indefinitely'. This caches stemming results until either the process exits or the 'clear_stem_cache' method is called.

clear_stem_cache;
Clears the cache of stemmed words

This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.

 Lingua::Stem

  Jim Richardson, University of Sydney
  jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html

  Integration in Lingua::Stem by
  Jerilyn Franz, FreeRun Technologies,
  <cpan@jerilyn.info>

Jim Richardson, University of Sydney Jerilyn Franz, FreeRun Technologies

This code is freely available under the same terms as Perl.

2020-09-26 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.