|
NAMEText::Language::Guess - Trained module to guess a document's language SYNOPSIS use Text::Language::Guess;
my $guesser = Text::Language::Guess->new();
my $lang = $guesser->language_guess("bill.txt");
# prints 'en'
print "Best fit: $lang\n";
DESCRIPTIONText::Language::Guess guesses a document's language. Its implementation is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN, it determines how many of the known stopwords the document contains for each language supported by "Lingua::StopWords". Each word in the document recognized as stopword of a particular language scores one point for this language. The language_guess() function takes a document as a parameter and returns the abbreviation of the language that it is most likely written in. Supported Languages:
Methods
EXAMPLES use Text::Language::Guess;
# Guess language in a string instead of a file
my $guesser = Text::Language::Guess->new();
my $lang = $guesser->language_guess_string("Make love not war");
# 'en'
# Limit number of languages to choose from
my $guesser = Text::Language::Guess->new(languages => ['da', 'nl']);
my $lang = $guesser->language_guess_string(
"Which is closer to English, danish or dutch?");
# 'nl'
# Show different scores
my $guesser = Text::Language::Guess->new();
my $scores = $guesser->scores_string(
"This text is English, but other languages are scoring as well");
use Data::Dumper;
print Dumper($scores);
# $VAR1 = {
# 'pt' => 1,
# 'en' => 6,
# 'fr' => 1,
# 'nl' => 1
# };
LEGALESECopyright 2005 by Mike Schilli, all rights reserved. This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself. AUTHOR2005, Mike Schilli <cpan@perlmeister.com>
|