GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  LINGUA::ZH::WORDSEGMENTER (3)

.ds Aq ’

NAME

Lingua::ZH::WordSegmenter - Simplified Chinese Word Segmentation

CONTENTS

VERSION

Version 0.01

SYNOPSIS



    use Lingua::ZH::WordSegmenter;

    my $segmenter = Lingua::ZH::WordSegmenter->new();
    print encode(gbk, $segmenter->seg($_) );



Description

This is a perl version of simplified Chinese word segmentation.

The algorithm for this segmenter is to search the longest word at each point from both left and right directions, and choose the one with higher frequency product.

The original program is from the CPAN module Lingua::ZH::WordSegment (http://search.cpan.org/~chenyr/) I did the follwing changes: 1) make the interface object oriented; 2) make the internal string into utf8; 3) using sogou’s dictionary (http://www.sogou.com/labs/dl/w.html) as the default dictionary.

METHODS

$segmenter = Lingua::ZH::WordSegmenter->new(%opinions) This method constructs a new Lingua::ZH::WordSegmenter object and returns it. Key/value pair arguments may be provided to set up the initial state. The following options correspond to attribute methods described below:



   KEY            PURPOSE                       DEFAULT
   -----------    -------------                 --------------------
   dic            filename of the dic           sogou dic
   dic_encoding   encoding of the dic           "gbk"
   seperator      string to seperate wrods      " "
   verbose        show the segment process      0



$segmenter->seg($input,[$encoding]) Segment a input string, you can specify the encoding by the optional parameter.

The return result is encoded in utf8 format.

SEE ALSO

Lingua::ZH::WordSegment

AUTHOR

Zhang Jun, <jzhang533 at gmail.com>

COPYRIGHT & LICENSE

Copyright 2007 Zhang Jun, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 LINGUA::ZH::WORDSEGMENTER (3) 2007-03-29

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.