GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  LINGUA::HAN::UTILS (3)

.ds Aq ’

NAME

Lingua::Han::Utils - The utility tools of Chinese character(HanZi)

CONTENTS

SYNOPSIS



    use Lingua::Han::Utils qw/Unihan_value csplit cdecode csubstr clength/;

    # cdecode
    # the same as decode(cp936, $word) in ASCII editing mode
    #         and decode(utf8, $word) in Unicode editing mode
    my $word = cdecode($word);

    # Unihan_value
    # return the first field of Unihan.txt on unicode.org
    my $word = "X";
    my $unihan = Unihan_value($word); # return 6211
    my $words = "XX";
    my @unihan = Unihan_value($word); # return (7231, 4F60)
    my $unihan = Unihan_value($word); # return 72314F60

    # csplit
    # split the Chinese characters into an array
    my $words = "XXX";
    my @words = csplit($words); # return ("X", "X", "X")

    # csubstr
    # treat the Chinese characters as one
    # so its the same as splice(csplit($words), $offset, $length)
    my $words = "XXXX";
    my @words = csubstr($words, 1, 2); # return ("X", "X")
    my @words = csubstr($words, 1); # return ("X", "X", "X")
    my $words = csubstr($words, 1, 2); # XX

    # clength
    # treat the Chinese character as one
    my $words = "XXX";
    print clength($words); # 3



EXPORT

Nothing is exported by default.

EXPORT_OK

cdecode use Encode::Guess to decode the character. It behavers like: decode(’cp936’, $word) under ASCII editing mode and decode(’utf8’, $word) under Unicode editing mode.
Unihan_value the first field of Unihan.txt is the Unicode scalar value as U+[x]xxxx, we return the [x]xxxx.
csplit split the Chinese characters into an array, English words can be mixed in.
csubstr(WORD, OFFSET, LENGTH) treat the Chinese character as one word, substr it.

(BE CAFEFUL! it’s NOT lvalue, we cann’t use csubstr($word, 2, 3) = $REPLACEMENT)

if no LENGTH is specified, substr form OFFSET to END.

clength treat the Chinese character as one word(length 1).

DOCUMENT

a Chinese version of document can be found @ <http://www.fayland.org/journal/Lingua-Han-Utils.html>

AUTHOR

Fayland Lam, <fayland at gmail.com>

BUGS

Please report any bugs or feature requests to bug-lingua-han-utils at rt.cpan.org, or through the web interface at <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-Han-Utils>. I will be notified, and then you’ll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.



    perldoc Lingua::Han::Utils



You can also look for information at:
o AnnoCPAN: Annotated CPAN documentation

<http://annocpan.org/dist/Lingua-Han-Utils>

o CPAN Ratings

<http://cpanratings.perl.org/d/Lingua-Han-Utils>

o RT: CPAN’s request tracker

<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-Han-Utils>

o Search CPAN

<http://search.cpan.org/dist/Lingua-Han-Utils>

ACKNOWLEDGEMENTS

the wonderful Encode::Guess

COPYRIGHT & LICENSE

Copyright 2005 Fayland Lam, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 LINGUA::HAN::UTILS (3) 2014-09-16

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.