GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  CYRILLIC (3)

.ds Aq ’

NAME

cyrillic - Library for fast and easy cyrillic text manipulation

CONTENTS

SYNOPSIS



 use cyrillic qw/866 win2dos convert locase upcase detect/;

 print convert( 866, 1251, $str );
 print convert( dos,win, \$str );
 print win2dos $str;



DESCRIPTION

This module includes cyrillic string converting functions from one and to another charset, to upper and to lower case without locale switching. Also included single-byte charsets detection routine. It is easy to add new code pages. For this purpose it is necessary only to add appropriate string of a code page.

Supported charsets:
ibm866, koi8-r, cp855, windows-1251, MacWindows, iso_8859-5, unicode, utf8;

If the first imported parameter - number of a code page, then locale will be switched to it.

FUNCTIONS

o cset_factory - between charsets convertion function generator
o case_factory - case convertion function generator
o convert - between charsets convertor
o upcase - convert to upper case
o locase - convert to lower case
o upfirst - convert first char to upper case
o lofirst - convert first char to lower case
o detect - detect codepage number
o charset - returns charset name for codepage number
At importing list also might be listed named convertors. For Ex.:



 use cyrillic qw/dos2win win2koi mac2dos ibm2dos/;



NOTE! Specialisations (like <B>win2dosB>, <B>utf2winB>) call faster then <B>convertB>.

NOTE! Only <B>convertB> function and they specialisation work with Unicode and UTF-8 strings. All others function work only with single-byte sharsets.

Names for using in named charset convertors:



 dos ibm866       866
 koi koi8-r       20866
 ibm cp855        855
 win windows-1251 1251
 mac ms-cyrillic  10007
 iso iso-8859-5   28585
 uni Unicode
 utf UTF-8



The following rules are correct for converting functions:



 VAR may be SCALAR or REF to SCALAR.
 If VAR is REF to SCALAR then SCALAR will be converted.
 If VAR is ommited then $_ operated.
 If function called to void context and VAR is not REF
 then result placed to $_.



CONVERSION METHODS

<B>cset_factoryB> SRC_CP, DST_CP Generates between codepages convertor function, from SRC_CP to DST_CP, and returns reference to his.

The converting Unicode or UTF-8 data requires presence of installed Unicode::String and Unicode::Map.

<B>case_factoryB> CODEPAGE, [TO_UP], [ONLY_FIRST_LETTER] Generates case convertor function for single-byte CODEPAGE and returns reference to his.
<B>convertB> SRC_CP, DST_CP, [VAR] Convert VAR from SRC_CP codepage to DST_CP codepage and returns converted string. Internaly calls <B>cset_factoryB>.
<B>upcaseB> CODEPAGE, [VAR] Convert VAR to uppercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.
<B>locaseB> CODEPAGE, [VAR] Convert VAR to lowercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.
<B>upfirstB> CODEPAGE, [VAR] Convert first char of VAR to uppercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.
<B>lofirstB> CODEPAGE, [VAR] Convert first char of VAR to lowercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.

MAINTAINANCE METHODS

<B>charsetB> CODEPAGE Returns charset name for CODEPAGE.
<B>detectB> ARRAY Detect single-byte codepage of data in ARRAY and returns codepage number. If first element of ARRAY is REF to array of codepages numbers, then detecting will made between these codepages, otherwise - between all single-byte codepages. If codepage not detected then returns undefined value;

EXAMPLES



 use cyrillic qw/convert locase upcase detect dos2win win2dos/;

 $_ = "\x8F\xE0\xA8\xA2\xA5\xE2 \xF0\xA6\x88\xAA\x88!";

 printf "    dos: %s\n", $_;
 upcase 866;
 printf " upcase: %s\n", $_;
 dos2win;
 printf "dos2win: %s\n", $_;
 win2dos;
 printf "win2dos: %s\n", $_;
 locase 866;
 printf " locase: %s\n", $_;
 printf " detect: %s\n", detect $_;

 # detect between 866 and 20866 codepages
 printf " detect: %s\n", detect [866, 20866], $_;


 # CONVERTING TEST:

 use cyrillic qw/utf2dos mac2utf dos2mac win2dos utf2win/;

 $_ = "XXXXX XXXXXX!\n";

 print "UTF-8: $_";
 print "  DOS: ", utf2dos mac2utf dos2mac win2dos utf2win $_;


 # EQVIVALENT CALLS:

 dos2win( $str );        # called to void context -> result placed to $_
 $_ = dos2win( $str );

 dos2win( \$str );       # called with REF to string -> direct converting
 $str = dos2win( $str );

 dos2win();              # with ommited param called -> $_ converted
 dos2win( \$_ );
 $_ = dos2win( $_ );

 my $convert = cset_factory 866, 1251;
 &$convert( $str );            # faster call convertor function via ref to his
   convert( 866, 1251, $str ); # slower call convertor function


 # FOR EASY SWITCH LOCALE CODEPAGE

 use cyrillic qw/866/;   # locale switched to Russian_Russia.866

 use locale;
 print $str =~ /(\w+)/;

 no locale;
 print $str =~ /(\w+)/;



FAQ



 * Q: Why module say: Cant create Unicode::Map for koi8-r charset!
   A: Your Unicode::Map module cant find map file for koi8-r charset.
      Copy file koi8-r.map to site/lib/Unicode/Map and add to file
      site/lib/Unicode/Map/registry followings three strings:

      name:    KOI8-R
      map:     $UnicodeMappings/koi8-r.map
      alias:   csKOI8R

 * Q: Why perl say: "Undefined subroutine koi2win called" ?
   A: The function B<koi2win> is specialization of the function B<convert>,
      which is created at inclusion it of the name in the list of import.



AUTHOR

Albert MICHEEV <Albert@f80.n5049.z2.fidonet.org>

COPYRIGHT

Copyright (C) 2000, Albert MICHEEV

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.

AVAILABILITY

The latest version of this library is likely to be available from:

http://www.perl.com/CPAN

SEE ALSO

Unicode::String, Unicode::Map.

POD ERRORS

Hey! <B>The above document had some coding errors, which are explained below:B>
Around line 243: ’=item’ outside of any ’=over’
Around line 281: You forgot a ’=back’ before ’=head1’
Around line 283: ’=item’ outside of any ’=over’
Around line 294: You forgot a ’=back’ before ’=head1’
Around line 319: Non-ASCII character seen before =encoding in ’"XXXXX’. Assuming UTF-8
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 CYRILLIC (3) 2001-08-17

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.