Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Contact Us
Online Help
Domain Status
Man Pages

Virtual Servers

Topology Map

Server Agreement
Year 2038

USA Flag



Man Pages

Manual Reference Pages  -  CYRILLIC (3)

.ds Aq ’


cyrillic - Library for fast and easy cyrillic text manipulation



 use cyrillic qw/866 win2dos convert locase upcase detect/;

 print convert( 866, 1251, $str );
 print convert( dos,win, \$str );
 print win2dos $str;


This module includes cyrillic string converting functions from one and to another charset, to upper and to lower case without locale switching. Also included single-byte charsets detection routine. It is easy to add new code pages. For this purpose it is necessary only to add appropriate string of a code page.

Supported charsets:
ibm866, koi8-r, cp855, windows-1251, MacWindows, iso_8859-5, unicode, utf8;

If the first imported parameter - number of a code page, then locale will be switched to it.


o cset_factory - between charsets convertion function generator
o case_factory - case convertion function generator
o convert - between charsets convertor
o upcase - convert to upper case
o locase - convert to lower case
o upfirst - convert first char to upper case
o lofirst - convert first char to lower case
o detect - detect codepage number
o charset - returns charset name for codepage number
At importing list also might be listed named convertors. For Ex.:

 use cyrillic qw/dos2win win2koi mac2dos ibm2dos/;

NOTE! Specialisations (like <B>win2dosB>, <B>utf2winB>) call faster then <B>convertB>.

NOTE! Only <B>convertB> function and they specialisation work with Unicode and UTF-8 strings. All others function work only with single-byte sharsets.

Names for using in named charset convertors:

 dos ibm866       866
 koi koi8-r       20866
 ibm cp855        855
 win windows-1251 1251
 mac ms-cyrillic  10007
 iso iso-8859-5   28585
 uni Unicode
 utf UTF-8

The following rules are correct for converting functions:

 VAR may be SCALAR or REF to SCALAR.
 If VAR is REF to SCALAR then SCALAR will be converted.
 If VAR is ommited then $_ operated.
 If function called to void context and VAR is not REF
 then result placed to $_.


<B>cset_factoryB> SRC_CP, DST_CP Generates between codepages convertor function, from SRC_CP to DST_CP, and returns reference to his.

The converting Unicode or UTF-8 data requires presence of installed Unicode::String and Unicode::Map.

<B>case_factoryB> CODEPAGE, [TO_UP], [ONLY_FIRST_LETTER] Generates case convertor function for single-byte CODEPAGE and returns reference to his.
<B>convertB> SRC_CP, DST_CP, [VAR] Convert VAR from SRC_CP codepage to DST_CP codepage and returns converted string. Internaly calls <B>cset_factoryB>.
<B>upcaseB> CODEPAGE, [VAR] Convert VAR to uppercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.
<B>locaseB> CODEPAGE, [VAR] Convert VAR to lowercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.
<B>upfirstB> CODEPAGE, [VAR] Convert first char of VAR to uppercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.
<B>lofirstB> CODEPAGE, [VAR] Convert first char of VAR to lowercase using CODEPAGE table and returns converted string. Internaly calls <B>case_factoryB>.


<B>charsetB> CODEPAGE Returns charset name for CODEPAGE.
<B>detectB> ARRAY Detect single-byte codepage of data in ARRAY and returns codepage number. If first element of ARRAY is REF to array of codepages numbers, then detecting will made between these codepages, otherwise - between all single-byte codepages. If codepage not detected then returns undefined value;


 use cyrillic qw/convert locase upcase detect dos2win win2dos/;

 $_ = "\x8F\xE0\xA8\xA2\xA5\xE2 \xF0\xA6\x88\xAA\x88!";

 printf "    dos: %s\n", $_;
 upcase 866;
 printf " upcase: %s\n", $_;
 printf "dos2win: %s\n", $_;
 printf "win2dos: %s\n", $_;
 locase 866;
 printf " locase: %s\n", $_;
 printf " detect: %s\n", detect $_;

 # detect between 866 and 20866 codepages
 printf " detect: %s\n", detect [866, 20866], $_;


 use cyrillic qw/utf2dos mac2utf dos2mac win2dos utf2win/;

 $_ = "XXXXX XXXXXX!\n";

 print "UTF-8: $_";
 print "  DOS: ", utf2dos mac2utf dos2mac win2dos utf2win $_;


 dos2win( $str );        # called to void context -> result placed to $_
 $_ = dos2win( $str );

 dos2win( \$str );       # called with REF to string -> direct converting
 $str = dos2win( $str );

 dos2win();              # with ommited param called -> $_ converted
 dos2win( \$_ );
 $_ = dos2win( $_ );

 my $convert = cset_factory 866, 1251;
 &$convert( $str );            # faster call convertor function via ref to his
   convert( 866, 1251, $str ); # slower call convertor function


 use cyrillic qw/866/;   # locale switched to Russian_Russia.866

 use locale;
 print $str =~ /(\w+)/;

 no locale;
 print $str =~ /(\w+)/;


 * Q: Why module say: Cant create Unicode::Map for koi8-r charset!
   A: Your Unicode::Map module cant find map file for koi8-r charset.
      Copy file to site/lib/Unicode/Map and add to file
      site/lib/Unicode/Map/registry followings three strings:

      name:    KOI8-R
      map:     $UnicodeMappings/
      alias:   csKOI8R

 * Q: Why perl say: "Undefined subroutine koi2win called" ?
   A: The function B<koi2win> is specialization of the function B<convert>,
      which is created at inclusion it of the name in the list of import.


Albert MICHEEV <>


Copyright (C) 2000, Albert MICHEEV

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.


The latest version of this library is likely to be available from:


Unicode::String, Unicode::Map.


Hey! <B>The above document had some coding errors, which are explained below:B>
Around line 243: ’=item’ outside of any ’=over’
Around line 281: You forgot a ’=back’ before ’=head1’
Around line 283: ’=item’ outside of any ’=over’
Around line 294: You forgot a ’=back’ before ’=head1’
Around line 319: Non-ASCII character seen before =encoding in ’"XXXXX’. Assuming UTF-8
Search for    or go to Top of page |  Section 3 |  Main Index

perl v5.20.3 CYRILLIC (3) 2001-08-17

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.