GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  PERLIO::VIA::UNIDECODE (3)

.ds Aq ’

NAME

PerlIO::via::Unidecode - a perlio layer for Unidecode

CONTENTS

SYNOPSIS



  # An example program using the perlio layer:

  % cat utf8translit
  #!/usr/bin/perl
  use strict;
  use PerlIO::via::Unidecode;
  foreach my $fs (@ARGV) {
    open( my $IN,
      <:encoding(utf8):via(Unidecode), # the layers
      $fs
     ) or die "$f -> $!\n";
    print while <$IN>;
    close($IN);
  }
  __END__

  # Were feeding it this file, which is the Chinese
  # characters for Beijing (in UTF8)

  % od -x home_city.txt
  000000:  E5 8C 97 E4 BA B0 0D 0A

  So:

  % utf8translit home_city.txt
  Bei Jing



DESCRIPTION

PerlIO::via::Unidecode implements a PerlIO::via layer that applies Unidecode (Text::Unidecode) to data passed through it.

You can use PerlIO::via::Unidecode on already-Unicode data, as in the example in the SYNOPSIS; or you can combine it with other layers, as in this little program that converts KOI8R text into Unicode and then feeds it to Unidecode, which then outputs an ASCII transliteration:



  % cat transkoi8r
  #!/usr/bin/perl
  use strict;
  use PerlIO::via::Unidecode;
  foreach my $filespec (@ARGV) {
    open(          # Three-argument open is always great
      my $IN,
      <:encoding(koi8-r):via(Unidecode),  # the layers
      $filespec ) or die $!;

    print while <$IN>;
    close($IN);
  }
  __END__

  % cat fet_koi8r.txt
 
  eÏeC\k:,AÄ\k:'LPEÓÂÍÀ\k:'OÛ\k:` IÓ\k:~LPEÓÂaIÒ\k:/IÛÀa OÓÔÌËË\k:'\h |\n:u,
  c\k:,AÄa OÁaOÀÄÃ\k:'U\k:'D-UÌ\k:` OÉ\k:'IÒ\k:/AaO\k:^ EÖÒ\k:~C\k:,IÏ\k:'\h |\n:u
  e\k:'OÓÔÀÓÓÊ\k:'OÌËÏeXIË\k:^ XUÁÜÌÁÁÒÔ\k:'D-IÖÎËË\k:'\h |\n:u,-
        iÂa XO\k:'|\n:u IÒ\k:/ Ie LPAaI\k:'\h |\n:u?

  % transkoi8r fet_koi8r.txt

  Koghda chitala ty muchitielnyie stroki,
  Gdie sierdtsa zvuchnyi pyl siianie liet krughom
  I strasti rokovoi vzdymaiutsia potoki,-
      Nie vspomnila l o chiem?



Of course, you could do this all by manually calling Text::Unidecode’s unidecode(...) function on every line you fetch, but that’s just what :via(...) layers do automatically do for you.

Note that you can also use :via(Unidecode) as an output layer too. In that case, add a dummy :utf8 after it, as below, just to silence some wide character in print warnings that you might otherwise see.



  % cat writebei.pl
  use PerlIO::via::Unidecode;
  open(
    my $OUT,
    ">:via(Unidecode):utf8",  # the layers
    "roman_bei.txt"
   ) or die $!;
  print $OUT "\x{5317}\x{4EB0}\n";
    # those are the Chinese characters for Beijing
  close($OUT);

  % perl writebei.pl
 
  % cat roman_bei.txt
  Bei Jing



FUNCTIONS AND METHODS

This module provides no public functions or methods X everything is done thru the via interface. If you want a function, see Text::Unidecode.

TIPS

Don’t forget the use PerlIO::via::Unidecode; line, and be sure to get the case right.

Don’t type Unicode when you mean Unidecode, nor vice versa.

Handy layer-modes to remember:



  <:encoding(utf8):via(Unidecode)
  <:encoding(some-other-encoding):via(Unidecode)
  >:via(Unidecode):utf8



SEE ALSO

Text::Unidecode

PerlIO::via

Encode and Encode::Supported (even though the modes they implement are called as ":encoding(...)").

PerlIO::via::PinyinConvert

perlunitut and perlunicode

<https://en.wikipedia.org/wiki/Afanasy_Fet>

NOTES

Note that if Unidecode’s transliteration of something changes, so will its effect on :via(Unidecode). So the first word of the above text is Koghda from one particular version of Unidecode, and Kogda from another.

Thanks for Jarkko Hietaniemi for help with this module and many other things besides.

THE POEM

In the first release of this module, I forgot to give the source of the above Russian text! So here it is:

The Russian text is the first stanza of a poem by Afanasy Afanasevich Fet (1822-1892). Above I have shown only its first stanza (Koghda chitala...), first in raw KOI8R, then passed through Unidecode. But here it is, in its entirety:



  XXXXX XXXXXX XX XXXXXXXXXXX XXXXXX,
  XXX XXXXXX XXXXXXX XXX XXXXXX XXXX XXXXXX
  X XXXXXXX XXXXXXX XXXXXXXXXX XXXXXX,X
    XX XXXXXXXXX XX X XXX?

  X XXXXXX XX XXXX! XXXXX X XXXXX, XXX XXXX,
  X XXXXXXXXX XXXXXXX XXXXXXXXXXX XXXX,
  XXXXX XXXXX XXXXX XXXXXXXXX X XXXXXXX
    XXXXXXXX XXXXX XXXX.

  X X XXX XXXXXXX XXXXXXXX XXXX XXXXXX,
  X XXX XXXXXXXXX XXXXX XX XXXXXX XXXX XXXXXX,X
  XXXXX XXXXX XXXX X XX XXXXX XX XXXXXXX:
    XXXX XXXXXXX XXXXXX!X


     XXXXXXXXX XXXXXXXXXXX XXX, 15 XXXXXXX 1887



Its conventional English title is a translation of the first line, When you were reading those tormented linesX which I found rather apt for a poem about mangled encodings.

COPYRIGHT AND DISCLAIMER

With the exception of the text of the poem, this is copyright 2003, 2014, Sean M. Burke sburke@cpan.org, all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The programs and documentation in this dist are distributed in the hope that they will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

AUTHOR

Sean M. Burke sburke@cpan.org
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 PERLIO::VIA::UNIDECODE (3) 2014-07-27

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.