GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  MARC::CHARSET (3)

.ds Aq ’

NAME

MARC::Charset - convert MARC-8 encoded strings to UTF-8

CONTENTS

SYNOPSIS



    # import the marc8_to_utf8 function
    use MARC::Charset marc8_to_utf8;
  
    # prepare STDOUT for utf8
    binmode(STDOUT, utf8);

    # print out some marc8 as utf8
    print marc8_to_utf8($marc8_string);



DESCRIPTION

MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.



    http://www.loc.gov/marc/specifications/spechome.html



EXPORTS

ignore_errors()

Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters.



    my $ignore = MARC::Charset->ignore_errors();
   
    MARC::Charset->ignore_errors(1); # ignore errors
    MARC::Charset->ignore_errors(0); # DO NOT ignore errors



assume_unicode()

Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters.



    my $setting = MARC::Charset->assume_unicode();
   
    MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
    MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode



assume_encoding()

Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters.



    my $setting = MARC::Charset->assume_encoding();
   
    MARC::Charset->assume_encoding(cp850); # assume characters are cp850
    MARC::Charset->assume_encoding(); # DO NOT assume any encoding



marc8_to_utf8()

Converts a MARC-8 encoded string to UTF-8.



    my $utf8 = marc8_to_utf8($marc8);



If you’d like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:



    my $utf8 = marc8_to_utf8($marc8, ignore-errors);

  or
 
    MARC::Charset->ignore_errors(1);
    my $utf8 = marc8_to_utf8($marc8);



utf8_to_marc8()

Will attempt to translate utf8 into marc8.



    my $marc8 = utf8_to_marc8($utf8);



If you’d like to ignore errors, or characters that can’t be converted to marc8 then pass in a true value as the second parameter:



    my $marc8 = utf8_to_marc8($utf8, ignore-errors);

  or
 
    MARC::Charset->ignore_errors(1);
    my $utf8 = marc8_to_utf8($marc8);



DEFAULT CHARACTER SETS

If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:



    use MARC::Charset::Constants qw(:all);
    $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
    $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;



SEE ALSO

o MARC::Charset::Constant
o MARC::Charset::Table
o MARC::Charset::Code
o MARC::Charset::Compiler
o MARC::Record
o MARC::XML

AUTHOR

Ed Summers (ehs@pobox.com)
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 MARC::CHARSET (3) 2013-08-14

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.