GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
MARC::Charset(3) User Contributed Perl Documentation MARC::Charset(3)

MARC::Charset - convert MARC-8 encoded strings to UTF-8

    # import the marc8_to_utf8 function
    use MARC::Charset 'marc8_to_utf8';
   
    # prepare STDOUT for utf8
    binmode(STDOUT, 'utf8');

    # print out some marc8 as utf8
    print marc8_to_utf8($marc8_string);

MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.

    http://www.loc.gov/marc/specifications/spechome.html

Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters.

    my $ignore = MARC::Charset->ignore_errors();
    
    MARC::Charset->ignore_errors(1); # ignore errors
    MARC::Charset->ignore_errors(0); # DO NOT ignore errors

Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters.

    my $setting = MARC::Charset->assume_unicode();
    
    MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
    MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode

Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters.

    my $setting = MARC::Charset->assume_encoding();
    
    MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
    MARC::Charset->assume_encoding(''); # DO NOT assume any encoding

Converts a MARC-8 encoded string to UTF-8.

    my $utf8 = marc8_to_utf8($marc8);

If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:

    my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');

  or
  
    MARC::Charset->ignore_errors(1);
    my $utf8 = marc8_to_utf8($marc8);

Will attempt to translate utf8 into marc8.

    my $marc8 = utf8_to_marc8($utf8);

If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:

    my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');

  or
  
    MARC::Charset->ignore_errors(1);
    my $utf8 = marc8_to_utf8($marc8);

If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:

    use MARC::Charset::Constants qw(:all);
    $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
    $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;

  • MARC::Charset::Constant
  • MARC::Charset::Table
  • MARC::Charset::Code
  • MARC::Charset::Compiler
  • MARC::Record
  • MARC::XML

Ed Summers (ehs@pobox.com)
2013-08-14 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.