GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
utf8::all(3) User Contributed Perl Documentation utf8::all(3)

utf8::all - turn on Unicode - all of it

version 0.024

    use utf8::all;                      # Turn on UTF-8, all of it.
    open my $in, '<', 'contains-utf8';  # UTF-8 already turned on here
    print length 'føø bār';             # 7 UTF-8 characters
    my $utf8_arg = shift @ARGV;         # @ARGV is UTF-8 too (only for main)

The "use utf8" pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions.

"utf8::all" goes further:

  • "charnames" are imported so "\N{...}" sequences can be used to compile Unicode characters based on names.
  • On Perl "v5.11.0" or higher, the "use feature 'unicode_strings'" is enabled.
  • "use feature fc" and "use feature unicode_eval" are enabled on Perl 5.16.0 and higher.
  • Filehandles are opened with UTF-8 encoding turned on by default (including "STDIN", "STDOUT", and "STDERR" when "utf8::all" is used from the "main" package). Meaning that they automatically convert UTF-8 octets to characters and vice versa. If you don't want UTF-8 for a particular filehandle, you'll have to set binmode $filehandle.
  • @ARGV gets converted from UTF-8 octets to Unicode characters (when "utf8::all" is used from the "main" package). This is similar to the behaviour of the "-CA" perl command-line switch (see perlrun).
  • "readdir", "readlink", "readpipe" (including the "qx//" and backtick operators), and "glob" (including the "<>" operator) now all work with and return Unicode characters instead of (UTF-8) octets (again only when "utf8::all" is used from the "main" package).

The pragma is lexically-scoped, so you can do the following if you had some reason to:

    {
        use utf8::all;
        open my $out, '>', 'outfile';
        my $utf8_str = 'føø bār';
        print length $utf8_str, "\n"; # 7
        print $out $utf8_str;         # out as utf8
    }
    open my $in, '<', 'outfile';      # in as raw
    my $text = do { local $/; <$in>};
    print length $text, "\n";         # 10, not 7!

Instead of lexical scoping, you can also use "no utf8::all" to turn off the effects.

Note that the effect on @ARGV and the "STDIN", "STDOUT", and "STDERR" file handles is always global and can not be undone!

As described above, the default behaviour of "utf8::all" is to convert @ARGV and to open the "STDIN", "STDOUT", and "STDERR" file handles with UTF-8 encoding, and override the "readlink" and "readdir" functions and "glob" operators when "utf8::all" is used from the "main" package.

If you want to disable these features even when "utf8::all" is used from the "main" package, add the option "NO-GLOBAL" (or "LEXICAL-ONLY") to the use line. E.g.:

    use utf8::all 'NO-GLOBAL';

If on the other hand you want to enable these global effects even when "utf8::all" was used from another package than "main", use the option "GLOBAL" on the use line:

    use utf8::all 'GLOBAL';

"utf8::all" will handle invalid code points (i.e., utf-8 that does not map to a valid unicode "character"), as a fatal error.

For "glob", "readdir", and "readlink", one can change this behaviour by setting the attribute "$utf8::all::UTF8_CHECK".

By default "utf8::all" marks decoding errors as fatal (default value for this setting is "Encode::FB_CROAK"). If you want, you can change this by setting $utf8::all::UTF8_CHECK. The value "Encode::FB_WARN" reports the encoding errors as warnings, and "Encode::FB_DEFAULT" will completely ignore them. Please see Encode for details. Note: "Encode::LEAVE_SRC" is always enforced.

Important: Only controls the handling of decoding errors in "glob", "readdir", and "readlink".

If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012 <https://metacpan.org/source/PJF/autodie-2.12/Changes#L3>. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 <https://rt.cpan.org/Ticket/Display.html?id=54777> and GH #7 <https://github.com/doherty/utf8-all/issues/7>.

Please report any bugs or feature requests on the bugtracker website <https://github.com/doherty/utf8-all/issues>.

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The "readlink" and "readdir" functions and "glob" operators will therefore not be replaced on these systems.

  • File::Find::utf8 for fully utf-8 aware File::Find functions.
  • Cwd::utf8 for fully utf-8 aware Cwd functions.

  • Michael Schwern <mschwern@cpan.org>
  • Mike Doherty <doherty@cpan.org>
  • Hayo Baan <info@hayobaan.com>

This software is copyright (c) 2009 by Michael Schwern <mschwern@cpan.org>; he originated it.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

2018-01-05 perl v5.40.2

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.