Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Contact Us
Online Help
Domain Status
Man Pages

Virtual Servers

Topology Map

Server Agreement
Year 2038

USA Flag



Man Pages

Manual Reference Pages  -  LINGUA::JA::MOJI (3)

.ds Aq ’


Lingua::JA::Moji - Handle many kinds of Japanese characters



Convert various types of Japanese characters into one another.

    use Lingua::JA::Moji qw/kana2romaji romaji2kana/;
    use utf8;
    my $romaji = kana2romaji (XXXXX);
    # $romaji is now aiueo.
    my $kana = romaji2kana ($romaji);
    # $kana is now XXXXX.


This module provides methods to convert different written forms of Japanese into one another. It enables conversion between romanized Japanese, hiragana, and katakana. It also includes a number of unusual encodings such as Japanese braille and morse code, as well as conversions between Japanese and Cyrillic and Hangul. It also handles conversion between the Chinese characters (kanji) used before and after the character reforms of 1949, as well as the various bracketed and circled forms of kana and kanji.

All the functions in this module assume the use of Unicode encoding. All input and output strings must be encoded using Perl’s UTF-8 format.

The module loads the various data format conversion files on demand, thus the various obscure conversions hopefully do not cause a memory burden.

This module does not handle the conversion of kanji words into kana, or kana into kanji.


These functions convert Japanese letters to and from romanized forms.


Convert kana to romaji.

    use Lingua::JA::Moji kana2romaji;

    $romaji = kana2romaji ("XXXXXXX");
    # $romaji = uresi\k:^kodomo

Convert kana to a romanized form.

An optional second argument, a hash reference, controls the style of conversion.

    use utf8;
    $romaji = kana2romaji ("XXXX", {style => "hepburn"});
    # $romaji = "shimbun"

The options are
style The style of romanization. The default style of romanization is Nippon-shiki. The user can set the conversion style to hepburn or passport or kunrei or common. If Hepburn is selected, then the following option use_m is set to true, and the ve_type is set to macron. The common style is the same as the Hepburn style, but it does things like changing XXXX to jetto rather than ignoring the small vowel.

Possible styles are as follows:
none/empty Without a style, the Nippon-shiki romanization <> is used.
hepburn This gives Hepburn romanization <>.
kunrei This is the form of romanization used in childrens’ education <>.
common This is a modification of the Hepburn system which also changes combinations of large kana + small vowel kana into the commonest romanized form. For example XXXX becomes jetto and XX becomes we.

use_m If this is true, syllabic ns (X) which come before b or p sounds, such as the first n in shinbun (XXXX, newspaper) will be converted into m rather than n.
ve_type The ve_type option controls how long vowels are written. The default is to use circumflexes to represent long vowels.
undef A circumflex is used.
macron A macron is used.
passport Oh is used to write long o vowels, and other long vowels are ignored.
none Long vowels are not indicated.
wapuro Chouon marks become hyphens, and XX becomes ou.

     kana2romaji ("XXXXX", { wo => 1 });

If wo is set to a true value, X becomes wo, otherwise it becomes o.


Convert romaji to kana.

    use Lingua::JA::Moji romaji2kana;

    $kana = romaji2kana (yamaguti);
    # $kana = XXXX

Convert romanized Japanese to katakana. The romanization is highly liberal and will attempt to convert any romanization it sees into katakana. The romanization is based on the behaviour of the Microsoft IME (input method editor). To convert romanized Japanese into hiragana, use romaji2hiragana.

An optional second argument to the function contains options in the form of a hash reference,

     $kana = romaji2kana ($romaji, {wapuro => 1});

Use an option wapuro => 1 to convert long vowels into the equivalent kana rather than chouon.

     $kana = romaji2kana ($romaji, {ime => 1});

Use the ime => 1 option to approximate the behaviour of an IME. For example, input gumma becomes XXX and input onnna becomes XXX. Passport romaji (Ohshimizu) is disallowed if this option is switched on.


Convert romaji to hiragana.

    use Lingua::JA::Moji romaji2hiragana;

    $hiragana = romaji2hiragana (babubo);
    # $hiragana = XXX

Convert romanized Japanese into hiragana. This takes the same options as romaji2kana. It also switches on the wapuro option, which uses long vowels with a kana rather than a chouon.


    use Lingua::JA::Moji romaji_styles;

    my @styles = romaji_styles ();
    # Returns a true value
    romaji_styles ("hepburn");
    # Returns the undefined value
    romaji_styles ("frogs");

Given an argument, this return a true value if it is a known style of romanization.

Without an argument, it returns a list of possible styles, as an array of hash references, with each hash reference containing the short name under the key abbrev and the full name under the key full_name.


    use Lingua::JA::Moji is_voiced;

    if (is_voiced (X)) {
         print "X is voiced.\n";

Given a kana or romaji input, is_voiced returns a true value if the sound is a voiced sound like a, za, ga, etc. and the undefined value if not.


    use Lingua::JA::Moji is_romaji;

    # The following line returns "undef"
    is_romaji ("abcdefg");
    # The following line returns a defined value
    is_romaji (loyehye);
    # The following line returns a defined value
    is_romaji ("atarimae");

This detects whether a string of alphabetical characters, which may also include characters with macrons or circumflexes, looks like romanized Japanese. If the test is successful, it returns a true value, and if the test is unsuccessful, it returns a false value. If the string is empty, it returns a false value.

This works by converting the string to kana via romaji2kana and seeing if it converts cleanly or not.


    use Lingua::JA::Moji is_romaji_strict;

    # The following line returns "undef"
    is_romaji_strict ("abcdefg");
    # The following line returns "undef"
    is_romaji_strict (loyehye);
    # The following line returns a defined value
    is_romaji_strict ("atarimae");

This detects whether a string of alphabetical characters, which may also include characters with macrons or circumflexes, looks like romanized Japanese. If the test is successful, it returns a true value, and if the test is unsuccessful, it returns a false value. If the string is empty, it returns a false value.

This test is much stricter than is_romaji. It insists that the word does not contain constructions which may be valid as inputs to an IME, but which do not look like Japanese words.


    use Lingua::JA::Moji normalize_romaji;

    $normalized = normalize_romaji (tsumuji);

normalize_romaji converts romanized Japanese to a canonical form, which is based on the Nippon-shiki romanization, but without representing long vowels using a circumflex. In the canonical form, sokuon (X) characters are converted into the string xtu. If there is kana in the input string, this will also be converted to romaji.

normalize_romaji is for comparing two Japanese words which may be represented in different ways, for example in different romanization systems, to see if they refer to the same word despite the difference in writing. It does not provide a standardized or officially-sanctioned form of romanization.


These functions convert one form of kana into another.


Convert hiragana to katakana.

    use Lingua::JA::Moji hira2kata;

    $katakana = hira2kata (XXXX);
    # $katakana = XXXX

hira2kata converts hiragana into katakana. The input may be a single string or a list of strings. If the input is a list, it converts each element of the list, and in list context it returns a list of the converted inputs. In scalar context it returns a concatenation of the strings.

    my @katakana = hira2kata (@hiragana);

This does not convert chouon signs.


Convert katakana to hiragana.

    use Lingua::JA::Moji kata2hira;

    $hiragana = kata2hira (XXXXX);
    # $hiragana = XXXXX

kata2hira converts full-width katakana into hiragana. If the input is a list, it converts each element of the list, and in list context, returns a list of the converted inputs, otherwise it returns a concatenation of the strings.

    my @hiragana = hira2kata (@katakana);

This function does not convert chouon signs into long vowels. It also does not convert half-width katakana into hiragana.


Convert kana to katakana.

    use Lingua::JA::Moji kana2katakana;

This converts any of katakana, halfwidth katakana, circled katakana and hiragana to full width katakana.


    use Lingua::JA::Moji kana_to_large;

    $large = kana_to_large (XXXX);
    # $large = XXXX

Convert small-sized kana such as XXX into full-sized kana such as XXX.


    use Lingua::JA::Moji nigori_first;

    my @list = (qw/XX XX XX XX/);
    nigori_first (\@list);
    # Now @list = (qw/XX XX XX XX XX XX XX XX/);

Given a list of kana, add all the possible versions of the words with the first kana with either a dakuten or a handakuten added.


    use Lingua::JA::Moji InHankakuKatakana;

    use utf8;
    if (X =~ /\p{InHankakuKatakana}/) {
        print "X is half-width katakana\n";

InHankakuKatakana is a character class for use in regular expressions with \p which can validate halfwidth katakana.


Convert kana to halfwidth katakana.

    use Lingua::JA::Moji kana2hw;

    $half_width = kana2hw (XXXXXXXXX);
    # $half_width = XXXXXXXXXX

kana2hw converts hiragana, katakana, and fullwidth Japanese punctuation to halfwidth katakana and halfwidth punctuation. Its function is similar to the Emacs command japanese-hankaku-region. For the opposite function, see hw2katakana. See also katakana2hw for a function which only converts katakana.


Convert halfwidth katakana to katakana.

    use Lingua::JA::Moji hw2katakana;

    $full_width = hw2katakana (XXXXXXXXXX);
    # $full_width = XXXXXXXXX

hw2katakana converts halfwidth katakana and halfwidth Japanese punctuation to fullwidth katakana and fullwidth punctuation. Its function is similar to the Emacs command japanese-zenkaku-region. For the opposite function, see kana2hw.


Convert katakana to halfwidth katakana.

    use Lingua::JA::Moji katakana2hw;

    $hw = katakana2hw ("XXXXXXXXXX");
    # $hw = XXXXXXXXXX

This converts katakana to halfwidth katakana, leaving hiragana unchanged. See also kana2hw.


    use Lingua::JA::Moji is_kana;

This function returns a true value if its argument is a string of kana, or an undefined value if not. The input cannot contain punctuation or chouon.


    use Lingua::JA::Moji is_hiragana;

This function returns a true value if its argument is a string of hiragana, and an undefined value if not. The entire string from beginning to end must all be kana for this to return true. The kana cannot include punctuation marks or chouon.


    use Lingua::JA::Moji kana_order;

    $kana_order = kana_order ();

Returns an array reference containing an ordering of the kana. This is useful for looping over the kana or sorting.


    use Lingua::JA::Moji katakana2syllable;

    $syllables = katakana2syllable (XXXXXXXXXXXXXXX);

This breaks the given string into syllables. If the string is broken up character by character, it becomes ’X’, ’X’, ’X’, ’X’, ’X’. This breaks the string up into meaningful syllables, so that $syllables becomes ’XX’, ’XX’, ’X’.


    use Lingua::JA::Moji InKana;

    $is_kana = (XXXXX =~ /^\p{InKana}+$/);
    # $is_kana = 1

A character class for use in regular expressions which matches all kana characters. This class catches meaningful combinations of hiragana, katakana, halfwidth katakana, circled katakana, and katakana combined words.

This is a combination of the existing Perl character classes Katakana, InKatakana, and InHiragana, minus unassigned characters, plus the halfwidth katakana prolonged sound mark (U+FF70) <X> (chouon), the halfwidth katakana voiced sound mark (U+FF9E) <X> (dakuten) and the halfwidth katakana semivoiced sound mark (U+FF9F) <X> (handakuten), minus ’X’, Unicode 30FB, KATAKANA MIDDLE DOT. It is somewhat like the following:


except that the unassigned points which are matched by \p{Katakana} are not matched and KATAKANA MIDDLE DOT is not matched.


    use Lingua::JA::Moji square2katakana;

    $kata = square2katakana (X);
    # $kata = XXX

Convert a square katakana box into its components.


    use Lingua::JA::Moji katakana2square;

    $sq = katakana2square (XXXXXXXX);
    # $sq = XXXXXX

Convert katakana into a square thing if possible.


Functions for handling wide ASCII.


    use Lingua::JA::Moji InWideAscii;

    use utf8;
    if (X =~ /\p{InWideAscii}/) {
        print "X is wide ascii\n";

This is a character class for use with \p which matches wide ASCII


Convert wide ASCII characters to printable ASCII characters.

    use Lingua::JA::Moji wide2ascii;

    $ascii = wide2ascii (XXXXXXX);
    # $ascii = abCE019

Convert wide ASCII into ASCII.


Convert printable ASCII characters to wide ASCII characters.

    use Lingua::JA::Moji ascii2wide;

    $wide = ascii2wide (abCE019);
    # $wide = XXXXXXX

Convert ASCII into wide ASCII.



Convert kana to Japanese morse code (wabun code).

    use Lingua::JA::Moji kana2morse;

    $morse = kana2morse (XXXXXX);
    # $morse = --.-. -- .--. ..-. -..-- ..-

Convert Japanese kana into Morse code. Japanese morse code does not have any way of representing small kana characters, so converting to and then from morse code will result in XXXXXX becoming XXXXXX.


Convert Japanese morse code (wabun code) to kana.

    use Lingua::JA::Moji morse2kana;

    $kana = morse2kana (--.-. -- .--. ..-. -..-- ..-);
    # $kana = XXXXXX

Convert Japanese Morse code into kana. Each Morse code element must be separated by whitespace from the next one.


This has not been extensively tested.


Convert kana to Japanese braille.

    use Lingua::JA::Moji kana2braille;

This converts kana into the equivalent Japanese braille (tenji) forms.


This has not been extensively tested. This is not an adequate Japanese braille convertor. Creating Japanese braille requires breaking Japanese sentences up into individual words, but this does not attempt to do that. People who are interested in building a Perl braille convertor could start here.


Convert Japanese braille to kana.

    use Lingua::JA::Moji braille2kana;

Converts Japanese braille (tenji) into the equivalent katakana.


Convert kana to circled katakana.

    use Lingua::JA::Moji kana2circled;

    $circled = kana2circled (XXXXX);
    # $circled = XXXXX

This function converts kana into the circled katakana of Unicode, which have code points from 32D0 to 32FE. See also circled2kana.

There is no circled form of the X kana, so this is left untouched.


Convert circled katakana to kana.

    use Lingua::JA::Moji circled2kana;

    $kana = circled2kana (XXXXX);
    # $kana = XXXXX

This function converts the circled katakana of Unicode into full-width katakana. See also kana2circled.



Convert Modern kanji to Pre-1949 kanji.

    use Lingua::JA::Moji new2old_kanji;

    $old = new2old_kanji (XX XXX);
    # $old = XX XXX

Convert new-style (post-1949) kanji (Chinese characters) into old-style (pre-1949) kanji.


The list of characters in this convertor may not contain every pair of old/new kanji.

It will not correctly convert X since this has three different equivalents in the old system.


Convert Pre-1949 kanji to Modern kanji.

    use Lingua::JA::Moji old2new_kanji;

    $new = old2new_kanji (XX);
    # $new = XX

Convert old-style (pre-1949) kanji (Chinese characters) into new-style (post-1949) kanji.


    use Lingua::JA::Moji circled2kanji;

    $kanji = circled2kanji (X);
    # $kanji = X

Convert the circled forms of kanji into their uncircled equivalents.


    use Lingua::JA::Moji kanji2circled;

    $kanji = kanji2circled (XX);
    # $kanji = XX

Convert the usual forms of kanji into circled equivalents, if they exist. Note that only a limited number of kanji have circled forms.


    use Lingua::JA::Moji bracketed2kanji;

    $kanji = bracketed2kanji (X);
    # $kanji = X

Convert bracketed form of kanji into unbracketed form.


    use Lingua::JA::Moji kanji2bracketed;

    $kanji = kanji2bracketed (X);
    # $kanji = X

Convert unbracketed form of kanji into bracketed form, if it exists.


This is an experimental cyrillization of kana based on the information in a Wikipedia article, <>. The module author does not know anything about cyrillization of kana, so any assistance in correcting this is very welcome.


Convert kana to the Cyrillic (Russian) alphabet.

    use Lingua::JA::Moji kana2cyrillic;

    $cyril = kana2cyrillic (XXXX);
    # $cyril = XXXXXX


Convert the Cyrillic (Russian) alphabet to katakana.

    use Lingua::JA::Moji cyrillic2katakana;

    $kana = cyrillic2katakana (XXXXXX);
    # $kana = XXXX



    use Lingua::JA::Moji kana2hangul;

    $hangul = kana2hangul (XXXX);
    # $hangul = XXXX

Doesn’t deal with syllabic n
May be incorrect This is based on a list found on the internet at <>. There is currently no proof of correctness.


Other Perl modules on CPAN include

    Japanese kana/romanization

Data::Validate::Japanese This contains four validators for kanji and kana, is_hiragana, corresponding to is_hiragana in this module, and three more, is_kanji, is_katakana, and is_h_katakana, for half-width katakana.
Lingua::JA::Kana This contains convertors for hiragana, half width and full width katakana, and romaji. As of version 0.07 [Aug 06, 2012], the romaji conversion is less complete than this module.
Lingua::JA::Romanize::Japanese Romanization of Japanese. The module also includes romanization of kanji via the kakasi kanji to romaji convertor, and other functions.
Lingua::JA::Romaji::Valid Validate romanized Japanese. This module does the same thing as is_romaji in Lingua::JA::Moji.
Lingua::JA::Hepburn::Passport Passport romanization, which means converting long vowels into OH. This corresponds to kana2romaji in the current module using the passport => 1 option, for example

    $romaji = kana2romaji ("XXX", {style => hepburn, passport => 1});

Lingua::JA::Fold Full/half width conversion, collation of Japanese text.
Lingua::JA::Romaji Romaji to kana/kana to romaji conversion.
Lingua::JA::Regular::Unicode This includes hiragana to katakana, full width / half width, and wide ascii conversion. The strange name is due to its being an extension of Lingua::JA::Regular using Unicode-encoded strings.
Lingua::JA::NormalizeText A huge collection of normalization functions for Japanese text. If Lingua::JA::Moji does not have it, Lingua::JA::NormalizeText may do.
Lingua::KO::Munja This is similar to the present module for Korean.

    Kana/kanji conversion

Lingua::JA::Romanize::MeCab Romanization of Japanese language with MeCab
Lingua::JA::Romanize::Japanese Romanization of Japanese language via kakasi.


Parts of this module are covered in the book Perl CPAN Module Guide by Naoki Tomita (in Japanese), ISBN 978-4862671080, published by WEB+DB PRESS plus, April 2011.



The long vowel marker, X, or chXon, which is used in Japanese katakana to indicate a lengthened vowel.

    wide ASCII

Wide ASCII, fullwidth ASCII, or zenkaku eisXji (XXXXX) are a legacy of bitmapped fonts which has survived into the present day. Wide ascii characters were originally special bitmapped font characters created to be the same size as one kanji or kana character. The name for normal ASCII characters in Japanese is hankaku eisXji (XXXXX), literally half width English letters and numerals.

    Halfwidth katakana

Halfwidth katakana, hankaku katakana (XXXXXX) is a legacy encoding of katakana based on an eight-bit encoding. See <> for full details.


This module exports its functions only on request. To export all the functions in the module,

    use Lingua::JA::Moji :all;


Ben Bullock, <>


Copyright 2008-2014 Ben Bullock, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Thanks to Naoki Tomita, David Steinbrunner, and Neil Bowers for fixes.
Search for    or go to Top of page |  Section 3 |  Main Index

perl v5.20.3 LINGUA::JA::MOJI (3) 2015-02-12

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.