Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode
use Lingua::KO::Hangul::Util qw(:all);
decomposeSyllable("\x{AC00}"); # "\x{1100}\x{1161}"
composeSyllable("\x{1100}\x{1161}"); # "\x{AC00}"
decomposeJamo("\x{1101}"); # "\x{1100}\x{1100}"
composeJamo("\x{1100}\x{1100}"); # "\x{1101}"
getHangulName(0xAC00); # "HANGUL SYLLABLE GA"
parseHangulName("HANGUL SYLLABLE GA"); # 0xAC00
A Hangul syllable consists of Hangul jamo (Hangul letters).
Hangul letters are classified into three classes:
CHOSEONG (the initial sound) as a leading consonant (L),
JUNGSEONG (the medial sound) as a vowel (V),
JONGSEONG (the final sound) as a trailing consonant (T).
Any Hangul syllable is a composition of (i) L + V, or (ii) L + V +
T.
- "$resultant_string = decomposeSyllable($string)"
- It decomposes a precomposed syllable
("LV" or
"LVT") to a sequence of conjoining jamo
("L + V" or "L + V
+ T") and returns the result as a string.
Any characters other than Hangul syllables are not
affected.
- "$resultant_string = composeSyllable($string)"
- It composes a sequence of conjoining jamo ("L +
V" or "L + V + T") to a
precomposed syllable ("LV" or
"LVT") if possible, and returns the
result as a string. A syllable "LV" and
final jamo "T" are also composed.
Any characters other than Hangul jamo and syllables are not
affected.
- "$resultant_string = decomposeJamo($string)"
- It decomposes a complex jamo to a sequence of simple jamo if possible, and
returns the result as a string. Any characters other than complex jamo are
not affected.
e.g.
CHOSEONG SIOS-PIEUP to CHOSEONG SIOS + PIEUP
JUNGSEONG AE to JUNGSEONG A + I
JUNGSEONG WE to JUNGSEONG U + EO + I
JONGSEONG SSANGSIOS to JONGSEONG SIOS + SIOS
- "$resultant_string = composeJamo($string)"
- It composes a sequence of simple jamo ("L1 +
L2", "V1 + V2 + V3", etc.)
to a complex jamo if possible, and returns the result as a string. Any
characters other than simple jamo are not affected.
e.g.
CHOSEONG SIOS + PIEUP to CHOSEONG SIOS-PIEUP
JUNGSEONG A + I to JUNGSEONG AE
JUNGSEONG U + EO + I to JUNGSEONG WE
JONGSEONG SIOS + SIOS to JONGSEONG SSANGSIOS
- "$resultant_string = decomposeFull($string)"
- It decomposes a syllable/complex jamo to a sequence of simple jamo.
Equivalent to
"decomposeJamo(decomposeSyllable($string))".
- "$string_decomposed = decomposeHangul($code_point)"
- "@codepoints = decomposeHangul($code_point)"
- If the specified code point is of a Hangul syllable, it returns a list of
code points (in a list context) or a string (in a scalar context) of its
decomposition.
decomposeHangul(0xAC00) # U+AC00 is HANGUL SYLLABLE GA.
returns "\x{1100}\x{1161}" or (0x1100, 0x1161);
decomposeHangul(0xAE00) # U+AE00 is HANGUL SYLLABLE GEUL.
returns "\x{1100}\x{1173}\x{11AF}" or (0x1100, 0x1173, 0x11AF);
Otherwise, returns false (empty string or empty list).
decomposeHangul(0x0041) # outside Hangul syllables
returns empty string or empty list.
- "$string_composed = composeHangul($src_string)"
- "@code_points_composed = composeHangul($src_string)"
- Any sequence of an initial jamo "L" and
a medial jamo "V" is composed to a
syllable "LV"; then any sequence of a
syllable "LV" and a final jamo
"T" is composed to a syllable
"LVT".
Any characters other than Hangul jamo and syllables are not
affected.
composeHangul("\x{1100}\x{1173}\x{11AF}.")
# returns "\x{AE00}." or (0xAE00,0x2E);
- "$code_point_composite = getHangulComposite($code_point_here,
$code_point_next)"
- It returns the codepoint of the composite if both two code points,
$code_point_here and
$code_point_next, are in Hangul, and composable.
Otherwise, returns
"undef".
The following functions handle only a precomposed Hangul syllable (from
"U+AC00" to
"U+D7A3"), but not a Hangul jamo or other
Hangul-related character.
Names of Hangul syllables have a format of
"HANGUL SYLLABLE %s".
- "$name = getHangulName($code_point)"
- If the specified code point is of a Hangul syllable, it returns its name;
otherwise it returns undef.
getHangulName(0xAC00) returns "HANGUL SYLLABLE GA";
getHangulName(0x0041) returns undef.
- "$codepoint = parseHangulName($name)"
- If the specified name is of a Hangul syllable, it returns its code point;
otherwise it returns undef.
parseHangulName("HANGUL SYLLABLE GEUL") returns 0xAE00;
parseHangulName("LATIN SMALL LETTER A") returns undef;
parseHangulName("HANGUL SYLLABLE PERL") returns undef;
# Regrettably, HANGUL SYLLABLE PERL does not exist :-)
Standard Korean syllable block consists of "L+ V+
T*" (a sequence of one or more L, one or more V, and zero or more
T) according to conjoining jamo behabior revised in Unicode 3.2 (cf. UAX #28).
A sequence of "L" followed by
"T" is not a syllable block without
"V", but consists of two nonstandard
syllable blocks: one without "V", and
another without "L" and
"V".
- "$bool = isStandardForm($string)"
- It returns boolean whether the string is encoded in the standard form
without a nonstandard sequence. It returns true only if the string
contains no nonstandard sequence.
- "$resultant_string = insertFiller($string)"
- It transforms the string into standard form by inserting fillers into each
syllables and returns the result as a string. Choseong filler
("Lf",
"U+115F") is inserted into a syllable
block without "L". Jungseong filler
("Vf",
"U+1160") is inserted into a syllable
block without "V".
- "$type = getSyllableType($code_point)"
- It returns the Hangul syllable type (cf. HangulSyllableType.txt)
for the specified code point as a string:
"L" for leading jamo,
"V" for vowel jamo,
"T" for trailing jamo,
"LV" for LV syllables,
"LVT" for LVT syllables, and
"NA" for other code points (as
Not Applicable).
By default:
decomposeHangul
composeHangul
getHangulName
parseHangulName
getHangulComposite
On request:
decomposeSyllable
composeSyllable
decomposeJamo
composeJamo
decomposeFull
isStandardForm
insertFiller
getSyllableType
This module does not support Hangul jamo assigned in Unicode 5.2.0 (2009).
A list of Hangul charcters this module supports:
1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH
115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA
11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH
AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH
SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2001, 2003, 2005, SADAHIRO Tomoyuki. Japan. All
rights reserved.
This module is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
- Unicode Normalization Forms (UAX #15)
- <http://www.unicode.org/reports/tr15/>
- Conjoining Jamo Behavior (revision) in UAX #28
- <http://www.unicode.org/reports/tr28/#3_11_conjoining_jamo_behavior>
- Hangul Syllable Type
- <http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt>
- Jamo Decomposition in Old Unicode
- <http://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt>
- ISO/IEC JTC1/SC22/WG20 N954
- Paper by K. KIM: New canonical decomposition and composition processes for
Hangeul
<http://std.dkuug.dk/JTC1/SC22/WG20/docs/N954.PDF>
(summary:
<http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF>) (cf.
<http://std.dkuug.dk/JTC1/SC22/WG20/docs/documents.html>)