mdbGeneral - General Format
The mdatabase_load() function returns the data specified by tags in the form of
plist if the first tag is not Mchartable nor Mcharset. The keys of the
returned plist are limited to Minteger, Msymbol, Mtext, and Mplist. The type
of the value is unambiguously determined by the corresponding key. If the key
is Minteger, the value is an integer. If the key is Msymbol, the value is a
symbol. And so on.
A number of expressions are possible to represent a plist. For instance, we can
use the form (K1:V1, K2:V2, ..., Kn:Vn) to represent a plist whose first
property key and value are K1 and V1, second key and value are K2 and V2, and
so on. However, we can use a simpler expression here because the types of
plists used in the m17n database are fairly restricted.
Hereafter, we use an expression, which is similar to S-expression, to represent
a plist. (Actually, the default database loader of the m17n library is
designed to read data files written in this expression.)
The expression consists of one or more elements
. Each element represents
a property, i.e. a single element of a plist.
Elements are separated by one or more whitespaces
, i.e. a space (code
32), a tab (code 9), or a newline (code 10). Comments begin with a semicolon
(;) and extend to the end of the line.
The key and the value of each property are determined based on the type of the
element as explained below.
An element that matches the regular expression -?[0-9]+ or 0[xX][0-9A-Fa-f]+
represents a property whose key is Minteger. An element matching the former
expression is interpreted as an integer in decimal notation, and one matching
the latter is interpreted as an integer in hexadecimal notation. The value of
the property is the result of interpretation.
For instance, the element 0xA0 represents a property whose value is 160 in
An element that matches the regular expression [^-0-9(]([^\()]|\.)+ represents a
property whose key is Msymbol. In the element, \t, \n, \r, and \e are replaced
with tab (code 9), newline (code 10), carriage return (code 13), and escape
(code 27) respectively. Other characters following a backslash is interpreted
as it is. The value of the property is the symbol having the resulting string
as its name.
For instance, the element abc\ def represents a property whose value is the
symbol having the name 'abc def'.
An element that matches the regular expression '([^']|\')*' represents a
property whose key is Mtext. The backslash escape explained above also applies
here. r, each part in the element matching the regular expression
\[xX][0-9A-Fa-f][0-9A-Fa-f] is replaced with its hexadecimal interpretation.
After having resolved the backslash escapes, the byte sequence between the
double quotes is interpreted as a UTF-8 sequence and decoded into an M-text.
This M-text is the value of the property.
Zero or more elements surrounded by a pair of parentheses represent a property
whose key is Mplist. Whitespaces before and after a parenthesis can be
omitted. The value of the property is a plist, which is the result of
recursive interpretation of the elements between the parentheses.
In an explanation of a plist format of data, a BNF-like notation is used. In the
notation, non-terminals are represented by a string of uppercase letters
(including '-' in the middle), terminals are represented by a string
surrounded by '''. Special non-terminals INTEGER, SYMBOL, MTEXT and PLIST
represents property integer, symbol, M-text, or plist respectively.
Here is an example of database data that is read into a plist of this simple
[ INTEGER | SYMBOL | MTEXT | FUNC ] *
'(' FUNC-NAME FUNC-ARG * ')'
INTEGER | SYMBOL | MTEXT | '(' FUNC-ARG ')'
For instance, a data file that contains this text matches the above syntax:
abc 123 (pqr 0xff) "m
and is read into this plist:
1st element: key: Msymbol, value: abc
2nd element: key: Minteger, value: 123
3rd element: key: Mplist, value: a plist of these elements:
1st element: key Msymbol, value: pgr
2nd element: key Minteger, value: 255
4th element: key: Mtext, value: m"text
5th element: key: Mplist, value: a plist of these elements:
1st element: key: Msymbol, value: __
2nd element: key: Mplist, value: a plist of these elements:
1st element: key: Mtext, value: string
2nd element: key: Msymbol, value: xyz
3rd element: key: Minteger, value: -456
Copyright (C) 2001 Information-technology Promotion Agency (IPA)
Copyright (C) 2001-2009 National Institute of Advanced Industrial Science and
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License