GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
MROONGA(1) Mroonga MROONGA(1)

mroonga - Mroonga Documentation

For the release history: News

Mroonga is a MySQL storage engine based on Groonga, the full text search engine.

In MySQL 5.1 or later, Pluggable Storage Engine interface is introduced, and we can use custom storage engines easily. So we implement Mroonga, so that we can use Groonga through MySQL.

By using Mroonga, you can use Groonga with SQL.

To support Japanese full text search, Tritonn was developed by embedding Senna, the predecessor of Groonga, in MySQL. Mroogna is its successor.

Since Tritonn was the modified version of MySQL, we need to build it by ourselves or use binary files provided by Tritonn project, thus we cannot use the official binary files provided by MySQL.

On the other hand, Mroonga is an independent program (shared library) using Pluggable Storage Engine interface, and we can dynamically load it on MySQL's official binary. So we can use it more easily than Tritonn.

Comparing to Senna, Groonga has much better throughput in adding or updating index.

Mroonga also has the benefit of this performance improvement.

In Tritonn, we use MyISAM storage engine, thus we have a exclusive table lock by updating data (and index), and it prevents the performance of search.

But in Mroonga, we no longer have this issue, and the performance of search is better especially in frequent data update cases.

Groonga supports not only the full text search, but also the fast geolocation search using index. And MySQL also has the syntax for geolocation search. With Mroonga, you can use Groonga's fast geolocation search by using MySQL's geolocation SQL syntax.

Mroonga stores the data by using Groonga's DB API. And its storage file's format is same as that of the file that is managed by Groonga itself only. Therefore you can share the same Groonga storage like below.
  • Store data through Mroonga (MySQL) and search from Groonga server.
  • Store data through Groonga server and search from Mroonga (MySQL).

And Groonga's storage file can be shared with multi-processes and multi-threads, so that we can invoke several search queries to the same storage file simultaneously.

Mroonga has two running modes.

One is "storage mode", that is the default mode, and we use Groonga for both storing data and searching. With this mode, you can have full benefits of Groonga described above, like fast data update, lock-free full text search and geolocation search. But it does not support transactions.

Another one is "wrapper mode", that adds full text search function on other storage engines like MyISAM or InnoDB. With this mode, you can use Groonga's fast full text search with having the benefits of the storage engine, ex. transaction in InnoDB. But you cannot have benefits from Groonga's read-lock free characteristic. And you might have the performance bottle neck in the storage engine in updating data.

Mroonga supports many of the major platforms.

Refer to Install for supported platforms.

This section describes how to install Mroonga on each environment. There are packages for major platforms. It's recommended that you use package instead of building Mroonga by yourself. But don't worry. There is a document about building Mroonga from source.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data.

This section describes how to install Mroonga on Windows. You can install Mroogna by extracting a zip package.

Mroonga binary for Windows is provided with MariaDB binary because some changes are needed for building Mroonga for Windows.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data.

CAUTION:
The following MSI files don't work yet. Please use zip files below or help us to creating MSI files that work well.


Download MSI file and execute it. You need to choose a MSI for your environment.

Choose win32 version for 32-bit environment, winx64 version for 64-bit environment:

  • mariadb-10.0.10-with-mroonga-4.02-win32.msi
  • mariadb-10.0.10-with-mroonga-4.02-winx64.msi



Download zip file and extract it. You need to choose a zip for your environment:
MariaDB with Mroonga-12.02



Zip packages are pre-configured for easy to use, so no need to execute INSTALL PLUGIN and CREATE FUNCTION.

Just start mysqld by following command:

> mysqld.exe --defaults-file=.\MY-PREFERRED-INI.ini --console


Each zip package contains ini files (my-small.ini, my-medium.ini, my-large.ini and so on), choose preferred ini file which meets on your demand.

Next connect to MariaDB by following command:

> mysql.exe
MariaDB [(none)]> SHOW ENGINES;
+--------------------+---------+------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                    | Transactions | XA   | Savepoints |
+--------------------+---------+------------------------------------------------------------+--------------+------+------------+
| CSV                | YES     | CSV storage engine                                         | NO           | NO   | NO         |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                         | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables  | NO           | NO   | NO         |
| MyISAM             | YES     | MyISAM storage engine                                      | NO           | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                      | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, and foreign keys | YES          | YES  | YES        |
| Mroonga            | YES     | CJK-ready fulltext search, column store                    | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                     | NO           | NO   | NO         |
+--------------------+---------+------------------------------------------------------------+--------------+------+------------+
8 rows in set (0.00 sec)


You need to use Mroonga bundled MariaDB source provided by the Mroonga project.

You can find it in https://packages.groonga.org/source/mroonga/. Mroonga bundled MariaDB source has mariadb-${MARIADB_VERSION}-with-mroonga-${MROONGA_VERSION}.zip file name.

You can build the source code with the standard MariaDB build process.

You need to register Mroonga after building MariaDB. Use SQL at ${MARIADB_BUILD_DIR}\storage\mroonga\data\install.sql to register Mroonga.

This section describes how to install Mroonga on macOS. You can install Mroonga by Homebrew.

See mroonga/homebrew-mroonga on GitHub for details.

This section describes how to install Mroonga related deb packages on Debian GNU/Linux. You can install them by apt.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data.

Install:

% sudo apt update
% sudo apt install -y -V apt-transport-https
% sudo apt install -y -V wget
% wget https://packages.groonga.org/debian/groonga-apt-source-latest-bullseye.deb
% sudo apt install -y -V ./groonga-apt-source-latest-bullseye.deb
% sudo apt update
% sudo apt install -y -V mariadb-server-10.5-mroonga


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt install -y -V groonga-tokenizer-mecab


Install:

% sudo apt update
% sudo apt install -y -V apt-transport-https
% sudo apt install -y -V wget
% wget https://packages.groonga.org/debian/groonga-apt-source-latest-bullseye.deb
% wget https://repo.mysql.com/mysql-apt-config_0.8.17-1_all.deb
% sudo apt install -y -V ./groonga-apt-source-latest-bullseye.deb
% sudo env DEBIAN_FRONTEND=noninteractive MYSQL_SERVER_VERSION=mysql-8.0 apt install -y ./mysql-apt-config_0.8.17-1_all.deb
% sudo apt update
% sudo apt install -y -V mysql-community-8.0-mroonga


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt install -y -V groonga-tokenizer-mecab


Install:

% sudo apt update
% sudo apt install -y -V apt-transport-https
% sudo apt install -y -V wget
% wget https://packages.groonga.org/debian/groonga-apt-source-latest-buster.deb
% sudo apt install -y -V ./groonga-apt-source-latest-buster.deb
% sudo apt update
% sudo apt install -y -V mariadb-server-10.3-mroonga


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt install -y -V groonga-tokenizer-mecab


Install:

% sudo apt update
% sudo apt install -y -V apt-transport-https
% sudo apt install -y -V wget
% wget https://packages.groonga.org/debian/groonga-apt-source-latest-buster.deb
% wget https://repo.mysql.com/mysql-apt-config_0.8.17-1_all.deb
% sudo apt install -y -V ./groonga-apt-source-latest-buster.deb
% sudo env DEBIAN_FRONTEND=noninteractive MYSQL_SERVER_VERSION=mysql-5.7 apt install -y ./mysql-apt-config_0.8.17-1_all.deb
% sudo apt update
% sudo apt install -y -V mysql-community-5.7-mroonga


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt install -y -V groonga-tokenizer-mecab


Install:

% sudo apt update
% sudo apt install -y -V apt-transport-https
% sudo apt install -y -V wget
% wget https://packages.groonga.org/debian/groonga-apt-source-latest-buster.deb
% wget https://repo.mysql.com/mysql-apt-config_0.8.17-1_all.deb
% sudo apt install -y -V ./groonga-apt-source-latest-buster.deb
% sudo env DEBIAN_FRONTEND=noninteractive MYSQL_SERVER_VERSION=mysql-8.0 apt install -y ./mysql-apt-config_0.8.17-1_all.deb
% sudo apt update
% sudo apt install -y -V mysql-community-8.0-mroonga


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt install -y -V groonga-tokenizer-mecab


This section describes how to install Mroonga related deb packages on Ubuntu. You can install them by apt.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data.

The Mroonga APT repository for Ubuntu uses PPA (Personal Package Archive) on Launchpad. You can install Mroonga by APT from the PPA.

Here are supported Ubuntu versions:

  • 18.04 Bionic Beaver
  • 20.04 Focal Fossa
  • 21.10 Impish Indri



Here are Ubuntu versions that supports MySQL:

  • 18.04 Bionic Beaver
  • 20.04 Focal Fossa
  • 21.10 Impish Indri



Here are Ubuntu versions that supports MariaDB:

20.04 Focal Fossa



Enable the universe repository and the security update repository to install Mroonga:

% sudo apt-get install -y -V software-properties-common lsb-release
% sudo add-apt-repository -y universe
% sudo add-apt-repository "deb http://security.ubuntu.com/ubuntu $(lsb_release --short --codename)-security main restricted"


Add the ppa:groonga/ppa PPA to your system:

% sudo add-apt-repository -y ppa:groonga/ppa
% sudo apt-get update


Install Mroonga for MySQL:

% sudo apt-get install -y -V mysql-server-mroonga


Install Mroonga for MariaDB:

% sudo apt-get install -y -V mariadb-server-mroonga


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get install -y -V groonga-tokenizer-mecab


This section describes how to install Mroonga related RPM packages on AlmaLinux. You can install them by dnf.

You can use Oracle's MySQL packages version 8.0 on AlmaLinux 8 since Mroonga 11.10 release.

NOTE:

There are already known issues about MySQL 8.0.
  • Wrapper mode Wrapper mode is not supported yet
  • Storage mode Storage mode does not support the following feature.
The feature of relevant to the optimization.




Install:

% sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm
% sudo dnf install -y https://repo.mysql.com/mysql80-community-release-el8.rpm
% sudo dnf install -y groonga-libs
% sudo dnf install --disablerepo=AppStream -y --enablerepo=epel mysql-community-8.0-mroonga
(% sudo systemctl start mysqld)
(% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //'))
(% sudo mysqladmin -u root --password="${tmp_password}" password)


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab


You can use Percona Server packages version 8.0 on AlmaLinux 8 since Mroonga 11.10 release.

Install:

% sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm
% sudo dnf install -y https://repo.percona.com/release/percona-release-latest.noarch.rpm
% sudo dnf install -y --enablerepo=epel percona-server-8.0-mroonga
(% sudo systemctl start mysqld)
(% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //'))
(% sudo mysqladmin -u root --password="${tmp_password}" password)


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.3 on AlmaLinux 8 since Mroonga 11.10 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.3/centos8-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm
% sudo dnf install -y boost-program-options
% sudo dnf install --disablerepo=AppStream -y MariaDB-server
% sudo systemctl start mariadb
% sudo dnf install -y --enablerepo=epel mariadb-10.3-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.4 on AlmaLinux 8 since Mroonga 11.10 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.4/centos8-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm
% sudo dnf install -y boost-program-options
% sudo dnf install --disablerepo=AppStream -y MariaDB-server
% sudo systemctl start mariadb
% sudo dnf install -y --enablerepo=epel mariadb-10.4-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.5 on AlmaLinux 8 since Mroonga 11.10 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.5/centos8-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm
% sudo dnf install -y boost-program-options
% sudo dnf install --disablerepo=AppStream -y MariaDB-server
% sudo systemctl start mariadb
% sudo dnf install -y --enablerepo=epel mariadb-10.5-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.6 on AlmaLinux 8 since Mroonga 11.10 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.6/centos8-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo dnf install -y https://packages.groonga.org/almalinux/8/groonga-release-latest.noarch.rpm
% sudo dnf install -y boost-program-options
% sudo dnf install --disablerepo=AppStream -y MariaDB-server
% sudo systemctl start mariadb
% sudo dnf install -y --enablerepo=epel mariadb-10.6-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo dnf install -y --enablerepo=epel groonga-tokenizer-mecab


This section describes how to install Mroonga related RPM packages on CentOS. You can install them by yum.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data.

You can use Oracle's MySQL packages version 5.7 on CentOS 7 since Mroonga 5.09 release.

Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y https://repo.mysql.com/mysql-community-release-el7.rpm
% sudo yum install -y yum-utils
% sudo yum-config-manager --disable mysql80-community
% sudo yum-config-manager --enable mysql57-community
% sudo yum install -y --enablerepo=epel mysql-community-5.7-mroonga
(% sudo systemctl start mysqld)
(% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //'))
(% sudo mysqladmin -u root --password="${tmp_password}" password)


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use Oracle's MySQL packages version 8.0 on CentOS 7 since Mroonga 9.04 release.

NOTE:

There are already known issues about MySQL 8.0.
  • Wrapper mode Wrapper mode is not supported yet
  • Storage mode JSON data type is not supported yet



Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y https://repo.mysql.com/mysql-community-release-el7.rpm
% sudo yum install -y --enablerepo=epel mysql-community-8.0-mroonga
(% sudo systemctl start mysqld)
(% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //'))
(% sudo mysqladmin -u root --password="${tmp_password}" password)


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use Percona Server packages version 5.7 on CentOS 7 since Mroonga 6.02 release.

Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
% sudo yum install -y --enablerepo=epel percona-server-5.7-mroonga
(% sudo systemctl start mysqld)
(% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //'))
(% sudo mysqladmin -u root --password="${tmp_password}" password)


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use Percona Server packages version 8.0 on CentOS 7 since Mroonga 10.06 release.

Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
% sudo yum install -y --enablerepo=epel percona-server-8.0-mroonga
(% sudo systemctl start mysqld)
(% tmp_password=$(sudo grep 'A temporary password' /var/log/mysqld.log | sed -e 's/^.*: //'))
(% sudo mysqladmin -u root --password="${tmp_password}" password)


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.2 on CentOS 7 since Mroonga 7.06 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.2/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y MariaDB-server
% sudo systemctl start mariadb
% sudo yum install -y --enablerepo=epel mariadb-10.2-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.3 on CentOS 7 since Mroonga 7.11 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.3/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y MariaDB-server
% sudo systemctl start mariadb
% sudo yum install -y --enablerepo=epel mariadb-10.3-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.4 on CentOS 7 since Mroonga 9.07 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.4/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y MariaDB-server
% sudo systemctl start mariadb
% sudo yum install -y --enablerepo=epel mariadb-10.4-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.5 on CentOS 7 since Mroonga 10.06 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.5/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y MariaDB-server
% sudo systemctl start mariadb
% sudo yum install -y --enablerepo=epel mariadb-10.5-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


You can use MariaDB's MariaDB packages version 10.6 on CentOS 7 since Mroonga 11.09 release.

Create /etc/yum.repos.d/MariaDB.repo with the following content:

[mariadb]
name = MariaDB
baseurl = https://yum.mariadb.org/10.6/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1


Install:

% sudo yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm
% sudo yum install -y MariaDB-server
% sudo systemctl start mariadb
% sudo yum install -y --enablerepo=epel mariadb-10.6-mroonga
(% sudo mysqladmin -u root password 'new-password')


If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y --enablerepo=epel groonga-tokenizer-mecab


MariaDB 10.0.x is adopted on Fedora 21. As Mroonga is now bundled with MariaDB since 10.0.15, you can use bundled version of Mroonga.

Note that MariaDB bundled Mroonga version is a bit old.

Mroonga is also available via Docker Hub.

See Quick Start guide to start.

This section describes how to install Mroonga from source code. If there is no package no your environment, you need to build Mroonga from source code.

Mroonga needs some tools, libraries and MySQL for build. You can use MariaDB instead of MySQL.

Here are required tools:
  • wget, curl or Web browser for downloading source archive
  • tar and gzip for extracting source archive
  • shell (many shells such as dash, bash and zsh will work)
  • C compiler and C++ compiler (gcc and g++ are supported but other compilers may work)
  • make (GNU make is supported but other make like BSD make will work)
  • pkg-config for detecting libraries



You must get them ready.

You can use CMake instead of shell but this document doesn't describe about building with CMake.

Here are optional tools:

sudo for installing built Mroonga.



Here are required libraries:
  • Groonga. (If you use package, install development package such as libgroonga-dev for deb or groonga-devel for RPM.)
  • groonga-normalizer-mysql.



Here are optional libraries:

MeCab: Japanese morphological analysis system

NOTE:

If you want to use indexes of tokenizing of each morpheme for full text search, install MeCab before installing Groonga.





Mroonga needs not only installed MySQL but also MySQL source and build directory. You can't use MySQL package. It doesn't provide MySQL source and build directory. You need MySQL source and build directory!

If you use MariaDB instead of MySQL, you need MariaDB source.

Download the latest MySQL 5.6 source code, then build and install it.

SEE ALSO:

Download MySQL Community Server


Here we assume that you use mysql-5.6.21 and its source code is extracted in the following directory:

/usr/local/src/mysql-5.6.21


Then build in the following directory:

/usr/local/build/mysql-5.6.21


Here are command lines to build and install MySQL:

% cd /usr/local/build/mysql-5.6.21
% cmake /usr/local/src/mysql-5.6.21
% make
% sudo make install


And we assume that MySQL is installed in the following directory:

/usr/local/mysql


Mroonga uses GNU build system. So the following is the simplest build steps:

% wget https://packages.groonga.org/source/mroonga/mroonga-6.12.tar.gz
% tar xvzf mroonga-6.12.tar.gz
% cd mroonga-6.12
% ./configure \
    --with-mysql-source=/usr/local/src/mysql-5.6.21 \
    --with-mysql-build=/usr/local/build/mysql-5.6.21 \
    --with-mysql-config=/usr/local/mysql/bin/mysql_config
% make
% sudo make install
% /usr/local/mysql/bin/mysql -u root < /usr/local/share/mroonga/install.sql


You need to specify the following on configure:

  • The location of MySQL source code with --with-mysql-source.
  • The location of MySQL build directory with --with-mysql-build.
  • The path of mysql_config command with --with-mysql-config.



You can confirm Mroonga is installed successfully by SHOW ENGINES SQL. If you can find Mroonga row, Mroonga is installed successfully:

mysql> SHOW ENGINES;
+------------+---------+------------------------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                                    | Transactions | XA   | Savepoints |
+------------+---------+------------------------------------------------------------+--------------+------+------------+
| Mroonga    | YES     | Fulltext search, column base                               | NO           | NO   | NO         |
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                      | NO           | NO   | NO         |
| CSV        | YES     | CSV storage engine                                         | NO           | NO   | NO         |
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance     | NO           | NO   | NO         |
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys | YES          | YES  | YES        |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables  | NO           | NO   | NO         |
+------------+---------+------------------------------------------------------------+--------------+------+------------+
6 rows in set (0.00 sec)


The following describes details about each step.

First, you need to run configure. Here are important configure parameters:

Specifies the location of MySQL source code.

This is required parameter:

% ./configure \
    --with-mysql-source=/usr/local/src/mysql-5.6.21 \
    --with-mysql-config=/usr/local/mysql/bin/mysql_config


Specifies the location where you build MySQL source code.

If you build MySQL in MySQL source code directory, you don't need to specify this parameter. If you build MySQL in other directory, you need to specify this parameter.

Here is an example when you build MySQL in /usr/local/build/mysql-5.6.21:

% ./configure \
    --with-mysql-source=/usr/local/src/mysql-5.6.21 \
    --with-mysql-build=/usr/local/build/mysql-5.6.21 \
    --with-mysql-config=/usr/local/mysql/bin/mysql_config


Specifies the path of mysql_config command.

If mysql_config command can be found by PATH, you don't need to specify this parameter. For example, if mysql_config command exists at /usr/bin/mysql_config, you don't need to specify this parameter:

% ./configure \
    --with-mysql-source=/usr/local/src/mysql-5.6.21


Specifies the default tokenizer for full text. You can custom it in my.cnf.

The default is TokenBigram.

Here is an example to use TokenMecab as the default tokenizer:

% ./configure \
    --with-mysql-source=/usr/local/src/mysql-5.6.21 \
    --with-mysql-config=/usr/local/mysql/bin/mysql_config \
    --with-default-tokenizer=TokenMecab


Specifies the install base directory. Mroonga related files are installed under ${PATH}/ directory except ha_mroonga.so. ha_mroonga.so is a MySQL plugin file. It is installed the plugin directory of MySQL.

The default is /usr/local. In this case, install.sql that is used for installing Mroonga is installed to /usr/local/share/mroonga/install.sql.

Here is an example that installs Mroonga into ~/local for an user use instead of system wide use:

% ./configure \
    --prefix=$HOME/local \
    --with-mysql-source=$HOME/local/src/mysql-5.6.21 \
    --with-mysql-config=$HOME/local/mysql/bin/mysql_config


This is not a configure parameter but we describe it for users who doesn't install Groonga into the standard location.

If Groonga is not installed in the standard location like /usr/lib, you need to specify its location by PKG_CONFIG_PATH. For example, if Groonga is installed with --prefix=$HOME/local, use the following command line:

./configure \
  PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig \
  --with-mysql-source=/usr/local/src/mysql-5.6.21 \
  --with-mysql-config=/usr/local/mysql/bin/mysql_config


configure is succeeded, you can build Mroonga by make:

% make


If you have multi cores CPU, you can make faster by using -j option. If you have 4 cores CPU, it's good for using -j4 option:

% make -j4


If you get some errors by make, please report them to us: How to report a bug

Now, you can install built Mroonga!:

% sudo make install


If you have write permission for ${PREFIX} and the plugin directory of MySQL, you don't need to use sudo. e.g. --prefix=$HOME/local case. In this case, use make install:

% make install


You need to run some SQLs to register Mroonga to MySQL such as INSTALL PLUGIN and CREATE FUNCTION. They are written in ${PREFIX}/share/mroonga/install.sql.

Here is an example when you specify --prefix=$HOME/local to configure:

% mysql -u root < $HOME/local/share/mroonga/install.sql


If you want to remove Mroonga, type below commands:

% mysql < ${PREFIX}/share/mroonga/uninstall.sql
% cd ${MROONGA_BUILD_DIR}
% sudo make uninstall


There is a case that incompatible change is introduced at new release. It is announced by release announce if new release contains such a incompatible change.

Here is the list of recommended way of upgrading Mroonga from old release.

See following URL about upgrade sequence if you use previous version.

If you upgrade prior to 1.20, refer to Release 1.20 - 2012/01/29

If you upgrade from 1.20, refer to Release 2.00 - 2012/02/29

If you upgrade from 2.00 or 2.01, refer to Release 2.02 - 2012/04/29

If you upgrade from 2.00 or later and using multiple column indexes on storage mode, refer to Release 2.03 - 2012/05/29

If you upgrade from 2.04 or later and using SET column or ENUM that has the number of elements < 256 in Storage mode, refer to Release 2.05 - 2012/07/29

If you upgrade from 2.05 or later and using multiple column indexes against VARCHAR or CHAR, refer to Release 2.08 - 2012/10/29

If you upgrade from 2.08 or later and using TIMESTAMP column, please recreate database. If you upgrade from 2.08 or later and using CHAR(N) as primary key, please recreate index. Refer to Release 2.09 - 2012/11/29 for each case.

If you upgrade prior to 5.03 and satisfies following the conditions, refer to Release 5.04 - 2015/06/29 and upgrade schema.

  • Using custom tokenizer in index comment
  • Using mroonga_default_parser as server variable
  • Using index_flags parameter for index column
  • Using type parameter for Groonga's column type in storage mode

If you don't install Mroonga yet, see Install. It describes how to install Mroonga.

Try the following tutorials after you install Mroonga. You will understand how to use Mroonga.

It is better that you check Mroonga installation before you use Mroonga. If Mroonga installation is failed, the SQLs in this tutorial will fail.

The way to start or stop MySQL server is just same as the normal MySQL.

After invoking the MySQL server, connect to it by mysql command. If you set password, you need to add '-p' option.

% mysql -uroot test


By using SHOW ENGINES command, you can check if Mroonga is installed:

SHOW ENGINES;
-- +------------+---------+------------------------------------------------------------+--------------+------+------------+
-- | Engine     | Support | Comment                                                    | Transactions | XA   | Savepoints |
-- +------------+---------+------------------------------------------------------------+--------------+------+------------+
-- | Mroonga    | YES     | Fulltext search, column base                               | NO           | NO   | NO         |
-- | MRG_MYISAM | YES     | Collection of identical MyISAM tables                      | NO           | NO   | NO         |
-- | CSV        | YES     | CSV storage engine                                         | NO           | NO   | NO         |
-- | MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance     | NO           | NO   | NO         |
-- | InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys | YES          | YES  | YES        |
-- | MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables  | NO           | NO   | NO         |
-- +------------+---------+------------------------------------------------------------+--------------+------+------------+
-- 6 rows in set (0.00 sec)


If you see Mroonga storage engine like the above, the installation is well done.

If Mroonga isn't shown, run the following command. If you don't install Mroonga by package, the path of install.sql may be different:

% mysql -uroot test < /usr/share/mroonga/install.sql


If you can't find install.sql, run INSTALL PLUGIN command like the following manually:

INSTALL PLUGIN Mroonga SONAME 'ha_mroonga.so';


Then, check server variable whether correct Mroonga version is installed by SHOW VARIABLES command.

SHOW VARIABLES LIKE 'mroonga_version';


Mroonga has the following two modes.
  • storage mode
  • wrapper mode



With the storage mode, we use Groonga for both of the full text search function and the data storage. Since all functions of storage engine are realised with Groonga, aggregations are fast, that is one of Groonga's advantages, and you can manage the database directly by groonga command.

The structure of the storage mode is the following. You use it instead of existing storage engines like MyISAM or InnoDB

[image: storage mode] [image]


With the wrapper mode, Groonga is used for full text search function only, and another existing storage engine like InnoDB is used for storing data. By using wrapper mode, you combine InnoDB that is well-use as the storage engine and Mroonga that is a proven full text search engine, and you can use it as the stable database having the fast full text search function.

The structure of the wrapper mode is the following. Full text search related operations are done by Mroonga, and other operations are done by existing storage engines like MyISAM, InnoDB etc. Mroonga is located between SQL Handler that processes SQL and an existing storage engine, thus all data goes through Mroonga With this way, full text search indexing etc. are done transparently.

[image: wrapper mode] [image]


Here we explain how to use storage mode of Mroonga
After confirming the installation, let's create a table. The important point is to specify Mroonga by ENGINE = Mroonga:

CREATE TABLE diaries (
  id INT PRIMARY KEY AUTO_INCREMENT,
  content VARCHAR(255),
  FULLTEXT INDEX (content)
) ENGINE = Mroonga DEFAULT CHARSET utf8;
-- Query OK, 0 rows affected (0.10 sec)


We put data by INSERT:

INSERT INTO diaries (content) VALUES ("It'll be fine tomorrow.");
-- Query OK, 1 row affected (0.01 sec)
INSERT INTO diaries (content) VALUES ("It'll rain tomorrow");
-- Query OK, 1 row affected (0.00 sec)


Try full text search:

SELECT * FROM diaries WHERE MATCH(content) AGAINST("+fine" IN BOOLEAN MODE);
-- +----+-----------------------------------------+
-- | id | content                                 |
-- +----+-----------------------------------------+
-- |  1 | It'll be fine tomorrow. |
-- +----+-----------------------------------------+
-- 1 row in set (0.00 sec)


Yes, full text search works.

NOTE:
In version 1.0.0 or before, Mroonga used a special column named _score to get search score. From version 1.0.0, it follows MySQL's standard way to get search score.


We often want to display more relevant results first in full text search. We use search score in such case.

We can get search score by MySQL's standard way [1], i.e. we use MATCH...AGAINST in one of columns in SELECT or ORDER BY.

Let's try:

INSERT INTO diaries (content) VALUES ("It's fine today. It'll be fine tomorrow as well.");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO diaries (content) VALUES ("It's fine today. But it'll rain tomorrow.");
-- Query OK, 1 row affected (0.00 sec)
SELECT *,
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE)
  FROM diaries
 WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE)
 ORDER BY
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC;
-- +----+--------------------------------------------------+---------------------------------------------------+
-- | id | content                                          | MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) |
-- +----+--------------------------------------------------+---------------------------------------------------+
-- |  3 | It's fine today. It'll be fine tomorrow as well. |                                                 2 |
-- |  1 | It'll be fine tomorrow.                          |                                                 1 |
-- |  4 | It's fine today. But it'll rain tomorrow.        |                                                 1 |
-- +----+--------------------------------------------------+---------------------------------------------------+
-- 3 rows in set (0.00 sec)


The result having the search word fine more, i.e. id = 3 message having the higher search score, is displayed first. And you also get search score by using MATCH AGAINST in SELECT phrase.

You can use AS to change the attribute name:

SELECT *,
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) AS score
  FROM diaries
 WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE)
 ORDER BY
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC;
-- +----+--------------------------------------------------+-------+
-- | id | content                                          | score |
-- +----+--------------------------------------------------+-------+
-- |  3 | It's fine today. It'll be fine tomorrow as well. |     2 |
-- |  1 | It'll be fine tomorrow.                          |     1 |
-- |  4 | It's fine today. But it'll rain tomorrow.        |     1 |
-- +----+--------------------------------------------------+-------+
-- 3 rows in set (0.00 sec)


MySQL has the following syntax to specify the parser [2] for full text search:

FULLTEXT INDEX (content) WITH PARSER parser_name


To use this syntax, you need to register all parsers in MySQL beforehand. On the other hand, Groonga can dynamically add a tokenizer, that is a parser in MySQL. So if use this syntax in Mroonga, tokenizers that are added in Groonga dynamically cannot be supported. We think that this limitation decreases the convenience, and we choose our own syntax using COMMENT like the following:

FULLTEXT INDEX (content) COMMENT 'tokenizer "TokenMecab"'


NOTE:

COMMENT in FULLTEXT INDEX is only supported MySQL 5.5 or later. If you use MySQL 5.1, use mroonga_default_tokenizer variable described below.


You can specify one of the following values as the tokenizer. "tokenizer" in Mroonga equals to "parser" in MySQL.

Tokenizer Description
off (Deprecated) It does not tokenize at all. Use off if you want to treat content as is. For example, this value is used for prefix search.
none New in version 8.07: It does not tokenize at all. Use none if you want to treat content as is. For example, this value is used for prefix search.
TokenBigram It tokenises in bigram. But continuous alphabets, numbers or symbols are treated as a token. So there can exist tokes with 3 letters or more. It is to reduce noises. This is the default value.
TokenMecab It tokenises using MeCab. Groonga should be built with MeCab support.
TokenBigramSplitSymbol It tokenises in bigram. Unlike TokenBigram, continuous symbols are not treated as a token, but tokenised in bigram. When you use TokenBigramSplitSymbol instead of TokenBigram, "!?" can match "!?!?!?" in "Is it really!?!?!?". But when you use TokenBigram, only "!?!?!?" can match as well.
TokenBigramSplitSymbolAlpha It tokenise in bigram. In addition to TokenBigramSplitSymbol, continuous alphabets are not treated as a token either, but tokenised in bigram. When you use TokenBigramSplitSymbolAlpha instead of TokenBigram, "real" can match "Is it really?". But when you use TokenBigram, only "really" can match as well.
TokenBigramSplitSymbolAlphaDigit It tokenise in bigram. In addition to TokenBigramSplitSymbolAlpha, continuous numbers are not treated as a token either, but tokenised in bigram. So any kind of characters are treated equally in bigram. When you use TokenBigramSplitSymbolAlphaDigit instead of TokenBigram, "567" can match "090-0123-4567". But when you use TokenBigram, only "4567" can match as well.
TokenBigramIgnoreBlank It tokenise in bigram. Unlike TokenBigram, it ignores white spaces. When you use TokenBigramIgnoreBlank instead of TokenBigram, "みなさん" can match "み な さ ん 注 目". But when you use TokenBigram, only "み な さ ん" can match as well.
TokenBigramIgnoreBlankSplitSymbol It tokenise in bigram. Unlike TokenBigramSplitSymbol, it ignores white spaces. When you use TokenBigramIgnoreBlankSplitSymbol instead of TokenBigramSplitSymbol, "???" can match "! ? ???". But when you use TokenBigramSplitSymbol, only "? ??" can match as well.
TokenBigramIgnoreBlankSplitSymbolAlpha It tokenise in bigram. Unlike TokenBigramSplitSymbolAlpha, it ignores white spaces. When you use TokenBigramIgnoreBlankSplitSymbolAlpha instead of TokenBigramSplitSymbolAlpha, "ama" can match "I am a pen.". But when you use TokenBigramSplitSymbolAlpha, only "am a" can match as well.
TokenBigramIgnoreBlankSplitSymbolAlphaDigit It tokenise in bigram. Unlike TokenBigramSplitSymbolAlphaDigit, it ignores white spaces. When you use TokenBigramIgnoreBlankSplitSymbolAlphaDigit instead of TokenBigramSplitSymbolAlphaDigit, "9001" can match "090 0123 4567". But when you use TokenBigramSplitSymbolAlphaDigit, only "90 01" can match as well.
TokenDelimit It tokenise by splitting with a white space. "movie horror topic" will be tokenised as "movie", "horror", "topic".
TokenDelimitNull It tokenise by splitting with a null character (\\0). "movie\\0horror\\0topic" will be tokenised as "movie", "horror", "topic".
TokenUnigram It tokenises in unigram. But continuous alphabets, numbers or symbols are treated as a token. So there can exist tokes with 2 letters or more. It is to reduce noises.
TokenTrigram It tokenises in trigram. But continuous alphabets, numbers or symbols are treated as a token. So there can exist tokes with 4 letters or more. It is to reduce noises.

You can specify the default parser by passing --with-default-tokenizer option in configure when you build Mroonga:

./configure --with-default-tokenizer TokenMecab ...


Or you can set mroonga_default_tokenizer variable in my.cnf or by SQL. If you specify it in my.cnf, the change will not be lost after restarting MySQL, but you need to restart MySQL to make it effective. On the other hand, if you set it in SQL, the change is effective immediately, but it will be lost when you restart MySQL.

my.cnf:

[mysqld]
mroonga_default_tokenizer=TokenMecab


SQL:

SET GLOBAL mroonga_default_tokenizer = TokenMecab;
-- Query OK, 0 rows affected (0.00 sec)


Mroonga uses normalizer corresponding to the encoding of document. It is used when tokenizing text and storing table key.

It is used NormalizerMySQLGeneralCI normalizer when the encoding is utf8_general_ci or utf8mb4_general_ci.

It is used NormalizerMySQLUnicodeCI normalizer when the encoding is utf8_unicode_ci or utf8mb4_unicode_ci.

It isn't used normalizer when the encoding is utf8_bin.

Here is an example that uses NormalizerMySQLUnicodeCI normalizer by specifying utf8_unicode_ci:

SET NAMES utf8;
-- Query OK, 0 rows affected (0.00 sec)
CREATE TABLE diaries (
  day DATE PRIMARY KEY,
  content VARCHAR(64) NOT NULL,
  FULLTEXT INDEX (content)
) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
-- Query OK, 0 rows affected (0.18 sec)
INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。");
-- Query OK, 1 row affected (0.00 sec)
SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE);
-- +------------+-----------------------------------------+
-- | day        | content                                 |
-- +------------+-----------------------------------------+
-- | 2013-04-23 | ブラックコーヒーを飲んだ。 |
-- +------------+-----------------------------------------+
-- 1 row in set (0.00 sec)
SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE);
-- +------------+-----------------------------------------+
-- | day        | content                                 |
-- +------------+-----------------------------------------+
-- | 2013-04-23 | ブラックコーヒーを飲んだ。 |
-- +------------+-----------------------------------------+
-- 1 row in set (0.00 sec)


Mroonga has the following syntax to specify Groonga's normalizer:

FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"'


See Groonga's documentation about normalizer for more details.

Here is an example that uses NormalizerAuto normalizer:

SET NAMES utf8;
-- Query OK, 0 rows affected (0.00 sec)
CREATE TABLE diaries (
  day DATE PRIMARY KEY,
  content VARCHAR(64) NOT NULL,
  FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"'
) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
-- Query OK, 0 rows affected (0.19 sec)
INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。");
-- Query OK, 1 row affected (0.00 sec)
SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE);
-- Empty set (0.00 sec)
SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE);
-- +------------+-----------------------------------------+
-- | day        | content                                 |
-- +------------+-----------------------------------------+
-- | 2013-04-23 | ブラックコーヒーを飲んだ。 |
-- +------------+-----------------------------------------+
-- 1 row in set (0.00 sec)


Mroonga has the following syntax to specify Groonga's token filters.:

FULLTEXT INDEX (content) COMMENT 'token_filters "TokenFilterStem"'


Here is an example that uses TokenFilterStem token filter:

SELECT mroonga_command('register token_filters/stem');
-- +------------------------------------------------+
-- | mroonga_command('register token_filters/stem') |
-- +------------------------------------------------+
-- | true                                           |
-- +------------------------------------------------+
-- 1 row in set (0.00 sec)
CREATE TABLE memos (
  id INT NOT NULL PRIMARY KEY,
  content TEXT NOT NULL,
  FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto", token_filters "TokenFilterStem"'
) Engine=Mroonga DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.18 sec)
INSERT INTO memos VALUES (1, "I develop Groonga");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (2, "I'm developing Groonga");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (3, "I developed Groonga");
-- Query OK, 1 row affected (0.00 sec)
SELECT *
  FROM memos
 WHERE MATCH (content) AGAINST ("+develops" IN BOOLEAN MODE);
-- +----+------------------------+
-- | id | content                |
-- +----+------------------------+
-- |  1 | I develop Groonga      |
-- |  2 | I'm developing Groonga |
-- |  3 | I developed Groonga    |
-- +----+------------------------+
-- 3 rows in set (0.01 sec)


See Groonga's documentation about token filter documentation for more details.

Here is an example that uses TokenFilterStopWord token filter:

SELECT mroonga_command("register token_filters/stop_word");
-- +-----------------------------------------------------+
-- | mroonga_command("register token_filters/stop_word") |
-- +-----------------------------------------------------+
-- | true                                                |
-- +-----------------------------------------------------+
-- 1 row in set (0.00 sec)
CREATE TABLE terms (
  term VARCHAR(64) NOT NULL PRIMARY KEY,
  is_stop_word BOOL NOT NULL
) Engine=Mroonga COMMENT='tokenizer "TokenBigram", token_filters "TokenFilterStopWord"' DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.12 sec)
CREATE TABLE memos (
  id INT NOT NULL PRIMARY KEY,
  content TEXT NOT NULL,
  FULLTEXT INDEX (content) COMMENT 'table "terms"'
) Engine=Mroonga DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.17 sec)
INSERT INTO terms VALUES ("and", true);
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (1, "Hello");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (2, "Hello and Good-bye");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (3, "Good-bye");
-- Query OK, 1 row affected (0.00 sec)
SELECT *
  FROM memos
 WHERE MATCH (content) AGAINST ('+"Hello and"' IN BOOLEAN MODE);
-- +----+--------------------+
-- | id | content            |
-- +----+--------------------+
-- |  1 | Hello              |
-- |  2 | Hello and Good-bye |
-- +----+--------------------+
-- 2 rows in set (0.01 sec)


It's used that specifying the lexicon table for fulltext search.

Mroonga has the following syntax to specify Groonga's column flags:

content TEXT COMMENT 'flags "COLUMN_SCALAR|COMPRESS_ZLIB"'


Here is an example that uses COMPRESS_ZLIB flag:

CREATE TABLE entries (
  id INT UNSIGNED PRIMARY KEY,
  content TEXT COMMENT 'flags "COLUMN_SCALAR|COMPRESS_ZLIB"'
) Engine=Mroonga DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.12 sec)


See Groonga's documentation about column flags for more details.

In storage mode, you can use fast geolocation search in addition to full text search. But unlike MyISAM, you can only store POINT type data. You cannot store other types data like LINE. And fast search using index only supports MBRContains. It does not support MBRDisjoint.

For the table definition for geolocation search, you need to define a POINT type column like in MyISAM and define SPATIAL INDEX for it.:

CREATE TABLE shops (
  id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(255),
  location POINT NOT NULL,
  SPATIAL INDEX (location)
) ENGINE = Mroonga;
-- Query OK, 0 rows affected (0.06 sec)


To store data, you create POINT type data by using geomFromText() function like in MyISAM:

INSERT INTO shops VALUES (null, 'Nezu''s Taiyaki', GeomFromText('POINT(139.762573 35.720253)'));
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO shops VALUES (null, 'Naniwaya', GeomFromText('POINT(139.796234 35.730061)'));
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO shops VALUES (null, 'Yanagiya Taiyaki', GeomFromText('POINT(139.783981 35.685341)'));
-- Query OK, 1 row affected (0.00 sec)


If you want to find shops within the rectangle where Ikebukuro station (139.7101 35.7292) is the top-left point and Tokyo Station (139.7662 35.6815) is the bottom-right point, SELECT phrase is like the following:

SELECT id, name, AsText(location)
  FROM shops
 WHERE MBRContains(GeomFromText('LineString(139.7101 35.7292, 139.7662 35.6815)'), location);
-- +----+-----------------------+------------------------------------------+
-- | id | name                  | AsText(location)                         |
-- +----+-----------------------+------------------------------------------+
-- |  1 | Nezu's Taiyaki        | POINT(139.762572777778 35.7202527777778) |
-- +----+-----------------------+------------------------------------------+
-- 1 row in set (0.00 sec)


Here you can search by geolocation!

Groonga assigns a unique number to identify the record when a record is added in the table.

To make the development of applications easier, you can get this record ID by SQL in Mroonga

To get the record ID, you need to create a column named _id when you create a table:

CREATE TABLE memos (
  _id INT,
  content VARCHAR(255),
  UNIQUE KEY (_id) USING HASH
) ENGINE = Mroonga;
-- Query OK, 0 rows affected (0.04 sec)


Tye typo of _id column should be integer one (TINYINT, SMALLINT, MEDIUMINT, INT or BIGINT).

You can create an index for _id column, but it should be HASH type.

Let's add records in the table by INSERT. Since _id column is implemented as a virtual column and its value is assigned by Groonga, you cannot specify the value when updating. So you need to exclude it from setting columns, or you need to use null as its value:

INSERT INTO memos VALUES (null, "Saury for today's dinner.");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (null, "Update mroonga tomorrow.");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (null, "Buy some dumpling on the way home.");
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO memos VALUES (null, "Thank God It's meat day.");
-- Query OK, 1 row affected (0.00 sec)


To get the record ID, you invoke SELECT with _id column:

SELECT * FROM memos;
-- +------+------------------------------------------+
-- | _id  | content                                  |
-- +------+------------------------------------------+
-- |    1 | Saury for today's dinner.                |
-- |    2 | Update mroonga tomorrow.                 |
-- |    3 | Buy some dumpling on the way home.       |
-- |    4 | Thank God It's meat day.                 |
-- +------+------------------------------------------+
-- 4 rows in set (0.00 sec)


By using last_insert_grn_id function, you can also get the record ID that is assigned by the last INSERT:

INSERT INTO memos VALUES (null, "Just one bottle of milk in the fridge.");
-- Query OK, 1 row affected (0.00 sec)
SELECT last_insert_grn_id();
-- +----------------------+
-- | last_insert_grn_id() |
-- +----------------------+
-- |                    5 |
-- +----------------------+
-- 1 row in set (0.00 sec)


last_insert_grn_id function is included in Mroonga as a User-Defined Function (UDF), but if you have not yet register it in MySQL by CREATE FUNCTION, you need to invoke the following SQL for defining a function:

CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';


As you can see in the example above, you can get the record ID by _id column or last_insert_grn_id function. It will be useful to use this value in the ensuing SQL queries like UPDATE:

UPDATE memos SET content = "So much milk in the fridge." WHERE _id = last_insert_grn_id();
-- Query OK, 1 row affected (0.00 sec)
-- Rows matched: 1  Changed: 1  Warnings: 0


Mroonga provides functionality to get keyword in context. It is implemented as mroonga_snippet() UDF.
Similar search is supported by specifying document itself instead of specifying keywords in query.

It is useful if you want to find documents which is related to specific document.

Here is the schema definition for execution examples:

CREATE TABLE similarities (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(32),
  content VARCHAR(255),
  FULLTEXT INDEX (content)
) ENGINE = Mroonga DEFAULT CHARSET utf8;


Here is the sample data for execution examples:

INSERT INTO similarities (title, content)
       VALUES ('Groonga similar search', 'Groonga is an open-source fulltext search engine and column store.');
INSERT INTO similarities (title, content)
       VALUES ('Mroonga similar search', 'Mroonga is an open-source storage engine for fast fulltext search with MySQL.');
INSERT INTO similarities (title, content)
       VALUES ('Rroonga library', 'A library to use Groonga features from Ruby.');


Here is the example of similar search by content itself.

SELECT title FROM similarities WHERE MATCH(content) AGAINST ('There are many open-source fulltext search engine.' IN NATURAL LANGUAGE MODE);


Note that you need to specify an content of document in AGAINST('...' IN NATURAL LANGUAGE MODE).

SELECT title FROM similarities WHERE MATCH(content) AGAINST ('There are many open-source fulltext search engine.' IN NATURAL LANGUAGE MODE);
-- +------------------------+
-- | title                  |
-- +------------------------+
-- | Groonga similar search |
-- | Mroonga similar search |
-- +------------------------+
-- 2 rows in set (0.00 sec)


To improve degree of similarity, you may need to use language specific tokenizer or use labeled data such as tag to get rid of some undesirable search results.

For example, if you want to execute similar search against Japanese text, it is recommended to use language specific tokenizer - TokenMecab. Here is the schema definition to use TokenMecab tokenizer:

CREATE TABLE similarities (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(32),
  content VARCHAR(255),
  FULLTEXT INDEX (content) COMMENT 'tokenizer "TokenMecab"'
) ENGINE = Mroonga DEFAULT CHARSET utf8;


In storage mode, Mroonga stores all your data into Groonga database. You can access Groonga database by SQL with Mroonga. SQL is very powerful but it is not good for some operations such as faceted search.

Faceted search is popular recently. Many online shopping sites such as amazon.com and ebay.com support faceted search. Faceted search refines the current search by available search parameters before users refine their search. And faceted search shows refined searches. Users just select a refined search. Users benefit from faceted search:

  • Users don't need to think about how to refine their search. Users just select a showed refined search.
  • Users don't get boared "not match" page. Faceted search showes only refined searches that has one or more matched items.

Faceted search needs multiple GROUP BY operations against searched result set. To do faceted search by SQL, multiple SELECT requests are needed. It is not effective.

Groonga can do faceted search by only one groonga command. It is effective. Groonga has the select command that can search records with faceted search. Faceted search is called as "drilldown" in Groonga. See Groonga's documentation about select command for more details.

Mroonga provides mroonga_command() function. You can run Groonga command in SQL by the function. But you should use only select command. Other commands that change schema or data may break consistency.

Here is the schema definition for execution examples:

CREATE TABLE diaries (
  id INT PRIMARY KEY AUTO_INCREMENT,
  content VARCHAR(255),
  date DATE,
  year YEAR,
  `year_month` VARCHAR(9),
  tag VARCHAR(32),
  FULLTEXT INDEX (content)
) ENGINE = Mroonga DEFAULT CHARSET utf8;


Here is the sample data for execution examples:

INSERT INTO diaries (content, date, year, `year_month`, tag)
       VALUES ('Groonga is an open-source fulltext search engine and column store.',
               '2013-04-08',
               '2013',
               '2013-04',
               'groonga');
INSERT INTO diaries (content, date, year, `year_month`, tag)
       VALUES ('Mroonga is an open-source storage engine for fast fulltext search with MySQL.',
               '2013-04-09',
               '2013',
               '2013-04',
               'MySQL');
INSERT INTO diaries (content, date, year, `year_month`, tag)
       VALUES ('Tritonn is a patched version of MySQL that supports better fulltext search function with Senna.',
               '2013-03-29',
               '2013',
               '2013-03',
               'MySQL');


Each record has groonga or MySQL as tag. Each record also has year and year_month. You can use tag, year and year_month as faceted search keys.

Groonga calls faceted search as drilldown. So parameter key in Groonga is --drilldown. Groonga returns search result as JSON. So mroonga_command() also returns search result as JSON. It is not SQL friendly. You need to parse search result JSON by yourself.

Here is the example of faceted search by all available faceted search keys (result JSON is pretty printed):

SELECT mroonga_command("select diaries --output_columns _id --limit 0 --drilldown tag,year,year_month") AS faceted_result;
-- +-----------------------------+
-- | faceted_result              |
-- +-----------------------------+
-- | [[[3],                      |
-- |   [["_id","UInt32"]]],      |
-- |  [[2],                      |
-- |   [["_key","ShortText"],    |
-- |    ["_nsubrecs","Int32"]],  |
-- |   ["groonga",1],            |
-- |   ["MySQL",2]],             |
-- |  [[1],                      |
-- |   [["_key","Time"],         |
-- |    ["_nsubrecs","Int32"]],  |
-- |   [1356998400.0,3]],        |
-- |  [[2],                      |
-- |   [["_key","ShortText"],    |
-- |    ["_nsubrecs","Int32"]],  |
-- |   ["2013-04",2],            |
-- |   ["2013-03",1]]]           |
-- +-----------------------------+
-- 1 row in set (0.00 sec)


The first element [[3], [["_id","UInt32"]]] is normal search result. It's not faceted search result.

The second, third and forth elements are faceted search results:

[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["groonga",1],
 ["MySQL",2]]


[[1],
 [["_key","Time"],
  ["_nsubrecs","Int32"]],
 [1356998400.0,3]]


[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["2013-04",2],
 ["2013-03",1]]


The order of faceted search results is corresponding to the value of --drilldown. In this example, we specified tag, year and year_month as --drilldown value. So the first faceted search result is for tag, the second one is for year and the third one is for year_month.

Each faceted search result uses the following format. This is the same as normal search result:

[[${THE_NUMBER_OF_RECORDS}],
 [[${OUTPUT_COLUMN_NAME_0}, ${OUTPUT_COLUMN_TYPE_0}],
  [${OUTPUT_COLUMN_NAME_1}, ${OUTPUT_COLUMN_TYPE_1}],
  ...,
  [${OUTPUT_COLUMN_NAME_N}, ${OUTPUT_COLUMN_TYPE_N}]]
 [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_0},
  ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_0},
  ...,
  ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_0}],
 [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_1},
  ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_1},
  ...,
  ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_1}],
 ...
 [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_M},
  ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_M},
  ...,
  ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_M}]]


The _key column value in faceted search result shows faceted search key.

For example, the first faceted search result (the faceted search result for tag) shows that matched records (all records in this case) have groonga and MySQL as tag value:

[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["groonga",1],
 ["MySQL",2]]


The _nsubrecs column value in faceted search result shows the number of records that have the corresponding faceted search key.

For example, the first faceted search result (the faceted search result for tag) shows that there are 1 record that has groonga as tag value and 2 records that have MySQL as tag value:

[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["groonga",1],
 ["MySQL",2]]


See Groonga's documentation about select command for more details.

In storage mode, you can use a Groonga's functionality from Mroonga as described above. Thus, you can search records by using a regular expression via Groonga's functionality.

There are some conditions to use regular expression in Mroonga.

  • Create an index with TokenRegexp tokenizer which is used in COMMENT
  • Use *SS pragma and @~ in WHERE MATCH ... AGAINST clause

Here is the example of search by regular expression.

CREATE TABLE paths (
  content text,
  FULLTEXT INDEX content_index (content) COMMENT 'tokenizer "TokenRegexp", normalizer "NormalizerAuto"'
) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4;
INSERT INTO paths VALUES ('/usr/bin/groonga');
INSERT INTO paths VALUES ('/var/log/auth.log');
INSERT INTO paths VALUES ('/var/log/messages');
INSERT INTO paths VALUES ('/tmp/local/var/log/auth.log');
SELECT * FROM paths WHERE MATCH(content) AGAINST ('*SS content @~ "\\\\A/var/log/auth"' IN BOOLEAN MODE);
-- +-------------------+
-- | content           |
-- +-------------------+
-- | /var/log/auth.log |
-- +-------------------+
-- 1 row in set (0.024 sec)


By using *SS pragma, you can search the records which matches /var/log/auth.log with content @~ "\\\\A/var/log/auth". @~ is a Groonga's operator which executes a regular expression search, and "\\\\A/var/log/auth" executes prefix search, so it matches to only /var/log/auth.log. /tmp/local/var/log/auth.log doesn't match because it doesn't begin with "/var/log/auth".

See Groonga's regular expression document for more syntax details.

Mroonga outputs the logs by default.

Log files are located in MySQL's data directory with the filename groonga.log.

Here is the example of the log.

2010-10-07 17:32:39.209379|n|b1858f80|mroonga 1.10 started.
2010-10-07 17:32:44.934048|d|46953940|hash get not found (key=test)
2010-10-07 17:32:44.936113|d|46953940|hash put (key=test)


The default log level is NOTICE, i.e. we have important information only and we don't have debug information etc.).

You can get the log level by mroonga_log_level system variable, that is a global variable. You can also modify it dynamically by using SET phrase:

SHOW VARIABLES LIKE 'mroonga_log_level';
-- +-------------------+--------+
-- | Variable_name     | Value  |
-- +-------------------+--------+
-- | mroonga_log_level | NOTICE |
-- +-------------------+--------+
-- 1 row in set (0.00 sec)
SET GLOBAL mroonga_log_level=DUMP;
-- Query OK, 0 rows affected (0.00 sec)
SHOW VARIABLES LIKE 'mroonga_log_level';
-- +-------------------+-------+
-- | Variable_name     | Value |
-- +-------------------+-------+
-- | mroonga_log_level | DUMP  |
-- +-------------------+-------+
-- 1 row in set (0.00 sec)


Available log levels are the followings.

  • NONE
  • EMERG
  • ALERT
  • CRIT
  • ERROR
  • WARNING
  • NOTICE
  • INFO
  • DEBUG
  • DUMP

See mroonga_log_level about details.

You can reopen the log file by FLUSH LOGS. If you want to rotate the log file without stopping MySQL server, you can do in the following procedure.

1.
change the file name of groonga.log (by using OS's mv command etc.).
2.
invoke "FLUSH LOGS" in MySQL server (by mysql command or mysqladmin command).

Now, you can use Mroonga as storage mode! If you want Mroonga to be faster, see also Optimizations.

[1]
MySQL 5.1 Reference Manual :: 11 Functions and Operations :: 11.7 Full-Text Search Functions
[2]
In Groonga, we call it a 'tokenizer'.

Here we explain how to use wrapper mode of Mroonga

In wrapper mode, Mroonga works in wrapping an existing storage engine. To specify the wrapped storage engine, we use SQL comment like COMMENT = 'engine "InnoDB"' for now.

NOTE:

For now, a primary key is mandatory in wrapper mode. That is not the case with storage mode.


NOTE:

Wrapper mode supports the followings, that are not supported in storage mode for now.
  • null value
  • transaction (if storage engine supports. Note that rollback causes mismatch of indexes, it may affects search results, so recreate index of Mroonga in such a case.)



After confirming the installation, let's create a table. The important point is to specify Mroonga by ENGINE = Mroonga.

mysql> CREATE TABLE diaries (
    ->   id INT PRIMARY KEY AUTO_INCREMENT,
    ->   content VARCHAR(255),
    ->   FULLTEXT INDEX (content)
    -> ) ENGINE = Mroonga COMMENT = 'engine "InnoDB"' DEFAULT CHARSET utf8;
Query OK, 0 rows affected (0.52 sec)


We put data by INSERT.

mysql> INSERT INTO diaries (content) VALUES ("It'll be fine tomorrow.");
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO diaries (content) VALUES ("It'll rain tomorrow");
Query OK, 1 row affected (0.00 sec)


Try full text search.

mysql> SELECT * FROM diaries WHERE MATCH(content) AGAINST("+fine" IN BOOLEAN MODE);
+----+-----------------------------------------+
| id | content                                 |
+----+-----------------------------------------+
|  1 | It'll be fine tomorrow. |
+----+-----------------------------------------+
1 row in set (0.00 sec)


Yes, full text search works.

We often want to display more relevant results first in full text search. We use search score in such case.

We can get search score by MySQL's standard way [1], i.e. we use MATCH...AGAINST in one of columns in SELECT or ORDER BY.

Let's try.

mysql> INSERT INTO diaries (content) VALUES ("It's fine today. It'll be fine tomorrow as well.");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO diaries (content) VALUES ("It's fine today. But it'll rain tomorrow.");
Query OK, 1 row affected (0.00 sec)
mysql> SELECT *, MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) FROM diaries WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) ORDER BY MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC;
+----+--------------------------------------------------+---------------------------------------------------+
| id | content                                          | MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) |
+----+--------------------------------------------------+---------------------------------------------------+
|  3 | It's fine today. It'll be fine tomorrow as well. |                                                 2 |
|  1 | It'll be fine tomorrow.                          |                                                 1 |
|  4 | It's fine today. But it'll rain tomorrow.        |                                                 1 |
+----+--------------------------------------------------+---------------------------------------------------+
3 rows in set (0.00 sec)


The result having the search word fine more, i.e. id = 3 message having the higher search score, is displayed first. And you also get search score by using MATCH AGAINST in SELECT phrase.

You can use AS to change the attribute name.

mysql> SELECT *, MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) AS score FROM diaries WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) ORDER BY MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC;
+----+--------------------------------------------------+-------+
| id | content                                          | score |
+----+--------------------------------------------------+-------+
|  3 | It's fine today. It'll be fine tomorrow as well. |     2 |
|  1 | It'll be fine tomorrow.                          |     1 |
|  4 | It's fine today. But it'll rain tomorrow.        |     1 |
+----+--------------------------------------------------+-------+
3 rows in set (0.00 sec)


MySQL has the following syntax to specify the parser [2] for full text search.

FULLTEXT INDEX (content) WITH PARSER parser_name


To use this syntax, you need to register all parsers in MySQL beforehand. On the other hand, Groonga can dynamically add a tokenizer, that is a parser in MySQL. So if use this syntax in Mroonga, tokenizers that are added in Groonga dynamically cannot be supported. We think that this limitation decreases the convenience, and we choose our own syntax using COMMENT like the following.

FULLTEXT INDEX (content) COMMENT 'tokenizer "TokenMecab"'


NOTE:

COMMENT in FULLTEXT INDEX is only supported MySQL 5.5 or later. If you use MySQL 5.1, use mroonga_default_parser variable described below.


You can specify one of following values as the tokenizer.

Tokenizer Description
off (Deprecated) It does not tokenize at all. Use off if you want to treat content as is. For example, this value is used for prefix search.
none New in version 8.07: It does not tokenize at all. Use none if you want to treat content as is. For example, this value is used for prefix search.
TokenBigram It tokenises in bigram. But continuous alphabets, numbers or symbols are treated as a token. So there can exist tokes with 3 letters or more. It is to reduce noises. This is the default value.
TokenMecab It tokenises using MeCab. Groonga should be built with MeCab support.
TokenBigramSplitSymbol It tokenises in bigram. Unlike TokenBigram, continuous symbols are not treated as a token, but tokenised in bigram. When you use TokenBigramSplitSymbol instead of TokenBigram, "!?" can match "!?!?!?" in "Is it really!?!?!?". But when you use TokenBigram, only "!?!?!?" can match as well.
TokenBigramSplitSymbolAlpha It tokenise in bigram. In addition to TokenBigramSplitSymbol, continuous alphabets are not treated as a token either, but tokenised in bigram. When you use TokenBigramSplitSymbolAlpha instead of TokenBigram, "real" can match "Is it really?". But when you use TokenBigram, only "really" can match as well.
TokenBigramSplitSymbolAlphaDigit It tokenise in bigram. In addition to TokenBigramSplitSymbolAlpha, continuous numbers are not treated as a token either, but tokenised in bigram. So any kind of characters are treated equally in bigram. When you use TokenBigramSplitSymbolAlphaDigit instead of TokenBigram, "567" can match "090-0123-4567". But when you use TokenBigram, only "4567" can match as well.
TokenBigramIgnoreBlank It tokenise in bigram. Unlike TokenBigram, it ignores white spaces. When you use TokenBigramIgnoreBlank instead of TokenBigram, "みなさん" can match "み な さ ん 注 目". But when you use TokenBigram, only "み な さ ん" can match as well.
TokenBigramIgnoreBlankSplitSymbol It tokenise in bigram. Unlike TokenBigramSplitSymbol, it ignores white spaces. When you use TokenBigramIgnoreBlankSplitSymbol instead of TokenBigramSplitSymbol, "???" can match "! ? ???". But when you use TokenBigramSplitSymbol, only "? ??" can match as well.
TokenBigramIgnoreBlankSplitSymbolAlpha It tokenise in bigram. Unlike TokenBigramSplitSymbolAlpha, it ignores white spaces. When you use TokenBigramIgnoreBlankSplitSymbolAlpha instead of TokenBigramSplitSymbolAlpha, "ama" can match "I am a pen.". But when you use TokenBigramSplitSymbolAlpha, only "am a" can match as well.
TokenBigramIgnoreBlankSplitSymbolAlphaDigit It tokenise in bigram. Unlike TokenBigramSplitSymbolAlphaDigit, it ignores white spaces. When you use TokenBigramIgnoreBlankSplitSymbolAlphaDigit instead of TokenBigramSplitSymbolAlphaDigit, "9001" can match "090 0123 4567". But when you use TokenBigramSplitSymbolAlphaDigit, only "90 01" can match as well.
TokenDelimit It tokenise by splitting with a white space. "movie horror topic" will be tokenised as "movie", "horror", "topic".
TokenDelimitNull It tokenise by splitting with a null character (\\0). "movie\\0horror\\0topic" will be tokenised as "movie", "horror", "topic".
TokenUnigram It tokenises in unigram. But continuous alphabets, numbers or symbols are treated as a token. So there can exist tokes with 2 letters or more. It is to reduce noises.
TokenTrigram It tokenises in trigram. But continuous alphabets, numbers or symbols are treated as a token. So there can exist tokes with 4 letters or more. It is to reduce noises.

You can specify the default tokenizer by passing --with-default-tokenizer option in configure when you build Mroonga.

./configure --with-default-tokenizer TokenMecab ...


Or you can set mroonga_default_tokenizer variable in my.cnf or by SQL. If you specify it in my.cnf, the change will not be lost after restarting MySQL, but you need to restart MySQL to make it effective. On the other hand, if you set it in SQL, the change is effective immediately, but it will be lost when you restart MySQL.

my.cnf:

[mysqld]
mroonga_default_tokenizer=TokenMecab


SQL:

mysql> SET GLOBAL mroonga_default_tokenizer = TokenMecab;
Query OK, 0 rows affected (0.00 sec)


Mroonga uses normalizer corresponding to the encoding of document. It is used when tokenizing text and storing table key.

It is used NormalizerMySQLGeneralCI normalizer when the encoding is utf8_general_ci or utf8mb4_general_ci.

It is used NormalizerMySQLUnicodeCI normalizer when the encoding is utf8_unicode_ci or utf8mb4_unicode_ci.

It isn't used normalizer when the encoding is utf8_bin.

Here is an example that uses NormalizerMySQLUnicodeCI normalizer by specifying utf8_unicode_ci.:

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE diaries (
    ->   day DATE PRIMARY KEY,
    ->   content VARCHAR(64) NOT NULL,
    ->   FULLTEXT INDEX (content)
    -> ) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.18 sec)
mysql> INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。");
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM diaries
    ->        WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE);
+------------+-----------------------------------------+
| day        | content                                 |
+------------+-----------------------------------------+
| 2013-04-23 | ブラックコーヒーを飲んだ。 |
+------------+-----------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM diaries
    ->        WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE);
+------------+-----------------------------------------+
| day        | content                                 |
+------------+-----------------------------------------+
| 2013-04-23 | ブラックコーヒーを飲んだ。 |
+------------+-----------------------------------------+
1 row in set (0.00 sec)


Mroonga has the following syntax to specify Groonga's normalizer:

FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"'


See Groonga's documentation about normalizer for more details.

Here is an example that uses NormalizerAuto normalizer:

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE diaries (
    ->   day DATE PRIMARY KEY,
    ->   content VARCHAR(64) NOT NULL,
    ->   FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"'
    -> ) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.19 sec)
mysql> INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。");
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM diaries
    ->        WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE);
Empty set (0.00 sec)
mysql> SELECT * FROM diaries
    ->        WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE);
+------------+-----------------------------------------+
| day        | content                                 |
+------------+-----------------------------------------+
| 2013-04-23 | ブラックコーヒーを飲んだ。 |
+------------+-----------------------------------------+
1 row in set (0.00 sec)


Mroonga provides functionality to get keyword in context. It is implemented as mroonga_snippet() UDF.

Mroonga outputs the logs by default.

Log files are located in MySQL's data directory with the filename groonga.log.

Here is the example of the log.

2010-10-07 17:32:39.209379|n|b1858f80|mroonga 1.10 started.
2010-10-07 17:32:44.934048|d|46953940|hash get not found (key=test)
2010-10-07 17:32:44.936113|d|46953940|hash put (key=test)


The default log level is NOTICE, i.e. we have important information only and we don't have debug information etc.).

You can get the log level by mroonga_log_level system variable, that is a global variable. You can also modify it dynamically by using SET phrase.

mysql> SHOW VARIABLES LIKE 'mroonga_log_level';
+-------------------+--------+
| Variable_name     | Value  |
+-------------------+--------+
| mroonga_log_level | NOTICE |
+-------------------+--------+
1 row in set (0.00 sec)
mysql> SET GLOBAL mroonga_log_level=DUMP;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'mroonga_log_level';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| mroonga_log_level | DUMP  |
+-------------------+-------+
1 row in set (0.00 sec)


Available log levels are the followings.

  • NONE
  • EMERG
  • ALERT
  • CRIT
  • ERROR
  • WARNING
  • NOTICE
  • INFO
  • DEBUG
  • DUMP

See mroonga_log_level about details.

You can reopen the log file by FLUSH LOGS. If you want to rotate the log file without stopping MySQL server, you can do in the following procedure.

1.
change the file name of groonga.log (by using OS's mv command etc.).
2.
invoke "FLUSH LOGS" in MySQL server (by mysql command or mysqladmin command).

Now, you can use Mroonga as wrapper mode! If you want Mroonga to be faster, see also Optimizations.

[1]
MySQL 5.1 Reference Manual :: 11 Functions and Operations :: 11.7 Full-Text Search Functions
[2]
In Groonga, we call it a 'tokenizer'.

Mroonga implemented some optimizations to return response faster.

Some optimizations can work only on Storage mode.

This optimization can work only on Storage mode.

Groonga uses column store architecture. It means that Groonga doesn't need to fetch values of all columns for fetching a column value in a row. Groonga can fetch values of only needed columns.

InnoDB and MyISAM use row store architecture. They need to fetch values of all columns for fetching a column value in a row.

If you specify only columns you needed in SELECT, Mroonga just fetches only values of these columns. Mroonga doesn't fetch values of other columns.

Mroonga can work faster by reducing operations and I/O.

It's this optimization.

Here is a sample table definition to describe this optimization:

CREATE TABLE t1 (
  c1 INT PRIMARY KEY AUTO_INCREMENT,
  c2 INT,
  c3 INT,
  ...
  c11 VARCHAR(20),
  c12 VARCHAR(20),
  ...
  c20 DATETIME
) ENGINE=Mroonga DEFAULT CHARSET=utf8;


Here is a SELECT to describe this optimization:

SELECT c1, c2, c11 FROM t1 WHERE c2 = XX AND c12 = "XXX";


In this case, Mroonga fetches values from only c1, c2, c11 and c12. Mroonga doesn't fetch values from c3, c4, ..., c10, c13, ..., c19 and c20.

This optimization can work only on Storage mode.

MySQL requires all column values from storage engine for processing COUNT(*) even if COUNT(*) doesn't need them.

Mroonga doesn't fetch any column values for the case.

Mroonga can work faster by reducing operations and I/O.

It's this optimization.

Here is a SELECT to describe this optimization:

SELECT COUNT(*) FROM t1 WHERE MATCH(c2) AGAINST("+keyword" IN BOOLEAN MODE);


The SELECT fetches only COUNT(*) and condition in WHERE can be processed only by index. In this case, Mroonga uses this optimization.

You can confirm whether this optimization is used or not by looking Mroonga_count_skip status variable:

mysql> SHOW STATUS LIKE 'Mroonga_count_skip';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Mroonga_count_skip | 1     |
+--------------------+-------+
1 row in set (0.00 sec)


Mroonga_count_skip status variable is incremented when Mroonga uses this optimization.

You can disable this optimization by setting mroonga_enable_optimization to false.

This optimization can work on both Storage mode and Wrapper mode.

MySQL can process ORDER BY and LIMIT with low cost if you can get sorted records by index even if the number of matched records is very big.

MySQL can do the process for MATCH() AGAINST(IN NATURAL LANGUAGE MODE). But MySQL can't do the process for MATCH() AGAINST(IN BOOLEAN MODE).

It means that MySQL might take long time for MATCH() AGAINST(IN BOOLEAN MODE) that matches with many records.

Mroonga processes ORDER BY and LIMIT by Groonga and returns only target records to MySQL. It's very faster for query that matches with many records.

It's this optimization.

Here is a SELECT to describe this optimization:

SELECT *
  FROM t1
 WHERE MATCH(c2) AGAINST("+keyword" IN BOOLEAN MODE)
 ORDER BY c1 LIMIT 1;


The SELECT runs full text search and sorts by Groonga and returns only one record to MySQL.

You can confirm whether this optimization is used or not by looking Mroonga_fast_order_limit status variable:

mysql> SHOW STATUS LIKE 'Mroonga_fast_order_limit';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Mroonga_fast_order_limit | 1     |
+--------------------------+-------+
1 row in set (0.00 sec)


Mroonga_fast_order_limit status variable is incremented when Mroonga uses this optimization.

This optimization is used only when all the following conditions are true:

  • Storage mode: WHERE clause has one MATCH AGAINST and zero or more arithmetic operations such as column < 100.
  • Wrapper mode: WHERE clause has only MATCH AGAINST.
  • No JOIN
  • No GROUP BY
  • No SQL_CALC_FOUND_ROWS
  • With LIMIT
  • Storage mode: ORDER BY clause has only columns or MATCH AGAINST that is used in WHERE clause.
  • Wrapper mode: ORDER BY clause has only primary column or MATCH AGAINST that is used in WHERE clause.

This section describes avaiable SQL commands and unavailable SQL commands.

  • ALTER TABLE table_name ADD COLUMN column_name TEXT
  • ALTER TABLE table_name ADD FULLTEXT INDEX index_name (column_name)
  • ALTER TABLE table_name ADD SPATIAL KEY index_name (geometry_column_name)
  • ALTER TABLE table_name ENGINE = mroonga
  • ALTER TABLE table_name RENAME new_table_name
  • COMMIT
  • CREATE FULLTEXT INDEX index_name ON table_name(column_name)
  • CREATE FULLTEXT INDEX index_name ON table_name(column_name, ...)
  • CREATE TABLE IF NOT EXISTS table_name (...)
  • CREATE TABLE \`table-name-with-hyphen\` (...)
  • CREATE TABLE table_name (... FULLTEXT INDEX index_name (column_name) COMMENT 'tokenizer "TokenMecab"')
  • CREATE TABLE table_name (... FULLTEXT INDEX index_name (column_name) COMMENT 'normalizer "NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark"')
  • CREATE TABLE table_name (... FULLTEXT INDEX index_name (column_name) COMMENT 'tokenizer "TokenDelimit", normalizer "NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark"')
  • CREATE TABLE table_name (... FULLTEXT INDEX index_name (column_name))
  • CREATE TABLE table_name (... FULLTEXT INDEX index_name (column_name, ...))
  • CREATE TABLE table_name (... INDEX (column_name))
  • CREATE TABLE table_name (... INDEX (column_name, ...))
  • CREATE TABLE table_name (... INDEX USING BTREE (column_name))
  • CREATE TABLE table_name (... PRIMARY KEY (column_name))
  • CREATE TABLE table_name (...) CHARSET ASCII
  • CREATE TABLE table_name (...) CHARSET BINARY
  • CREATE TABLE table_name (...) CHARSET CP932
  • CREATE TABLE table_name (...) CHARSET EUCJPMS
  • CREATE TABLE table_name (...) CHARSET KOI8R
  • CREATE TABLE table_name (...) CHARSET LATIN1
  • CREATE TABLE table_name (...) CHARSET SJIS
  • CREATE TABLE table_name (...) CHARSET UJIS
  • CREATE TABLE table_name (...) CHARSET UTF8
  • CREATE TABLE table_name (...) CHARSET UTF8MB4
  • CREATE TABLE table_name (...) ENGINE=mroonga COMMENT = 'ENGINE "InnoDB"'
  • CREATE TABLE table_name (...) ENGINE=mroonga
  • CREATE TABLE table_name (\`_id\` INT)
  • CREATE TABLE table_name (\`_id\` INT, KEY(_id) USING HASH)
  • CREATE TABLE table_name (column_name BIGINT UNSIGNED)
  • CREATE TABLE table_name (column_name BIGINT)
  • CREATE TABLE table_name (column_name BINARY(...))
  • CREATE TABLE table_name (column_name BIT)
  • CREATE TABLE table_name (column_name BLOB)
  • CREATE TABLE table_name (column_name CHAR(...))
  • CREATE TABLE table_name (column_name DATE)
  • CREATE TABLE table_name (column_name DATETIME)
  • CREATE TABLE table_name (column_name DECIMAL)
  • CREATE TABLE table_name (column_name DOUBLE)
  • CREATE TABLE table_name (column_name ENUM(...))
  • CREATE TABLE table_name (column_name FLOAT)
  • CREATE TABLE table_name (column_name GEOMETRY NOT NULL)
  • CREATE TABLE table_name (column_name INT UNSIGNED)
  • CREATE TABLE table_name (column_name INT)
  • CREATE TABLE table_name (column_name LONGBLOB)
  • CREATE TABLE table_name (column_name LONGTEXT)
  • CREATE TABLE table_name (column_name MEDIUMBLOB)
  • CREATE TABLE table_name (column_name MEDIUMINT UNSIGNED)
  • CREATE TABLE table_name (column_name MEDIUMINT)
  • CREATE TABLE table_name (column_name MEDIUMTEXT)
  • CREATE TABLE table_name (column_name SET(...))
  • CREATE TABLE table_name (column_name SMALLINT UNSIGNED)
  • CREATE TABLE table_name (column_name SMALLINT)
  • CREATE TABLE table_name (column_name TEXT)
  • CREATE TABLE table_name (column_name TIME)
  • CREATE TABLE table_name (column_name TIMESTAMP)
  • CREATE TABLE table_name (column_name TINYBLOB)
  • CREATE TABLE table_name (column_name TINYINT UNSIGNED)
  • CREATE TABLE table_name (column_name TINYINT)
  • CREATE TABLE table_name (column_name TINYTEXT)
  • CREATE TABLE table_name (column_name VARBINARY(...))
  • CREATE TABLE table_name (column_name VARCHAR(...))
  • CREATE TABLE table_name (column_name YEAR)
  • CREATE TABLE table_name (column_name INT AUTO_INCREMENT)
  • CREATE TABLE table_name (column_name INT PRIMARY KEY)
  • CREATE TABLE table_name (column_name DATE PRIMARY KEY)
  • CREATE TABLE table_name (column_name DATETIME PRIMARY KEY)
  • CREATE TABLE table_name (column_name DATETIME(fractional_seconds_precision) PRIMARY KEY)
  • CREATE TABLE table_name (column_name DECIMAL PRIMARY KEY)
  • CREATE TABLE table_name (column_name DECIMAL(maximum_number_digits, fractional_seconds_precision) PRIMARY KEY)
  • CREATE TABLE table_name (column_name TIME PRIMARY KEY)
  • CREATE TABLE table_name (column_name TIME(fractional_seconds_precision) PRIMARY KEY)
  • CREATE TABLE table_name (column_name TIMESTAMP PRIMARY KEY)
  • CREATE TABLE table_name (column_name TIMESTAMP(fractional_seconds_precision) PRIMARY KEY)
  • CREATE TABLE table_name (column_name YEAR PRIMARY KEY)
  • CREATE TABLE table_name (column_name1 INT, column_name2 ..., KEY (column_name1, column_name2))
  • CREATE TABLE table_name (column_name1 DOUBLE, column_name2 ..., KEY (column_name1, column_name2))
  • CREATE TABLE table_name (column_name1 FLOAT, column_name2 ..., KEY (column_name1, column_name2))
  • CREATE TABLE table_name (column_name1 CHAR(...), column_name2 ..., KEY (column_name1, column_name2))
  • CREATE TABLE table_name (column_name1 VARCHAR(...), column_name2 ..., KEY (column_name1, column_name2))
  • CREATE TEMPORARY TABLE table_name (...)
  • DELETE FROM table_name WHERE ...
  • DROP INDEX column_name ON table_name
  • DROP TABLE IF EXISTS table_name, ...
  • DROP TABLE \`table-name-with-hyphen\`, ...
  • DROP TABLE table_name, ...
  • DROP TEMPORARY TABLE table_name
  • FLUSH LOGS
  • FLUSH TABLES
  • INSERT INTO (geometry_column_name) VALUES (GeomFromText('POINT(...)'))
  • INSERT INTO ... VALUES ...
  • INSERT INTO ... VALUES ... ON DUPLICATE KEY UPDATE ... (for PRIMARY KEY)
  • INSERT INTO ... VALUES ... ON DUPLICATE KEY UPDATE ... (for UNIQUE KEY)
  • INSERT INTO table_name (column_name, ...) SELECT ... FROM other_table_name
  • REPLACE INTO table_name SELECT ... FROM other_table_name ...
  • SELECT * FROM information_schema.plugins WHERE plugin_name = "mroonga"
  • SELECT * FROM table_name
  • SELECT * FROM table_name FORCE INDEX(index_name) WHERE ...
  • SELECT * FROM table_name ORDER BY column_name ASC LIMIT ...
  • SELECT * FROM table_name ORDER BY column_name ASC
  • SELECT * FROM table_name ORDER BY column_name DESC LIMIT ...
  • SELECT * FROM table_name ORDER BY column_name DESC
  • SELECT * FROM table_name ORDER BY column_name
  • SELECT * FROM table_name WHERE MATCH(column_name) AGAINST("*D+ word ..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MATCH(column_name) AGAINST("*D- word ..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MATCH(column_name) AGAINST("*DOR word ..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MATCH(column_name) AGAINST("*W word ..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MATCH(column_name) AGAINST("+word ..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MATCH(column_name) AGAINST("..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MATCH(column_name, ...) AGAINST("..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE MBRContains(GeomFromText('LineString(...)'), geometry_column_name)
  • SELECT * FROM table_name WHERE NOT MATCH(column_name) AGAINST("..." IN BOOLEAN MODE)
  • SELECT * FROM table_name WHERE column_name < ...
  • SELECT * FROM table_name WHERE column_name <= ...
  • SELECT * FROM table_name WHERE column_name = ...
  • SELECT * FROM table_name WHERE column_name > ...
  • SELECT * FROM table_name WHERE column_name >= ...
  • SELECT * FROM table_name WHERE column_name BETWEEN ... AND ...
  • SELECT * FROM table_name WHERE column_name IN (SELECT sub_column_name FROM sub_table_name WHERE MATCH(sub_text_column_name) AGAINST("..."))
  • SELECT *, MATCH(column_name) AGAINST("..." IN BOOLEAN MODE) FROM table_name WHERE MATCH(column_name) AGAINST("..." IN BOOLEAN MODE)
  • SELECT FOUND_ROWS()
  • SELECT SQL_CALC_FOUND_ROWS * FROM table_name WHERE MATCH(...) AGAINST("..." IN BOOLEAN MODE) ORDER BY column_name LIMIT start,n_records
  • SELECT last_insert_grn_id()
  • SET binlog_format="MIXED"
  • SET binlog_format="ROW"
  • SET binlog_format="STATEMENT"
  • TRUNCATE TABLE table_name
  • UPDATE table_name SET column_name = ...

  • CREATE TABLE (...) CHARSET not_listed_charset_above
  • INSERT INTO (geometry_column_name) VALUES (GeomFromText('LineString(...)'))
  • INSERT INTO (...) VALUES (null)
  • START TRANSACTION

Basically, The character set which Groonga supports (EUC-JP/UTF-8/SJIS/LATIN1/KOI8R) is also supported in Mroonga.

If you find unlisted SQL command and confirm whether the SQL command works well or not, we want to list it. Please mail it to us or send a patch against the source file by pull request system on GitHub.

Here are the explanations of server variables that are introduced by Mroonga.

The default behavior of fulltext query error.

The default value of mroonga_action_on_fulltext_query_error is ERROR_AND_LOG. This is the conventional behavior which is equal to old version of mroonga.

Here is the list of mroonga_action_on_fulltext_query_error which you can use.

Value Description
ERROR Report an error. Logging is disabled.
ERROR_AND_LOG Report an error. Logging is enabled. (This is the default)
IGNORE Just ignore an error. Logging is disabled.
IGNORE_AND_LOG Ignore an error, but logging is enabled. (Similar to InnoDB behavior)

Here is an example SQL to confirm the value of mroonga_action_on_fulltext_query_error:

mysql> SHOW VARIABLES LIKE 'mroonga_action_on_fulltext_query_error';
+----------------------------------------+---------------+
| Variable_name                          | Value         |
+----------------------------------------+---------------+
| mroonga_action_on_fulltext_query_error | ERROR_AND_LOG |
+----------------------------------------+---------------+
1 row in set (0.00 sec)


The flags to custom syntax in MATCH () AGAINST ('...' IN BOOLEAN MODE).

This variable is system and session variable.

Here are available flags:

Flag Description
DEFAULT Equals to SYNTAX_QUERY,ALLOW_LEADING_NOT.
SYNTAX_QUERY Uses query syntax in Groonga. Query syntax in Groonga is a compatible syntax with MySQL's BOOLEAM MODE syntax. If neither SYNTAX_QUERY nor SYNTAX_SCRIPT aren't specified, SYNTAX_QUERY is used.
SYNTAX_SCRIPT Uses script syntax in Groonga. It's JavaScript like syntax. You can use full Groonga features with this syntax. If both SYNTAX_QUERY and SYNTAX_SCRIPT are specified, SYNTAX_SCRIPT is used.
ALLOW_COLUMN Allows COLUMN:... syntax in query syntax. It's not compatible with MySQL's BOOLEAM MODE syntax. You can use multiple indexes in one MATCH () AGAINST () with this syntax. MySQL can use only one index in a query. You can avoid the restriction by this syntax. You can use not only full-text search operation but also other more operations such as equal operation and prefix search operation with this syntax. See query syntax in Groonga for details.
ALLOW_UPDATE Allows updating value by COLUMN:=NEW_VALUE syntax in query syntax.
ALLOW_LEADING_NOT Allows -NOT_INCLUDED_KEYWORD ... syntax in query syntax.

The default flags is DEFAULT. It is MySQL's BOOLEAN MODE compatible syntax.

You can combine flags by separated by comma such as SYNTAX_QUERY,ALLOW_LEADING_NOT.

Here is an example SQL to use script syntax in Groonga:

mysql> SET mroonga_boolean_mode_syntax_flags = "SYNTAX_SCRIPT";


TODO:

Deprecated since version 5.04: Use mroonga_default_tokenizer instead.

The default parser of the full text search. The default value can be specified by --with-default-parser=PARSER configure argument, whose default value is TokenBigram.

Here is an example to use TokenBigramSplitSymbolAlphaDigit as a fulltext search parser. It is used by body_index fulltext index.

SET GLOBAL mroonga_default_parser=TokenBigramSplitSymbolAlphaDigit;
CREATE TABLE diaries (
  id INT PRIMARY KEY AUTO_INCREMENT,
  body TEXT,
  FULLTEXT INDEX body_index (body)
) DEFAULT CHARSET UTF8;


New in version 5.04.

The default tokenizer of the full text search. The default value can be specified by --with-default-tokenizer=TOKENIZER configure argument, whose default value is TokenBigram.

Here is an example to use TokenBigramSplitSymbolAlphaDigit as a fulltext index tokenizer. It is used by body_index fulltext index.

SET GLOBAL mroonga_default_tokenizer=TokenBigramSplitSymbolAlphaDigit;
CREATE TABLE diaries (
  id INT PRIMARY KEY AUTO_INCREMENT,
  body TEXT,
  FULLTEXT INDEX body_index (body)
) DEFAULT CHARSET UTF8;


TODO:

Whether really write data to Groonga database or not. The default value is OFF that means data are really written to Groonga database. Usually we don't need to change the value of this variable. This variable is useful for benchmark because we can measure processing time MySQL and Mroonga. It doesn't include Groonga's processing time.

Here is an example SQL to disable writing data to Groonga database:

mysql> SHOW VARIABLES LIKE 'mroonga_dry_write';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| mroonga_dry_write | OFF   |
+-------------------+-------+
1 row in set (0.00 sec)
mysql> SET mroonga_dry_write = true;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'mroonga_dry_write';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| mroonga_dry_write | ON    |
+-------------------+-------+
1 row in set (0.00 sec)


Whether enable optimization or not. The default value is ON that means optimization is enabled. Usually we don't need to change the value of this variable. This variable is useful for benchmark.

Here is an example SQL to disable optimization:

mysql> SHOW VARIABLES LIKE 'mroonga_enable_optimization';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| mroonga_enable_optimization | ON    |
+-----------------------------+-------+
1 row in set (0.00 sec)
mysql> SET mroonga_enable_optimization = false;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'mroonga_enable_optimization';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| mroonga_enable_optimization | OFF   |
+-----------------------------+-------+
1 row in set (0.00 sec)


The status of libgroonga supports LZ4.

Here is an example SQL to confirm the status of libgroonga supports LZ4:

mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_libgroonga_support_lz4';
+--------------------------------+-------+
| Variable_name                  | Value |
+--------------------------------+-------+
| mroonga_libgroonga_support_lz4 | ON    |
+--------------------------------+-------+


The status of libgroonga supports zlib.

Here is an example SQL to confirm the status of libgroonga supports zlib:

mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_libgroonga_support_zlib';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| mroonga_libgroonga_support_zlib | ON    |
+---------------------------------+-------+


The status of libgroonga supports Zstandard.

Here is an example SQL to confirm the status of libgroonga supports Zstandard:

mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_libgroonga_support_zstd';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| mroonga_libgroonga_support_zstd | ON    |
+---------------------------------+-------+


The version string of the groonga library.

Here is an example SQL to confirm the using groonga library version:

mysql> SHOW VARIABLES LIKE 'mroonga_libgroonga_version';
+----------------------------+------------------+
| Variable_name              | Value            |
+----------------------------+------------------+
| mroonga_libgroonga_version | 1.2.8-9-gbf05b82 |
+----------------------------+------------------+
1 row in set (0.00 sec)


TODO:

The path of the log file of Mroonga. The default value is groonga.log.

Here is an example transcript to change log file to /tmp/mroonga.log:

mysql> SHOW VARIABLES LIKE 'mroonga_log_file';
+------------------+-------------+
| Variable_name    | Value       |
+------------------+-------------+
| mroonga_log_file | groonga.log |
+------------------+-------------+
1 row in set (0.00 sec)
mysql> SET GLOBAL mroonga_log_file = "/tmp/mroonga.log";
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'mroonga_log_file';
+------------------+------------------+
| Variable_name    | Value            |
+------------------+------------------+
| mroonga_log_file | /tmp/mroonga.log |
+------------------+------------------+
1 row in set (0.00 sec)


The output level of Mroonga log file. The default value is NOTICE.

Here is the list of mroonga_log_level which you can use.

Log level Description
NONE No logging output.
EMERG Logging emergency messages such as database corruption.
ALERT Logging alert messages such as internal error.
CRIT Logging critical message such as deadlock.
ERROR Logging error messages such as API error which Mroonga use.
WARNING Logging warning messages such as invalid argument.
NOTICE Logging notice messages such as configuration or status changed.
INFO Logging informative messages such as file system operation.
DEBUG Logging debug messages. Recommend to use for Mroonga developer or bug reporter.
DUMP Logging dump messages.

Here is an example transcript to change log level to DEBUG that logs many messages for debugging:

mysql> SHOW VARIABLES LIKE 'mroonga_log_level';
+-------------------+--------+
| Variable_name     | Value  |
+-------------------+--------+
| mroonga_log_level | NOTICE |
+-------------------+--------+
1 row in set (0.00 sec)
mysql> SET GLOBAL mroonga_log_level = "debug";
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'mroonga_log_level';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| mroonga_log_level | DEBUG |
+-------------------+-------+
1 row in set (0.00 sec)


The path of the query log file of Mroonga. The default value is empty.

If this value is empty, the query log is not stored to file. If this value is not empty, query log is stored to the specified file.

Here is an example transcript to change query log file to /tmp/mroonga_query.log:

mysql> SHOW VARIABLES LIKE 'mroonga_query_log_file';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| mroonga_query_log_file |       |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> SET GLOBAL mroonga_query_log_file = "/tmp/mroonga.log";
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'mroonga_query_log_file';
+------------------------+------------------+
| Variable_name          | Value            |
+------------------------+------------------+
| mroonga_query_log_file | /tmp/mroonga.log |
+------------------------+------------------+
1 row in set (0.00 sec)


The threshold to determin whether match method is escalated. See search specification for Groonga about match method escalation.

The default value is the same as Groonga's default value. It's 0 for the default installation. The dafault value can be configured in my.cnf or by SET GLOBAL mroonga_match_escalation_threshold = THRESHOLD;. Because this variable's scope is both global and session.

Here is an example to use -1 as a threshold to determin whether match method is escalated. -1 means that never escalated.

SET GLOBAL mroonga_match_escalation_threshold = -1;


Here is an another example to show behavior change by the variable value.

CREATE TABLE diaries (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title TEXT,
  tags TEXT,
  FULLTEXT INDEX tags_index (tags) COMMENT 'tokenizer "TokenDelimit"'
) ENGINE=mroonga DEFAULT CHARSET=UTF8;
-- Test data
INSERT INTO diaries (title, tags) VALUES ("Hello groonga!", "groonga install");
INSERT INTO diaries (title, tags) VALUES ("Hello mroonga!", "mroonga install");
-- Matches all records that have "install" tag.
SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("install" IN BOOLEAN MODE);
-- id        title   tags
-- 1 Hello groonga!  groonga install
-- 2 Hello mroonga!  mroonga install
-- Matches no records by "gr" tag search because no "gr" tag is used.
-- But matches a record that has "groonga" tag because search
-- method is escalated and prefix search with "gr" is used.
-- The default threshold is 0. It means that no records are matched then
-- search method is escalated.
SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("gr" IN BOOLEAN MODE);
-- id        title   tags
-- 1 Hello groonga!  groonga install
-- Disables escalation.
SET mroonga_match_escalation_threshold = -1;
-- No records are matched.
SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("gr" IN BOOLEAN MODE);
-- id        title   tags
-- Enables escalation again.
SET mroonga_match_escalation_threshold = 0;
-- Matches a record by prefix search with "gr".
SELECT * FROM diaries WHERE MATCH (tags) AGAINST ("gr" IN BOOLEAN MODE);
-- id        title   tags
-- 1 Hello groonga!  groonga install


New in version 5.02.

TODO:

Whether recording operations for recover is enabled or not. The default value is OFF that means operations are not recorded to Groonga database. It needs to reopen the database with FLUSH TABLES in order to reflect the variable is changed.

Here is an example SQL to disable operations recording:

mysql> SET GLOBAL mroonga_enable_operations_recording = false;
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH TABLES;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_enable_operations_recording';
+-------------------------------------+-------+
| Variable_name                       | Value |
+-------------------------------------+-------+
| mroonga_enable_operations_recording | OFF   |
+-------------------------------------+-------+


The delimiter when outputting a vector column. The default value is a white space.

Here is an example SQL to change the delimiter to a semicolon from a white space:

mysql> SHOW VARIABLES LIKE 'mroonga_vector_column_delimiter';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| mroonga_vector_column_delimiter |       |
+---------------------------------+-------+
1 row in set (0.00 sec)
mysql> SET GLOBAL mroonga_vector_column_delimiter = ';';
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW GLOBAL VARIABLES LIKE 'mroonga_vector_column_delimiter';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| mroonga_vector_column_delimiter | ;     |
+---------------------------------+-------+


The version string of Mroonga.

Here is an example SQL to confirm the running mroonga version:

mysql> SHOW VARIABLES LIKE 'mroonga_version';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| mroonga_version | 1.10  |
+-----------------+-------+
1 row in set (0.00 sec)


New in version 7.10.

It controls how to enable condition push down support.

The default value is ONE_FULL_TEXT_SEARCH. It means that condition push down is enabled only when WHERE clause has one MATCH AGAINST condition.

Here are available values:

Value Description
NONE Never use condition push down.
ALL Always use condition push down. It's experimental for now.
ONE_FULL_TEXT_SEARCH Use condition push down only when one MATCH AGAINST condition exists in WHERE. It's the default.

Here is an example SQL to confirm the current value:

mysql> SHOW VARIABLES LIKE 'mroonga_condition_push_down_type';
+----------------------------------+----------------------+
| Variable_name                    | Value                |
+----------------------------------+----------------------+
| mroonga_condition_push_down_type | ONE_FULL_TEXT_SEARCH |
+----------------------------------+----------------------+
1 row in set (0.00 sec)


Here are the explanations of status variables that are introduced by Mroonga.

This value is increased when 'fast line count feature' is used. You can use this value to check if the feature is working when you enable it.

Here is an example how to check it:

mysql> SHOW STATUS LIKE 'Mroonga_count_skip';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Mroonga_count_skip | 0     |
+--------------------+-------+
1 row in set (0.00 sec)


This value is increased when 'fast ORDER BY LIMIT feature' is used. You can use this value to check if the feature is working when you enable it.

Here is an example how to check it:

mysql> SHOW STATUS LIKE 'Mroonga_fast_order_limit';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Mroonga_fast_order_limit | 0     |
+--------------------------+-------+
1 row in set (0.00 sec)


This value is increased when condition push down is used. You can use this value to check whether condition push down is used or not.

Here is an example how to check it:

mysql> SHOW STATUS LIKE 'Mroonga_condition_push_down';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Mroonga_condition_push_down | 1     |
+-----------------------------+-------+
1 row in set (0.00 sec)


This value shows the number of pooling contexts for mroonga_command(). These contexts are reused from multiple mroonga_command() calls. So, it's not increased continually.

Here is an example how to check it:

mysql> SHOW STATUS LIKE 'Mroonga_n_pooling_contexts';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Mroonga_n_pooling_contexts | 0     |
+----------------------------+-------+
1 row in set (0.00 sec)


Mroonga provides some useful functionality as UDF (User Defined Functions).

New in version 3.02.

mroonga_command() UDF executes the given string as a Groonga command and returns result of the Groonga command. Groonga command will be faster than MySQL query.

mroonga_command() is an UDF for advanced users. Normally, you don't need to use this UDF.

mroonga_command() has only one required parameter:

mroonga_command(command)


command is a string value. It's a Groonga command to be executed.

TODO

There is one required parameter, command.

It specifies a Groonga command to be executed.

It returns an evaluated result of the given Groonga command as a string.

Command in Groonga document.

New in version 3.08.

mroonga_escape UDF provides functionality to escape given string. It also accepts parameter what character should be escaped.

mroonga_escape() has required parameter and optional parameter:

mroonga_escape(string)
mroonga_escape(string, special_characters)


Here is the example query which use special characters to be escaped:

SELECT * FROM `symbols` WHERE MATCH(`content`) AGAINST(mroonga_escape("+hello_world()", "()") IN BOOLEAN MODE);


Here is the example about special characters which is escaped:

SELECT mroonga_escape("+-<>~*()\"\\\:");
-- \+\-\<\>\~\*\(\)\"\\\:


There is one required parameter, string.

It specifies text which you want to escape.

There is one optional parameter, special_characters.

It specifies characters to escape.

The default value is +-<>~*()"\\:.

It returns escaped string.

New in version 7.05.

mroonga_highlight_html() highlights the specified keywords in target text. It surrounds each keyword with <span class="keyword">...</span> and special characters in HTML such as < and > are escaped. You can use the result as is safely in HTML.

mroonga_highlight_html() has required parameter and optional parameter:

mroonga_highlight_html(text, query AS query)
mroonga_highlight_html(text, keyword1, ..., keywordN)


AS query is very important. You must specify it to extract keywords from query.

Here is a sample to highlight keywords "mroonga" and "groonga" in target text by query "mroonga OR groonga". You must specify AS query:

SELECT mroonga_highlight_html('Mroonga is the Groonga based storage engine.',
                              'mroonga OR groonga' AS query) AS highlighted;


Here is the result of the execution example:

+--------------------------------------------------------------------------------------------------------+
| highlighted                                                                                            |
+--------------------------------------------------------------------------------------------------------+
| <span class="keyword">Mroonga</span> is the <span class="keyword">Groonga</span> based storage engine. |
+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


Here is a sample to highlight keywords "mroonga" and "groonga" in target text by keywords "mroonga" and "groonga":

SELECT mroonga_highlight_html('Mroonga is the Groonga based storage engine.',
                              'mroonga', 'groonga') AS highlighted;


Here is the result of the execution example:

+--------------------------------------------------------------------------------------------------------+
| highlighted                                                                                            |
+--------------------------------------------------------------------------------------------------------+
| <span class="keyword">Mroonga</span> is the <span class="keyword">Groonga</span> based storage engine. |
+--------------------------------------------------------------------------------------------------------+


There is one required parameter.

The column name of string or string value to be highlighted.

There are some optional parameters.

Specify query in Groonga's query syntax.

You must specify AS query to extract keywords from query like the following:

SELECT mroonga_highlight_html('...', 'mroonga OR groonga' AS query);


Specify 0 or more keywords to be highlighted.

It returns highlighted HTML. If optional parameter is not given, it only escapes special characters in HTML such as <, > in text.

mroonga_last_insert_grn_id() UDF provides functionality to get id of last inserted record.

This id is unique in Groonga.

It returns id of last inserted record.

New in version 5.11.

mroonga_normalize() UDF normalizes text by Groonga's normalizer.

mroonga_normalize() has required parameter and optional parameter:

mroonga_normalize(string)
mroonga_normalize(string, normalizer_name)


Here is the example query which use Groonga's NormalizerAuto normalizer to be normalized:

SELECT mroonga_normalize("ABCDあぃうぇ㍑");
abcdあぃうぇリットル


Here is the example query which use Groonga's NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark normalizer to be normalized:

SELECT mroonga_normalize("aBcDあぃウェ㍑", "NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark");
ABCDあぃうぇ㍑


There is one required parameter, string.

It specifies text which you want to normalize.

There is one optional parameter, normalizer_name.

It specifies Groonga's normalizer name to normalize.

The default value is NormalizerAuto.

It returns normalized string.

New in version 2.07.

mroonga_snippet() UDF provides functionality to get keyword in context.

NOTE:

This feature is in experimental stage. So, the required arguments or value is changed without notice in the future.


mroonga_snippet() has required parameter and optional parameter:

mroonga_snippet(document,
                max_length,
                max_count,
                encoding,
                skip_leading_spaces,
                html_escape,
                snippet_prefix,
                snippet_suffix,
                word1, word1_prefix, word1_suffix,
                ...,
                [wordN, wordN_prefix, wordN_suffix])


There is a case that you want to extract keyword and surrounding text as a search results.

Snippet means 'keyword and surrounding text'. It is called 'Keyword in context'.

mroonga_snippet() function provides the way to get snippet from search results.

You can use mroonga_snippet() UDF in storage mode and wrapper mode.

mroonga_snippet() function is included in Mroonga as a User-Defined Function (UDF), but if you have not yet register it in MySQL by CREATE FUNCTION, you need to invoke the following SQL for defining a function.

mysql> CREATE FUNCTION mroonga_snippet RETURNS STRING SONAME 'ha_mroonga.so';


mroonga_snippet() function is useful for searching the text which contains keyword and associated one by using MATCH .. AGAINST syntax.

Imagine searching the document which contains 'fulltext' as a keyword. Assume that some keyword such as 'MySQL' and 'search' are associated with 'fulltext'.

mroonga_snippet() function meets above.

Here is the schema definition for execution examples(storage mode):

CREATE TABLE `snippet_test` (
  `id` int(11) NOT NULL,
  `text` text,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `text` (`text`)
) ENGINE=mroonga DEFAULT CHARSET=utf8


Here is the schema definition for execution examples(wrapper mode):

CREATE TABLE `snippet_test` (
  `id` int(11) NOT NULL,
  `text` text,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `text` (`text`)
) ENGINE=mroonga COMMENT 'engine = "innodb"' DEFAULT CHARSET=utf8


Here is the sample data for execution examples:

INSERt INTO snippet_test (id, text) VALUES (1, 'An open-source fulltext search engine and column store.');
INSERT INTO snippet_test (id, text) VALUES (2, 'An open-source storage engine for fast fulltext search with MySQL.');
INSERT INTO snippet_test (id, text) VALUES (3, 'Tritonn is a patched version of MySQL that supports better fulltext search function with Senna.');


Here is the results of execution examples:

mysql> SELECT * FROM snippet_test;
+----+-------------------------------------------------------------------------------------------------+
| id | text                                                                                            |
+----+-------------------------------------------------------------------------------------------------+
|  1 | An open-source fulltext search engine and column store.                                         |
|  2 | An open-source storage engine for fast fulltext search with MySQL.                              |
|  3 | Tritonn is a patched version of MySQL that supports better fulltext search function with Senna. |
+----+-------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
mysql> SELECT id, text, mroonga_snippet(text, 8, 2, 'ascii_general_ci', 1, 1, '...', '...<br>', 'fulltext', '<span class="w1">', '</span>', 'MySQL', '<span class="w2">', '</span>', 'search', '<span calss="w3">', '</span>') FROM snippet_test WHERE MATCH(text) AGAINST ('+fulltext' IN BOOLEAN MODE);
+----+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | text                                                                                            | mroonga_snippet(text, 8, 2, 'ascii_general_ci', 1, 1, '...', '...<br>', 'fulltext', '<span class="w1">', '</span>', 'MySQL', '<span class="w2">', '</span>', 'search', '<span calss="w3">', '</span>') |
+----+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | An open-source fulltext search engine and column store.                                         | ...<span class="w1">fulltext</span>...<br>... <span calss="w3">search</span> ...<br>                                                                                                                   |
|  2 | An open-source storage engine for fast fulltext search with MySQL.                              | ...<span class="w1">fulltext</span>...<br>... <span calss="w3">search</span> ...<br>                                                                                                                   |
|  3 | Tritonn is a patched version of MySQL that supports better fulltext search function with Senna. | ...f <span class="w2">MySQL</span> ...<br>...<span class="w1">fulltext</span>...<br>                                                                                                                   |
+----+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)


The keyword 'fulltext' and associated keyword 'MySQL' and 'search' has been extracted.

There are many required parameters.

The column name or string value is required.

The max length of snippet (bytes) is required.

The max elements of snippets (N word) is required.

The encoding of document is required. You can specify the value of encoding such as 'ascii_general_ci', 'cp932_japanese_ci', 'eucjpms_japanese_ci' and so on.

Specify whether skip leading spaces or not. Specify the value 1 for skipping leading spaces, 0 for not.

HTML escape is enabled or not. Specify the value 1 for enabling HTML escape, 0 for not.

The start text of snippet.

The end text of snippet.

There is one optional group parameter, wordN related prefix and suffix.

Specify any word.

It is the start text of wordN.

It is the end text of wordN.

It returns snippet string.

New in version 5.09.

mroonga_snippet_html() UDF provides functionality to get highlighted keyword in context.

NOTE:

This feature is in experimental stage. So, the required arguments or value is changed without notice in the future.


mroonga_snippet_html() has required parameter and optional parameter:

mroonga_snippet_html(document, key_word)


There are many required parameters.

It returns snippet string.

There are some limitations in Mroonga storage engine.

A table has the following limitations. This limitation is derived from Groonga.
  • The maximum one key size: 4KiB
  • The maximum total size of keys: 4GiB
  • The maximum number of records:
  • No primary key table: 1,073,741,815 (2 30 - 9)
  • PRIMARY KEY or PRIMARY KEY USING BTREE table: 1,073,741,823 (2 30 - 1)
  • PRIMARY KEY USING HASH table: 536,870,912 (2 29)


Keep in mind that these limitations may vary depending on conditions.

A full-text index has the following limitations. This limitation is derived from groonga and applied to each table.
  • The maximum number of distinct terms: 268,435,455 (more than 268 million)
  • The maximum index size: 256GiB

Keep in mind that these limitations may vary depending on conditions.

Confirm by SHOW TABLE STATUS or SHOW INDEX FROM ... whether your table data matches to this limitations.

There is a limitation about the value of column in storage mode.

Mroonga storage engine executes automatic conversion against the value NULL.

For example, if the value NULL is used in DATE or DATETIME columns, Mroonga storage engine automatically converts 0 into 1 as the value of month or date.

Thus, the value 0 is treated as the 1st month (January) of the year or the 1st date of the month.

And more, the value NULL is treated as the value of UNIX time 0 (1970-01-01 00:00:00).

This kind of automatic conversion is not restricted to only DATE or DATETIME types.

The value NULL is converted into the default value of columns. In most cases, it will be converted into empty string for column which belongs to type of string, 0 for column which belongs to type of numeric.

Here is an example to show behavior described above.

CREATE TABLE date_limitation (
  id INT PRIMARY KEY AUTO_INCREMENT,
  input varchar(32) DEFAULT NULL,
  date DATE DEFAULT NULL
) ENGINE=mroonga DEFAULT CHARSET=UTF8;
CREATE TABLE datetime_limitation (
  id INT PRIMARY KEY AUTO_INCREMENT,
  input varchar(32) DEFAULT NULL,
  datetime DATETIME DEFAULT NULL
) ENGINE=mroonga DEFAULT CHARSET=UTF8;
-- Test data for date_limitation
INSERT INTO date_limitation (input) VALUES ("NULL");
INSERT INTO date_limitation (input, date) VALUES ("1970-00-00", "1970-00-00");
-- Test data for datetime_limitation
INSERT INTO datetime_limitation (input) VALUES ("NULL");
INSERT INTO datetime_limitation (input, datetime) VALUES ("1970-00-00 00:00:00", "1970-00-00 00:00:00");


Here is the results of execution example:

mysql> select * from date_limitation;
+----+------------+------------+
| id | input      | date       |
+----+------------+------------+
|  1 | NULL       | 1970-01-01 |
|  2 | 1970-00-00 | 1970-01-01 |
+----+------------+------------+
2 rows in set (0.00 sec)
mysql> select * from datetime_limitation;
+----+---------------------+---------------------+
| id | input               | datetime            |
+----+---------------------+---------------------+
|  1 | NULL                | 1970-01-01 00:00:00 |
|  2 | 1970-00-00 00:00:00 | 1970-01-01 00:00:00 |
+----+---------------------+---------------------+
2 rows in set (0.00 sec)


A column has the following limitation.
The maximum stored data size of a column: 256GiB

Mroonga can perform boolean full text searches using the IN BOOLEAN MODE modifier for MATCH AGAINST:

SELECT ... WHERE MATCH(column) AGAINST ('...' IN BOOLEAN MODE);


Normally, IN BOOLEAN MODE is suitable rather than the default IN NATURAL LANGUAGE MODE. Because IN BOOLEAN MODE is similar to query in Web search engine. Most people familiar with query in Web search engine.

You can use qualifiers which MySQL support and Mroonga original pragmas in boolean full text search query.

These qualifiers and pragmas can change the relative rank of search results.

In the case of a search string not using neither a qualifier nor a pragma, the search results that contain the search string will be rated higher.

Here are schema and data to show examples:

CREATE TABLE books (
  `id` INTEGER AUTO_INCREMENT,
  `title` text,
  PRIMARY KEY(`id`),
  FULLTEXT INDEX title_index (title)
) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4;
INSERT INTO books (title) VALUES ('Professional MySQL');
INSERT INTO books (title) VALUES ('MySQL for Professional');
INSERT INTO books (title) VALUES ('Mroonga = MySQL + Groonga');


Here are supported qualifiers.

No operator between keywords such as KEYWORD1 KEYWORD2 indicates that one of keywords must be present in each row that is returned.

Mroonga for query means that Mroonga or for must be present:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('Mroonga for' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | Mroonga = MySQL + Groonga |
-- | MySQL for Professional    |
-- +---------------------------+


OR (must be uppercase) indicates that left hand side keyword or right hand side keyword must be present in each row that is returned.

Mroonga OR for query means that Mroonga or for must be present:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('Mroonga OR for' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | Mroonga = MySQL + Groonga |
-- | MySQL for Professional    |
-- +---------------------------+


OR is the default operator. You can omit it. Both Mroonga OR for and Mroonga for return the same result.

A leading plus sign indicates that this word must be present in each row that is returned.

+MySQL +Mroonga query means that both MySQL and Mroonga must be present:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('+MySQL +Groonga' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | Mroonga = MySQL + Groonga |
-- +---------------------------+


A leading minus sign indicates that this word must not be present in any of the rows that are returned.

+MySQL -Mroonga query means that MySQL must be present but Mroonga must not be present:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('+MySQL -Mroonga' IN BOOLEAN MODE);
-- +------------------------+
-- | title                  |
-- +------------------------+
-- | Professional MySQL     |
-- | MySQL for Professional |
-- +------------------------+


A following asterisk indicates that all words starting with this word must be present in any of the rows that are returned.

+M* query means that words starting M (MySQL and Mroonga in this case) must be present:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('+M*' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | Mroonga = MySQL + Groonga |
-- | Professional MySQL        |
-- | MySQL for Professional    |
-- +---------------------------+


NOTE:

To be precise, "word" may not be "word" you think. "word" in this context is "token". "token" may not be word. For example, tokens in "It's" are "It", "'" and "s".

You can confirm token by mroonga_command() and tokenize:

SELECT mroonga_command('tokenize TokenBigram "It''s" NormalizerMySQLGeneralCI');
-- +--------------------------------------------------------------------------+
-- | mroonga_command('tokenize TokenBigram "It''s" NormalizerMySQLGeneralCI') |
-- +--------------------------------------------------------------------------+
-- | [                                                                        |
-- |   {                                                                      |
-- |     "value":"IT",                                                        |
-- |     "position":0,                                                        |
-- |     "force_prefix":false                                                 |
-- |   },                                                                     |
-- |   {                                                                      |
-- |     "value":"'",                                                         |
-- |     "position":1,                                                        |
-- |     "force_prefix":false                                                 |
-- |   },                                                                     |
-- |   {                                                                      |
-- |     "value":"S",                                                         |
-- |     "position":2,                                                        |
-- |     "force_prefix":false                                                 |
-- |   }                                                                      |
-- | ]                                                                        |
-- +--------------------------------------------------------------------------+


JSON value in the above result is formatted by hand.



Quoting phrase by double quote (") indicates that the phrase must be present in any of the rows that are returned.

+"Professional MySQL" query means that Professional MySQL phrase must be present. The query doesn't match to MySQL for Profession. MySQL for Profession includes both MySQL and Professional words but doesn't include Professional MySQL phrase:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('+"Professional MySQL"' IN BOOLEAN MODE);
-- +--------------------+
-- | title              |
-- +--------------------+
-- | Professional MySQL |
-- +--------------------+


Parentheses groups expressions.

+(Groonga OR Mroonga) +MySQL query means the following:

  • Groonga or Mroonga must be present.
  • MySQL must be present.



Here is the result of the query:

SELECT title
  FROM books
 WHERE MATCH(title) AGAINST('+(Groonga OR Mroonga) +MySQL' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | Mroonga = MySQL + Groonga |
-- +---------------------------+


Pragma is metadata for query. You can change how to parse query by specifying pragma.

You can embed pragma at the head of query for specifying how to execute.

Pragma must exist in the beginning of a query. Don't put a blank into a head of the query. Pragma starts with *:

SELECT MATCH AGAINST('*PRAGMA ...' IN BOOLEAN MODE);


You can specify multiple pragmas:

SELECT MATCH AGAINST('*PRAGMA1PRAGMA2 ...' IN BOOLEAN MODE);


Here are available pragmas.

D pragma indicates the default operator. It's used when an individual operator is omitted.

Here is the D pragma syntax. You can choose one of OR, + or - as ${OPERATOR}:

*D${OPERATOR}


DOR means that "or" is used as the default operator.

This is the default.

Here is an example to use DOR. '*DOR for Mroonga' IN BOOLEAN MODE returns records that includes for or Mroonga:

SELECT title
 FROM books
WHERE MATCH (title) AGAINST('*DOR for Mroonga' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | MySQL for Professional    |
-- | Mroonga = MySQL + Groonga |
-- +---------------------------+


D+ means that "and" is used as the default operator. It's similar to query in Web search engine.

Here is an example to use D+. '*D+ MySQL Mroonga' IN BOOLEAN MODE returns records that includes MySQL and Mroonga:

SELECT title
 FROM books
WHERE MATCH (title) AGAINST('*D+ MySQL Mroonga' IN BOOLEAN MODE);
-- +---------------------------+
-- | title                     |
-- +---------------------------+
-- | Mroonga = MySQL + Groonga |
-- +---------------------------+


D- means that "not" is used as the default operator.

Here is an example to use D-. '*D- MySQL Mroonga' IN BOOLEAN MODE returns records that includes MySQL but doesn't include Mroonga:

SELECT title
 FROM books
WHERE MATCH (title) AGAINST('*D- MySQL Mroonga' IN BOOLEAN MODE);
-- +------------------------+
-- | title                  |
-- +------------------------+
-- | Professional MySQL     |
-- | MySQL for Professional |
-- +---------------------------+


W pragma indicates target section and its weight for multiple column index.

You can specify different weight for each section. The default weight is 1. 1 means that no weight.

Here is the W pragma syntax. ${SECTION} is a number that is begun not from 0 but from 1. ${WEIGHT} is omitable:

*W[${SECTION1}[:${WEIGHT1}]][,${SECTION2}[:${WEIGHT2}]][,...]


Here are schema and data to show examples. You need to create a multiple column index to use W pragma:

CREATE TABLE memos (
  `id` INTEGER AUTO_INCREMENT,
  `title` text,
  `content` text,
  PRIMARY KEY(`id`),
  FULLTEXT INDEX text_index (title, content)
) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4;
INSERT INTO memos (title, content) VALUES (
  'MySQL', 'MySQL is a RDBMS.'
);
INSERT INTO memos (title, content) VALUES (
  'Groonga', 'Groonga is a full text search engine.'
);
INSERT INTO memos (title, content) VALUES (
  'Mroonga', 'Mroonga is a storage engine for MySQL based on Groonga.'
);


Here is an example to show how to use weight. title column has 10 weight and content columns has 1 weight. It means that keyword in title column is 10 times important than keyword in content column:

SELECT title,
       content,
       MATCH (title, content) AGAINST('*W1:10,2:1 +Groonga' IN BOOLEAN MODE) AS score
  FROM memos;
-- +---------+--------------------------------------------------------+-------+
-- | title   | content                                                | score |
-- +---------+--------------------------------------------------------+-------+
-- | MySQL   | MySQL is a RDBMS.                                      |     0 |
-- | Groonga | Groonga is a full text search engine.                  |    11 |
-- | Mroonga | Mroonga is a storage engine for MySQL based on Groonga |     1 |
-- +---------+--------------------------------------------------------+-------+


The score of the first record is 0. Because it doesn't have any Groonga in both title column and content column.

The score of the second record is 11. Because it has Groonga in both title column and content column. Groonga in title column has score 10. Groonga in content column has score 1. 11 is sum of them.

The score of the third record is 1. Because it has Groonga in only content column. `Groonga in content column has score 1. So the score of the record is 1.

S pragma indicates syntax of the query.

Here is a syntax of S pragma:

*S${SYNTAX}


Here is a list of available syntax:

S: Script syntax



You can use script syntax by *SS pragma. You can use full Groonga search features in script syntax.

Here are schema and data to show example of script syntax usage:

CREATE TABLE comments (
  `content` text,
  FULLTEXT INDEX content_index (content)
) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4;
INSERT INTO comments VALUES (
  'A student started to use Mroonga storage engine. It is very fast!'
);
INSERT INTO comments VALUES (
  'Another student also started to use Mroonga storage engine. It is very fast!'
);


Here is an example to use near search by script syntax:

SELECT content,
       MATCH (content) AGAINST('*SS content *N "student fast"' IN BOOLEAN MODE) AS score
  FROM comments;
-- +------------------------------------------------------------------------------+-------+
-- | content                                                                      | score |
-- +------------------------------------------------------------------------------+-------+
-- | A student started to use Mroonga storage engine. It is very fast!            |     1 |
-- | Another student also started to use Mroonga storage engine. It is very fast! |     0 |
-- +------------------------------------------------------------------------------+-------+


Near search matches only when there are 10 or less words between specified words (student and fast in this case). So student started ...(8 words)... very fast is matched but student also started ...(8 words)... very fast isn't matched.

You can also use other advanced features.

The search score of Mroonga in natural language mode is a similarity score between query and document. Mroonga's scoring algorithm is as follows:
1.
Splitting a query into tokens.
2.
Removing not matched tokens.
3.
Calculating weight per token.
4.
Getting the top N weight tokens.
5.
Summing up the weight per token which exists in a document, and is one of the top N weight tokens. The total weight is the similarity score for the document and the query.



We are showing by example. The first thing we create a table and insert data as follows:

SET NAMES UTF8;
CREATE TABLE diaries (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  content TEXT,
  FULLTEXT INDEX(content)
) ENGINE mroonga DEFAULT CHARSET UTF8;
INSERT INTO diaries (content) VALUES("It'll be fine tomorrow as well.");
INSERT INTO diaries (content) VALUES("It'll rain tomorrow.");
INSERT INTO diaries (content) VALUES("It's fine today. It'll be fine tomorrow as well.");
INSERT INTO diaries (content) VALUES("It's fine today. But it'll rain tomorrow.");


We use a query "fine today" to search. The search result is as follows:

mysql> SELECT *, MATCH (content) AGAINST ("fine today") AS score
    ->        FROM diaries
    ->        WHERE MATCH (content) AGAINST ("fine today")
    ->        ORDER BY MATCH (content) AGAINST ("fine today") DESC;
+----+--------------------------------------------------+--------+
| id | content                                          | score  |
+----+--------------------------------------------------+--------+
|  3 | It's fine today. It'll be fine tomorrow as well. | 131073 |
|  4 | It's fine today. But it'll rain tomorrow.        | 131073 |
+----+--------------------------------------------------+--------+
2 rows in set (0.01 sec)


Now, let us explain how to make the result score "131073".

The default tokenizer splits up the query "fine today" into two tokens as follows:
  • fine
  • today

We have nothing to do in this case because the tokens exist in documents.
  • fine: include in document id=1,3,4
  • today: include in document id=3,4

  • fine: 116508(= 1048576 / 9)
  • today: 131072(= 1048576 / 8)

1048576(= 2 ** 20) denotes the number of tokens in all documents. We should get the number from the table, but as a matter of fact, the fixed value is used for simplicity.

In the fraction 1048576 / 8, the denominator "8" indicates that the number of document which contains token "today".

In the fraction 1048576 / 9, the denominator "9" means the number of document which contains token "fine".

The denominator "8" and "9" are approximate number, strictly speaking, "2" is the correct answer for the number of document that contains "today", "3" is the correct answer for the number of document that contains "fine".

You can check the approximate number of token by using the following query.

SELECT mroonga_command("select diaries-content --query '_key:fine OR _key:today' --output_columns _key, index --limit -1") AS groonga_response;


The retrieval result of the above query is as follows:

[[[2],[["_key","ShortText"],["index","diaries"]],["FINE",9],["TODAY",8]]]


The formula for the N is "the number of occurrence of token in document / 8 + 1"

In this case, N = 2 / 8 + 1 ≒ 1

The sorted tokens are as follows:

  • today: 131072(= 1048576 / 8)
  • fine: 116508(= 1048576 / 9)

N = 1, then we get "today: 131072(= 1048576 / 8)".

today: include in document id=3,4

Finally document id 3,4 are hit, the similarity score between query and document(id=3) is 131072 + 1 = 131073 ("1" is the number of occurrence of token "today" in document).

The similarity score between query and document(id=4) is the same as the score between query and document(id=3).

There are some frequently asked questions for Mroonga. This section describes about them.

Increase vm.max_map_count Linux kernel parameter value.

There is a case following mmap error in log file:

2013-06-04 08:19:34.835218|A|4e86e700|mmap(4194304,551,432017408)=Cannot allocate memory <13036498944>


Note that <13036498944> means total size of mmap (almost 12GB) in this case.

So you need to confirm following point of views.

  • Are there enough free memory?
  • Are maximum number of mappings exceeded?

To check there are enough free memory, you can use vmstat command.

To check whether maximum number of mappings are exceeded, you can investigate the value of vm.max_map_count.

If this issue is fixed by modifying the value of vm.max_map_count, it's exactly the reason.

As Groonga allocates memory chunks each 256KB, you can estimate the size of database you can handle by following formula:

(database size) = vm.max_map_count * (memory chunks)


If you want to handle over 16GB Groonga database, you must specify at least 65536 as the value of vm.max_map_count:

database size (16GB) = vm.max_map_count (65536) * memory chunks (256KB)


You can modify vm.max_map_count temporary by sudo sysctl -w vm.max_map_count=65536.

Then save the configuration value to /etc/sysctl.conf or /etc/sysctl.d/*.conf.

There are some places for sharing Groonga and Mroonga information. We welcome you to join our community.

There are mailing lists for discussion about Groonga and Mroonga.
For English speakers
groonga-talk@lists.sourceforge.net
For Japanese speakers
groonga-dev@lists.osdn.me

There are chat rooms for discussion about Groonga and Mroonga.
For English speakers
groonga/en chat room on Gitter
For Japanese speakers
groonga/ja chat room on Gitter

@groonga tweets Groonga and Mroonga related information.

Please follow the account to get the latest Groonga and Mroonga related information!

Mroonga page on Facebook shares Mroonga related information.

Please like the page to get the latest Mroonga related information!

We welcome your contributions to the Mroonga Project. There are many ways to contribute, such as using Mroonga, introduction to others, etc. For example, if you find a bug when using Mroonga, you are welcome to report the bug. Coding and documentation are also welcome for Mroonga and its related projects.
As a user:
If you are interested in Mroonga, please read this document and try it.
As a spokesman:
Please introduce Mroonga to your friends and colleagues.
As a developer: Bug report, development and documentation
This section describes the details.

There are two ways to report a bug:
  • Submit a bug to the issue tracker
  • Report a bug to the mailing list



You can use either way It makes no difference to us.

Mroonga project uses GitHub issue tracker.

You can use English or Japanese to report a bug.

Mroonga project has Community for discussing about Mroonga. Please send an Email that describes a bug.

There may be a case that Mroonga doesn't work as expected. It may be a crash bug or SQL returns unexpected search results. To solve such a issues with you, We need a reproducible bug report.

For a reproducible bug report, it is better to report with the following information as much as you can because it is useful to investigate the reason which causes the bug.

Name Description
Environment information Installed packages information to detect what version is used. rpm: 0.0 3.5 rpm -qa > INSTALLED.txt 168u 168u deb: 0.0 3.5 dpkg -l > INSTALLED.txt 168u 168u On Windows, show us what zip package is used.


Executing query which triggers the issue It shows what you did.
Sample schema and data It is recommended to provide a minimum schema and sample data to reproduce the issue. It helps to solve the issue sooner.
MySQL error log It may include backtrace information why crashed unexpectedly. It may exists on /var/log/mysqld.log for example. (It depends on log-error configuration)
groonga.log It may include low level error information. It may exists on /var/lib/mysql/groonga.log for example. (It depends on mroonga_log_file configuration)

There is a case that it is not enough to reproduce a bug with above information. For such a case, additional information is required.

Name Description
MySQL dump It may be required to construct same database from scratch. It may be generated by mysqldump -u USER -pPASSWORD --opt --flush-logs --single-transaction --master-data=2 --default-character-set=utf8 --hex-blob --databases DATABASE > dump.sql and so on.
MySQL binlog If dump data is enough to reproduce the issue, it is not required. It may exists on /var/lib/mysql/mysql-bin.* for example. (It depends on log-bin configuration)
MySQL data directory (Offline backup) If dump data and binlog pair is lost, It may be required. It may exists on /var/lib/mysql/ for example. (It depends on datadir configuration)

We use Sphinx for documentation tool.

具体的にどんなことをやればいいのかを説明します。基本的にはドキュメントのソースファイルごとにpull requestを送ってもらうと進めやすいです。

対象となるファイルはdoc/sourceディレクトリ以下の拡張子が「.rst」となっているファイルです。

あまりGitHubでの作業に慣れていなくてもできるように、「最初にやること」と「作業ごとにやること」、「ファイルごとにやること」に分けて順に説明します。

  • The things you must do at first
  • The things you need to do every tasks
  • The things you need to do every files

以下では、最初に一度だけ実施しておけば良いことを説明します。

TODO

まずは、gitの設定をしましょう。すでにある程度gitを使っている場合には初期設定はすでに完了しているかも知れません。その場合には飛ばして構いません。:

% git config --global user.name "Your Name"
% git config --global user.email "Email address"


上記はコミットログに使われます。公開しても差し支えないユーザ名もしくはメールアドレスを設定します。

First, create GitHub account. If your GitHub account is ready, login to GitHub and access following URL.
Fork the Mroonga repository

Fork リポジトリ選択画面でご自分のリポジトリへとforkしてください。

Clone Mroonga repository to working directory. Don't forget to do "Git configuration".:

% git clone git@github.com:(YOUR_GITHUB_ACCOUNT)/mroonga.git
% cd mroonga
% git remote add upstream git@github.com:mroonga/mroonga


Execute following commands to prepare for generating Mroonga documentation:

% ./autogen.sh
% ./configure --enable-document --with-mysql-source=(SOURCE_DIRECTORY_OF_MySQL)


Next step is "The things you need to do every tasks".

以下では作業ごとにやることを説明します。

Mroonga本家の最新状態に追従して、作業がかぶらないようにします。:

% git fetch --all
% git checkout master
% git rebase upstream/master


最新の状態に追従できたら、「ファイルごとにやること」へと進みます。

以下では、例えば http://mroonga.org/docs/characteristic.html を更新する場合で説明します。作業対象となるファイルは、リポジトリのdoc/source/ディレクトリ以下にあり拡張子が.rstなファイルです。今回は、doc/source/characteristic.rstを変更する例で説明します。

Create a working branch. Use meaningful branch name.
% git checkout -b use-capitalized-notation-characteristic


Fix typos, styles or write a new document for Mroonga.

マークアップに問題がないか、HTMLを確認します。HTMLを生成するには以下のコマンドを実行します。:

% cd doc/locale/en
% make html


いつも使っているブラウザで該当ファイルを確認して、変更した内容が反映されていればOKです。:

% firefox html/characteristic.html


HTMLに問題がないことを確認できたら、コミットします。:

% cd ${cloneしたディレクトリーのトップディレクトリー}
% git add doc/source/characteristic.rst
% git commit


コミットするときのメッセージについては、例えば以下のようにします。:

doc: use "Mroonga" notation


Publish your changes to your own GitHub repository:

% git push -u origin use-capitalized-notation-characteristic


Note that use-capitalized-notation-characteristic is already created branch in advance.

ブラウザで https://github.com/(GitHubのアカウント)/mroonga を開くと「 @use-capitalized-notation-characteristic@ 」ブランチをpull requestする!みたいなUIができているので、そこのボタンを押してpull requestしてください。入力フォームがでてきますが、コミットしたときメッセージで十分なのでそのままpull requestしてOKです!

これで、ひととおりの作業は完了しました。

When you build software for debugging, you can get more information like symbol resolutions in gdb. So we build both MySQL and Mroonga for debugging in development.

NOTE:

If you build one of them for debugging, the size of structures etc. might be different, and you might not be able to load Mroonga, or assertions don't work in running.


As you can see in MySQL :: MySQL 5.5 Reference Manual :: 2.9.2 Installing MySQL from a Standard Source Distribution, you can build MySQL for debugging by passing -DWITH_DEBUG=yes option in CMAKE options.

The procedure from download to build is the following.

% mkdir -p ~/work/
% cd ~/work/
% wget http://ftp.jaist.ac.jp/pub/mysql/Downloads/MySQL-5.5/mysql-5.5.13.tar.gz
% tar xvzf mysql-5.5.13.tar.gz
% cd mysql-5.5.13
% cmake . -DCMAKE_INSTALL_PREFIX=/tmp/local -DWITH_DEBUG=yes
% make


You can build Mroonga for debugging by passing --with-debug in configure options.

The procedure from cloning repository to build is the following.

% cd ~/work/
% git clone git@github.com:mroonga/mroonga.git
% cd mroonga
% ./autogen.sh
% ./configure CFLAGS="-g3 -O0" CXXFLAGS="-g3 -O0" --with-debug --prefix=/tmp/local --with-mysql-source=$HOME/work/mysql-5.5.13 --with-mysql-config=$HOME/work/mysql-5.5.13/scripts/mysql_config
% make


When you successfully build both, please invoke tests like the following. If you get [pass] for all tests, you succeeded to build for debugging.

% test/run-sql-test.sh


run-sql-test.sh is our friend for debugging. Here we show some examples of its usage.

When you invoke run-sql-test.sh without any option, all tests under mysql-test/mroonga will be invoked.

So if you want to run certain tests only, you can specify the test name in --do-test option.

./test/run-sql-test.sh --do-test=foobar


When you run tests by adding --debug option like the following, function calls information is recorded.

./test/run-sql-test.sh --debug --do-test=foobar


This information is stored in ${MySQL's working directory}/${MySQL version}/mysql-test/var/log/mysqld.1.trace.

When you add a new function, it would be a good idea to put it in the beginning of MRN_DBUG_ENTER_FUNCTION function and record its calls.

By adding --gdb option, you can debug with GDB when you run tests.

./test/run-sql-test.sh --gdb


Here is the requirements about release procedure.
  • Debian GNU/Linux (sid)
  • Use zsh as user shell

Use the following working directories.

  • MROONGA_DIR=$HOME/work/mroonga
  • MROONGA_CLONE_DIR=$HOME/work/mroonga/mroonga.clean
  • MROONGA_GITHUB_COM_PATH=$HOME/work/mroonga/mroonga.github.com
  • CUTTER_DIR=$HOME/work/cutter
  • CUTTER_SOURCE_PATH=$HOME/work/cutter/cutter
  • GROONGA_SOURCE_PATH=$HOME/work/groonga/groonga.clean
  • PACKAGES_GROONGA_ORG_REPOSITORY=$HOME/work/groonga/packages.groonga.org

Install the following packages:

% sudo apt-get install -V ruby mecab libmecab-dev gnupg2 dh-autoreconf python-sphinx bison


Summarize recent changes since the previous release into doc/source/news.txt. This summary is also used in release note.

Execute the following command to collect change logs since the previous version:

% git log -p --reverse $(git tag | tail -1)..


Search the logs by ^commit, and pick up collect remarkable changes.

Should be included

  • The changes which affect to users
  • The changes which broke compatibility

Shoud not be included

The changes which doesn't affect to users (Internal source code changes or refactoring)

Because of master branch doesn't include configure script, then it is required to generate configure script for building.

Execute autogen.sh with the following command:

% sh autogen.sh


It generates configure script.

For generating Makefile, execute the configure script.

Execute with the following options for the release:

% ./configure \
    --enable-document \
    --prefix=/tmp/local \
    --with-launchpad-uploader-pgp-key=(The key ID that is registered Launchpad) \
    --with-mroonga-github-com-path=$MROONGA_GITHUB_COM_PATH \
    --with-cutter-source-path=$CUTTER_SOURCE_PATH \
    --with-groonga-source-path=$GROONGA_SOURCE_PATH \
    --with-mysql-source=(The directory of MySQL source code) \
    --with-mysql-build=(The build directory of MySQL) \
    --with-mysql-config=(The path to mysql_config command)


See Others for the details of --with-mysql-source option.

NOTE:

We can not upload and override the same name packages to PPA repository. Therefore, we upload packages to a stable repository after confirming the successful build of they in the nightly repository.


Check whether you can login to packages.groonga.org as packages user.

You can check with the following command whether you can login:

% ssh packages@packages.groonga.org


If you can't login to packages.groonga.org, you must be registered ssh public key.

Execute make update-latest-release command with OLD_RELEASE_DATE, NEW_RELEASE_DATE.

When 9.09 release, we executed the following command:

% make update-latest-release OLD_RELEASE=9.09 OLD_RELEASE_DATE=2019-09-27 NEW_RELEASE_DATE=2019-10-30


This command updates some html files (which is used for web sites of Mroonga - index.html,ja/index.html) and the version of spec file or debian/changelog entry.

We confirm the results of all the below tests before setting the tag to Mroonga. Because if we will find problems in Mroonga after setting the tag to it, we must release it again.
GitHub Actions

Execute the following command for tagging:

% make tag
% git push --tags origin


NOTE:

After tagging for the release, execute configure script. This tag information is reflected when generating the documents.


Then, create archive file (tar.gz) for distribution:

% make dist


Change working directory to packages/source:

% cd packages/source


Execute make download for syncing with the upstream:

% make download


Execute make archive for generating source archive:

% make archive


Execute make upload for uploading archive file:

% make upload


As a result, tar.gz archive file is available from https://packages.groonga.org/source/mroonga/.

Create Linux and Windows packages.

Change working directory to packages:

% cd packages


Execute the following command:

% rake apt


Now we finish build and upload packages to https://packages.groonga.org/. However, these packages are unsigned. We sign packages by executing the below commands:

% cd $PACKAGES_GROONGA_ORG_REPOSITORY
% rake apt


For Ubuntu, packages are provided by PPA on launchpad.net.

Change working directory to packages and execute rake ubuntu:upload command:

% cd packages
% rake ubuntu


When upload packages was succeeded, package build process is executed on launchpad.net. Then build result is notified via E-mail. You can install packages via Groonga PPA on launchpad.net:

https://launchpad.net/~groonga/+archive/ubuntu/ppa


Change working directory to packages

% cd packages


Execute the following command:

% rake yum


Now we finish build and upload packages to https://packages.groonga.org/. However, these packages are unsigned. We sign packages by executing the below commands:

% cd $PACKAGES_GROONGA_ORG_REPOSITORY
% rake yum


For windows packages, we use artifacts of GitHub release page .

1.
Clone mroonga.github.com repository
2.
Execute make update-document as below in $MROONGA_CLONE_DIR:

% make update-document DOCUMENT_VERSION=xx.xx_VERSION DOCUMENT_VERSION_FULL=xx.xx


3.
Commit changes in mroonga.github.com repository && push them

We update the below files.
  • $MROONGA_GITHUB_COM_PATH/ja/_posts/(the date of release)-mroonga-(version).md
  • $MROONGA_GITHUB_COM_PATH/en/_posts/(the date of release)-mroonga-(version).md

We can confirm contents of blog on Web browser by using Jekyll.:

% jekyll serve --watch


We access http://localhost:4000 on our web browser for confirming contents.

NOTE:

If we want private to blog contents, we set false on published: in .md file.:

---
layout: post.en
title: Mroonga 10.01 has been released!
description: Mroonga 10.01 has been released!
published: false
---




Send release announce for each mailing list
  • ml@mysql.gr.jp for Japanese
  • groonga-dev@lists.osdn.me for Japanese
  • groonga-talk@lists.sourceforge.net for English

Click Tweet link in Mrooga blog entry. You can share tweet about latest release. If you use tweet link, title of release announce and URL is embedded into your tweet.

Execute sharing tweet in Japanese and English version of blog entry. Note that this tweet should be done when logged in by groonga account.

We announce release from Mroonga group in Facebook.

https://www.facebook.com/mroonga/

Bump version to the latest release:

% make update-version NEW_VERSION_MAJOR=9 NEW_VERSION_MINOR=1 NEW_VERSION_MICRO=0


TODO: Translate.

一般的に1つのコードベースに複数のスタイルがまざっているとソースコードが読みづらくなります。たとえ、それぞれのスタイル単独では読みやすいスタイルあっても、まざると読みづらくなります。そのため、Mroongaプロジェクトでもスタイルを統一して、読みやすいソースコードになるようにします。

読みやすいソースコードにする理由は以下の通りです。

  • 機能を拡張するときのコストを下げる。
  • 問題を修正するときのコストを下げる。



どちらの場合も周辺のソースコードを読んで、それをベースにコードを追加・変更します。このとき、ソースコードが読みやすい状態だと周辺のソースコードの把握をスムーズに行うことができ、スムーズにその後の作業に移れます。

TODO: 読みやすさの他にデバッグのしやすさ(gdbでの追いやすさ)も考慮に入れたほうがよさそうだがどうしよう。

基本的にすべてC++で記述します。よほどのことがない限りCは使いません。

よい例:

ha_mroonga.cpp


悪い例(C言語を使っている):

mrn_sys.c


ソースコードのファイル名は全て小文字にします。また、単語ごとに"_"で区切ります。

よい例:

ha_mroonga.cpp


悪い例(大文字を使っている):

HA_MROONGA.cpp


悪い例(単語を"_"で区切らずにくっつけている):

hamroonga.cpp


悪い例(単語を"-"で区切っている):

ha-mroonga.cpp


ソースコードの拡張子 .cpp にします。

よい例:

ha_mroonga.cpp


悪い例( .cc を使っている):

ha_mroonga.cc


ヘッダーファイルの拡張子は .hpp にします。

よい例:

ha_mroonga.hpp


悪い例( .h を使っている):

ha_mroonga.h


ヘッダーファイルでは using namespace を使わない。ソースコードでは using namespace std であれば使ってもよい。他の名前空間は使ってはいけない。

よい例:

ha_mroonga.cpp:
  using namespace std;


悪い例(ヘッダーファイル内で using namespace を使っている):

ha_mroonga.hpp:
  using namespace std;


悪い例( std 以外の名前空間に対して using namespace を使っている):

ha_mroonga.cpp:
  using namespace zmq;


2重 include を防ぐためのマクロは、ヘッダーファイルの名前をすべて大文字にし、単語の区切りをアンダースコアにしたものにし、最後にアンダースコアをつけて HEADER_FILE_NAME_HPP_ という名前にする。

よい例:

mrn_db_path.hpp:
  #ifndef MRN_DB_PATH_HPP_
  #define MRN_DB_PATH_HPP_
  ...
  #endif // MRN_DB_PATH_HPP_


悪い例(小文字になっている):

mrn_db_path.hpp:
  #ifndef mrn_db_path_hpp_
  #define mrn_db_path_hpp_
  ...
  #endif // mrn_db_path_hpp_


悪い例(アンダースコアが最後ではなく先頭についている):

mrn_db_path.hpp:
  #ifndef _MRN_DB_PATH_HPP
  #define _MRN_DB_PATH_HPP
  ...
  #endif // _MRN_DB_PATH_HPP


= の前後に1つスペースを入れる。

よい例:

int i = 0;


悪い例(スペースが入っていない):

for (i=0; i<10; ++i) {...}


悪い例(2つスペースが入っている):

int i  = 0;


文字列はポインタと長さで表現する。 \0 での終端を仮定しない。

よい例(本当はもっとすっきりした例がよいけど。。。):

char *raw_data = "table_name column_name column_value"
char *column_name;
size_t column_name_size;
column_name = raw_data + strlen("table_name ");
column_name_size = strlen("column_name");


悪い例(無理やり \0 終端にしている):

char *raw_data = "table_name column_name column_value"
char *column_name;
column_name = strndup(raw_data + strlen("table_name "), strlen("column_name"));


ただし、ファイル名など \0 が前提であるものに関しては \0 終端を仮定してよい。

よい例:

char *database_path = "db/test.mrn";


悪い例( \0 終端を仮定せず、長さも管理している):

char *database_path = "db/test.mrn";
size_t database_path_size = strlen("db/test.mrn");


std::string は内部でメモリ確保などの処理が発生するので多用しない。

よい例:

char database_path[MAX_PATH];


悪い例(最大サイズがわかっているのに std::string を使っている):

std::string database_path;


TODO: ちゃんと考える。

何度も繰り返し使う領域(バッファ)が必要な場合は GRN_BULK を使う。例えば、カラムの値を取得する領域などである。

クラスの名前は UpperCamelCase とする。

よい例:

class MyClass
{
}


悪い例( snail_case である):

class my_class
{
}


ただし、 ha_mroonga などMySQLとのインターフェイスとなるクラスでかつ他の類似のモジュールに命名規則がある場合はそれに従う。

よい例:

class ha_mroonga: public handler
{
}


悪い例( UpperCamelCase になっている):

class HaMroonga: public handler
{
}


メンバー変数名は snail_case とし、末尾にアンダースコア( _ )を付ける。

よい例:

class MyClass
{
  char *my_name_;
}


悪い例( UpperCamelCase である):

class MyClass
{
  char *MyName_;
}


悪い例(末尾にアンダースコアがない):

class MyClass
{
  char *my_name;
}


メンバー変数の値を読み込むメソッドの名前はメンバー変数名の末尾のアンダースコアを除いたものにする。

よい例:

class MyClass
{
  char *my_name_;
  const char *my_name() {return my_name_;};
}


悪い例(末尾にアンダースコアが残っている):

class MyClass
{
  char *my_name_;
  const char *my_name_() {return my_name_;};
}


悪い例(先頭に get_ を付けている):

class MyClass
{
  char *my_name_;
  const char *_my_name() {return my_name_;};
}


メンバー変数の値を設定するメソッドの名前は、メンバー変数名の末尾のアンダースコアを除き、先頭に set_ を加えたものにする。

よい例:

class MyClass
{
  unsigned int age_;
  void set_age(unsigned int age)
  {
    age_ = age;
  };
}


悪い例(末尾にアンダースコアが残っている):

class MyClass
{
  unsigned int age_;
  void set_age_(unsigned int age)
  {
    age_ = age;
  };
}


悪い例(先頭に set_ ではなく update_ を付けている):

class MyClass
{
  unsigned int age_;
  void update_age(unsigned int age)
  {
    age_ = age;
  };
}


基本的にコピーコンストラクターの使用を禁止する。よほどのことがなければ使用しないこと。

コピーコンストラクターは暗黙的に無駄なコピーが発生する可能性があるためパフォーマンス上の問題がある。コピーではなくポインターやリファレンスを用いること。

また、デフォルトのコピーコンストラクター実装はメンバー変数のポインターの値をそのままコピーするため、デコンストラクターで二重に解放してしまう危険性がある。そのため、明示的にコピーコンストラクターを定義しない場合は無効にする。

よい例:

class MyClass
{
private:
  MyClass(const MyClass &);
}


悪い例(コピーコンストラクターを禁止していない):

class MyClass
{
}


悪い例(カスタムコピーコンストラクターを使っている):

class MyClass
{
  unsigned int age_;
  MyClass(const MyClass &object)
  {
    age_ = object.age_;
  }
}


基本的に定義したクラスの代入を禁止する。よほどのことがなければ使用しないこと。

代入演算子は暗黙的に無駄なコピーが発生する可能性があるためパフォーマンス上の問題がある。コピーではなくポインターやリファレンスを用いること。

また、デフォルトの代入演算子の実装はメンバー変数のポインターの値をそのままコピーするため、デコンストラクターで二重に解放してしまう危険性がある。そのため、明示的に代入演算子を定義しない場合は無効にする。

よい例:

class MyClass
{
private:
  MyClass &operator=(const MyClass &);
}


悪い例(代入を禁止していない):

class MyClass
{
}


悪い例(代入を使っている):

class MyClass
{
  unsigned int age_;
  MyClass &operator=(const MyClass &object)
  {
    age_ = object.age_;
    return *this;
  }
}


引数がない場合は void を省略する。

よい例:

class MyClass
{
  unsigned int age_;
  unsigned int age()
  {
     return age_;
  };
}


悪い例( void を省略していない):

class MyClass
{
  unsigned int age_;
  unsigned int age(void)
  {
     return age_;
  };
}


入力のみに用いる引数には const を付ける。これは、入力のみに用いる引数である事を明示するためと、間違って引数を変更してしまわないためである。

よい例:

class Table
{
  void insert(unsigned int id, const char *column_name, const char *value)
  {
     Record *record = records[i];
     Column *column = columns[column_name];
     column.set_value(value);
  }
}


悪い例(入力のみに用いているのに const が付いていない):

class Table
{
  void insert(unsigned int id, char *column_name, char *value)
  {
     Record *record = records[i];
     Column *column = columns[column_name];
     column.set_value(value);
  }
}


フラグやサイズなどを示す定数には const オブジェクトを用いる。これはデバッガー上でプログラムを走らせているときに名前で値を参照できるようにするためである。

よい例:

const char *MRN_LOG_FILE_PATH = "groonga.log";


悪い例( #define を用いている):

#define MRN_LOG_FILE_PATH "groonga.log"


真偽値には bool 型を用いる。

よい例:

bool is_searching;


悪い例( int 型を用いている):

int is_searching;


真偽値の値として truefalse はより完結で説明的だからである。

よい例:

bool is_searching = true;


悪い例( 0 以外の値を真の値として用いている):

bool is_searching = 1;


真偽値の値は boolean_value == true などとせず、 boolean_value として条件式に使用する。すでに真偽値の値を真偽値のリテラルと比較することは重複したコードだからである。

よい例:

boolean is_searching = true;
if (!is_searching) { ... }


悪い例(真偽値のリテラルと比較している):

boolean is_searching = true;
if (is_searching == false) { ... }


NULL かどうかを条件式に使う場合は value == NULL ではなく !value というように書く。多くの言語で NULL に相当する値(たとえばLispの nil )は偽を表すため、明示的に NULL と比較しなくても意図は伝わるからである。

よい例:

char *name = NULL;
if (!name) { ... }


悪い例( NULL と比較している):

char *name = NULL;
if (name == NULL) { ... }


CやC++では 0 は偽、 0 以外は真の値となるが、条件式に数値を使う場合は strcmp(...) == 0 などというように明示的に比較する。

C++では真偽値に bool を使うためこのような状況は発生しないが、C言語由来のAPIでは int で真偽値を表現している場合が多い。しかし、 int だけでは真偽値として使っているか本当に数値として使っているかがわかりにくいため、 int のときはすべて数値として扱う。

よい例:

if (memcmp(value1, value2, value_size) == 0) {
  printf("same value!\n");
}


悪い例( 0 を偽の値として扱っている):

if (!memcmp(value1, value2, value_size)) {
  printf("same value!\n");
}


無駄な処理を省くためにコンストラクターでのメンバー変数の初期化には初期化リストを用いる。初期化リストを用いないとコンストラクターの処理とコピーコンストラクター・代入処理が行われたりなど非効率である。(後述)

よい例:

class Table
{
  Table(const char *name);
  std::string name_;
}
Table::Table(const char *name) :
  name_(name)
{
}


悪い例( std::string(name) のところでコンストラクターが動き、 name_ = ... のところで代入演算子が動いて2回初期化している):

class Table
{
  Table(const char *name);
  std::string name_;
}
Table::Table(const char *name)
{
  name_ = std::string(name);
}


変数を宣言したときに同時に初期化する。宣言時に初期化せずに代入して初期化すると、無駄な処理が発生する可能性があるため非効率である。(後述)

よい例:

std::string name("users");


悪い例( std::string() のところでコンストラクターが動き、 name = ... のところで代入演算子が動いて2回初期化している):

std::string name;
name = std::string("users");


後置形式ではオブジェクトのコピーをしなければいけないため非効率である。そのため、できるだけ前置形式を用いる。

よい例( int だと効率は変わらないので本当はあんまりよい例ではない):

for (int i = 0; i < 10; ++i) {
}


悪い例(後置形式を用いている):

for (int i = 0; i < 10; ++i) {
}


Cスタイルのキャストはなんでもキャストできてしまうため、意図しないキャストにも気付かない可能性がある。例えば、単に const を外したいだけなのに、間違って違う型に変換していても気付けない。C++のキャストでは const を外したいときは const_cast を使用し、型を変換するときは static_cast を指定する。こうすれば、 static_cast で間違って const を外してしまっている場合も気付ける。 reinterpret_cast はどうしても必要なときのみ注意して使う。

よい例( const_cast を使っている):

uchar *to_key;
const ucahr *from_key;
KEY *key_info;
uint key_length;
key_copy(to_key, const_cast<uchar *>from_key, key_info, key_length);


よい例( static_cast を使っている):

int n_hits = 1;
int n_documents = 10;
float hit_ratio = (float)(n_hits) / n_documents;


よい例( static_cast では無理なので reinterpret_cast を使っている):

THD *thread = current_thd;
my_hash_delete(&mrn_allocated_thds, reinterpret_cast<uchar *>(thread));


悪い例(Cスタイルのキャストを使っている):

int n_hits = 1;
int n_documents = 10;
float hit_ratio = (float)(n_hits) / n_documents;


悪い例( static_cast で十分なのに reinterpret_cast を使っている):

void *value = get_value(key);
char *name;
name = reinterpret_cast<char *>(value);


Cと同様にポインタ型を示す * は型名ではなく変数名に寄せる。これは、以下のように複数の変数を一度に宣言したときに一貫性がなくなるためである。2つめ以降の変数は近くに型名がないため * を寄せる場所がない。

例:

char* key, *value;


同様に、リファレンス型を示す & も変数名に寄せる。

なお、 *& と型名の間にはスペースを入れない。

よい例:

char *key;


よい例:

bool is_exist(const std::string &file_name);


悪い例(型名に寄せている):

char* key;


ここに書いていないものについては特にスタイルを定めないが、 プロジェクト内で常に一貫性のあるスタイルを使用すること。 同じような事をするときは同じような書き方にすること。 複数の書き方で同じようなことを実現している場合は1つの方法に合わせること。



以下、具体例が必要。

  • ビルド時にできることを実行時に延ばさない(静的チェックを活用)
  • なるべく局所的に変数を定義し、同時に初期化する
  • 長い関数や深いブロックのネストを避ける
2つに分けた方がよさそう。あと目安があるといいかも。 100行以上は長いよねーとか3段以上はデンジャーとか。

  • 必要以上にオブジェクトを複製しない
  • 暗黙の型変換はなるべく避ける
  • assertを積極的に使う



  • 動的テンプレートを使わない。(要追加情報。implicit template instantiationのことであれば、これはふつうに使わ れているものなので特に禁止しないんでいいんじゃない説?)
  • typeidを使わない。
  • 例外はMySQLで問題がないようであればOK。Mroongaから外の世 界(MySQLの世界)にはださないようにする。



The Mroonga project uses Redmine for Ticket Driven Development.

The location of our Redmine site is the following.

http://redmine.groonga.org/projects/mroonga

Developments are done per ticket.

We would like to ask developers to register on the site above.

We develop based on the roadmap, and you can see our roadmap in the following page.

http://redmine.groonga.org/projects/mroonga/roadmap

We draw up our roadmap by discussing in off-line or on-line meetings.

The roadmap specifies the list of adding features in each version.

We manage the source code using Git on Github.

http://github.com/mroonga/mroonga

If you want to contribute, please make clone in read/write mode, and then commit and push.

If you still do not have your account on github.com, please register.

We develop Mroonga using the following procedures.
1.
Create a new ticket and add a description
2.
Discuss in the development meeting and update roadmap
3.
Assign the ticket, and the assignee starts development
4.
Implement features or fix bugs, then push to the repository if needed
5.
Add and run test codes, then push to the repository
6.
Review the implementation and tests, and go back to 4. if needed
7.
Add or update documents, then push to the repository
8.
Close the ticket

We welcome your ideas about new features or changes of specifications. Please create a ticket first and describe your idea there.

For adding and running test codes, please refer the description below.

We want to ask the assignee to handle whole the process, like design, implementation, test and documentations.

Then you ask other developers to review (especially in case of adding new features), and when it passes you add or update documents and the ticket can be closed.

For adding and updating documents, please refer the description below.

Currently we are assuming the following development environments.
  • Linux x86_64
  • glibc
  • MySQL
  • Groonga
  • Cutter (for C/C++ unit tests)
  • Sphinx (for documents)

There are just a few source files for now, and we would like to keep it simple as possible.
ha_mroonga.hpp
The header file of Mroonga.
ha_mroonga.cpp
The implementation of Mroonga.
mrnsys.hpp
The header file of utility functions
mrnsys.cpp
The implementation of utility functions
mysql-test/
The directory for tests by SQL

SQL test scripts are included in each test case 't/' directory, that are also the definition of currently available SQL statements.

The expected results of SQL tests are included in each test case 'r/' directory , that are also the definition of the current specification of supported SQL.

test/unit/
The directory for per-function unit tests by C/C++
doc/source/
English documents in Sphinx format

Since we are still in the early stage of the development of Mroonga, we will not make the documents of the specification of SQL queries for now.

Alternatively we consider our SQL tests and its expected results as the list of features and the definition of their specifications.

We use two kinds of regression tests to manage the quality of Mroonga.
SQL tests
When you add features or fix bugs that can be confirmed by SQL queries, please always add SQL tests. You might think that performance improvements cannot be confirmed with SQL, but some can be still well tested by using status variables or information_schema plugin etc.
C/C++ unit tests
They are function level regression tests using Cutter. When you add features that cannot differ SQL queries' results, like utility functions, please add tests here.

Before pushing to the repository, please always run the regression tests and confirm that you don't introduce any degradation.

You can invoke these two kinds of tests by "make check".

SQL tests are implemented as "sub test suite" for "mysql-test" in MySQL's regression tests. For the detail about how to add test cases or how to modify expected result files, please refer the following MySQL document.

http://dev.mysql.com/doc/mysqltest/2.0/en/index.html

For the detail about C/C++ unit tests, please refer the following Cutter document.

http://cutter.sourceforge.net/

We use Sphinx for the documentation of Mroonga.

Sphinx is a documentation generator which uses reStructuredText as its markup language, and converts reStructuredText files into HTML files.

reStructuredText uses .rst filename extensions.

We write documents in the reStructuredText format and convert them to HTML.

The source files of documents have .rst extension in the "doc/source" directory of the Mroonga repository.

When you add or update them, please try "make html" to confirm that there are no syntax errors.

Documents are published in http://mroonga.org. Since we are using GitHub Project Pages, the web site is updated when HTML files are pushed to the http://github.com/mroonga/mroonga.github.com repository.

We push added or updated HTML files to the repository after confirming the consistency between the documents and the current release version.

So you can just push to Mroonga repository to push documents for each ticket.

The details are as follows.

You don't need to install Sphinx by yourself because Mroonga clones the latest Sphinx from Sphinx repository automatically. You just need to install Mercurial.

In order to enables documentation generation, you should run configure script with "--enable-document" option as follows.

% ./configure --enable-document --with-mysql-source=(your mysql source directory)
% make
% make install


Write a document in reStructuredText format, and save the document with .rst extension in "doc/source" directory.

See reStructuredText (reST) concepts and syntax about how to write a document in reStructuredText format.

You can generate HTML files by the following command.

% cd doc/locale/en
% make html


You can confirm generated HTML document in web browser, after generating HTML files.

Here we show an example of using Firefox.

% firefox html/characteristic.html


  • Index
  • Module Index
  • Search Page

Mroonga Project

2009-2022, Mroonga Project
March 29, 2022 mrn_version

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.