Hailo - A pluggable Markov engine analogous to MegaHAL
This is the synopsis for using Hailo as a module. See hailo for command-line
invocation.
# Hailo requires Perl 5.10
use v5.10.0;
use Moose;
use Hailo;
# Construct a new in-memory Hailo using the SQLite backend. See
# backend documentation for other options.
my $hailo = Hailo->new;
# Various ways to learn
my @train_this = ("I like big butts", "and I can not lie");
$hailo->learn(\@train_this);
$hailo->learn($_) for @train_this;
# Heavy-duty training interface. Backends may drop some safety
# features like journals or synchronous IO to train faster using
# this mode.
$hailo->train("megahal.trn");
$hailo->train($filehandle);
# Make the brain babble
say $hailo->reply("hello good sir.");
# Just say something at random
say $hailo->reply();
Hailo is a fast and lightweight markov engine intended to replace AI::MegaHAL.
It has a pluggable storage, tokenizer and engine backends.
It is similar to MegaHAL in functionality, the main differences (with the
default backends) being better scalability, drastically less memory usage, an
improved tokenizer, and tidier output.
With this distribution, you can create, modify, and query Hailo brains. To use
Hailo in event-driven POE applications, you can use the POE::Component::Hailo
wrapper. One example is POE::Component::IRC::Plugin::Hailo, which implements
an IRC chat bot.
Hailo is a portmanteau of
HAL (as in MegaHAL) and failo
<http://identi.ca/failo>.
Hailo supports pluggable storage and tokenizer backends, it also supports a
pluggable UI backend which is used by the hailo command-line utility.
Hailo can currently store its data in either a SQLite, PostgreSQL or MySQL
database. Some NoSQL backends were supported in earlier versions, but they
were removed as they had no redeeming quality.
SQLite is the primary target for Hailo. It's much faster and uses less resources
than the other two. It's highly recommended that you use it.
See "Comparison of backends" in Hailo::Storage for benchmarks showing
how the various backends compare under different workloads, and how you can
create your own.
By default Hailo will use the word tokenizer to split up input by whitespace,
taking into account things like quotes, sentence terminators and more.
There's also a the character tokenizer. It's not generally useful for a
conversation bot but can be used to e.g. generate new words given a list of
existing words.
Hailo makes no promises about brains generated with earlier versions being
compatable with future version and due to the way Hailo works there's no
practical way to make that promise. Learning in Hailo is lossy so an accurate
conversion is impossible.
If you're maintaining a Hailo brain that you want to keep using you should save
the input you trained it on and re-train when you upgrade.
Hailo is always going to lose information present in the input you give it. How
input tokens get split up and saved to the storage backend depends on the
version of the tokenizer being used and how that input gets saved to the
database.
For instance if an earlier version of Hailo tokenized "foo+bar" simply
as "foo+bar" but a later version split that up into "foo",
"+", "bar", then an input of "foo+bar are my favorite
metasyntactic variables" wouldn't take into account the existing
"foo+bar" string in the database.
Tokenizer changes like this would cause the brains to accumulate garbage and
would leave other parts in a state they wouldn't otherwise have gotten into.
There have been more drastic changes to the database format itself in the past.
Having said all that the database format and the tokenizer are relatively
stable. At the time of writing 0.33 is the latest release and it's compatable
with brains down to at least 0.17. If you're upgrading and there isn't a big
notice about the storage format being incompatable in the
Changes file
your old brains will probably work just fine.
The name of the brain (file name, database name) to use as storage. There is no
default. Whether this gets used at all depends on the storage backend,
currently only SQLite uses it.
A boolean value indicating whether Hailo should save its state before its object
gets destroyed. This defaults to true and will simply call save at
"DEMOLISH" time.
See "in_memory" in Hailo::Storage::SQLite for how the SQLite backend
uses this option.
The Markov order (chain length) you want to use for an empty brain. The default
is 2.
A a short name name of the class we use for the engine, storage, tokenizer or ui
backends.
By default this is
Default for the engine,
SQLite for storage,
Words for the tokenizer and
ReadLine for the UI. The UI backend
is only used by the hailo command-line interface.
You can only specify the short name of one of the packages Hailo itself ships
with. If you need another class then just prefix the package with a plus
(Catalyst style), e.g. "+My::Foreign::Tokenizer".
A "HashRef" of arguments for engine/storage/tokenizer/ui backends. See
the documentation for the backends for what sort of arguments they accept.
This is the constructor. It accepts the attributes specified in
"ATTRIBUTES".
Takes a string or an array reference of strings and learns from them.
Takes a filename, filehandle or array reference and learns from all its lines.
If a filename is passed, the file is assumed to be UTF-8 encoded. Unlike
"learn", this method sacrifices some safety (disables the database
journal, fsyncs, etc) for speed while learning.
You can prove a second parameter which, if true, will use aggressive caching
while training, which will speed things up considerably for large inputs, but
will take up quite a bit of memory.
Takes an optional line of text and generates a reply that might be relevant.
Takes a string argument, learns from it, and generates a reply that might be
relevant. This is equivalent to calling learn followed by reply.
Tells the underlying storage backend to save its state, any arguments to this
method will be passed as-is to the backend.
Takes no arguments. Returns the number of tokens, expressions, previous token
links and next token links.
You can join the IRC channel
#hailo on FreeNode if you have questions.
Bugs, feature requests and other issues are tracked in Hailo's RT on rt.cpan.org
<https://rt.cpan.org/Dist/Display.html?Name=Hailo>
- •
- POE::Component::Hailo - A non-blocking POE wrapper around Hailo
- •
- POE::Component::IRC::Plugin::Hailo - A Hailo IRC bot plugin
- •
- <http://github.com/hinrik/failo> - Failo, an IRC bot that uses
Hailo
- •
- <http://github.com/bingos/gumbybrain> - GumbyBRAIN, a more famous
IRC bot that uses Hailo
- •
- Hailo::UI::Web - A Catalyst and jQuery powered web interface to Hailo
available at hailo.nix.is <http://hailo.nix.is> and as hailo-ui-web
<http://github.com/avar/hailo-ui-web> on GitHub
<http://github.com>
- •
- tweetmix <http://www.tweetmix.me/>, a random tweet generator powered
by Hailo
- •
- <http://github.com/pteichman/cobe> - cobe, a Python port of MegaHAL
"inspired by the success of Hailo"
- •
- hailo.org <http://hailo.org> - Hailo's website
- •
- <http://bit.ly/hailo_rewrite_of_megahal> - Hailo: A Perl rewrite of
MegaHAL, A blog posting about the motivation behind Hailo
- •
- <http://blogs.perl.org/users/aevar_arnfjor_bjarmason/hailo/> - More
blog posts about Hailo on Ævar Arnfjörð Bjarmason's
blogs.perl.org <http://blogs.perl.org> blog
- •
- Hailo on freshmeat <http://freshmeat.net/projects/hailo> and ohloh
<https://www.ohloh.net/p/hailo>
Hinrik Örn Sigurðsson, hinrik.sig@gmail.com
Ævar Arnfjörð Bjarmason <avar@cpan.org>
Copyright 2010 Hinrik Örn Sigurðsson and Ævar
Arnfjörð Bjarmason <avar@cpan.org>
This program is free software, you can redistribute it and/or modify it under
the same terms as Perl itself.