|dissociate($input, $group_size, $max)||
The function dissociate takes three parameters:
$input is the input string, hopefully containing a stretch of (plaintext) text in a human language, encoded either in just plain US-ASCII, or in a character-encoding your locale settings know about. $output will be dissociated text (charmingly generated gibberish) based on that input text. (Note that output will contain no line-breaks or tabs. Yoy may wish, as dissociate_filter does, to pass the output thru Text::Wraps wrap.)
Youll get strange output if $input contains markup (HTML, LaTeX, etc.), or is very short, or is not in a human language.
$group_size is the number of tokens (words or characters) that must be in common between bits of text the dissociation algorithm skips between. A positive value means you want to dissociate by character, with a group-size of that many characters (4 = 4 characters); a negative value means you want to dissociate by word, with a group size of that many words (-2 = 2 words). I suggest values between -3 and 5; Im a fan of -2. A $group_size value of 0 or 1 is invalid, and currently causes dissociate to use the default value of 2 (2 characters) instead. A value of -1 is invalid, and currently causes dissociate to use the value of -2 (2 words) instead. The behavior/validity of $group_size values of 0, 1, or -1 may change in future versions.
$max is a parameter used to control the maximum number of iterations of dissociates central loop it corresponds roughly to the number of chunks of text you get back, where a chunk is N * -$group_size words for negative values of $group_size, and N * $group_size characters for positive values of $group_size. $max must be greater than 1.
If you need (!) more precise control over the size of the output text, try setting set $max high and trim the output to size, and/or try calling dissociate multiple times until you get the amount of output you want. (But be sure to give up if dissociate keeps returning nullstring, as it will in some strange cases.)
dissociate can also be called with the following syntaxes:
This library also provides the procedure dissociate_filter, which
pulls input from <> (files specified on the command line, or STDIN),
and sends dissociated output to STDOUT. It can be called with these
These above-mentioned default values can come from command line switches, if you make a script consisting of:
and call that script, say, dissociate, and call it as:
and so on.
To explain the switches:
-w[number] specifies a by-word dissociation with that number of words as the group size, -c[number] specifies a by-character dissociation with that number of characters as the group size, -m[number] specifies a default for $max.
If you dont specify a default for $group_size or $max, $group_size defaults to 100 and $max defaults to 2 (characters).
This module has to search the input string by performing regexp searches on it. In the current version of this module, control over compilation of regular expressions may not be not optimally efficient. Perl 5.005 provides options to better control regexp compilation; once Perl 5.005 is in wider use, I may come out with a new version of Games::Dissociate requiring Perl 5.005 or later, using these new regexp compilation control features.
If you feed this module a lot of text (over 50K, say), it will indeed get very slow (notably with by-word dissociation), since that whole chunk of text has to be searched over and over and over.
If you have an idea for making this module more efficient, feel free to email it to me.
When dealing with text in heavily inflected languages (like Finnish lots of unique word endings, frequently used), this module will require longer input text to produce interesting results for by-word dissociation, compared to relatively inflection-poor languages like English.
For text written with no inter-word spacing (often the case with Thai, for example), theres no way for this module to tell where the word breaks are in such cases, use only the by-character mode.
The current version of this library assumes /./ matches a single character, for by-character dissociation; and, for by-word dissociation, that /\w+/ matches whole words and /\W+/ matches non-word strings. These are locale-dependent functions, and Games::Dissociate has a use locale in it, hopefully triggering correct behavior for your favorite locale, language, and character-encoding. Consult perllocale and locale for more information on locales.
I have found use locale to do unwelcome things (like unceremoniously dumping core) on a few very strange, very old (and otherwise barely-working) machines. If this is a problem for you, or if you dont plan to use locales, comment out the use locale in the Games::Dissociate source code.
The treatment of locales and support for them may change in future versions of this module, depending on how future Perl versions shape up, particularly in their support of Unicode.
This library uses rand extensively, but never calls srand. If youre getting the same dissociated output all the time, then youre using an old (pre-5.004) version of Perl that doesnt do implicit randomness seeding just call srand();, maybe right after you say use Games::Dissociate;
* Emacss dissociate.el (written circa 1985?).
Copyright (c) 1998-2001, Sean M. Burke. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
Its just a toy.
Current maintainer Avi Finkel firstname.lastname@example.org; Original author Sean M. Burke <email@example.com>
|perl v5.20.3||GAMES::DISSOCIATE (3)||2007-08-21|