Mail::SpamAssassin::Plugin::Bayes - determine spammishness using a
Bayesian classifier
This is a Bayesian-style probabilistic classifier, using an
algorithm based on the one detailed in Paul Graham's A Plan For Spam
paper at:
http://www.paulgraham.com/spam.html
It also incorporates some other aspects taken from Graham
Robinson's webpage on the subject at:
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
And the chi-square probability combiner as described here:
http://www.linuxjournal.com/print.php?sid=6467
The results are incorporated into SpamAssassin as the BAYES_*
rules.
- bayes_stopword_languages
lang (default: en)
- Languages enabled in bayes stopwords processing, every language have a
default stopwords regexp, tokens matching this regular expression will not
be considered in bayes processing.
Custom regular expressions for additional languages can be
defined in "local.cf".
Custom regular expressions can be specified by using the
"bayes_stopword_lang" keyword like in
the following example:
bayes_stopword_languages en se
bayes_stopword_en (?:you|me)
bayes_stopword_se (?:du|mig)
Regexps are case-insensitive will be anchored automatically at
beginning and end.
To disable stopwords usage, specify
"bayes_stopword_languages
disable".
Only one bayes_stopword_languages or bayes_stopword_xx
configuration line can be used. New configuration line will override the
old one, for example the ones from SpamAssassin default ruleset
(60_bayes_stopwords.cf).
- bayes_max_token_length
(default: 15)
- Configure the maximum number of character a token could contain