Diablo is an internet news backbone storage and transit server. Diablo sits
on the NNTP port of the machine and accepts inbound news articles from
Diablo based servers... really anything that can run innxmit or the newslink.
Diablo stores the articles and handles the queueing for outbound feeds.
Queue files are in an dnewslink compatible format and dnewslink
is supplied with the distribution. Diablo is about 10-20 times as efficient
as Innd when dealing with inbound traffic, mainly due to the fact that it
is a forking server. Diablos memory footprint of less then a megabyte is
tiny compared to innd. Diablo was initially written by Matt Dillon over
a weekend and has grown from there.
Many of the options below can be configured in diablo.config.
The news administrator email address that is reported in the banner
message for new connections. This defaults to news@hostname.
Specify the IP address or hostname for an interface that diablo should
sit on. The port can be specified after a :. The default is all interfaces.
-c commonpathname sets the common path name. The path is prepended to the Path: header only
if it does not already exist in the Path: header. Usually both -p and -c
options are used. The newspathname is placed in front of the commonname
in this case (assuming it does not exist elsewhere in the path), as in
wil be added (along with -p) in the order specified on the command-line.
-d[n] turns on debugging. Specifying a number increases the debug level.
-e pctimeout sets the precommit cache timeout. The default is 30 seconds. Setting it to 0
disables the precommit cache. The precommit cache is a check/ihave message-id
lockout used to prevent simultanious article reception of the same article.
The first client to send a check for a message-id wins. Other clients will
get a dup or reject return code for that message-id for 30 seconds.
Specifies the path to the external spamfilter. The path must be fully
Diablo calls gethostbyname() to set the hostname it reports on connect. On
some systems this will not necessarily be what you want so you can override
it with the -h option.
Set the maximum number of simultanious connections from any given remote
host. For example, if you set this to 10, each of your feeds will be allowed
to make up to 10 simultanious connections to you. The default is 0 (unlimited)
Specify the port that the diablo should sit on. The default is 119. This
is commonly used to run a server on a different port (say, 434) so you can
run a reader on the main port.
-p0 sets the domain name to prepend to the Path: header. This option
is required. If you specify -p0, diablo will not insert anything into
the Path: header, i.e. when you use Diablo as a bridge rather then a full
router. The use of -p0 is NOT RECOMMENDED. Also note that ipaddress.MISMATCH
Path: elements will still be added in either case if the first element of the
Path: on the incoming article does not match an alias in the appropriate
dnewsfeeds file entry. Multiple -p options may be used and will be added
(along with -c) options in the order specified on the command-line. The
last -p option is used for Xref: generation.
Set the TCP receive buffer size.
This option enables the internal spamfilter and will override the
an ISPAM entry in dnewsfeeds before any articles will reach the
filter even if it is enabled here. The ISPAM entry determines which
articles are sent to the internal filter. There are two internal
Duplicate body detection
NNTP-Posting-Host rate detection
Each type is enabled with a different option, which also sets
the trip value for that type.
Bn - enables duplicate body detection and sets the number
of allowed duplicates before further articles are rejected.
Nn - enables the NNTP-Posting-Host rate detection and the
number specifies how many duplicate hosts are allowed
in an hour before extra articles from that host are
en - set the expire time (in seconds) for the previous
sn - set the number of entries in the filter hash table to
n for the previous B or N optiopn. The size must be
a power of 2. Default is 65536 entries.
Both types of filters also make a note of the number of lines in
the body of the article to reduce the possibility of false
Use of this option causes the creation of 2 files in path_db
that are used to store the filter hash tables.
e.g: B6s32768 N16 would set the body filter trip to 6, with
a hash table size of 32768 entries and the nph filter trip to 16
with the default hash table size of 65536.
The default is disabled (B0 N0).
The spam filter utilizes a fixed-size hash table cache and rate-limits
postings with the same number of lines from the same NNTP-Posting-Host:
source or with the same body hash. If the rate exceeds n articles
over a period of e seconds, all further matching articles will be
-s argvbufferspace Generally used to reserve buffer space so diablo can generate a real time
status in its argv that the systems ps command can read. This does not
work with all operating systems.
Set the TCP transmit buffer size. A minimum size of 4K is imposed to
guarentee lockup-free operation in streaming mode.
-X xrefhost sets the XRef: hostname used when generating Xref: lines. The default
is to use the newspathname if Xref: generation is enabled.
If active file is enabled and we are not an Xref: slave, then use the
Xref: line to update the NX field in the active file. This is useful
for a backup to the Xref: generator in large installations.
Diablo understands a subset of the NNTP protocol. The basic commands it
ihave, check, and
takethis. Diablo also understands
stat, head, mode stream, and
quit. Diablo also implements a number of commands to support remotely configured
newsfeeds files on a site-by-site basis. These are
feedrset, feedadd, feeddel, and
feedcommit. Remote sites may query the state of outgoing feeds directed to them with
Diablo is strictly a news holding and transit server. It does not maintain
newsgroup , or
active file, and it does not store articles in a hierarchy based on the group name.
Diablo stores files in a hierarchy based on the time received and a randomly
generated iteration number. A new directory is created every 10 minutes
and each incomming connection creates its own file. Multiple articles may
be stored in each file. Connections that last more then 10 minutes will
close their current file and reopen a new one in the new directory.
Diablo also maintains a history database, called
dhistory, which references articles based on their hash code and stores reception
date and expiration information. The history database is headed by a
four million entry hash table then followed by linked lists of
History structures in a machine-readable (but not human-readable) format. It
should be noted that the Message-ID is not stored anywhere but in the
article and in the outbound feed queue files. If two different Message-IDs
wind up with the same hash code, one of the articles will be lost.
Given a (as of this writing) full feed of 250,000 articles a day, a
maximum lifetime of 16 days, and 62 significant bits in the hash code,
collisions will statistically occur only once every 4 billion articles or
so. This is the price for using Diablo, and I consider it a minor one.
The file names also have an iteration tagged onto the end. The iteration
is used to group files within a 10 minute-span directory. If an article
collision on input occurs, whichever diablo process missed the history
commit will remove the data associated with the article from its spool file.
Critical-path operations in diablo are extremely efficient due to the
time-locality for most of its operations. From a time-local point of
view, files are created in the same reasonably-sized directory. The
diablo expiration program,
dexpire , does not rewrite the history file (see
diload for that), Instead it simply scans it, removes
expired files, and updates the history file in-place to indicate the fact.
There are no softlinks, because the spool is not based on the group name(s).
Cleaning the spool directory is trivial because, frankly, there arent
many files in it. In a very heavily loaded system, approximately 80 files
are created every 10 minutes. The only real random access is the
history file itself. Due to the fixed-length records,
dhistory is around 1/2 the size of a typical INN history file. Since there is no
active or newsgroups file to maintain, no renumbering mechanism is required.
Diablo forks for each inbound connection allowing history file lookups and
file creates to run in parallel. Diablo uses true record locking for
history database updates and none at all for lookups.
Finally, being strictly transit in nature, Diablo does not attempt to act on
the contents of the message... For example, control messages are ignored,
and Diablo makes no header modifications except to the
Path: header and to remove the
Xref: header, if it exists. The source of the feed is expected to generate a
properly formatted article, and very little article checking is done until
after the article has propogated to a newsreader site (beyond Diablo).
Any content-specific action which you wish to support must be dealt with
through an external medium using the outbound feed mechanism.
dexpire program uses a dynamic expiration mechanism whereby you give it a
free-space goal and it scales the
dexpire.ctl expirations accordingly to reach that goal. It should be noted that
the expiration is stored in the history file at the time of article
reception and NOT calculated when
dexpire is run. The
dexpire.ctl file has a number of features that allow you to scale the expiration based
on the number of cross posts and message-size, and to reject messages for
certain groups that are too large.
Diablo maintains a pipe between each forked child and the master
acceptor server and has a mechanism which may be used to issue commands
to the running system. The master acceptor server handles all outbound
feed file queueing which makes feed file flushing a very simple command
to issue. You may also request the master server to exit, which propogates
to the forked slaves and guarentees that all outbound feed files have
been flushed. The program used to issue commands to Diablo is called
dicmd , and is generally run with
exit as an argument. Queue file flushing works in a manner similar to Innd in
that you are supposed to rename the server-created queue file and then
flush it with
dicmd , but unlike Innd, the file is not wiped out if you do not rename it. It
is instead reopened for append. Diablo includes two separate programs,
dnewsfeed which do queue file sequencing, management, outbound feeds, and trimming.
dnewsfeed program can run the non-streaming
ihave protocol, or can run the streaming
check/takethis protocol. It dynamically figures out what the remote end can handle. It
should be noted that Diablo can run all of its commands fully streamed,
not just the
Diablo syslogs to NEWS. It typically generates both per-connection statistics
and global statistics.
The per-connection statistics are made up of two lines. Each line contains
key=value pairs as described below.
secs - elapsed time of connection
ihave - number of IHAVE nntp commands received
chk - number of CHECK nntp commands received
rec - number of articles received from remote
rej - of the received articles, the number rejected
predup - number of duplicate articles via takethis determined to be duplicates
prior to the first byte of the article being received.
posdup - (meaningless)
pcoll - pre-commit cache collision. Typically indicates that either a history
collision occurs against some other article simultaniously in-transit or that
a history collision occured with recently received message-ids.
spam - number of articles determined to be spam by the spam filter
err - number of errors that occured. Typically protocol errors
added - of the received articles, the number committed to the spool
bytes - number of bytes committed to the spool
The second statistics line contains key-value pairs as shown below.
acc - number of articles accepted
ctl - of the accepted articles, how many were control messages
failsafe - rejected due to failsafe, typically means that the spool directory
structure got messed up.
misshdrs - rejected due to missing required headers. Can also occur when the
feeder sends an empty article ( typically occurs when the feeder cannot find
the article in its spool ).
tooold - rejected for being too old.
grpfilt - rejected due to the incoming group filter for this feed in dnewsfeeds.
spamfilt - rejected due to the spam filter
earlyexp - of the articles received, the number that have been accepted but
will be expired early, usually due to dexpire.ctl.
instantexp - rejected because dexpire.ctl indicated that the article would
notinactv - rejected because none of the newsgroups are in the active file
( if you have activedrop set in diablo.config ).
ioerr - rejected due to an I/O or other abnormal error
The global statistics are logged by the master diablo process and include
the key-value pairs shown below.
uptime - total uptime in hours and minutes.
arts - total number of articles accepted
bytes - total number of bytes accepted
fed - aggregate number of articles queued to outgoing feeds