GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  DIABLO (8)

NAME

diablo - NetNews daemon for backbone article transit

CONTENTS

Synopsis
Description
Logging
Concepts

SYNOPSIS

diablo [ -A newsadminname ] [ -B ip/hostname[:port] ] [ -c commonpathname ] [ -d[n] ] [ -e pctimeout ] [ -F filterpath ] [ -h reportedhostname ] [ -M maxforkper ] [ -P port ] -p newspathname/0 [ -R rxbufsize ] [ -S[Bn[sn]][Nn[sn]] ] [ -s argv-buffer-space-for-ps-status ] [ -T txbufsize ] [ -X xrefhost ] [ -x ] server

DESCRIPTION

Diablo is an internet news backbone storage and transit server. Diablo sits on the NNTP port of the machine and accepts inbound news articles from Innd or Diablo based servers... really anything that can run innxmit or the newslink. Diablo stores the articles and handles the queueing for outbound feeds. Queue files are in an dnewslink compatible format and dnewslink is supplied with the distribution. Diablo is about 10-20 times as efficient as Innd when dealing with inbound traffic, mainly due to the fact that it is a forking server. Diablo’s memory footprint of less then a megabyte is tiny compared to innd. Diablo was initially written by Matt Dillon over a weekend and has grown from there.

Many of the options below can be configured in diablo.config.

-A newsadminname

The news administrator email address that is reported in the banner message for new connections. This defaults to ‘‘news@hostname’’.

-B ip/hostname[:port]

Specify the IP address or hostname for an interface that diablo should sit on. The port can be specified after a ’:’. The default is all interfaces.

-c commonpathname sets the common path name. The path is prepended to the Path: header only if it does not already exist in the Path: header. Usually both -p and -c options are used. The newspathname is placed in front of the commonname in this case (assuming it does not exist elsewhere in the path), as in wil be added (along with -p) in the order specified on the command-line.

-d[n] turns on debugging. Specifying a number increases the debug level.

-e pctimeout sets the precommit cache timeout. The default is 30 seconds. Setting it to 0 disables the precommit cache. The precommit cache is a check/ihave message-id lockout used to prevent simultanious article reception of the same article. The first client to send a check for a message-id wins. Other clients will get a dup or reject return code for that message-id for 30 seconds.

-F path

Specifies the path to the external spamfilter. The path must be fully qualified.

-h reportedhostname

Diablo calls gethostbyname() to set the hostname it reports on connect. On some systems this will not necessarily be what you want so you can override it with the -h option.

-M maxforkper

Set the maximum number of simultanious connections from any given remote host. For example, if you set this to 10, each of your feeds will be allowed to make up to 10 simultanious connections to you. The default is 0 (unlimited)

-P port

Specify the port that the diablo should sit on. The default is 119. This is commonly used to run a server on a different port (say, 434) so you can run a reader on the main port.

-p newspathname

-p0 sets the domain name to prepend to the Path: header. This option is required. If you specify -p0, diablo will not insert anything into the Path: header, i.e. when you use Diablo as a bridge rather then a full router. The use of -p0 is NOT RECOMMENDED. Also note that ipaddress.MISMATCH Path: elements will still be added in either case if the first element of the Path: on the incoming article does not match an alias in the appropriate dnewsfeeds file entry. Multiple -p options may be used and will be added (along with -c) options in the order specified on the command-line. The last -p option is used for Xref: generation.

-R rxbufsize

Set the TCP receive buffer size.

-S[Bn[sn]][Nn[sn]

This option enables the internal spamfilter and will override the an ISPAM entry in dnewsfeeds before any articles will reach the filter even if it is enabled here. The ISPAM entry determines which articles are sent to the internal filter. There are two internal filters available:

Duplicate body detection NNTP-Posting-Host rate detection

Each type is enabled with a different option, which also sets the trip value for that type.

Bn - enables duplicate body detection and sets the number of allowed duplicates before further articles are rejected.

Nn - enables the NNTP-Posting-Host rate detection and the number specifies how many duplicate hosts are allowed in an hour before extra articles from that host are rejected.

en - set the expire time (in seconds) for the previous

sn - set the number of entries in the filter hash table to n for the previous ’B’ or ’N’ optiopn. The size must be a power of 2. Default is 65536 entries.

Both types of filters also make a note of the number of lines in the body of the article to reduce the possibility of false duplicates.

Use of this option causes the creation of 2 files in path_db that are used to store the filter hash tables.

e.g: B6s32768 N16 would set the body filter trip to 6, with a hash table size of 32768 entries and the nph filter trip to 16 with the default hash table size of 65536.

The default is disabled (B0 N0).

The spam filter utilizes a fixed-size hash table cache and rate-limits postings with the same number of lines from the same NNTP-Posting-Host: source or with the same body hash. If the rate exceeds n articles over a period of e seconds, all further matching articles will be rejected.

-s argvbufferspace Generally used to reserve buffer space so diablo can generate a real time status in its argv that the system’s ps command can read. This does not work with all operating systems.

-T txbufsize

Set the TCP transmit buffer size. A minimum size of 4K is imposed to guarentee lockup-free operation in streaming mode.

-X xrefhost sets the XRef: hostname used when generating Xref: lines. The default is to use the newspathname if Xref: generation is enabled.

-x

If active file is enabled and we are not an Xref: slave, then use the Xref: line to update the NX field in the active file. This is useful for a backup to the Xref: generator in large installations.

Diablo understands a subset of the NNTP protocol. The basic commands it understands are ihave, check, and takethis. Diablo also understands stat, head, mode stream, and quit. Diablo also implements a number of commands to support remotely configured newsfeeds files on a site-by-site basis. These are feedrset, feedadd, feeddel, and feedcommit. Remote sites may query the state of outgoing feeds directed to them with the outq command.

Diablo is strictly a news holding and transit server. It does not maintain a newsgroup , or active file, and it does not store articles in a hierarchy based on the group name. Diablo stores files in a hierarchy based on the time received and a randomly generated iteration number. A new directory is created every 10 minutes and each incomming connection creates its own file. Multiple articles may be stored in each file. Connections that last more then 10 minutes will close their current file and reopen a new one in the new directory. Diablo also maintains a history database, called dhistory, which references articles based on their hash code and stores reception date and expiration information. The history database is headed by a four million entry hash table then followed by linked lists of History structures in a machine-readable (but not human-readable) format. It should be noted that the Message-ID is not stored anywhere but in the article and in the outbound feed queue files. If two different Message-IDs wind up with the same hash code, one of the articles will be lost. Given a (as of this writing) full feed of 250,000 articles a day, a maximum lifetime of 16 days, and 62 significant bits in the hash code, collisions will statistically occur only once every 4 billion articles or so. This is the price for using Diablo, and I consider it a minor one.

The file names also have an iteration tagged onto the end. The iteration is used to group files within a 10 minute-span directory. If an article collision on input occurs, whichever diablo process missed the history commit will remove the data associated with the article from its spool file.

Critical-path operations in diablo are extremely efficient due to the time-locality for most of its operations. From a time-local point of view, files are created in the same reasonably-sized directory. The diablo expiration program, dexpire , does not rewrite the history file (see didump and diload for that), Instead it simply scans it, removes expired files, and updates the history file in-place to indicate the fact. There are no softlinks, because the spool is not based on the group name(s). Cleaning the spool directory is trivial because, frankly, there aren’t many files in it. In a very heavily loaded system, approximately 80 files are created every 10 minutes. The only real random access is the history file itself. Due to the fixed-length records, dhistory is around 1/2 the size of a typical INN history file. Since there is no active or newsgroups file to maintain, no renumbering mechanism is required. Diablo forks for each inbound connection allowing history file lookups and file creates to run in parallel. Diablo uses true record locking for history database updates and none at all for lookups.

Finally, being strictly transit in nature, Diablo does not attempt to act on the contents of the message... For example, control messages are ignored, and Diablo makes no header modifications except to the Path: header and to remove the Xref: header, if it exists. The source of the feed is expected to generate a properly formatted article, and very little article checking is done until after the article has propogated to a newsreader site (beyond Diablo). Any content-specific action which you wish to support must be dealt with through an external medium using the outbound feed mechanism.

Diablo’s dexpire program uses a dynamic expiration mechanism whereby you give it a free-space goal and it scales the dexpire.ctl expirations accordingly to reach that goal. It should be noted that the expiration is stored in the history file at the time of article reception and NOT calculated when dexpire is run. The dexpire.ctl file has a number of features that allow you to scale the expiration based on the number of cross posts and message-size, and to reject messages for certain groups that are too large.

Diablo maintains a pipe between each forked child and the master acceptor server and has a mechanism which may be used to issue commands to the running system. The master acceptor server handles all outbound feed file queueing which makes feed file flushing a very simple command to issue. You may also request the master server to exit, which propogates to the forked slaves and guarentees that all outbound feed files have been flushed. The program used to issue commands to Diablo is called dicmd , and is generally run with flush or exit as an argument. Queue file flushing works in a manner similar to Innd in that you are supposed to rename the server-created queue file and then flush it with dicmd , but unlike Innd, the file is not wiped out if you do not rename it. It is instead reopened for append. Diablo includes two separate programs, dspoolout and dnewsfeed which do queue file sequencing, management, outbound feeds, and trimming. The dnewsfeed program can run the non-streaming ihave protocol, or can run the streaming check/takethis protocol. It dynamically figures out what the remote end can handle. It should be noted that Diablo can run all of its commands fully streamed, not just the check/takethis protocol.

CRON JOBS

Typically you set up a number of cron jobs to support the running Diablo server.

dspoolout -s 9 , should generally be run every 5 minutes. The -s argument should generally be 2x-1 the cron interval, see the manual page for dspoolout for more information.

dexpire -r2000 , should generally be run every 4 hours. -rFREESPACE tells dexpire to remove files until the free-space target, in megabytes, is reached. In this example, we have a 2GB free space target. Once your system has stabilized, you can reduced this to 1GB safely, and less if you are not taking a full feed. It should roughly be equivalent to 5% of your available news spool space. You may have to run dexpire more often with tighter free-space margins.

The adm/biweekly.atrim script should generally be run twice a week. The script shuts down the diablo server, then renames and rewrites the dhistory file using a combination of didump and diload to remove expired entries over 16 days old. The dhistory file is typically about 1/2 the size of an INN history file for a full feed, so it is not necessary to run this script more then once a week. Diablo must be shut down during this procedure to prevent appends to the older version of the history file from occuring.

adm/daily.atrim , To rotate the log files in the log/ directory. If you are using syslog to generate a /var/log/news or other log files, you need to have appropriate crontab entries to rotate them as well.

LOGGING

Diablo syslog’s to NEWS. It typically generates both per-connection statistics and global statistics.

The per-connection statistics are made up of two lines. Each line contains key=value pairs as described below.

secs - elapsed time of connection

ihave - number of IHAVE nntp commands received

chk - number of CHECK nntp commands received

rec - number of articles received from remote

rej - of the received articles, the number rejected

predup - number of duplicate articles via takethis determined to be duplicates prior to the first byte of the article being received.

posdup - (meaningless)

pcoll - pre-commit cache collision. Typically indicates that either a history collision occurs against some other article simultaniously in-transit or that a history collision occured with recently received message-ids.

spam - number of articles determined to be spam by the spam filter

err - number of errors that occured. Typically protocol errors

added - of the received articles, the number committed to the spool

bytes - number of bytes committed to the spool

The second statistics line contains key-value pairs as shown below.

acc - number of articles accepted

ctl - of the accepted articles, how many were control messages

failsafe - rejected due to failsafe, typically means that the spool directory structure got messed up.

misshdrs - rejected due to missing required headers. Can also occur when the feeder sends an empty article ( typically occurs when the feeder cannot find the article in its spool ).

tooold - rejected for being too old.

grpfilt - rejected due to the incoming group filter for this feed in dnewsfeeds.

spamfilt - rejected due to the spam filter

earlyexp - of the articles received, the number that have been accepted but will be expired early, usually due to dexpire.ctl.

instantexp - rejected because dexpire.ctl indicated that the article would expire instantly.

notinactv - rejected because none of the newsgroups are in the active file ( if you have ’activedrop’ set in diablo.config ).

ioerr - rejected due to an I/O or other abnormal error

The global statistics are logged by the master diablo process and include the key-value pairs shown below.

uptime - total uptime in hours and minutes.

arts - total number of articles accepted

bytes - total number of bytes accepted

fed - aggregate number of articles queued to outgoing feeds

CONCEPTS

The Diablo system employs a number of concepts to attain high throughput and efficiency. Some, like the fork()ing server, are obvious. Others are not so obvious.

The history file consists of a chained hash table with a four-million-entry base array. History entries form a linked list relative to their base index, which is itself calculated through a hashing function. When new history entries are added, they are physically appended to the file but logically inserted at the base of the appropriate linked list, NOT at the end. What this means is that certain programs such as dexpire , which scan the history file linearly rather then follow the chains, generally wind up accessing files grouped by directory. This is very efficient. Searches, however, run through the chains and thus scan the chain in reverse-time order, with the most recent entries scanned first. While this hops through the history file (you hop through it anyway), it is well optimized by the fact that (a) the hash table array is so large, and (b) it is likely to be looking up more recently received articles and thus likely to hit them first. Searches for which failures are expected only have the advantage of (a), but I had to compromise somewhere.

The spool directory itself is organized by time-received. It is explicitly NOT organized by the Date: field or by group. A new directory is created every 10 minutes, and in a heavily loaded system does not generally contain more then 80 or so spool files, each containing multiple articles. Inbound articles have the advantage of being appended to open descriptors as well as being readily cacheable and in time-proximity localized directories, and outbound articles have the same advantage. Even when some of your feeds get behind, per-process accesses are readily cacheable and the kernel can generally survive the partitioning effect. This is quite unlike standard INN spool management which bounces files all over the group hierarchy and makes article adds and accesses almost random.

Direct access to the articles is supported by looking the article up in the history file. The history file contains the time-received and that combined with the iteration id, a byte offset, and byte count, allows you to access the physical article.

SIGNALS

Sending a USR1 signal to Diablo will enable debugging. Diablo will output debug info for each received article and will indicate the reason for any rejection. More USR1’s bump up the debug level. A single USR2 signal will set the debug level back to 0. It is suggested that signals only be sent to child processes and never the parent Diablo.

TYPICAL PERFORMANCE, TUNING SUGGESTIONS

See the KERNEL_NOTES file for tuning suggestions and machine-specific configurations.

SEE ALSO

diablo(8), dicmd(8), didump(8), diload(8), dnewslink(8), doutq(8), dexpire(8), dexpireover(8), diconvhist(8), dilookup(8), dspoolout(8), dkp(8), dpath(8), diablo-kp(5), diablo-files(5)

Search for    or go to Top of page |  Section 8 |  Main Index


DIABLO (8) -->

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.