GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  RWSPLIT (1)

.ds Aq ’

NAME

rwsplit - Divide a SiLK file into a (sampled) collection of subfiles

CONTENTS

SYNOPSIS



  rwsplit --basename=BASENAME
        { --ip-limit=LIMIT | --flow-limit=LIMIT
          | --packet-limit=LIMIT | --byte-limit=LIMIT }
        [--seed=NUMBER] [--sample-ratio=SAMPLE_RATIO]
        [--file-ratio=FILE_RATIO] [--max-outputs=MAX_OUTPUTS]
        [--note-add=TEXT] [--note-file-add=FILE]
        [--compression-method=COMP_METHOD]
        [--print-filenames] [--site-config-file=FILENAME]
        [--xargs[=FILE] | FILE [FILES...]]

  rwsplit --help

  rwsplit --version



DESCRIPTION

rwsplit reads SiLK Flow records from the standard input or from files named on the command line and writes the flows into a set of subfiles based on the splitting criterion. In its simplest form, rwsplit partitions the file, meaning that each input flow will appear in one (and only one) of the subfiles.

In addition to splitting the file, rwsplit can generate files containing sample flows. Sampling is specified by using the --sample-ratio and --file-ratio switches.

rwsplit reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified and --xargs is not present. To read the standard input in addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as it is read. When the --xargs switch is provided, rwsplit will read the names of the files to process from the named text file, or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line.

If you wish to use the size of the output files as the splitting criterion, use the --flow-limit switch. The paramater to this switch should be the size of the desired output files divided by the record size. The record size can be determined by rwfileinfo(1). When the output files are compressed (see the description of --compression-method below), you should assume about a 50% compression ratio.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

The splitting criterion is defined using one of the limit specifiers; one and only one must be specified. They are:
--ip-limit=LIMIT Close the current subfile and begin a new subfile when the count of unique source and destination IPs in the current subfile meets or exceeds LIMIT. The next-hop-IP does not count toward LIMIT.
--flow-limit=LIMIT Close the current subfile and begin a new subfile when the number of SiLK Flow records in the current subfile meets LIMIT.
--packet-limit=LIMIT Close the current subfile and begin a new subfile when the sum of the packet counts across all SiLK Flow records in the current subfile meets or exceeds LIMIT.
--byte-limit=LIMIT Close the current subfile and begin a new subfile when the sum of the byte counts across all SiLK Flow records in the current subfile meets or exceeds LIMIT. This switch does not specify the size of the subfiles.
The other switches are:
--basename=BASENAME Specifies the basename of the output files; this switch is required. The flows are written sequentially to a set of subfiles whose names follow the format BASENAME.ORDER.rwf, where ORDER is an 8-digit zero-formatted sequence number (i.e., 00000000, 00000001, and so on). The sequence number will begin at zero and increase by one for every file written, unless --file-ratio is specified,
--seed=NUMBER Use NUMBER to seed the pseudo-random number generator for the --sample-ratio or --file-ratio switch. This can be used to put the random number generator into a known state, which is useful for testing.
--sample-ratio=SAMPLE_RATIO Writes one flow record, chosen at random, from every SAMPLE_RATIO flows that are read.
--file-ratio=FILE_RATIO Picks one subfile, chosen from random, out of every FILE_RATIO names generated, for writing to disk.
--max-outputs=NUMBER Limits the number of files that are written to disk to NUMBER.
--note-add=TEXT Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
--compression-method=COMP_METHOD Specify how to compress the output. When this switch is not given, the output files are compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
none Do not compress the output using an external library.
zlib Use the zlib(3) library for compressing the output. Using zlib produces the smallest output files at the cost of speed.
lzo1x Use the lzo1x algorithm from the LZO real time compression library for compression. This compression provides good compression with less memory and CPU overhead.
best Use lzo1x if available, otherwise use zlib.
--print-filenames Print to the standard error the names of input files as they are opened.
--site-config-file=FILENAME Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwsplit searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME Causes rwsplit to read file names from FILENAME or from the standard input if FILENAME is not provided. The input should have one file name per line. rwsplit will open each file in turn and read records from it, as if the files had been listed on the command line.
--help Print the available options and exit.
--version Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLES

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is used to indicate a wrapped line.

Assume a source file source.rwf; to split that file into files that each contain about 100 unique IP addresses:



 $ rwsplit --basename=result --ip-limit=100 source.rwf



To split source.rwf into files that each contain 100 flows:



 $ rwsplit --basename=result --flow-limit=100 source.rwf



The following causes rwsplit to sample 1 out of every 10 records from source.rwf; i.e., rwsplit will read 1000 flow records to produce each subfile:



 $ rwsplit --basename=result --flow-limit=100 --sample-ratio=10 source.rwf



When --file-ratio is specified, the file names are generated as usual (e.g., base-00000000, base-00000001, ...); however, one of these names will be chosen randomly from each set of --file-ratio candidates, and only that file will be written to disk.



 $ rwsplit --basename=result --flow-limit=100 --file-ratio=5 source.rwf
 $ ls
 result-00000002.rwf
 result-00000008.rwf
 result-00000013.rwf
 result-00000016.rwf



LIMITATIONS

rwsplit can take exactly 1 partitioning switch per invocation.

Partitioning is not exact, rwsplit keeps appending flow records a file until it meets or exceeds the specified LIMIT. For example, if you specify --ip-limit=100, then rwsplit will fill up the file until it has 100 IP addresses in it; if the file has 99 addresses and a new record with 2 previously unseen addresses is received, rwsplit will put this in the current file, resulting in a 101-address file. Similarly, if you specify --byte-limit=2000, and rwsplit receives a 10kb flow record, that flow record will be placed in the current subfile.

The switches --sample-ratio, --file-ratio, and --max-outputs are processed in that order. So, when you specify



 $ rwsplit --sample-ratio=10 --ip-limit=100    \
        --file-ratio=10 --max-outputs=20



rwsplit will pick 1 out of every 10 flow records, write that to a file until it has 100 IP’s per file, pick 1 out of every 10 files to write, and write up to 20 files. If there are 1000 records, each with 2 unique IPs in them, then rwsplit will write at most 1 file (it will write 200 unique IP addresses, but it may not pick one of the files from the set to write).

ENVIRONMENT

SILK_CLOBBER The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
SILK_CONFIG_FILE This environment variable is used as the value for the --site-config-file when that switch is not provided.
SILK_DATA_ROOTDIR This environment variable specifies the root directory of data repository. As described in the FILES section, rwsplit may use this environment variable when searching for the SiLK site configuration file.
SILK_PATH This environment variable gives the root of the install tree. When searching for configuration files, rwsplit may use this environment variable. See the FILES section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfileinfo(1), silk(7), zlib(3)
Search for    or go to Top of page |  Section 1 |  Main Index


SiLK 3.11.0.1 RWSPLIT (1) 2016-04-05

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.