bgzip - Block compression/decompression utility
] [-b virtualOffset
] [-l compression_level
] [-@ threads
Bgzip compresses files in a similar manner to, and compatible with, gzip(1). The
file is compressed into a series of small (less than 64K) 'BGZF' blocks. This
allows indexes to be built against the compressed file and used to retrieve
portions of the data without having to decompress the entire file.
If no files are specified on the command line, bgzip will compress (or
decompress if the -d option is used) standard input to standard output. If a
file is specified, it will be compressed (or decompressed with -d). If the -c
option is used, the result will be written to standard output, otherwise when
compressing bgzip will write to a new file with a .gz suffix and remove the
original. When decompressing the input file must have a .gz suffix, which will
be removed to make the output name. Again after decompression completes the
input file will be removed.
- -b, --offset INT
- Decompress to standard output from virtual file position (0-based
uncompressed offset). Implies -c and -d.
- -c, --stdout
- Write to standard output, keep original files unchanged.
- -d, --decompress
- -f, --force
- Overwrite files without asking.
- -h, --help
- Displays a help message.
- -i, --index
- Create a BGZF index while compressing. Unless the -I option is used, this
will have the name of the compressed file with .gzi appended to it.
- -I, --index-name FILE
- Index file name.
- -l, --compress-level INT
- Compression level to use when compressing. From 0 to 9, or -1 for the
default level set by the compression library. [-1]
- -r, --reindex
- Rebuild the index on an existing compressed file.
- -g, --rebgzip
- Try to use an existing index to create a compressed file with matching
block offsets. Note that this assumes that the same compression library
and level are in use as when making the original file. Don't use it unless
you know what you're doing.
- -s, --size INT
- Decompress INT bytes (uncompressed size) to standard output. Implies
- -@, --threads INT
- Number of threads to use .
The BGZF format written by bgzip is described in the SAM format specification
available from http://samtools.github.io/hts-specs/SAMv1.pdf.
It makes use of a gzip feature which allows compressed files to be concatenated.
The input data is divided into blocks which are no larger than 64 kilobytes
both before and after compression (including compression headers). Each block
is compressed into a gzip file. The gzip header includes an extra sub-field
with identifier 'BC' and the length of the compressed block, including all
The index format is a binary file listing pairs of compressed and uncompressed
offsets in a BGZF file. Each compressed offset points to the start of a BGZF
block. The uncompressed offset is the corresponding location in the
uncompressed data stream.
All values are stored as little-endian 64-bit unsigned integers.
The file contents are:
followed by number_entries pairs of:
# Compress stdin to stdout
bgzip < /usr/share/dict/words > /tmp/words.gz
# Make a .gzi index
bgzip -r /tmp/words.gz
# Extract part of the data using the index
bgzip -b 367635 -s 4 /tmp/words.gz
# Uncompress the whole file, removing the compressed copy
bgzip -d /tmp/words.gz
The BGZF library was originally implemented by Bob Handsaker and modified by
Heng Li for remote file access and in-memory caching.