 |
|
| |
samtools-view(1) |
Bioinformatics tools |
samtools-view(1) |
samtools view - views and converts SAM/BAM/CRAM files
samtools view [options]
in.sam|in.bam|in.cram [region...]
With no options or regions specified, prints all alignments in the
specified input alignment file (in SAM, BAM, or CRAM format) to standard
output in SAM format (with no header).
You may specify one or more space-separated region specifications
after the input filename to restrict output to only those alignments which
overlap the specified region(s). Use of region specifications requires a
coordinate-sorted and indexed input file (in BAM or CRAM format).
The -b, -C, -1, -u, -h,
-H, and -c options change the output format from the default
of headerless SAM, and the -o and -U options set the output
file name(s).
The -t and -T options provide additional reference
data. One of these two options is required when SAM input does not contain
@SQ headers, and the -T option is required whenever writing CRAM
output.
The -L, -M, -N, -r, -R,
-d, -D, -s, -q, -l, -m, -f,
-F, -G, and --rf options filter the alignments that
will be included in the output to only those alignments that match certain
criteria.
The -p, option sets the UNMAP flag on filtered alignments
then writes them to the output file.
The -x, -B, --add-flags, and
--remove-flags options modify the data which is contained in each
alignment.
The -X option can be used to allow user to specify
customized index file location(s) if the data folder does not contain any
index file. See EXAMPLES section for sample of usage.
Finally, the -@ option can be used to allocate additional
threads to be used for compression, and the -? option requests a long
help message.
- REGIONS:
Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]]
and all position coordinates are 1-based.
Important note: when multiple regions are given, some alignments
may be output multiple times if they overlap more than one of the specified
regions.
Examples of region specifications:
- chr1
- Output all alignments mapped to the reference sequence named `chr1' (i.e.
@SQ SN:chr1).
- chr2:1000000
- The region on chr2 beginning at base position 1,000,000 and ending at the
end of the chromosome.
- chr3:1000-2000
- The 1001bp region on chr3 beginning at base position 1,000 and ending at
base position 2,000 (including both end positions).
- '*'
- Output the unmapped reads at the end of the file. (This does not include
any unmapped reads placed on a reference sequence alongside their mapped
mates.)
- .
- Output all alignments. (Mostly unnecessary as not specifying a region at
all has the same effect.)
- -b, --bam
- Output in the BAM format.
- -C, --cram
- Output in the CRAM format (requires -T).
- -1, --fast
- Enable fast compression. This also changes the default output format to
BAM, but this can be overridden by the explicit format options or using a
filename with a known suffix.
- -u,
--uncompressed
- Output uncompressed data. This also changes the default output format to
BAM, but this can be overridden by the explicit format options or using a
filename with a known suffix.
This option saves time spent on compression/decompression and
is thus preferred when the output is piped to another samtools
command.
- -h,
--with-header
- Include the header in the output.
- -H,
--header-only
- Output the header only.
- When producing SAM format, output alignment records but not headers. This
is the default; the option can be used to reset the effect of
-h/-H.
- -c, --count
- Instead of printing the alignments, only count them and print the total
number. All filter options, such as -f, -F, and -q,
are taken into account. The -p option is ignored in this mode.
- --save-counts
FILE
- Save data on the number of records processed, accepted and rejected by any
filter options to FILE. The data is stored in JSON format. The
counts only include records that are processed through the filtering
options. Any records skipped while iterating over regions will not be
included, so the number processed may be less than the total number of
records in the file. If used with the --fetch-pairs option, counts
will be given for records processed during the second pass over the
data.
- -?, --help
- Output long help and exit immediately.
- -o FILE, --output FILE
- Output to FILE [stdout].
- -U FILE, --unoutput
FILE, --output-unselected FILE
- Write alignments that are not selected by the various filter
options to FILE. When this option is used, all alignments (or all
alignments intersecting the regions specified) are written to
either the output file or this file, but never both.
- -p, --unmap
- Set the UNMAP flag on alignments that are not selected by the filter
options. These alignments are then written to the normal output. This is
not compatible with -U.
- -t FILE,
--fai-reference FILE
- A tab-delimited FILE. Each line must contain the reference name in
the first column and the length of the reference in the second column,
with one line for each distinct reference. Any additional fields beyond
the second column are ignored. This file also defines the order of the
reference sequences in sorting. If you run: `samtools faidx
<ref.fa>', the resulting index file <ref.fa>.fai can be
used as this FILE.
- -T FILE,
--reference FILE
- A FASTA format reference FILE, optionally compressed by
bgzip and ideally indexed by samtools faidx. If an
index is not present one will be generated for you, if the reference file
is local.
If the reference file is not local, but is accessed instead
via an https://, s3:// or other URL, the index file will need to be
supplied by the server alongside the reference. It is possible to have
the reference and index files in different locations by supplying both
to this option separated by the string "##idx##", for
example:
-T
ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai
However, note that only the location of the reference will be
stored in the output file header. If this method is used to make CRAM
files, the cram reader may not be able to find the index, and may not be
able to decode the file unless it can get the references it needs using
a different method.
- -L FILE,
--target-file FILE, --targets-file FILE
- Only output alignments overlapping the input BED FILE [null].
- -M,
--use-index
- Use the multi-region iterator on the union of a BED file and command-line
region arguments. This avoids re-reading the same regions of files so can
sometimes be much faster. Note this also removes duplicate sequences.
Without this a sequence that overlaps multiple regions specified on the
command line will be reported multiple times. The usage of a BED file is
optional and its path has to be preceded by -L option.
- --region-file
FILE, --regions-file FILE
- Use an index and multi-region iterator to only output alignments
overlapping the input BED FILE. Equivalent to -M -L
FILE or --use-index --target-file FILE.
- -N FILE,
--qname-file FILE
- Output only alignments with read names listed in FILE. If
FILE starts with ^ then the operation is negated and only
outputs alignment with read groups not listed in FILE. It is not
permissible to mix both the filter-in and filter-out style syntax in the
same command.
- -r STR,
--read-group STR
- Output alignments in read group STR [null]. Note that records with
no RG tag will also be output when using this option. This
behaviour may change in a future release.
- -R FILE,
--read-group-file FILE
- Output alignments in read groups listed in FILE [null]. If
FILE starts with ^ then the operation is negated and only
outputs alignment with read names not listed in FILE. It is not
permissible to mix both the filter-in and filter-out style syntax in the
same command. Note that records with no RG tag will also be output
when using this option. This behaviour may change in a future
release.
- -d STR1[:STR2],
--tag STR1[:STR2]
- Only output alignments with tag STR1 and associated value
STR2, which can be a string or an integer [null]. The value can be
omitted, in which case only the tag is considered.
Note that this option does not specify a tag type. For
example, use -d XX:42 to select alignments with an XX:i:42
field, not -d XX:i:42.
- -D STR:FILE,
--tag-file STR:FILE
- Only output alignments with tag STR and associated values listed in
FILE [null].
- -q INT, --min-MQ
INT
- Skip alignments with MAPQ smaller than INT [0].
- -l STR, --library
STR
- Only output alignments in library STR [null].
- -m INT, --min-qlen
INT
- Only output alignments with number of CIGAR bases consuming query sequence
≥ INT [0]
- -e STR, --expr
STR
- Only include alignments that match the filter expression STR. The
syntax for these expressions is described in the main samtools(1) man page
under the FILTER EXPRESSIONS heading.
- -f FLAG,
--require-flags FLAG
- Only output alignments with all bits set in FLAG present in the
FLAG field. FLAG can be specified in hex by beginning with `0x'
(i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/),
as a decimal number not beginning with '0' or as a comma-separated list of
flag names.
For a list of flag names see samtools-flags(1).
- -F FLAG,
--excl-flags FLAG, --exclude-flags FLAG
- Do not output alignments with any bits set in FLAG present in the
FLAG field. FLAG can be specified in hex by beginning with `0x'
(i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/),
as a decimal number not beginning with '0' or as a comma-separated list of
flag names.
- --rf FLAG ,
--incl-flags FLAG, --include-flags FLAG
- Only output alignments with any bit set in FLAG present in the FLAG
field. FLAG can be specified in hex by beginning with `0x' (i.e.
/^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a
decimal number not beginning with '0' or as a comma-separated list of flag
names.
- -G FLAG
- Do not output alignments with all bits set in INT present in the
FLAG field. This is the opposite of -f such that -f12 -G12
is the same as no filtering at all. FLAG can be specified in hex by
beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0'
(i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a
comma-separated list of flag names.
- -x STR,
--remove-tag STR
- Read tag(s) to exclude from output (repeatable) [null]. This can be a
single tag or a comma separated list. Alternatively the option itself can
be repeated multiple times.
If the list starts with a `^' then it is negated and treated
as a request to remove all tags except those in STR. The list may
be empty, so -x ^ will remove all tags.
Note that tags will only be removed from reads that pass
filtering.
- --keep-tag
STR
- This keeps only tags listed in STR and is directly
equivalent to --remove-tag ^STR. Specifying an empty list
will remove all tags. If both --keep-tag and --remove-tag
are specified then --keep-tag has precedence.
Note that tags will only be removed from reads that pass
filtering.
- -B,
--remove-B
- Collapse the backward CIGAR operation.
- --add-flags
FLAG
- Adds flag(s) to read. FLAG can be specified in hex by beginning
with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e.
/^0[0-7]+/), as a decimal number not beginning with '0' or as a
comma-separated list of flag names.
- --remove-flags
FLAG
- Remove flag(s) from read. FLAG is specified in the same way as with
the --add-flags option.
- --subsample
FLOAT
- Output only a proportion of the input alignments, as specified by 0.0
≤ FLOAT ≤ 1.0, which gives the fraction of
templates/pairs to be kept. This subsampling acts in the same way on all
of the alignment records in the same template or read pair, so it never
keeps a read but not its mate.
- --subsample-seed
INT
- Subsampling seed used to influence which subset of reads is kept.
When subsampling data that has previously been subsampled, be sure to use
a different seed value from those used previously; otherwise more reads
will be retained than expected. [0]
- -s FLOAT
- Subsampling shorthand option: -s INT.FRAC is
equivalent to --subsample-seed INT --subsample
0.FRAC.
- -@ INT, --threads INT
- Number of BAM compression threads to use in addition to main thread
[0].
- -P,
--fetch-pairs
- Retrieve pairs even when the mate is outside of the requested region.
Enabling this option also turns on the multi-region iterator (-M).
A region to search must be specified, either on the command-line, or using
the -L option. The input file must be an indexed regular file.
This option first scans the requested region, using the
RNEXT and PNEXT fields of the records that have the PAIRED
flag set and pass other filtering options to find where paired reads are
located. These locations are used to build an expanded region list, and
a set of QNAMEs to allow from the new regions. It will then make
a second pass, collecting all reads from the originally-specified region
list together with reads from additional locations that match the
allowed set of QNAMEs. Any other filtering options used will be
applied to all reads found during this second pass.
As this option links reads using RNEXT and
PNEXT, it is important that these fields are set accurately. Use
'samtools fixmate' to correct them if necessary.
Note that this option does not work with the -c,
--count; -U, --output-unselected; or -p, --unmap
options.
- -S
- Ignored for compatibility with previous samtools versions. Previously this
option was required if input was in SAM format, but now the correct format
is automatically detected by examining the first few characters of
input.
- -X,
--customized-index
- Include customized index file as a part of arguments. See EXAMPLES
section for sample of usage.
- -z FLAGs,
--sanitize FLAGs
- Perform some sanity checks on the state of SAM record fields, fixing up
common mistakes made by aligners. These include soft-clipping alignments
when they extend beyond the end of the reference, marking records as
unmapped when they have reference * or position 0, and ensuring unmapped
alignments have no CIGAR or mapping quality for unmapped alignments and no
MD, NM, CG or SM tags.
FLAGs is a comma-separated list of keywords chosen from
the following list.
- unmap
- The UNMAPPED BAM flag. This is set for reads with position <= 0,
reference name "*" or reads starting beyond the end of the
reference. Note CIGAR "*" is permitted for mapped data so does
not trigger this.
- pos
- Position and reference name fields. These may be cleared when a sequence
is unmapped due to the coordinates being beyond the end of the reference.
Selecting this may change the sort order of the file, so it is not a part
of the on compound argument.
- mqual
- Mapping quality. This is set to zero for unmapped reads.
- cigar
- Modifies CIGAR fields, either by adding soft-clips for reads that overlap
the end of the reference or by clearing it for unmapped reads.
- cigdup
- Canonicalises CIGAR by collapsing neighbouring elements with identical
opcodes (provided the length field does not extend beyond 28-bits which is
problematic for BAM). So for example 2M 3M becomes 5M, with spaces added
for clarity only.
- cigarx
- Replaces CIGAR "=" and "X" codes with "M".
While "=" and "X" are valid codes, they are not
supported by CRAM so this can aid validation and also improve support by
some third party tools that do not cope with "=" and
"X". Note this implicitly also enables cigdup so 10=1X9=
becomes 10M1M9M which then becomes 20M.
- aux
- For unmapped data, some auxiliary fields are meaningless and will be
removed. These include NM, MD, CG and SM.
- off
- Perform no sanity fixing. This is the default
- on
- Sanitize data in a way that guarantees the same sort order. This is
everything except for pos as it cannot be checked and cigarx
as it is not erroneous data.
- all
- All sanitizing options except cigarx, including pos. Combine
with all,cigarx to perform the "=" and "X"
replacement too.
- --no-PG
- Do not add a @PG line to the header of the output file.
- o
- Import SAM to BAM when @SQ lines are present in the header:
samtools view -bo aln.bam aln.sam
If @SQ lines are absent:
samtools faidx ref.fa
samtools view -bt ref.fa.fai -o aln.bam aln.sam
where ref.fa.fai is generated automatically by the
faidx command.
- o
- Convert a BAM file to a CRAM file using a local reference sequence.
samtools view -C -T ref.fa -o aln.cram aln.bam
- o
- Convert a BAM file to a CRAM with NM and MD tags stored verbatim rather
than calculating on the fly during CRAM decode, so that mixed data sets
with MD/NM only on some records, or NM calculated using different
definitions of mismatch, can be decoded without change. The second command
demonstrates how to decode such a file. The request to not decode MD here
is turning off auto-generation of both MD and NM; it will still emit the
MD/NM tags on records that had these stored verbatim.
samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram
- o
- An alternative way of achieving the above is listing multiple options
after the --output-fmt or -O option. The commands below are
equivalent to the two above.
samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram
- o
- Include customized index file as a part of arguments.
samtools view [options] -X /data_folder/data.bam /index_folder/data.bai chrM:1-10
- o
- Output alignments in read group grp2 (records with no RG tag
will also be in the output).
samtools view -r grp2 -o /data_folder/data.rg2.bam /data_folder/data.bam
- o
- Only keep reads with tag BC and were the barcode matches the
barcodes listed in the barcode file.
samtools view -D BC:barcodes.txt -o /data_folder/data.barcodes.bam /data_folder/data.bam
- o
- Only keep reads with tag RG and read group grp2. This does
almost the same than -r grp2 but will not keep records without the
RG tag.
samtools view -d RG:grp2 -o /data_folder/data.rg2_only.bam /data_folder/data.bam
- o
- Remove the actions of samtools markdup. Clear the duplicate flag and
remove the dt tag, keep the header.
samtools view -h --remove-flags DUP -x dt -o /data_folder/dat.no_dup_markings.bam /data_folder/data.bam
Written by Heng Li from the Sanger Institute.
samtools(1), samtools-tview(1), sam(5)
Samtools website: <http://www.htslib.org/>
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|