GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
fastq-trim(1) FreeBSD General Commands Manual fastq-trim(1)

fastq-trim - Trim adapters and low-quality bases from FASTQ files

fastq-trim [--help]
fastq-trim [--verbose] [--exact-match]
    [--3p-adapter1 seq] [--3p-adapter2 seq]
    [--min-match N] [--max-mismatch-percent N]
    [--min-qual N] [--min-length N] [--polya-min-length N]
    [--phred-base N]
    [infile1.fastq[.gz|.bz2|.xz]] [outfile1.fastq[.gz|.bz2|.xz]]
    [infile2.fastq[.gz|.bz2|.xz] outfile2.fastq[.gz|.bz2|.xz]]

--help
Print a summary of usage and exit.

--verbose
Print some intermediate results during trimming for debugging purposes.

--exact-match
Use exact matching to find adapters. Adapters will be matched if a segment matches the entire adapter sequence, or a portion of it at the end of the read matches to a minimum of min_match (default=3, controlled by --min-match) characters. The default matching algorithm used when --exact-match is not specified is described below.

--3p-adapter1 seq
Specify the 3' adapter for single-end mode and read1 in paired-end mode. Default is the Illumina universal adapter, AGATCGGAAGAG. Use fastq-scum(1) to identify other standard adapters in the input.

--3p-adapter1 seq
Specify the 3' adapter for read2 in paired-end mode. Default is the Illumina universal adapter, AGATCGGAAGAG. Use fastq-scum(1) to identify other standard adapters in the input.

--min-match N
Minimum number of bases in the read that must match the adapter in order to report a match. This applies to both exact matching and smart matching. Default is 3.

--max-mismatch-percent N
Maximum percentage of bases in the read can differ from the adapter and still consider it a match. This applies only to smart matching. Increasing the value from the default 10% slows down processing slightly, reduces the number of missed adapters, and increases the risk of removing real data resembling adapters.

--min-qual N
Minimum quality of bases to keep for end-trimming. Default is 20.

--min-length N
Minimum length of reads to keep after trimming. Default is 30.

--polya-min-length N
Minimum length of poly-A tails to be removed (after other trimming). Default is 0, meaning no poly-A trimming is done.

--phred-base N
Offset used for characters in quality string. Default is 33, which should be the correct value for virtually all modern sequence data.

File Arguments:
Fastq-trim optionally accepts 1, 2, or 4 filenames.

If no filenames are provided, single-read mode is selected with input read from the standard input and output written to the standard output.

If one filename is provided, single-read mode is selected with input read from the given filename and output is written to the standard output.

If two filenames are provided, single-read mode is selected with input read from the first filename and output written to the second.

If four filenames are provided, paired mode is selected. The first filename is the forward read input, the second the forward read output, the third the reverse read input and the fourth the reverse read output.

Fastq-trim removes adapters and low-quality bases from the ends of each read in a FASTQ file. Reads with a length less than the specified or default minimum are not output.

Note that adapter matching (A.K.A. alignment) is not an exact science. Most bioinformatics data contain errors and hence the processing must be probabilistic. It is possible that an adapter sequence occurs naturally in a given sample. The longer the adapter, the less often this will occur, which is why adapters are typically 12 or more bases long. Also, read errors can occur in adapters as well as in the insert (the real DNA/RNA sequence between the adapters). This is rare and using an exact match algorithm like memcmp(3) will generally find more than 99% of adapters.

Tolerating some slop in adapter matching will result in fewer adapters left in the data and a higher risk of false positives (removing natural sequences resembling adapters). Neither situation is catastrophic. If a fraction of a percent of reads will not align to a genome properly because of adapter contamination, the end results of the downstream analysis won't tell a different story. The same is true of reads that were shortened by removing a falsely identified adapter.

The default tolerance is 10% of the adapter length (or the remaining bases at the end of the read if that's shorter). This can be increased using --max-mismatch_percent.

The default "smart" adapter matching allows for roughly 10% of the bases to be substituted. No insertions or deletions are currently handled for adapter matching.

Exact matching can be selected using --exact-match. In either case, a minimum of min_match (default 3, controlled by --min-match) bases must be matched in order to decide that the sequence is indeed an adapter.

Low-quality 3' end removal uses the same algorithm as BWA and Cutadapt, namely scanning backward from the 3' end until the sum of (score - min-qual) becomes > 0, then trimming at the location of the minimum sum. Quality trimming is done before adapter matching so that likely misread bases do not contribute to the adapter match scoring.

Input and output files may be compressed using gzip(1), bzip2(1), or xz(1). Support for this is provided by xt_fopen(3), which automatically determines the file type from the filename extension and pipes input or output as needed. Compression level of output files and other options can be be passed to gzip(1), bzip2(1), and xz(1) via the environment variables GZIP, BZIP2, BZIP, and XZ_OPT. This is often useful for adjusting the compression level of output files in order to tune performance.

GZIP, BZIP2, BZIP, XZ_OPT: Fine-tune output compression.

fastq-scum(1), biolibc(3)

Please report bugs to the author and send patches in unified diff format. (man diff for more information)

J. Bacon

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.