![]() |
![]()
| ![]() |
![]()
SYNOPSISblt fastq-derep.sh file.fastq|fq[.xz|.bz2|.gz] ARGUMENTSfilename FASTQ file optionally compressed with xz, bzip2, or gzip DESCRIPTIONblt fastq-derep.sh removes replicates from a fastq file using fastq2tsv to reformat to tab-separated data for easier sorting, then using Unix sort and an awk script to remove adjacent entries with the same sequence (column 2 of the TSV). Per latest benchmarks, seqkit rmdup --by-sequence and our own C version, blt fastq-derep, are about 3x as fast. However, blt fastq-derep.sh does not require the entire file in memory as it uses the Unix sort command, which automatically breaks large files into chunks for later merging. EXAMPLESblt fastq-derep.sh file.fastq.xz SEE ALSOblt-fastx2tsv(1), blt-fastx-derep(1) AUTHORJ. Bacon
|