GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
AD2VCF(1) FreeBSD General Commands Manual AD2VCF(1)

AD2VCF - Extract allelic depth from a SAM stream and add to VCF files

ad2vcf file.vcf minimum-MAPQ < file.sam
samtools view [flags] file.{bam|cram} | ad2vcf file.vcf minimum-MAPQ

ad2vcf is designed to augment VCF files lacking allelic depth (AD in FORMAT) with allelic depth extracted from a SAM stream for the same sample used to generate the VCF.

It was specifically created for a TOPMed project to augment dbGaP BCF files lacking AD info in preparation for use with haplohseq, which requires either NGS data with AD info or micro-array data.

ad2vcf scans a single-sample VCF file one call at a time and simultaneously scans the SAM stream for the same sample, extracting the alleles from the read sequence for each call position in the VCF.

Output is compressed using xz and saved in file-ad.vcf.xz.

Both files must be sorted by chromosome and position. Otherwise, it would not be possible to perform this task in a single pass. Since a single read in the SAM stream could overlap multiple VCF calls, SAM alignments are cached until the next call in the VCF stream is beyond the end of the read. Usually this results in no more than a few thousand SAM alignments being held in memory at once, but in very rare cases, we've seen hundreds of thousands for a particular sample.

Processing a single-sample human VCF and CRAM file from SRA takes about 15 to 30 minutes on a typical processor, with CRAM decoding being a major bottleneck due to it's complex compression. For this reason, ad2vcf was intentionally designed as a filter, so that it can utilize a second core while samtools view saturates the first core decoding the CRAM file.

If is advisable to filter out questionable alignments before feeding data to ad2vcf. For example, samtools can remove unmapped, secondary, qcfail, dup, and supplementary alignments using --excl-flags 0xF0C. This will greatly reduce the number of buffered alignments in rare cases, preventing runaway memory use to buffer alignments that are probably not useful.

vcf-split, haplohseq

Please report bugs to the author and send patches in unified diff format. (man diff for more information)

J. Bacon

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.