bp_mask_by_search - mask sequence(s) based on its alignment results
bp_mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa
Mask sequence based on significant alignments of another sequence. You need to
provide the report file and the entire sequence data which you want to mask.
By default this will assume you have done a TBLASTN (or TFASTY) and try and
mask the hit sequence assuming you've provided the sequence file for the hit
database. If you would like to do the reverse and mask the query sequence
specify the -t/--type query flag.
This is going to read in the whole sequence file into memory so for large
genomes this may fall over. I'm using DB_File to prevent keeping everything in
memory, one solution is to split the genome into pieces (BEFORE you run the DB
search though, you want to use the exact file you BLASTed with as input to
Below the double dash (--) options are of the form --format=fasta or --format
fasta or you can just say -f fasta
By -f/--format I mean either are acceptable options. The =s or =n or =c specify
these arguments expect a 'string'
-f/--format=s Search report format (fasta,blast,axt,hmmer,etc)
-sf/--sformat=s Sequence format (fasta,genbank,embl,swissprot)
--hardmask (booelean) Hard mask the sequence
with the maskchar [default is lowercase mask]
--maskchar=c Character to mask with [default is N], change
to 'X' for protein sequences
-e/--evalue=n Evalue cutoff for HSPs and Hits, only
mask sequence if alignment has specified evalue
--outfile=file Output file to save the masked sequence to.
-t/--type=s Alignment seq type you want to mask, the
'hit' or the 'query' sequence. [default is 'hit']
--minlen=n Minimum length of an HSP for it to be used
in masking [default 0]
-h/--help See this help information
Jason Stajich, jason-at-bioperl-dot-org.