![]() |
![]()
| ![]() |
![]()
NAMEfasta - scan a protein or DNA sequence library for similar sequences tfasta - compare a protein sequence to a DNA sequence library, translating the DNA sequence library `on-the-fly'. lfasta - compare two protein or DNA sequences for local similarity and show the local sequence alignments plfasta - compare two sequences for local similarity and plot the local sequence alignments SYNOPSISfasta [-a -A -b # -c # -d # -E # -f # -g # -k # -l file -L FASTLIBS -r STATFILE -m # -o -O file -p # -Q -s SMATRIX -w # -x "# #" -y # -z -1 ] query-sequence-file library-file [ ktup ] fasta [-QaAbcdEfgHiklmnoOprswxyz] query-file @library-name-file fasta [-QaAbcdEfgHiklmnoOprswxyz] query-file "%PRMVI" fasta [-aAbcdEgHlmnoOprswyx] - interactive mode fastx [-aAbcdEfghHlmnoOprswyx] DNA-query-file protein-library [ ktup ] tfasta [-aAbcdEfgkmoOprswy3] protein-query-file DNA-library [ ktup ] tfastx [-abcdEfghHikmoOprswy3] protein-query-file DNA-library [ ktup ] lfasta [-afgmnpswx] sequence-file-1 sequence-file-2 [ ktup ] plfasta [-afgkmnpsxv] sequence-file-1 sequence-file-2 [ ktup ] DESCRIPTIONfasta is used to compare a protein or DNA sequence to all of the entries in a sequence library. For example, fasta can compare a protein sequence to all of the sequences in the NBRF PIR protein sequence database. fasta will automatically decide whether the query sequence is DNA or protein by reading the query sequence as protein and determining whether the `amino-acid composition' is more than 85% A+C+G+T. fasta uses an improved version of the rapid sequence comparison algorithm described by Lipman and Pearson (Science, (1985) 227:1427) that is described in Pearson and Lipman, Proc. Natl. Acad. USA, (1988) 85:2444. The program can be invoked either with command line arguments or in interactive mode. The optional third argument, ktup sets the sensitivity and speed of the search. If ktup=2, similar regions in the two sequences being compared are found by looking at pairs of aligned residues; if ktup=1, single aligned amino acids are examined. ktup can be set to 2 or 1 for protein sequences, or from 1 to 6 for DNA sequences. The default if ktup is not specified is 2 for proteins and 6 for DNA. fasta compares a query sequence to a sequence library which consists of sequence data interspersed with comments, see below. Normally fasta, fastx, tfasta, and tfastx search the libraries listed in the file pointed to by the environment variable FASTLIBS. The format of this file is described in the file FASTA.DOC. tfasta compares a protein sequence to a DNA sequence database, translating the DNA sequence library in 6 frames `on-the-fly' (3 frames with the -3 option). The search uses the standard BLOSUM50 scoring matrix, and uses a ktup=2 by default. tfasta searches a DNA sequence database in the standard text format described below. tfastx, like tfasta, compares a protein sequence to a DNA sequence library. However, tfastx compares the protein sequence to the forward and reverse three-frame translation of the DNA library sequence, allowing for frameshifts. fastx compares a DNA sequence to a protein sequence database, translating the DNA sequence in three frames and allowing frameshifts in the alignment. lfasta and plfasta programs compare two sequences looking for local sequence similarities. While fasta, fastx, and tfasta report only the best alignment between the query sequence and the library sequence, lfasta and plfasta will report all of the alignments between the two sequences with scores greater than a cut-off value. lfasta shows the actual local alignments between the two sequences and their scores, while plfasta produces a plot of the alignments that looks similar to a `dot-matrix' homology plot. On Unix™ systems, plfasta generates postscript output. The fasta programs use a standard text format sequence file. Lines beginning with '>' or ';' are considered comments and ignored; sequences can be upper or lower case, blanks,tabs and unrecognizable characters are ignored. fasta expects sequences to use the single letter amino acid codes, see protcodes(1) . Library files for fasta should have the form shown below. OPTIONSfasta and the other programs can be directed to change the scoring matrix, search parameters, output format, and default search directories by entering options on the command line (preceeded by a `-' or `/' for MS-DOS). All of the options should preceed the file name and ktup arguments). Alternately, these options can be changed by setting environment variables. The options and environment variables are:
EXAMPLES
Compare the amino acid sequence in the file musplfm.aa with the complete PIR protein sequence library using ktup = 2 Each "library" sequence (there need only be one) should start with a comment line which starts with a '>', e.g.
Compare the amino acid sequence in the file musplfm.aa with the sequences in the file lcbo.aa using ktup = 1. Show both sequences in their entirety, with 80 residues on each output line.
Run the fasta program in interactive mode. The program will prompt for the file name for the query sequence, list alternative libraries to be seached (if FASTLIBS is set), and prompt for the ktup. FILESThis version of fasta prompts for the library file to be
searched from a list of file names that are saved in the file pointed to by
the environment variable FASTLIBS. If FASTLIBS = fastgb.list, then the file
fastgb.list might have the entries:
NBRF Protein$0P/u/lib/aabank.lib 0 GB Primate$1P@/u/lib/gpri.nam GB Rodent$1R@/u/lib/grod.nam GB Mammal$1M@/u/lib/gmammal.nam Each line in this file has 4 fields: (1) The library name,
separated from the remaining fields by a '$'; (2) A 0 or a 1 indicating
protein or DNA library respectively; (3) A single letter that will be used
to choose the library; (4) the location of the library file itself (the
library file name can contain an optional library format specfier.
Fasta recognizes the following library formats: 0 - Pearson/FASTA; 1
- Genbank flat file; 2 - NBRF/PIR Codata; 3 - EMBL/SWISS-PROT; 4 -
Intelligenetics; 5 - NBRF/PIR VMS); Note that this fourth field can contain
an '@' character, which indicates that the library file is an
indirect library file containing list of library files, one per line. An
indirect library file might have the lines:
</usr/slib/genbank (the directory for the library files) gbpri.seq 1 gbrod.seq 1 gbmam.seq 1 ... gbvrl.seq 1 ... You can use your own sequence files for fasta, just be certain to put a '>' and comment as the first line before the sequence. Only one library file type, the standard NBRF library format, is supported by the VAX/VMS programs. lfasta and plfasta do not required the '>' and comment line. fasta does. SEE ALSOrdf2(1),protcodes(5), dnacodes(5) AUTHORBill Pearson
|