Manual Reference Pages - PRSS (1)
prss - test a protein sequence similarity for significance
prss [-Q -f # -g # -h -O file -s SMATRIX
-w # ]
- interactive mode
prss is used to evaluate the significance of a protein sequence similarity
score by comparing two sequences and calculating optimal similarity
scores, and then repeatedly shuffling the second sequence, and
calculating optimal similarity scores using the Smith-Waterman
algorithm. An extreme value distribution is then fit to the
shuffled-sequence scores. The characteristic parameters of the
extreme value distribution are then used to estimate the probability
that each of the unshuffled sequence scores would be obtained by
chance in one sequence, or in a number of sequences equal to the
number of shuffles. This program is derived from
rdf2 , which was described by Pearson and Lipman, PNAS (1988)
85:2444-2448, and Pearson (Meth. Enz. 183:63-98). Use of the extreme
value distribution for estimating the probabilities of similarity
scores was described by Altshul and Karlin, PNAS (1990) 87:2264-2268.
The z-values calculated by rdf2 are not as informative as the
P-values and expectations calculated by prdf.
prss uses calculates optimal scores using the same rigorous Smith-Waterman
algorithm (Smith and Waterman, J. Mol. Biol. (1983) 147:195-197) used by the
prss also allows a more sophisticated shuffling method: residues can be shuffled
within a local window, so that the order of residues 1-10, 11-20, etc,
is destroyed but a residue in the first 10 is never swapped with a residue
outside the first ten, and so on for each local window.
Run prss in interactive mode. The program will prompt for the file
name of the two query sequence files and the number of shuffles to be
used. 100 shuffles are calculated by default; 250 - 500 shuffles
should provide more accurate probability estimates.
prss -w 10 musplfm.aa lcbo.aa
Compare the amino acid sequence in the file musplfm.aa with that
in lcbo.aa, then shuffle lcbo.aa 100 times using a local shuffle with
a window of 10. Report the significance of the
unshuffled musplfm/lcbo comparison scores with respect to the shuffled
prss musplfm.aa lcbo.aa
Compare the amino acid sequence in the file musplfm.aa with the sequences
in the file lcbo.aa.
prss can be directed to change the scoring matrix, gap penalties, and
shuffle parameters by entering options on the command line (preceeded
by a -). All of the options should preceed the file names number of
Penalty for the first residue in a gap (-12 by default).
Penalty for additional residues in a gap (-2 by default).
Do not display histogram of similarity scores.
"quiet" - do not prompt for filename.
send copy of results to "filename."
(SMATRIX) the filename of an alternative scoring matrix file. For protein
sequences, BLOSUM50 is used by default; PAM250 can be used with the
command line option
-s 250 (or with -s pam250.mat).
ssearch(1), prdf(1), fasta(1), lfasta(1), protcodes(5)
The curve fitting routines in rweibull.c were provided by Phil Green,
Washington U., St. Louis.
Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.