 |
|
| |
POOLER(1) |
FreeBSD General Commands Manual |
POOLER(1) |
This program is for geneticists who want to use Multiplex PCR to
study DNA samples, and wish to optimise their combinations of primers to
minimise the formation of dimers. It has been used and cited in oncology,
plant science, climatology, COVID-19 and other research.
Primer Pooler can:
- ○
- Check through each proposed pool for combinations that are likely to form
dimers,
- ○
- Automatically move prospective amplicons between proposed pools to reduce
dimer formation,
- ○
- Automatically search the genome sequence to find which amplicons overlap,
and place their corresponding primers in separate pools,
- ○
- Optionally keep pool sizes within a specified range,
- ○
- Handle thousands of primers without being slow (useful for high-throughput
sequencing applications),
- ○
- Do all of the above with degenerate primers too.
-
If your CPU is modern enough to have them, Primer Pooler will take
advantage of 64-bit registers and multiple cores. But it also runs on older
equipment.
Please note that Primer Pooler does not design primers by
itself. You must choose your primers first, whether by using NCBI's
Primer BLAST
https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi or any
other method of your choice. Once you have your primers,
Primer Pooler can partition them into pools.
The easiest way to run Primer Pooler for first-time users is to
run it interactively. To do this, simply launch the program file
(pooler) and it should ask you a series of questions to take you
through what you want to do.
Questions asked by Primer Pooler when running interactively:
- Would you like to run
interactively? (y/n):
- You should answer y to this question, otherwise Primer Pooler will
merely display the command-line help (see below) and exit.
- Please enter the name of
the primers file to read.
- As the program further explains, it is expecting a text file in
multiple-sequence FASTA format.
Degenerate bases are allowed using the normal letters, and both
upper and lower case is allowed. Names of amplicons' primers should end with
F or R (for Forward and Reverse), and otherwise match.
Optionally include tags to apply to all primers (also called
tailed primers or barcoding) using >tagF and
>tagR (tags can also be changed part-way through the file). If you
also have Taq probes or other primers that don't themselves make amplicons,
you can include these ending with other letters, e.g.
>toySet1-P--any set of names differing in only the last character
will be kept in the same pool, but you must use F for forward and R or B for
reverse (backward) if you also want to check primer-pairs for overlaps in
the genome. If you want to re-use the same primer in two amplicons (for
example, two amplicons that have the same forward primer but differing
reverse primers, to be found on two different genomes), then you should
input the shared primer twice, once for each amplicon, each time
naming it after the corresponding amplicon (e.g. product1-F and
product2-F)--the corresponding sets will then be kept in the same pool.
You can also manually "fix" a primer-set to a
predetermined pool number by using a primer name prefix:
>@2:myPrimer-F fixes myPrimer-F to pool 2 (in which case
Primer Pooler will allocate other primer-sets around these limitations);
this can be useful when you don't have a whole-genome file for overlap
detection.
- Do you want to use deltaG?
(y/n):
- As the program explains, it will need to be told the temperature and
concentration settings if you want it to use deltaG. Temperature is
normally the annealing temperature, about 5C below the "Tm"
melting point of the primers; 45C is typical. Alternatively you can use
the faster and simpler "score" method, but this is less
accurate.
- ○
- If you opt to use Score when your primers and/or tags are very long, you
will be asked if you are really sure you don't want to use deltaG
instead.
- ○
- If you opt for deltaG, the following questions will be asked:
-
- Temperature:
- Enter a number (decimal fractions are allowed). You can enter it in
Celsius, Kelvin, Fahrenheit or Rankine. Do not enter the suffix C or K or
F or R--Primer Pooler will determine for itself which unit was meant, and
ask you to confirm.
- Magnesium
concentration mM/L (0 for no correction):
- Enter your concentration of magnesium in millimoles per litre (decimal
fractions are allowed). Enter 0 if you don't mind the deltaG figures not
being corrected for magnesium concentration.
- Monovalent cation
(e.g. sodium) concentration mM/L:
- Enter your concentration of sodium etc in millimoles per litre (decimal
fractions are allowed). If in doubt, try 50.
- dNTP concentration mM/L (0 for
no correction):
- Enter your concentration of deoxynucleotide (dNTP) in millimoles per litre
(decimal fractions are allowed). If you have been supplied a mixture with
separately-specified concentrations of dATP, dCTP, dGTP and dTTP then sum
these. Enter 0 if you don't mind the deltaG figures not being corrected
for dNTP concentration.
(end of deltaG questions)
- Shall I count how many pairs
have what score/deltaG range? (y/n):
- Answer "y" if you want a fast summary of how many pairs of
primers (in the entire collection, before pooling) have what range of
interaction strengths. This could be used for example to check a pool that
you have already chosen manually, or if you want a rough idea of the
worst-case scenario that pooling aims to avoid.
- ○
- If you answered yes to this question, the summary will be displayed on
screen, and you will be asked if you also want to save it to a file. If
you answer yes to this, you will be asked for a filename.
- ○
- These up-front counts will include self-interactions (a primer interacting
with itself), and interactions between the pair of primers in any given
set. Self-interactions and in-set interactions are not counted when
summarizing the counts of each pool (below).
-
- Do you want to see the highest
bonds of the whole file? (y/n):
- Similar to the above question, this can be useful for checking a manual
selection or for a rough idea. If you answer Yes, you will be asked for a
deltaG or score threshold, and all interactions worse than that threshold
will be displayed on-screen with bonds diagrams.
You will then be asked if you wish to save it to a file, and, if
so, what file name. You will then be asked if you would like to try another
threshold.
- Shall I split this into
pools for you? (y/n):
- Most users will want to say y here, unless you merely wanted to
check a batch of primers that you picked some other way. If you say No,
Primer Pooler will forget about the primers at hand and ask you if you
want to start the program again or exit.
- Shall I check the
amplicons for overlaps in the genome? (y/n):
- If you answer yes to this, Primer Pooler will prompt you for a genome
file, either in .2bit format as supplied by UCSC, or in .fa (FASTA)
format.
To obtain a .2bit file from UCSC:
- 1.
- Go to http://hgdownload.cse.ucsc.edu/downloads.html
- 2.
- Choose a species (e.g. Human)
- 3.
- Choose "Full data set"
- 4.
- Scroll down to the links, and choose the one that ends .2bit (e.g.
hg38.2bit
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit)
-
Primer Pooler will then ask "Do you want me to ignore variant
chromosomes i.e. sequences with _ or - in their names?"
(you'll probably want to answer Yes if you're using hg38.2bit), and will
then ask for a maximum amplicon length (in base pairs): this is the maximum
length of the product--the number does not include the length
of any tag sequences you have added to the primers. Then it will scan
through the genome data to detect where your amplicons start and finish, and
which ones overlap.
- ○
- After the overlap scan is complete, Primer Pooler will then have enough
data to write an input file for MultiPLX if you wish to run that software
as well for comparison. If you decline this, it will ask if you want it to
write a simple text file with the locations of all amplicons, which you
may accept or decline.
- ○
- If you do not opt to check for overlaps in the genome, then Primer
Pooler will not take overlaps into account when generating its
pools. This is rarely useful unless you have already ensured there
are no overlaps in the set of amplicons under consideration. Even then, I
would recommend performing a scan anyway, just to double-check: an early
version found 11 overlaps in a supposedly overlap-free batch drawn up by
an experienced academic--we all make mistakes. But bypassing the overlap
check might be useful if you are sure there are no overlaps and you
don't want to download a very large genome file to the workstation you're
using.
-
- How many pools?
- Enter a number of pools. Before answering this question, you will be given
a "computer suggestion", which is the approximate lowest number
of pools needed to achieve no worse than a deltaG of -7 (or a score of 7)
in each. If you're not sure how many pools, just pick a number and
see. You will be allowed to come back to this question later and try a
different number if you weren't happy with the result.
- Do you want to set a maximum
size of each pool? (y/n):
- As the program explains, setting a maximum size of each pool can make the
pools more even. If you decide to set a maximum, you will be asked to set
the maximum number of primer-sets in each pool. Before answering this
question you will be given a computer suggestion and a lower limit.
You will not be allowed to set the maximum size of each pool lower
than the average size of each pool, since that would make it logically
impossible to fit all primer-sets into all pools. It is not advisable to set
it just above the average either, since being overly strict about the
evenness of the pools could hinder Primer Pooler from finding a solution
with lower dimer formation. You might want to experiment with different
maxima--you will be able to come back to this question and try again.
- Do you want to give me a time
limit? (y/n):
- If you answer y, you will be asked to set a time limit in minutes.
Normally 1 or 2 is enough, although you may wish to let it run a long time
to see if it can find better solutions. You don't have to set a
time limit: you may manually interrupt the pooling process at any time and
have it give the best solution it has found so far, whether a time limit
is in place or not. Additionally, Primer Pooler will stop automatically
when it detects better solutions are unlikely to be found.
- Do you want my
"random" choices to be 100% reproducible for demonstrations?
(y/n):
- If you answer y, Primer Pooler's random choices will be generated in a way
that merely look random but are in fact completely reproducible.
This is useful for demonstration purposes--you'll know how long it will
take to find the solution you want. Otherwise, the random choices will be
less predictable, as a different sequence will be chosen depending on the
exact time at which the pooling was started.
- Pooling display
- While pooling is in progress, Primer Pooler will periodically display a
brief summary of the best solution found so far, showing the pool sizes,
and the counts of interactions (by deltaG range or score) within each
pool. As instructed on screen, you may press Ctrl-C (i.e. hold down Ctrl
while pressing and releasing C, then release Ctrl) to cancel further
exploration and use the best solution found so far.
- Do you want to see the
statistics of each pool? (y/n):
- After the pooling is complete, or after you have interrupted it (by
pressing Ctrl-C as instructed on screen), you will be asked if you wish to
see the interaction counts of each pool (rather than a simple
summary of all pools as appeared during pooling). If you want this,
you will also be asked if you wish to save them to a file, and, if so,
what file name.
- Do you want to see the highest
bonds of these pools? (y/n):
- If you answer Yes, you will be asked for a deltaG or score threshold, and
all interactions worse than that threshold will be displayed on-screen
with bonds diagrams.
You will then be asked if you wish to save it to a file, and, if
so, what file name. You will then be asked if you would like to try another
threshold.
- Shall I write each pool
to a different result file? (y/n):
- If you answer y to this, you will be asked for a prefix, which will
be used to name the individual results files. Otherwise, you will be asked
if you wish to save all results to a single file. If you decline saving
all results to a single file, the results will not be saved at all--this
is for when you weren't happy with the solution and want to go back to try
a different number of pools or a different maximum pool size.
- Do you want to try a different
number of pools? (y/n):
- This question is self-explanatory. You can go back as many times as you
like, trying different numbers of pools. But many researchers have a
pretty good idea of how many pools they want to use, or else are happy
with the computer's initial suggestion.
- Would you like another
go? (y/n):
- If you answered No to trying a different number of pools, or if you didn't
want the program to do pooling at all, then you will be asked if you want
to start the program again. Answering No to this question will exit.
Command-line usage
Besides running interactively (see above), it is also possible to
run Primer Pooler with command-line arguments. This section assumes
familiarity with the concept of running programs from the command line.
The only mandatory argument (if not running interactively)
is a filename for the primers file. This should be a text file in
multiple-sequence FASTA format.
Degenerate bases are allowed using the normal letters, and both
upper and lower case is allowed. Names of amplicons' primers should end with
F or R, and otherwise match. Taq probes etc can end with other letters. If
you want to use the same primer sequence as part of two or more amplicons,
then you may include two or more copies in the input with different names;
they'll be kept in the same pool. Optionally include tags (tails, barcoding)
to apply to all primers: >tagF and >tagR (tags can also be changed
part-way through the file).
Processing options should be placed before this filename. Options
are as follows:
- --help or /help
or /?
- Show a brief help message and exit.
- --counts
- Show score or deltaG-range pair counts for the whole input. deltaG will be
used if the --dg option is set (see below). This option produces a
fast summary of how many primer pairs (in the entire collection, before
pooling) have what range of interaction strengths. This could be used for
example to check a pool that you have already chosen manually, or if you
want a rough idea of the worst-case scenario that pooling aims to
avoid.
- --self-omit
- Causes the --counts option to avoid counting self-interactions(a
primer interacting with itself), and interactions between the pair of
primers in any given set.
- --print-bonds=THRESHOLD
- Similar to --counts, this can be useful for checking a manual
selection or for a rough idea. All interactions worse than the given
threshold (deltaG if --dg is in use, otherwise score) will be
written to standard output, with bonds diagrams.
- --dg[=temperature[,mg[,cation[,dNTP]]]]
- Set this option to use deltaG instead of score. Optional parameters are
the temperature (normally the annealing temperature, about 5C below the
"Tm" melting point of the primers; default 45C), the
concentration of magnesium (default 0), the concentration of monovalent
cation (e.g. sodium, default 50), and the concentration of deoxynucleotide
(dNTP, default 0). Decimal fractions are allowed in all of these.
Temperature is specified in kelvin, and all concentrations are specified
in millimoles per litre.
- --suggest-pools
- Outputs a suggested number of pools. This is the approximate lowest number
of pools needed to achieve no worse than a deltaG of -7 (or a score of 7)
in each.
- --pools[=NUM[,MINS[,PREFIX]]]
- Splits the primers into pools. Optional parameters are the number of pools
(if omitted or set to ? then the suggested number will be
calculated and used), a time limit in minutes, and a prefix for the
filenames of each pool (set this to - to write all to standard
output).
- --max-count=NUM
- Set the maximum number of pairs per pool. This is optional but can make
the pools more even. A maximum lower than the average is not allowed, and
it's usually best to allow a generous margin above the average.
- --genome=PATH
- Check the amplicons for overlaps in the genome, and avoid these overlaps
during pooling. The genome file may be in .2bit format as supplied by
UCSC, or in .fa (FASTA) format.
- --scan-variants
- When searching for amplicons in a genome file, scan variant sequences in
that file too, i.e. sequences with _ and - in their names.
By default such sequences are omitted as they're not normally needed if
using hg38.
- --amp-max=LENGTH
- Sets maximum amplicon length for the overlap check. The default is
220.
- --multiplx=FILE
- Write a MultiPLX input file after the --genome stage, to assist
comparisons with MultiPLX's pooling etc.
- --seedless
- Don't seed the random number generator
- --version
- Just show the program version number and exit.
Defects fixed:
Version 1.0 had important bugs that can affect results:
- 1.
- an error in incremental-update logic sometimes had the effect of
generating suboptimal solutions (in particular, pools could be
unnecessarily empty, and/or full beyond any limit that was set);
- 2.
- an error in the user-interface loop meant that if you use tags, run
interactively, and answer "yes" to the question "Do you
want to try a different number of pools", the second run will
have been done without the tags, and its results will have been de-tagged
twice, removing some bases from the output; moreover, the resulting
truncated versions of your primers will have made it into the interaction
calculations for any third run.
-
These bugs have now been fixed. In addition, Versions 1.1 through
1.13 had a bug related to the first fix, which would cause
interaction-checking for pooling purposes to be performed without
tags when running in interactive mode (command-line mode was not affected).
I therefore recommend re-running in the latest version.
Versions prior to 1.17 also had a display bug: the concentrations
for the deltaG calculation are in millimoles per litre, not nanomoles as
stated on-screen in interactive mode (please ignore the on-screen
instruction and enter millimoles, or upgrade to the latest version which
fixes that instruction). The manual was fixed in version 1.8 (also noting
that it's per litre, not per cubic metre).
Versions prior to 1.34 would round down any decimal fraction you
type when in interactive mode (for deltaG temperature, concentration and
threshold settings). Internal calculation and command-line use was not
affected by this bug.
Versions prior to 1.37 did not ignore whitespace characters after
FASTA labels.
Version 1.8 was briefly released with a regression that could
sometimes result in pairs not being kept in the same pool; this was fixed in
version 1.81.
Version 1.83 fixes a crash that could occur on very large servers
where the number of CPU cores exceeds the number of primers, and version
1.84 fixes messages like pool sizes under unusual circumstances.
Version 1.85 changes the default annealing temperature from 37C to
45C.
Version 1.87 has an important update to maximum pool size
handling. Previous versions accepted pool sizes in primer counts (not
product counts), and incorrectly converted this to product counts in some
cases where some product groups were not of size 2. Plus the user messages
were confusing: this could cause issues for experimenters who wanted to set
the pool size at the lower limit (which is not advisable but supported).
Version 1.87 accepts pool sizes in product counts, and the associated
messages have been revised. Documentation has also been fixed to clarify
that it's the last character (not the last letter) that should be different
in labels of non-standard primer groups. Version 1.88 additionally fixes an
infinite loop that can occur should the user ignore warnings and fill pools
exactly to the maximum.
Notable additions:
Version 1.2 added the MultiPLX output option, and Version 1.33
fixed a bug when MultiPLX output was used with tags and multiple
chromosomes. Version 1.3 added genome reading from FASTA (not just 2bit),
auto-open browser, and suggest number of pools.
Version 1.36 clarified the use of Taq probes, and allowed these to
be in the input file during the overlap check. It's consequently stricter
about the requirement that reverse primers must end with R or
B: previous versions would accept any letter other than F for
these.
Version 1.4 allows tags to be changed part-way through a FASTA
file. For example, if there are two >tagF sequences, the first
>tagF will set the tags for all F primers between the
beginning of the file and the point at which the second >tagF is
given; the second >tagF will set the tags for all F primers
from that point forward. You can change tags as often as you like.
Version 1.5 allows primer sets to be "fixed" to
predetermined pools by specifying these as primer name prefixes, e.g.
>@2:myPrimer-F fixes myPrimer-F to pool 2. (In versions
1.81 through 1.88, the program would not allow you to fix two or more
identical primers to different pools; this was fixed in
1.89.)
Version 1.6 detects and warns about alternative products of
non-unique PCR. It was followed within hours by Version 1.61 which fixed a
regression in the amplicon overlap check. Reporting was improved in version
1.82.
Version 1.7 makes the ignoring of variant sequences in the genome
optional, and warns if primers not being found might be due to variant
sequences having been ignored.
Version 1.72 changes the license to Apache 2.0.
Version 1.8 allows multiple amplicons to share one primer and to
be kept together.
- Base
- The nitrogenous base part of a nucleotide in a DNA sequence, represented
by A, C, G or T. Informally, "base"
can also be used to refer to the entire nucleotide.
- Complement
- What the base binds with. T binds with A and C binds
with G. Complementing a sequence means swapping A for T and C for G
throughout.
- Degenerate
base
- A base we're not sure about because of genetic variation in a population.
We can use extra letters to specify which bases are allowable.
- Primer or
Oligo
- A short string of bases (actually nucleotides) that's used to start
copying from the strand of DNA we're testing. The primer matches up with
the start of a section of DNA we want to copy. There are also extra
structures at the two ends of the primer that set its direction: these are
written as 5' (for the phosphate start) and 3' (for the
hydroxyl end). The actual copying occurs from the complementary
strand, but we can ignore this. Primers are special cases of molecules
called oligonucleotides.
- Degenerate
primer
- A primer that has one or more degenerate bases. In practice, this means we
manufacture separate primers for each combination of allowable bases and
mix them together. So we have to make worst-case assumptions about these
when checking for dimers or overlaps.
- Amplicon
- A section of the DNA we're interested in amplifying (producing copies of).
Primers are designed to copy it.
- Primer set
- Two primers, corresponding to the start and end of an amplicon. They must
be kept in the same pool. Sometimes called a "primer pair", but
this might be confused with the two participants of a dimer (below)
so I think "set" is better. The two primers in a set are called
"forward" and "reverse" primers, but the reverse
primer is not a backward copy of the forward one--if you're reading
my code, you have to be aware of the distinction between backward, which
is just a flipped-over copy of any sequence, and reverse, which is the
second primer of a set. With assistance from an enzyme called polymerase,
the forward primer begins copying from the start of the amplicon, while
the reverse primer begins from the end of the amplicon. Although these
initial copies continue for an indeterminate number of bases (probably not
the whole chromosome, but longer than the region we want), the
second cycle will apply the forward primer to the 'end' section of
what the reverse primer produced, and conversely the reverse primer to the
'start' section of what the forward primer produced, in both cases
resulting in exactly the amplicon we want (which is then reduplicated in
subsequent cycles).
- Negative
strand
- The complement of the normal (positive) sequence in the genome. If a
primer is designed to match the negative strand then you need to
complement it and read it backwards to match the (positive) genome data.
In a set, one of the two primers will be a negative-strand primer,
but the primer file won't tell us which one (it's not necessarily
the "reverse" primer: when a chromosome has a gene on its
negative strand, primers are typically labelled in the other direction so
we'll see the "reverse" primer on the positive strand followed
by the "forward" primer on the negative). You can't put both
primers on the same strand because collisions would occur during
copying.
- Pool or Subpool
or Group or Tube or Primer set combination
(PSC)
- A bunch of primer-sets all drifting around in the same mixture. When that
mixture is added to some of our sample of DNA, the amplicons whose
primer-sets are in that pool are copied (amplified) so we can measure
them. If we can reduce the number of different pools we need, we can
finish the testing more quickly and use up less of the sample, but on the
other hand we want to avoid combinations that overlap or form dimers.
- Overlap
- Two primer-sets that access overlapping sections of the genome. If they
are placed in the same pool, an unwanted shorter amplicon is
produced.
- Dimer
- Two primers stuck to each other. This is bad news because, if they're
stuck to each other, they're not helping us test the sample. But a dimer
is not as bad as an overlap: just because two primers can form a
dimer doesn't mean they will, and the experiment might run anyway
on the fraction of primers that didn't get stuck. But it's better
if each pool can have a combination of primers that tends to produce as
few dimers as possible.
- Score
- A number that gives a rough idea of how likely it is that two primers will
make a dimer. It's just the number of bases that bond, minus the number of
bases that don't, and ignoring any bases that are left dangling off either
end. This is repeated for all positions and the worst case is taken.
- Delta G (dG)
- The change in Gibbs free energy when two primers make a dimer. The more
negative this is, the more likely dimers will form. This thermodynamics
calculation gives better results than score, while being only a
little slower (unless you have ridiculous numbers of degenerate
bases). It does need to know the temperature and amounts of various
chemicals, but if you don't know these, the defaults should still be
reasonable for comparisons.
- Genome
- All the DNA in the cell (most species have hundreds of megabytes at
the very least). We need data about the whole genome to work out which
amplicons will overlap. If some parts are still unknown, we ignore those
and hope for the best.
- Tag or index sequence
or barcode or tail
- A constant set of extra bases added to the beginning (5'--actually
the end on the complimentary strand) of every forward or reverse
primer. This is used for fishing the results out of the pool. If you tell
Primer Pooler what tags you are using, it takes them into account when
checking for dimers, while ignoring them when checking the genome for
amplicon overlaps.
- Efficiency
- The rate at which amplicons are copied, as a fraction of the ideal rate.
Particularly important in quantitative PCR (qPCR) as you need to know the
copy rate for the final counts to be meaningful. Efficiency is improved
with dimer reduction, but it can also depend on manufacturing quality and
equipment quality, so each batch needs to be checked experimentally.
- Massive(ly)
parallel sequencing or next-generation sequencing or
second-generation sequencing or high-throughput sequencing
- Base-by-base reading of thousands of short sections of a genome in
parallel. Less expensive machines in smaller labs typically need the
relevant sections of the genome to be amplified first. If a reference copy
of the genome has already been sequenced and we want to re-sequence
specific sections to check them for alterations, then we can use multiplex
PCR to pull out these sections. This may involve dealing with far more
amplicons than is the case with PCR for detecting or counting genes.
- AutoDimer
- A 2004 program to check a single pool for dimers. AutoDimer was coded in
Visual Basic 6 and its dimer search is several thousand times slower than
Primer Pooler's; re-pooling must be done manually, as must the handling of
degenerate bases.
- Thresholding
- A simple and fast way of grouping primer sets: "don't add a set to a
pool if the interaction badness would exceed some threshold" (usually
dG worse than -7 or amplicon overlap). The total number of pools required
is discovered by the computer, not chosen by the user. Primer Pooler uses
thresholding to suggest a number of pools, but allows the user to
override it for minimisation.
- Minimisation
- Method used by Primer Pooler to group primer sets into a user-specified
number of pools, seeking to minimise the interactions within each
pool.
- MPprimer
- A 2009 GPLd Perl+Python program for finding optimal PSCs by thresholding.
Slower than our C bit-patterns code and cannot cope with degenerate
primers.
- MultiPLX
- A 2004 C++ program for grouping primer-sets by thresholding. No overlap
checking: you are expected to divide the batches yourself and run them
separately. MultiPLX can score on differences between melting
temperatures, and also on unwanted extra interactions between primer and
product-amplicon (which isn't normally a concern when large numbers of
primers are involved); its interaction calculations are slower than ours
and it makes up for this by giving you the option of not checking for
every kind of interaction. Primer Pooler has an option to output
your primers and their products (after genome search) in MultiPLX's input
format if you wish to compare with MultiPLX's scoring.
- Bit patterns
- A computer programming technique that involves writing information about
different items into different binary digits of the same number, loading
that number into the computer's calculation circuitry, and getting it to
do something to all its digits in one operation, thus processing many
items together. This is even more effective on newer CPUs, because their
wider registers can take even more digits at a time. Primer Pooler uses
bit-pattern techniques for its bonding calculations.
- C compiler
- A computer program that takes something written in the C programming
language and converts it into machine code that the CPU can run quickly.
Modern C compilers can be frighteningly good at this, so a
well-written C program can easily outpace what can be done in more
"beginner-friendly" languages. This doesn't usually matter if
you just want to show things on the screen and wait for input, but you
will notice the difference when big calculations are involved.
- C++
- A computer language inspired by C but with many extra features which, if
used well, can make programs easier to manage. In theory, well-written C++
can equal the speed of well-written C. In practice there can be problems
with some C++ compilers. Since I was handling register-level bit patterns
and builtins for specific CPU opcodes, I decided not to risk it and stick
with C even though I could have done it in C++.
- Command line
- A way of interacting with the computer that involves typing commands on
the keyboard and seeing the computer's response written below. It might
not look as nice as a modern graphical desktop, but it can be quite
efficient when you get used to it; moreover, if you're writing in C then
the command line tends to be the easiest interface to write for, freeing
up the programmer to concentrate on the calculation part instead of having
to spend all their time making it look pretty. Sometimes another
programmer who specialises in pretty front-ends will come along later and
add one. (I'm more of a "back-end" than a "front-end"
programmer.)
- CRISPR
- Naturally occuring DNA fragments in unicellular immune systems that have
been repurposed for genetic engineering. Widely hailed as the "next
big thing" after PCR, but doesn't yet replace it in all cases. CRISPR
is more about editing genes like a Unix sed command (you script the
edits but don't see them happen), but it can be modified to create a
visible signal when a cut is made, thereby becoming a sequence-detection
tool for one sequence at a time.
Silas S. Brown, Yun-Wen Chen, Ming Wang, Alexandra Clipson,
Eguzkine Ochoa, and Ming-Qing Du (2017). PrimerPooler: automated primer
pooling to prepare library for targeted sequencing. Biology Methods and
Protocols. Oxford University Press. 2(1). doi:10.1093/biomethods/bpx006
http://doi.org/10.1093/biomethods/bpx006
Primer Pooler is free software, now licensed under the Apache
License, version 2.0. Prior to v1.72 it was licensed under the GNU General
Public License, version 3 or later; the new Apache 2 license is still
GPL-compatible but with added permissions to make it more acceptable in
laboratories with blanket legal policies against GPL'd code.
When developing Primer Pooler, I was aware that Stockholm
University's Professor Erik Lindahl, author of GROMACS, had written to the
Folding@Home project in 2010 to explain the presence of some 400 joke
expansions of "GROMACS" in his code: "our students and
postdocs frequently put in 12 [hour] days and occasional weekends of hard
coding and research work, and then the occasional smile in the middle of
their very serious work can be surprisingly helpful."
Since I was aware of similar circumstances in the local pathology
research group for which Primer Pooler was originally developed, I did place
a little humour into Primer Pooler. This originally included a statement in
the paper that 30,000 years is too long for a research grant, and the
download page had a mildly disparaging pathology-themed reference to
Microsoft's market dominance. The peer reviewers, while sympathetic of my
humorous intentions, requested these to be removed. However, there is still
a little humour in the program itself, revealed if you run interactively and
provide silly answers like setting deltaG to absolute zero or millions of
degrees. There's also an occasional humorous comment in the source code, and
there's something else which I'm rather afraid the biologists won't have
time to figure out although fans of a certain ex-NASA engineer's Web comic
might see it. I have reasonable confidence that these minor jokes are
concealed well enough so as not to be disruptive to the work of anyone not
deliberately looking for a little entertainment.
I've lost track of how many giants I've stood on the shoulders of
for this, but they include:
- ○
- All the scientists who figured out how DNA works and sequenced the human
genome;
- ○
- Martin Richards for his BCPL bit-pattern techniques, which influenced the
way I wrote the fast dimer check;
- ○
- The free/libre and open source software community for their legal
research, a C compiler, editor and debugger;
- ○
- my wife Yun-Wen, who needed this for her cancer-research project, provided
test data and feedback, and put up with all my silly questions.
-
-
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|