GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Chemistry::FormulaPattern(3) User Contributed Perl Documentation Chemistry::FormulaPattern(3)

Chemistry::FormulaPattern - Match molecule by formula

    use Chemistry::FormulaPattern;

    # somehow get a bunch of molecules...
    use Chemistry::File::SDF;
    my @mols = Chemistry::Mol->read("file.sdf");

    # we want molecules with six carbons and 8 or more hydrogens
    my $patt = Chemistry::FormulaPattern->new("C6H8-");

    for my $mol (@mols) {
        if ($patt->match($mol)) {
            print $mol->name, " has a nice formula!\n";
        }
    }

    # a concise way of selecting molecules with grep
    my @matches = grep { $patt->match($mol) } @mols;

This module implements a simple language for describing a range of molecular formulas and allows one to find out whether a molecule matches the formula specification. It can be used for searching for molecules by formula, in a way similar to the NIST WebBook formula search (<http://webbook.nist.gov/chemistry/form-ser.html>). Note however that the language used by this module is different from the one used by the WebBook!

Chemistry::FormulaPattern shares the same interface as Chemistry::Pattern. To perform a pattern matching operation on a molecule, follow these steps.

1) Create a pattern object, by parsing a string. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.

2) Execute the pattern on the molecule by calling $patt->match($mol).

If $patt->match returns true, there was a match. If $patt->match is called two consecutive times with the same molecule, it returns false; then true (if there is a match), then false, etc. This is because the Chemistry::Pattern interface is designed to allow multiple matches for a given molecule, and then returns false when there are no further matches; in the case of a formula pattern, there is only one possible match.

    $patt->match($mol); # may return true
    $patt->match($mol); # always false
    $patt->match($mol); # may return true
    $patt->match($mol); # always false
    # ...

This allows one two use the standard while loop for all kinds of patterns without having to worry about endless loops:

    # $patt might be a Chemistry::Pattern, Chemistry::FormulaPattern,
    # or Chemistry::MidasPattern object
    while ($patt->match($mol)) {
        # do something
    }

Also note that formula patterns don't really have the concept of an atom map, so $patt->atom_map and $patt->bond_map always return the empty list.

In the simplest case, a formula pattern may be just a regular formula, as used by the Chemistry::File::Formula module. For example, the pattern "C6H6" will only match molecules with six carbons, six hydrogens, and no other atoms.

The interesting thing is that one can also specify ranges for the elements, as two hyphen-separated numbers. "C6H8-10" will match molecules with six carbons and eight to ten hydrogens.

Ranges may also be open, by omitting the upper part of the range. "C6H0-" will match molecules with six carbons and any number of hydrogens (i.e., zero or more).

A formula pattern may also allow for unspecified elements by means of the asterisk special character, which can be placed anywhere in the formula pattern. For example, "C2H6*" (or "C2*H6, etc.) will match C2H6, and also C2H6O, C2H6S, C2H6SO, etc.

Ranges can also be used after a subformula in parentheses: "(CH2)1-2" will match molecules with one or two carbons and two to four hydrogens. Note, however, that the "structure" of the bracketed part of the formula is forgotten, i.e., the multiplier applies to each element individually and does not have to be an integer. That is, the above pattern will match CH2, CH3, CH4, C2H2, C2H3, and C2H4.

0.10

Chemistry::Pattern

The PerlMol website <http://www.perlmol.org/>

Ivan Tubert-Brohman <itub@cpan.org>

Copyright (c) 2004 Ivan Tubert-Brohman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
2004-08-11 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.