NAME

No concept is valid in Perl if it cannot be expressed in a one-liner. For the Bio::Phylo package, some operations can be performed using a single expression from the command line. Here are some examples.

Solution 1: downloading and converting Tree of Life data

 perl -MBio::Phylo::IO=parse -e 'print parse->to_nexus' format tolweb as_project 1 url $URL

Assuming that the environment variable $URL has been set to point to a node in the XML web service of the Tree of Life (<http://tolweb.org>), this command will download the output, parse it, and print the parsed output as nexus. As an example, using this url:

 http://tolweb.org/onlinecontributors/app?service=external&page=xml/TreeStructureService&node_id=133799

Something like the following output would be produced:

 #NEXUS
 BEGIN TAXA;
 [! Taxa block written by Bio::Phylo::Taxa 0.31_1520 on Thu Nov 25 20:49:54 2010 ]      
         DIMENSIONS NTAX=3;
         TAXLABELS
                        'Bembidion alaskense'
                        'Bembidion argenteolum'
                        'Bembidion semenovi'
                 ;
 END;
 BEGIN TREES;
 [! Trees block written by Bio::Phylo::Forest 0.31_1520 on Thu Nov 25 20:49:54 2010 ]
        TRANSLATE
                1 'Bembidion alaskense',
                2 'Bembidion argenteolum',
                3 'Bembidion semenovi';
        TREE Tree2 = [&R] (((2,3),1));
 END;

So what is happening here? Firstly, we provide the "-MBio::Phylo::IO" switch, to which we add "=parse", which means we import the "parse" function from Bio::Phylo::IO. This function is supplied with named arguments, which can also be provided on the command line, i.e. as part of the @ARGV array.

Secondly, we use the "-e 'print parse->to_nexus'" switch. Here we tell perl to execute the parse function, transform its return value to nexus, and print that to STDOUT.

Following that, we provide the named command line arguments. "format tolweb" specifies that the input for the parse function is in the Tree of Life XML format. "as_project 1" specifies that the parse function should return its contents as a newly created Bio::Phylo::Project object. "url $URL" specifies the data source to parse; in this case the data source lives at $URL. Other possible options for a data source are "file" with a file name, "string" with a string of phylogenetic data in some recognized format, or "handle" with an open file handle.

(This example requires the otherwise optional modules LWP::UserAgent and XML::Twig to be installed on your system.)

Solution 2: calculating tree balance

 perl -MBio::Phylo::IO=parse -e 'print \
 parse(-format=>"newick",-string=>"((A,B),C);")->first->calc_imbalance'

The "-MModule" switch is the equivalent of using "use Module;" in a script. Here we use the Bio::Phylo::IO module, which is Bio::Phylo's entry point into file parsing and file writing.

The "-e" switch is used to evaluate the subsequent expression. We parse a string, "((A,B),C);", of format "newick". The parser returns a Bio::Phylo::Forest object (i.e. a set of trees, in this case a set of one). From this set we retrieve the first (and only) tree, and calculate Colless' imbalance, which returns a number, which we print to standard out.

This would print "1", because the tree is a ladder, and therefore completely unbalanced. Note how this example uses the standard interface for Bio::Phylo::IO as you would normally use it in code you write in a script or a module. As the arguments to the "parse" function can also be supplied in @ARGV (useful for one-liners or other processes that launch shell commands) the example can be rewritten as:

 perl -MBio::Phylo::IO=parse -e 'print parse()->first->calc_imbalance' \
 format newick string "((A,B),C);"

In this alternative invocation, note how the arguments to the parse call are now outside of the '...command...' quotes, making them "shell words", which for various reasons may not be preceded by dashes.

Sets of trees

You want a one-liner to iterate over a set of trees:

 perl -MBio::Phylo::IO=parse -lne 'print \
 parse(-format=>"newick",-string=>$_)->first->calc_i2' <file>

The "-n" switch wraps a "while(<>) { ... }" around the program, so the trees from file (that is, if they are one newick tree description per line) are copied into $_ one tree at a time. The "-l" switch appends a line break to the printed output.

Stringifying trees

You don't want a number printed to "STDOUT", you want a tree:

 perl -MBio::Phylo::IO=parse -e 'print \
 parse(-format=>"newick",-string=>"((A,B),C);")->first->to_newick'

If you try to print a tree object, what's written is something like "Bio::Phylo::Forest::Tree=SCALAR(0x1a337dc)" (that is, the memory address of the object reference). This is probably not what you want, so the tree object has a "to_newick" method that stringifies the tree to a newick string. Likewise, matrices, taxa and tree blocks can write a NEXUS block using "to_nexus", and all of them can also be written to NeXML (<http://www.nexml.org>) using "to_xml" and to a JSON mapping thereof using "to_json".

Input and output

The Bio::Phylo::IO module is the unified front end for parsing and unparsing phylogenetic data objects. It is a non-OO module that optionally exports the "parse" and "unparse" subroutines into the caller's namespace, using the "use Bio::Phylo::IO qw(parse unparse);" directive. Alternatively, you can call the subroutines as class methods. The "parse" and "unparse" subroutines load and dispatch the appropriate sub-modules at runtime, depending on the "-format" argument.

Parsing trees

You want to create a Bio::Phylo::Forest::Tree object from a newick string.

 use Bio::Phylo::IO;
 # get a newick string from some source
 my $tree_string = '(((A,B),C),D);';
 # Call class method parse from Bio::Phylo::IO
 my $tree = Bio::Phylo::IO->parse(
    -string => $tree_string,
    -format => 'newick'
 )->first;
 # note: newick parser returns 'Bio::Phylo::Forest'
 # Call ->first to retrieve the first tree of the forest.
 print ref $tree, "\n"; # prints 'Bio::Phylo::Forest::Tree'

The Bio::Phylo::IO module invokes format specific parser and unparser modules. It is Bio::Phylo's front door for data input and output from files, raw strings and file handles.

In the solution the IO module calls the Bio::Phylo::Parsers::Newick parser which turns a tree description into a Bio::Phylo::Forest object. (Several other parser and unparser modules live in the Bio::Phylo::Parsers::* and Bio::Phylo::Unparsers::* namespaces, respectively.)

The returned forest object subclasses Bio::Phylo::Listable, as a forest models a list of trees that you can iterate over. By calling the "->first" method, we get the first tree in the forest - a Bio::Phylo::Forest::Tree object (in the example it's a very small forest, consisting of just this single tree).

Parsing tables

You want to create a Bio::Phylo::Matrices::Matrix object from a string.

 use Bio::Phylo::IO;
 # parsing a table
 my $table_string = qq(A,1,2|B,1,2|C,2,2|D,2,1);
 my $matrix = Bio::Phylo::IO->parse(
    -string   => $table_string,
    -format   => 'table',     # See Bio::Phylo::Parsers::Table
    -type     => 'STANDARD',  # Data type
    -fieldsep => ',',         # field separator
    -linesep  => '|'          # line separator
 );
 print ref $matrix, "\n"; # prints 'Bio::Phylo::Matrices::Matrix'

Here the Bio::Phylo::Parsers::Table module parses a string "A,1,2|B,1,2|C,2,2|D,2,1", where the "|" is considered a record or line separator, and the "," as a field separator. The default field and line separators are the tabstop character "\t" and the line break "\n".

Parsing taxa

You want to create a Bio::Phylo::Taxa object from a string.

 use Bio::Phylo::IO;
 # parsing a list of taxa
 my $taxa_string = 'A:B:C:D';
 my $taxa = Bio::Phylo::IO->parse(
    -string   => $taxa_string,
    -format   => 'taxlist',
    -fieldsep => ':'
 );
 print ref $taxa, "\n"; # prints 'Bio::Phylo::Taxa'

Here the Bio::Phylo::Parsers::Taxlist module parses a string "A:B:C:D", where the ":" is considered a field separator. The parser returns a Bio::Phylo::Taxa object. Note that the same result can be obtained by building the taxa object from scratch (a more feasible proposition than building trees or matrices from scratch):

 use Bio::Phylo::Factory; 
 
 # first instantiate the factory...
 my $factory = Bio::Phylo::Factory->new;
 
 # ...then use it to create other objects, such as taxa blocks
 my $taxa = $factory->create_taxa( -name => 'MyTaxa' );
 
 # or taxa, (with names A, B, C and D), and add them to the taxa block
 $taxa->insert( $factory->create_taxon( -name => $_ ) ) for qw(A B C D);
 
 # and write out as a nexus block
 print $taxa->to_nexus( -header => 1, -links => 1 );

This example uses the Bio::Phylo::Factory, which is an object that can create other objects. Here we have it create a Bio::Phylo::Taxa block, which we populate with four Bio::Phylo::Taxa::Taxon objects. We then write out the taxa block as nexus, complete with the #NEXUS header (this is optional so that we can combine multiple blocks in the same file), and a title, using the "-links" switch. The latter is a facility that only seems to be used by Mesquite (<http://mesquiteproject.org>) and Bio::Phylo. It adds a "title" to the taxa block in the nexus output, and other blocks (character state matrices and tree blocks) refer to this using a "links" statement. This is useful if you want to have multiple taxa blocks in the same file and you want to distinguish them. Putting this all together, the output is thus:

 #NEXUS
 BEGIN TAXA;
 [! Taxa block written by Bio::Phylo::Taxa 0.31_1520 on Thu Nov 25 21:31:58 2010 ]
        TITLE MyTaxa;
         DIMENSIONS NTAX=4;
         TAXLABELS
                        A
                        B
                        C
                        D
         ;
 END;

Iterating

The Bio::Phylo::Listable module is the superclass of all container objects. Container objects are objects that contain a set of objects of the same type. For example, a Bio::Phylo::Forest::Tree object is a container for Bio::Phylo::Forest::Node objects. Hence, the Bio::Phylo::Forest::Tree inherits from the Bio::Phylo::Listable class. You can therefore iterate over the nodes in a tree using the methods defined by Bio::Phylo::Listable.

Iterating over trees and nodes.

You want to access trees and nodes contained in a Bio::Phylo::Forest object.

  use Bio::Phylo::IO qw(parse);
  my $string = '((A,B),(C,D));(((A,B),C)D);';
  my $forest = parse( -format => 'newick', -string => $string );
  print ref $forest; # prints 'Bio::Phylo::Forest'
  # access trees in $forest
  foreach my $tree ( @{ $forest->get_entities } ) {
      print ref $tree; # prints 'Bio::Phylo::Forest::Tree';
      # access nodes in $tree
      foreach my $node ( @{ $tree->get_entities } ) {
          print ref $node; # prints 'Bio::Phylo::Forest::Node';
      }
  }

Bio::Phylo::Forest and Bio::Phylo::Forest::Tree are nested subclasses of the iterator class Bio::Phylo::Listable. Nested iterator calls (such as "->get_entities") can be invoked on the objects.

Iterating over taxa.

You want to access the individual taxa in a Bio::Phylo::Taxa object.

 use Bio::Phylo::IO qw(parse);
 my $string = 'A|B|C|D|E|F|G|H';
 my $taxa = parse(
     -string   => $string,
     -format   => 'taxlist',
     -fieldsep => '|'
 );
 print ref $taxa; # prints 'Bio::Phylo::Taxa';
 while ( my $taxon = $taxa->next ) {
     print ref $taxon; # prints 'Bio::Phylo::Taxa::Taxon'
 }

A Bio::Phylo::Taxa object is a subclass of the Bio::Phylo::Listable class. Hence, you could also call "->get_entities" on the taxa object, which returns a reference to an array of taxon objects contained by the taxa object. Note however the shorthand:

 while ( my $taxon = $taxa->next ) { ... }

Iterating over datum objects.

You want to access the datum objects contained by a Bio::Phylo::Matrices::Matrix object.

 use Bio::Phylo::IO;
 # parsing a table
 my $table_string = qq(A,1,2|B,1,2|C,2,2|D,2,1);
 my $matrix = Bio::Phylo::IO->parse(
    -string   => $table_string,
    -format   => 'table',     # See Bio::Phylo::Parsers::Table
    -type     => 'STANDARD',  # Data type
    -fieldsep => ',',         # field separator
    -linesep  => '|'          # line separator
 );
 print ref $matrix, "\n"; # prints 'Bio::Phylo::Matrices::Matrix'
 my $datum = $matrix->get_by_index( 0, -1 );
 print ref $datum; # NOTE: prints 'ARRAY'!