GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  BIO::ASSEMBLY::IO::TIGR (3)

.ds Aq ’

NAME

Bio::Assembly::IO::tigr - Driver to read and write assembly files in the TIGR Assembler v2 default format.

CONTENTS

SYNOPSIS



    # Building an input stream
    use Bio::Assembly::IO;

    # Assembly loading methods
    my $asmio = Bio::Assembly::IO->new( -file   => SGC0-424.tasm,
                                        -format => tigr );
    my $scaffold = $asmio->next_assembly;

    # Do some things on contigs...

    # Assembly writing methods
    my $outasm = Bio::Assembly::IO->new( -file   => ">SGC0-modified.tasm",
                                         -format => tigr );
    $outasm->write_assembly( -scaffold => $assembly,
                             -singlets => 1 );



DESCRIPTION

This package loads and writes assembly information in/from files in the default TIGR Assembler v2 format. The files are lassie-formatted and often have the .tasm extension. This module was written to be used as a driver module for Bio::Assembly::IO input/output.

    Implementation

Assemblies are loaded into Bio::Assembly::Scaffold objects composed of Bio::Assembly::Contig and Bio::Assembly::Singlet objects. Since aligned reads and contig gapped consensus can be obtained in the tasm files, only aligned/gapped sequences are added to the different BioPerl objects.

Additional assembly information is stored as features. Contig objects have SeqFeature information associated with the primary_tag:



    _main_contig_feature:$contig_id -> misc contig information
    _quality_clipping:$read_id      -> quality clipping position



Read objects have sub_seqFeature information associated with the primary_tag:



    _main_read_feature:$read_id     -> misc read information



Singlets are considered by TIGR Assembler as contigs of one sequence. Contigs are represented here with features having these primary_tag:



    _main_contig_feature:$contig_id
    _quality_clipping:$read_primary_id
    _main_read_feature:$read_primary_id
    _aligned_coord:$read_primary_id



THE TIGR TASM LASSIE FORMAT

    Description

In the TIGR tasm lassie format, contigs are separated by a line containing a single pipe character |, whereas the reads in a contig are separated by a blank line. Singlets can be present in the file and are represented as a contig composed of a single sequence.

Other than the two above-mentioned separators, each line has an attribute name, followed a tab and then an attribute value.

The tasm format is used by more TIGR applications than just TIGR Assembler. Some of the attributes are not used by TIGR Assembler or have constant values. They are indicated by an asterisk *

Contigs have the following attributes:



    asmbl_id   -> contig ID
    sequence   -> contig ungapped consensus sequence (ambiguities are lowercase)
    lsequence  -> gapped consensus sequence (lowercase ambiguities)
    quality    -> gapped consensus quality score (in hexadecimal)
    seq_id     -> *
    com_name   -> *
    type       -> *
    method     -> always asmg *
    ed_status  -> *
    redundancy -> fold coverage of the contig consensus
    perc_N     -> percent of ambiguities in the contig consensus
    seq#       -> number of sequences in the contig
    full_cds   -> *
    cds_start  -> start of coding sequence *
    cds_end    -> end of coding sequence *
    ed_pn      -> name of editor (always GRA) *
    ed_date    -> date and time of edition
    comment    -> some comments *
    frameshift -> *



Each read has the following attributes:



    seq_name  -> read name
    asm_lend  -> position of first base on contig ungapped consensus sequence
    asm_rend  -> position of last base on contig ungapped consensus sequence
    seq_lend  -> start of quality-trimmed sequence (aligned read coordinates)
    seq_rend  -> end of quality-trimmed sequence (aligned read coordinates)
    best      -> always 0 *
    comment   -> some comments *
    db        -> database name associated with the sequence (e.g. >my_db|seq1234)
    offset    -> offset of the sequence (gapped consensus coordinates)
    lsequence -> aligned read sequence (ambiguities are uppercase)



When asm_rend < asm_lend, the sequence was on the complementary DNA strand but its reverse complement is shown in the aligned sequence of the assembly file, not the original read.

Ambiguities are reflected in the contig consensus sequence as lowercase IUPAC characters: a c g t u m r w s y k x n . In the read sequences, however, ambiguities are uppercase: M R W S Y K X N

    Example

Example of a contig containing three sequences:



    sequence    CGATGCTGTACGGCTGTTGCGACAGATTGCGCTGGGTCGATACCGCGTTGGTGATCGGCTTGTTCAGCGGGCTCTGGTTCGGCGACAGCGCGGCGATCTTGGCGGCTGCGAAGGTTGCCGGCGCAATCATGCGCTGCTGACCGTTGACCTGGTCCTGCCAGTACACCCAGTCGCCCACCATGACCTTCAGCGCGTAGCTGTCACAGCCGGCTGTGGTCAGCGCAGTGGCGACGGTGGTGTAGGAGGCGCCAGCAACACCTTGGGTGATCATGTAGCAGCCTTCTGACAGGCCGTAGGTCAGCATGGTCGGCCACTGGGTACCAGTCAGTCGGGTCAACCGAGATTCGCAsCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGGTGTTACCCGAGGTGCCAGTGGTGAAGGCGACGGTCTGGGTGCTGGCCACAGGCGCCAGAGTGGTCGCGCCAACGGTGGCGATGACCAGTTGCGATGGGCCACGGATACCTGACTGCCCGTTGTTCACGGCGCTGACGATGTTCTGCCACAGCGCCAGGCCAGAGCCGGTGATGTTGTCGAACACTTCGGGCGCAACGCCAGGGAGCGAGACGGTCAGCTTCCAGCTCGAAGCAGCGGAGCCAGTAGCCAGGGCGGCGCTGAGCGAGTTGCCGAGCGTGCCGGTGTAGAACGCGGTCAGCGTGGCGCCGGTGGCGGCGGCAGTGTCCTTCAGCGCACTGGTCGCGGCGGTGTCGGTGCCGTCAGTGACGCGCACGGCGCGGATGTTCGAGGCGCCGCCCTGGATTGATACCGCCAGCGCGGTGCACAGGTCGTACTTGCGCACGGTCyGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATAaGCGTGGCGCTGTTCACCGGCCCCCAGTCAGCAATGCCGACGATGCCGAGAATGTCAGTCGGGACGCCATTGATGTAGCGGGTCTTGGGCGCCACTATTTGTATGTACAAATCTGGCGCAGATAAAGCCGCCGTATTCAAATAACCAGCAGGATAGATAGGCATCACGCCTCCAGAATGAAAAAGGCCACCGATTAGGTGGCCTTTGTTGTGTTCGGCTGGCTGTTAGAGCAGCAGCCCGTTTTCCCGCGCAAACGCGAATGGGTCCTTGTCATGCTTCCTGCAATTGCAGGTAGGACAAAGAATTTGCAGGTTGGATTTGTCGTTCGATCCGCCCTTTGCAAGCGGGAACACGTGGTCAACGTGATACCCATCCCTTATGGATATAGTGCACATGGCGCATTTCCAGCGCTGAGCAGCCAGCAAAAATTTTATGTCGTCGCCGGTGTGTGAGCCGACAGCATTTTTCTTGCGAGCCTTGTATGTCCGCGAGAGTGAACGAACTTGCTCCTTGTTGGCTGTCTTCCAGAGCTTTTGAGTAAGCGCACAGAGATCCTTGTTTCTTGATCTCCACTCTCTGGTTGCGGAAAT
    lsequence   CGATGCTGTACGGCTGTTGCGACAGATTGCGCTGGGTCGATACCGCGTTGGTGATCGGCTTGTTCAGCGGGCTCTGGTTCGGCGACAGCGCGGCGATCTTGGCGGCTGCGAAGGTTGCCGGCGCAATCATGCGCTGCTGACCGTTGACCTGGTCCTGCCAGTACACCCAGTCGCCCACCATGACCTTCAGCGCGTAGCTGTCACAGCCGGCTGTGGTCAGCGCAGTGGCGACGGTGGTGTAGGAGGCGCCAGCAACACCTTGGGTGATCATGTAGCAGCCTTCTGACAGGCCGTAGGTCAGCATGGTCGGCCACTGGGTACCAGTCAGTCGGGTCAACCGAGATTCG-CAsCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGGTGTTACCCGAGGTGCCAGTGGTGAAGGCGACGGTCTGGGTGCTGGCCACAGGCGCCAGAGTGGTCGCGCCAACGGTGGCGATGACCAGTTGCGATGGGCCACGGATACCTGACTGCCCGTTGTTCACGGCGCTGACGATGTTCTGCCACAGCGCCAGGCCAGAGCCGGTGATGTTGTCGAACACTTCGGGCGCAACGCCAGGGAGCGAGACGGTCAGCTTCCAGCTCGAAGCAGCGGAGCCAGTAGCCAGGGCGGCGCTGAGCGAGTTGCCGAGCGTGCCGGTGTAGAACGCGGTCAGCGTGGCGCCGGTGGCGGCGGCAGTGTCCTTCAGCGCACTGGTCGCGGCGGTGTCGGTGCCGTCAGTGACGCGCACGGCGCGGATGTTCGAGGCGCCGCCCTGGATTGATACCGCCAGCGCGGTGCACAGGTCGTACTTGCGCACGGTCyGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATAaGCGTGGCGCTGTTCACCGGCCCCCAGTCAGCAATGCCGACGATGCCGAGAATGTCAGTCGGGACGCCATTGATGTAGCGGGTCTTGGGCGCCACTATTTGTATGTACAAATCTGGCGCAGATAAAGCCGCCGTATTCAAATAACCAGCAGGATAGATAGGCATCACGCCTCCAGAATGAAAAAGGCCACCGATTAGGTGGCCTTTGTTGTGTTCGGCTGGCTGTTAGAGCAGCAGCCCGTTTTCCCGCGCAAACGCGAATGGGTCCTTGTCATGCTTCCTGCAATTGCAGGTAGGACAAAGAATTTGCAGGTTGGATTTGTCGTTCGATCCGCCCTTTGCAAGCGGGAACACGTGGTCAACGTGATACCCATCCCTTATGGATATAGTGCACATGGCGCATTTCCAGCGCTGAGCAGCCAGCAAAAATTTTATGTCGTCGCCGGTGTGTGAGCCGACAGCATTTTTCTTGCGAGCCTTGTATGTCCGCGAGAGTGAACGAACTTGCTCCTTGTTGGCTGTCTTCCAGAGCTTTTGAGTAAGCGCACAGAGATCCTTGTTTCTTGATCTCCACTCTCTGGTTGCGGAAAT
    quality     0x
    asmbl_id    93
    seq_id     
    com_name   
    type       
    method      asmg
    ed_status  
    redundancy  1.11
    perc_N      0.20
    seq#        3
    full_cds   
    cds_start  
    cds_end    
    ed_pn       GRA
    ed_date     08/16/07 17:10:12
    comment    
    frameshift 

    seq_name    SDSU_RFPERU_010_C09.x01.phd.1
    asm_lend    1
    asm_rend    4423
    seq_lend    1
    seq_rend    442
    best        0
    comment    
    db 
    offset      0
    lsequence   CGATGCTGTACGGCTGTTGCGACAGATTGCGCTGGGTCGATACCGCGTTGGTGATCGGCTTGTTCAGCGGGCTCTGGTTCGGCGACAGCGCGGCGATCTTGGCGGCTGCGAAGGTTGCCGGCGCAATCATGCGCTGCTGACCGTTGACCTGGTCCTGCCAGTACACCCAGTCGCCCACCATGACCTTCAGCGCGTAGCTGTCACAGCCGGCTGTGGTCAGCGCAGTGGCGACGGTGGTGTAGGAGGCGCCAGCAACACCTTGGGTGATCATGTAGCAGCCTTCTGACAGGCCGTAGGTCAGCATGGTCGGCCACTGGGTACCAGTCAGTCGGGTCAACCGAGATTCG-CAGCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGG

    seq_name    SDSU_RFPERU_002_H12.x01.phd.1
    asm_lend    339
    asm_rend    940
    seq_lend    1
    seq_rend    602
    best        0
    comment    
    db 
    offset      338
    lsequence   CGAGATTCGCCACCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGGTGTTACCCGAGGTGCCAGTGGTGAAGGCGACGGTCTGGGTGCTGGCCACAGGCGCCAGAGTGGTCGCGCCAACGGTGGCGATGACCAGTTGCGATGGGCCACGGATACCTGACTGCCCGTTGTTCACGGCGCTGACGATGTTCTGCCACAGCGCCAGGCCAGAGCCGGTGATGTTGTCGAACACTTCGGGCGCAACGCCAGGGAGCGAGACGGTCAGCTTCCAGCTCGAAGCAGCGGAGCCAGTAGCCAGGGCGGCGCTGAGCGAGTTGCCGAGCGTGCCGGTGTAGAACGCGGTCAGCGTGGCGCCGGTGGCGGCGGCAGTGTCCTTCAGCGCACTGGTCGCGGCGGTGTCGGTGCCGTCAGTGACGCGCACGGCGCGGATGTTCGAGGCGCCGCCCTGGATTGATACCGCCAGCGCGGTGCACAGGTCGTACTTGCGCACGGTCCGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATA-GCGTGGCGC

    seq_name    SDSU_RFPERU_009_E07.x01.phd.1
    asm_lend    880
    asm_rend    1520
    seq_lend    641
    seq_rend    1
    best        0
    comment    
    db 
    offset      8803
    lsequence   CGCACGGTCTGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATAAGCGTGGCGCTGTTCACCGGCCCCCAGTCAGCAATGCCGACGATGCCGAGAATGTCAGTCGGGACGCCATTGATGTAGCGGGTCTTGGGCGCCACTATTTGTATGTACAAATCTGGCGCAGATAAAGCCGCCGTATTCAAATAACCAGCAGGATAGATAGGCATCACGCCTCCAGAATGAAAAAGGCCACCGATTAGGTGGCCTTTGTTGTGTTCGGCTGGCTGTTAGAGCAGCAGCCCGTTTTCCCGCGCAAACGCGAATGGGTCCTTGTCATGCTTCCTGCAATTGCAGGTAGGACAAAGAATTTGCAGGTTGGATTTGTCGTTCGATCCGCCCTTTGCAAGCGGGAACACGTGGTCAACGTGATACCCATCCCTTATGGATATAGTGCACATGGCGCATTTCCAGCGCTGAGCAGCCAGCAAAAATTTTATGTCGTCGCCGGTGTGTGAGCCGACAGCATTTTTCTTGCGAGCCTTGTATGTCCGCGAGAGTGAACGAACTTGCTCCTTGTTGGCTGTCTTCCAGAGCTTTTGAGTAAGCGCACAGAGATCCTTGTTTCTTGATCTCCACTCTCTGGTTGCGGAAAT
    |



...

FEEDBACK

    Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing lists Your participation is much appreciated.



  bioperl-l@bioperl.org                  - General discussion
  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists



    Support

Please direct usage questions or support issues to the mailing list:

bioperl-l@bioperl.org

rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

    Reporting Bugs

Report bugs to the BioPerl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:



  bioperl-bugs@bio.perl.org
  https://github.com/bioperl/bioperl-live/issues



AUTHOR - Florent E Angly

Email florent dot angly at gmail dot com

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _.

    next_assembly



 Title   : next_assembly
 Usage   : my $scaffold = $asmio->next_assembly();
 Function: return the next assembly in the tasm-formatted stream
 Returns : Bio::Assembly::Scaffold object
 Args    : none



    next_contig



 Title   : next_contig
 Usage   : my $contig = $asmio->next_contig();
 Function: return the next contig or singlet TIGR-formatted stream
 Returns : Bio::Assembly::Contig or Bio::Assembly::Singlet
 Args    : none



    _qual_hex2dec



    Title   : _qual_hex2dec
    Usage   : my dec_quality = $self->_qual_hex2dec($hex_quality);
    Function: convert an hexadecimal quality score into a decimal quality score
    Returns : string
    Args    : string



    _qual_dec2hex



    Title   : _qual_dec2hex
    Usage   : my hex_quality = $self->_qual_dec2hex($dec_quality);
    Function: convert a decimal quality score into an hexadecimal quality score
    Returns : string
    Args    : string



    _store_contig



    Title   : _store_contig
    Usage   : my $contigobj = $self->_store_contig(\%contiginfo, $contigobj);
    Function: store information of a contig belonging to a scaffold in the
              appropriate object
    Returns : Bio::Assembly::Contig object
    Args    : hash, Bio::Assembly::Contig



    _store_read



    Title   : _store_read
    Usage   : my $readobj = $self->_store_read(\%readinfo, $contigobj);
    Function: store information of a read belonging to a contig in a contig object
    Returns : Bio::LocatableSeq
    Args    : hash, Bio::Assembly::Contig



    _store_singlet



    Title   : _store_singlet
    Usage   : my $singletobj = $self->_store_read(\%readinfo, \%contiginfo);
    Function: store information of a singlet belonging to a scaffold in a singlet object
    Returns : Bio::Assembly::Singlet
    Args    : hash, hash



    write_assembly



    Title   : write_assembly
    Usage   : $asmio->write_assembly($assembly)
    Function: Write the assembly object in TIGR Assembler compatible format. The
              contig IDs are sorted naturally if the Sort::Naturally module is
              present, or lexically otherwise. Internally, write_assembly use
              the write_contig, write_footer and write_header methods. Use these
              methods if you want more control on the writing process.
    Returns : 1 on success, 0 for error
    Args    : A Bio::Assembly::Scaffold object
              1 to write singlets in the assembly file, 0 otherwise



    write_contig



    Title   : write_contig
    Usage   : $asmio->write_contig($contig)
    Function: Write a contig or singlet object in TIGR compatible format. Quality
              scores are automatically generated if the contig does not contain
              any
    Returns : 1 on success, 0 for error
    Args    : A Bio::Assembly::Contig or Singlet object



    write_header



    Title   : write_header
    Usage   : $asmio->write_header($assembly)
    Function: In the TIGR Asseformat assembly driver, this does nothing. The
              method is present for compatibility with other assembly drivers
              that need to write a file header.
    Returns : 1 on success, 0 for error
    Args    : A Bio::Assembly::Scaffold object



    write_footer



    Title   : write_footer
    Usage   : $asmio->write_footer($assembly)
    Function: Write TIGR footer, i.e. do nothing except making sure that the
              file does not end with a |.
    Returns : 1 on success, 0 for error
    Args    : A Bio::Assembly::Scaffold object



    _perc_N



    Title   : _perc_N
    Usage   : my $perc_N = $asmio->_perc_N($sequence_string)
    Function: Calculate the percent of ambiguities in a sequence.
              M R W S Y K X N are regarded as ambiguities in an aligned read
              sequence by TIGR Assembler. In the case of a gapped contig
              consensus sequence, all lowercase symbols are ambiguities, i.e.:
              a c g t u m r w s y k x n.
    Returns : decimal number
    Args    : string



    _redundancy



    Title   : _redundancy
    Usage   : my $ref = $asmio->_redundancy($contigobj)
    Function: Calculate the fold coverage (redundancy) of a contig consensus
              (average number of read base pairs covering the consensus)
    Returns : decimal number
    Args    : Bio::Assembly::Contig



    _ungap



    Title   : _ungap
    Usage   : my $ungapped = $asmio->_ungap($gapped)
    Function: Remove the gaps from a sequence. Gaps are - in TIGR Assembler
    Returns : string
    Args    : string



    _date_time



    Title   : _date_time
    Usage   : my $timepoint = $asmio->date_time
    Function: Get date and time (MM//DD/YY HH:MM:SS)
    Returns : string
    Args    : none



    _split_seq_name_and_db



    Title   : _split_seq_name_and_db
    Usage   : my ($seqname, $db) = $asmio->_split_seq_name_and_db($id)
    Function: Extract seq_name and db from sequence id
    Returns : seq_name, db
    Args    : id



    _merge_seq_name_and_db



    Title   : _merge_seq_name_and_db
    Usage   : my $id = $asmio->_merge_seq_name_and_db($seq_name, $db)
    Function: Construct id from seq_name and db
    Returns : id
    Args    : seq_name, db



    _coord



    Title   : _coord
    Usage   : my $id = $asmio->__coord($readobj, $contigobj)
    Function: Get different coordinates for the read
    Returns : number, number, number, number, number
    Args    : Bio::Assembly::Seq, Bio::Assembly::Contig



Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 BIO::ASSEMBLY::IO::TIGR (3) 2016-04-05

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.