![]() |
![]()
| ![]() |
![]()
NAMEBio::SeqIO::msout - input stream for output by Hudson's ms SYNOPSISDo not use this module directly. Use it via the Bio::SeqIO class. DESCRIPTIONms ( Hudson, R. R. (2002) Generating samples under a Wright-Fisher neutral model. Bioinformatics 18:337-8 ) can be found at http://home.uchicago.edu/~rhudson1/source/mksamples.html. Currently, this object can be used to read output from ms into seq objects. However, because bioperl has no support for haplotypes created using an infinite sites model (where '1' identifies a derived allele and '0' identifies an ancestral allele), the sequences returned by msout are coded using A, T, C and G. To decode the bases, use the sequence conversion table (a hash) returned by get_base_conversion_table(). In the table, 4 and 5 are used when the ancestry is unclear. This should not ever happen when creating files with ms, but it will be used when creating msOUT files from a collection of seq objects ( To be added later ). Alternatively, use get_next_hap() to get a string with 1's and 0's instead of a seq object. Mapping to Finite SitesThis object can now also be used to map haplotypes created using an infinite sites model to sequences of arbitrary finite length. See set_n_sites() for more detail. Thanks to Filipe G. Vieira <fgvieira@berkeley.edu> for the idea and code. FEEDBACKMailing ListsUser feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Reporting BugsReport bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: https://github.com/bioperl/bioperl-live/issues AUTHOR - Warren KretzschmarThis module was written by Warren Kretzschmar email: wkretzsch@gmail.com This module grew out of a parser written by Aida Andres. COPYRIGHTPublic Domain NoticeThis software/database is ``United States Government Work'' under the terms of the United States Copyright Act. It was written as part of the authors' official duties for the United States Government and thus cannot be copyrighted. This software/database is freely available to the public for use without a copyright notice. Restrictions cannot be placed on its present or future use. Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the National Human Genome Research Institute (NHGRI) and the U.S. Government does not and cannot warrant the performance or results that may be obtained by using this software or data. NHGRI and the U.S. Government disclaims all warranties as to performance, merchantability or fitness for any particular purpose. METHODSMethods for Internal Use_initialize Title : _initialize Usage : $stream =
Bio::SeqIO::msOUT->new($infile) Function: extracts basic information
about the file. Returns : Bio::SeqIO object Args : no_og, gunzip, gzip,
n_sites Details :
_read_start Title : _read_start Usage : $stream->_read_start() Function: reads from the filehandle $stream->{_filehandle} all information up to the first haplotype (sequence). Closes the filehandle if all lines have been read. Returns : void Args : none Methods to Access Dataget_segsites Title : get_segsites Usage : $segsites = $stream->get_segsites() Function: returns the number of segsites in the msOUT file (according to the msOUT header line's -s option), or the current run's segsites if -s was not specified in the command line (in this case the number of segsites varies from run to run). Returns : scalar Args : NONE get_current_run_segsites Title : get_current_run_segsites Usage :
$segsites =
$stream->get_current_run_segsites()
Function: returns the number of segsites in the run of the last read
get_n_sites Title : get_n_sites Usage : $n_sites =
$stream->get_n_sites() Function: Gets the
number of total sites (variable or not) to be output. Returns : scalar if
n_sites option is defined at call time of new() Args : NONE Note :
set_n_sites Title : set_n_sites Usage : $n_sites =
$stream->set_n_sites($value) Function: Sets the
number of total sites (variable or not) to be output. Returns : 1 on
success; throws an error if $value is not a positive
integer or undef Args : positive integer Note :
get_runs Title : get_runs Usage : $runs =
$stream->get_runs() Function: returns the
number of runs in the msOUT file (according to the
get_Seeds Title : get_Seeds Usage : @seeds =
$stream->get_Seeds() Function: returns an
array of the seeds used in the creation of the msOUT file. Returns : array
Args : NONE Details : In older versions, ms used three seeds. Newer versions
of ms seem to
get_Positions Title : get_Positions Usage : @positions =
$stream->get_Positions() Function: returns
an array of the names of each segsite of the run of the last
get_tot_run_haps Title : get_tot_run_haps Usage :
$number_of_haps_per_run =
$stream->get_tot_run_haps() Function:
returns the number of haplotypes (sequences) in each run of the msOUT
get_ms_info_line Title : get_ms_info_line Usage : $ms_info_line = $stream->get_ms_info_line() Function: returns the header line of the msOUT file. Returns : scalar Args : NONE tot_haps Title : tot_haps Usage :
$number_of_haplotypes_in_file =
$stream->tot_haps() Function: returns the
number of haplotypes (sequences) in the msOUT file.
get_Pops Title : get_Pops Usage : @pops =
$stream->pops() Function: returns an array
of population sizes (order taken from the -I flag in
get_next_run_num Title : get_next_run_num Usage :
$next_run_number =
$stream->next_run_num() Function: returns
the number of the ms run that the next haplotype (sequence)
get_last_haps_run_num Title : get_last_haps_run_num Usage :
$last_haps_run_number =
$stream->get_last_haps_run_num() Function:
returns the number of the ms run that the last haplotype (sequence)
get_last_read_hap_num Title : get_last_read_hap_num Usage :
$last_read_hap_num =
$stream->get_last_read_hap_num() Function:
returns the number (starting with 1) of the last haplotype read from
outgroup Title : outgroup Usage : $outgroup =
$stream->outgroup() Function: returns '1'
if the msOUT stream has an outgroup. Returns '0'
get_next_haps_pop_num Title : get_next_haps_pop_num Usage : ($next_haps_pop_num,
$num_haps_left_in_pop) =
$stream->get_next_haps_pop_num() Function:
First return value is the population number (starting with 1) the
get_next_seq Title : get_next_seq Usage : $seq =
$stream->get_next_seq() Function: reads
and returns the next sequence (haplotype) in the stream Returns : Bio::Seq
object or void if end of file Args : NONE Note : This function is included
only to conform to convention. The
next_seq Title : next_seq Usage : $seq =
$stream->next_seq() Function: Alias to
get_next_seq() Returns : Bio::Seq object or void if end of file Args
: NONE Note : This function is only included for convention. It calls
get_next_seq().
get_next_hap Title : get_next_hap Usage : $hap =
$stream->next_hap() Function: reads and
returns the next sequence (haplotype) in the stream.
get_next_pop Title : get_next_pop Usage : @seqs =
$stream->next_pop() Function: reads and
returns all the remaining sequences (haplotypes) in the
next_run Title : next_run Usage : @seqs =
$stream->next_run() Function: reads and
returns all the remaining sequences (haplotypes) in the ms
Methods to Retrieve Constantsbase_conversion_table Title : get_base_conversion_table Usage :
$table_hash_ref =
$stream->get_base_conversion_table()
Function: returns a reference to a hash. The keys of the hash are the
letters '
# retrieve the Bio::Seq object's sequence my $haplotype = $seq->seq; # need to convert all letters to their corresponding numbers. foreach my $base (keys %{$rh_base_conversion_table}){ $haplotype =~ s/($base)/$rh_base_conversion_table->{$base}/g; } # $haplotype is now an ms style haplotype. (e.g. '100101101455')
|