GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  LINGUA::TREEBANK (3)

.ds Aq ’

NAME

Lingua::Treebank - Perl extension for manipulating the Penn Treebank format

CONTENTS

SYNOPSIS



  use Lingua::Treebank;

  my @utterances = Lingua::Treebank->from_penn_file($filename);

  foreach (@utterances) {
    # $_ is a Lingua::Treebank::Const now

    foreach ($_->get_all_terminals) {
      # $_ is a Lingua::Treebank::Const that is a terminal (word)

      print $_->word(),   $_->tag(), "\n";
    }

    print "\n\n";

  }



ABSTRACT



  Modules for abstracting out the "natural" objects in the Penn
  Treebank format.



DESCRIPTION

This class knows how to read two treebank formats, the Penn format and the Chomsky Normal Form (CNF) format. These formats differ in how they handle terminal nodes. The Penn format places pre-terminal part of speech tags in the left-hand position of a parenthesis-delimited pair, just like it does non-terminal nodes. The CNF format attaches pre-terminal tags to the word with an underscore. For example, the sentence I spoke would be rendered in each format as follows:



    (S
        (NP
            (N I))
        (VP
            (V spoke)))
            Penn

    (S
        (NP
            I_N)
        (VP
            spoke_V))
     Chomsky Normal Form



Almost all the interesting tree-functionality is in the constituent-forming package (included in this distribution, see Lingua::Treebank::Const).

PLEASE NOTE: The format expected here is the .mrg format, not the .psd format. In other words, one POS-tag per word is required. (In response to CPAN bug 15079.)

Variables

CONST_CLASS The value Lingua::Treebank::CONST_CLASS indicates what class should be used as the class for constituents. The default is Lingua::Treebank::Const; it will generate an error to use a value for $Lingua::Treebank::CONST_CLASS that is not a subclass of Lingua::Treebank::Const.

Methods

    Class methods

from_penn_file given a Penn treebank file, open it, extract the constituents, and return the roots.
from_penn_fh given a Penn treebank filehandle, extract the constituents and return the roots.
from_cnf_file given a Chomsky normal form file, open it, extract the constituents, and return the roots.
from_cnf_fh given a Chomsky normal form filehandle, extract the constituents and return the roots.

    EXPORT

None by default.

HISTORY

0.01 Original version; created by h2xs 1.22 with options



  -CAX
        Lingua::Treebank



0.02 Improved documentation.
0.03 added a VERBOSE variable that can be set.
0.09 A variety of additional features
0.10 more features still, also some bugfixes.
0.11 Removed references to Text::Balanced, which is slow and not uniformly available.
0.12 Corrected bug in Makefile.PL pointed out by Vassilii Khachaturov.

Added some documentation distinguishing that .mrg (and not .psd files) are supported.

0.13 text() method now suppresses anything with a -NONE- tag.

$VERSION for Lingua::Treebank and Lingua::Treebank::Const now tied.

0.14 Actually include patch intended for 0.13. *sheesh*.
0.15 Include Lingua::Treebank::HeadFinder class in distro. Modify L::TB::Const to support head-child annotation.

also support 64-bit systems much better.

0.16 Including data for Lingua::Treebank::HeadFinder. Updating version numbers in Const.pm code Revised test code so that it doesn’t require Devel::Cycle (but uses it if needed).

SEE ALSO

TO DO: Where is Penn Treebank documented?

AUTHOR

Jeremy Gillmor Kahn, <kahn@cpan.org>

COPYRIGHT AND LICENSE

Copyright 2003-2008 by Jeremy Gillmor Kahn with additional support and ideas from Bill McNeill

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

POD ERRORS

Hey! <B>The above document had some coding errors, which are explained below:B>
Around line 250: You forgot a ’=back’ before ’=head1’
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 TREEBANK (3) 2008-08-28

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.