Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Contact Us
Online Help
Domain Status
Man Pages

Virtual Servers

Topology Map

Server Agreement
Year 2038

USA Flag



Man Pages

Manual Reference Pages  -  TREEBANKFREQ.PL (1)

.ds Aq ’

NAME - Compute Information Content from Penn Treebank 2


SYNOPSIS [--outfile=OUTFILE [--stopfile=STOPFILE]
       [--wnpath=WNPATH] [--resnik] [--smooth=SCHEME] PATH
        | --help --version]


This program reads the Penn Treebank, Release 2, from the Linguistic Data Consortium, <>, and computes the frequency counts for each synset in WordNet. These frequency counts are used by the Lin, Resnik, and Jiang & Conrath measures of semantic relatedness to calculate the information content values of concepts. The output is generated in a format as required by the WordNet::Similarity modules for computing semantic relatedness.

A more detailed description of how information content is calculated can be found in This program uses exactly the same techniques as described there.



    The name of a file to which output should be written


    A file containing a list of stop listed words that will not be
    considered in the frequency counts.  A sample file can be down-
    loaded from


    Location of the WordNet data files (e.g.,


    Use Resnik (1995) frequency counting


    Smoothing should used on the probabilities computed.  SCHEME can
    only be ADD1 at this time


    Show a help message


    Display version information


    Path to the raw Wall Stree Journal portion of the Treebank corpus.
    This is usually in the /raw/wsj subdirectory of the Treebank
    installation.  Thus, you might run this program as [OPTIONS] /home/sid/treebank/raw/wsj


Report to WordNet::Similarity mailing list :



Penn Treebank :

WordNet home page :

WordNet::Similarity home page :


 Ted Pedersen, University of Minnesota, Duluth
 tpederse at

 Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
 banerjee+ at

 Siddharth Patwardhan, University of Utah, Salt Lake City
 sidd at


Copyright (c) 2005-2008, Ted Pedersen, Satanjeev Banerjee, and Siddharth Patwardhan

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Search for    or go to Top of page |  Section 1 |  Main Index

perl v5.20.3 TREEBANKFREQ (1) 2016-04-03

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.