GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  WORDNET::QUERYDATA (3)

.ds Aq ’

NAME

WordNet::QueryData - direct perl interface to WordNet database

CONTENTS

SYNOPSIS



  use WordNet::QueryData;

  my $wn = WordNet::QueryData->new( noload => 1);

  print "Synset: ", join(", ", $wn->querySense("cat#n#7", "syns")), "\n";
  print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
  print "Parts of Speech: ", join(", ", $wn->querySense("run")), "\n";
  print "Senses: ", join(", ", $wn->querySense("run#v")), "\n";
  print "Forms: ", join(", ", $wn->validForms("lay down#v")), "\n";
  print "Noun count: ", scalar($wn->listAllWords("noun")), "\n";
  print "Antonyms: ", join(", ", $wn->queryWord("dark#n#1", "ants")), "\n";



DESCRIPTION

WordNet::QueryData provides a direct interface to the WordNet database files. It requires the WordNet package (http://www.cogsci.princeton.edu/~wn/). It allows the user direct access to the full WordNet semantic lexicon. All parts of speech are supported and access is generally very efficient because the index and morphical exclusion tables are loaded at initialization. The module can optionally be used to load the indexes into memory for extra-fast lookups.

USAGE

    LOCATING THE WORDNET DATABASE

To use QueryData, you must tell it where your WordNet database is. There are two ways you can do this: 1) by setting the appropriate environment variables, or 2) by passing the location to QueryData when you invoke the new function.

QueryData knows about two environment variables, WNHOME and WNSEARCHDIR. If WNSEARCHDIR is set, QueryData looks for WordNet data files there. Otherwise, QueryData looks for WordNet data files in WNHOME/dict (WNHOME\dict on a PC). If WNHOME is not set, it defaults to /usr/local/WordNet-3.0 on Unix and C:\Program Files\WordNet\3.0 on a PC. Normally, all you have to do is to set the WNHOME variable to the location where you unpacked your WordNet distribution. The database files are normally unpacked to the dict subdirectory.

You can also pass the location of the database files directly to QueryData. To do this, pass the location to new:



  my $wn = WordNet::QueryData->new("/usr/local/wordnet/dict");



You can instead call the constructor with a hash of params, as in:



  my $wn = WordNet::QueryData->new(
      dir => "/usr/local/wordnet/dict",
      verbose => 0,
      noload => 1
  );



When calling new in this fashion, two additional arguments are supported; verbose will output debugging information, and noload will cause the object to *not* load the indexes at startup.

    CACHING VERSUS NOLOAD

The noload option results in data being retrieved using a dictionary lookup rather than caching the indexes in RAM. This method yields an immediate startup time but *slightly* (though less than you might think) longer lookup time. For the curious, here are some profile data for each method on a duo core intel mac, averaged seconds over 10000 iterations:

Caching versus noload times in seconds



                                          noload => 1  noload => 0
------------------------------------------------------------------
new()                                     0.00001      2.55
queryWord("descending")                   0.0009       0.0001
querySense("sunset#n#1", "hype")          0.0007       0.0001
validForms ("lay down#2")                 0.0004       0.0001



Obviously the new() comparison is not very useful, because nothing is happening with the constructor in the case of noload => 1. Similarly, lookups with caching are basically just hash lookups, and therefore very fast. The lookup times for noload => 1 illustrate the tradeoff between caching at new() time and using dictionary lookups.

Because of the lookup speed increase when noload => 0, many users will find it useful to set noload to 1 during development cycles, and to 0 when RAM is less of a concern than speed. The bottom line is that noload => 1 saves you over 2 seconds of startup time, and costs you about 0.0005 seconds per lookup.

    QUERYING THE DATABASE

There are two primary query functions, ’querySense’ and ’queryWord’. querySense accesses semantic (sense to sense) relations; queryWord accesses lexical (word to word) relations. The majority of relations are semantic. Some relations, including also see, antonym, pertainym, participle of verb, and derived forms are lexical. See the following WordNet documentation for additional information:



  http://wordnet.princeton.edu/man/wninput.5WN#sect3



Both functions take as their first argument a query string that takes one of three types:



  (1) word (e.g. "dog")
  (2) word#pos (e.g. "house#n")
  (3) word#pos#sense (e.g. "ghostly#a#1")



Types (1) or (2) passed to querySense or queryWord will return a list of possible query strings at the next level of specificity. When type (3) is passed to querySense or queryWord, it requires a second argument, a relation. Relations generally only work with one function or the other, though some relations can be either semantic or lexical; hence they may work for both functions. Below is a list of known relations, grouped according to the function they’re most likely to work with:



  queryWord
  ---------
  also - also see
  ants - antonyms
  deri - derived forms (nouns and verbs only)
  part - participle of verb (adjectives only)
  pert - pertainym (pertains to noun) (adjectives only)
  vgrp - verb group (verbs only)

  querySense
  ----------
  also - also see
  glos - word definition
  syns - synset words
  hype - hypernyms
  inst - instance of
  hypes - hypernyms and "instance of"
  hypo - hyponyms
  hasi - has instance
  hypos - hyponums and "has instance"
  mmem - member meronyms
  msub - substance meronyms
  mprt - part meronyms
  mero - all meronyms
  hmem - member holonyms
  hsub - substance holonyms
  hprt - part holonyms
  holo - all holonyms
  attr - attributes (?)
  sim  - similar to (adjectives only)
  enta - entailment (verbs only)
  caus - cause (verbs only)
  domn - domain - all
  dmnc - domain - category
  dmnu - domain - usage
  dmnr - domain - region
  domt - member of domain - all (nouns only)
  dmtc - member of domain - category (nouns only)
  dmtu - member of domain - usage (nouns only)
  dmtr - member of domain - region (nouns only)



When called in this manner, querySense and queryWord will return a list of related words/senses. Note that as of WordNet 2.1, many hypernyms have become instance of and many hyponyms have become has instance.

Note that querySense and queryWord use type (3) query strings in different ways. A type (3) string passed to querySense specifies a synset. A type (3) string passed to queryWord specifies a specific sense of a specific word.

    OTHER FUNCTIONS

validForms accepts a type (1) or (2) query string. It returns a list of all alternate forms (alternate spellings, conjugations, plural/singular forms, etc.). The type (1) query returns alternates for all parts of speech (noun, verb, adjective, adverb). WARNING: Only the first argument returned by validForms is certain to be valid (i.e. recognized by WordNet). Remaining arguments may not be valid.

listAllWords accepts a part of speech and returns the full list of words in the WordNet database for that part of speech.

level accepts a type (3) query string and returns a distance (not necessarily the shortest or longest) to the root in the hypernym directed acyclic graph.

offset accepts a type (3) query string and returns the binary offset of that sense’s location in the corresponding data file.

tagSenseCnt accepts a type (2) query string and returns the tagsense_cnt value for that lemma: number of senses of lemma that are ranked according to their frequency of occurrence in semantic concordance texts.

lexname accepts a type (3) query string and returns the lexname of the sense; see WordNet lexnames man page for more information.

frequency accepts a type (3) query string and returns the frequency count of the sense from tagged text; see WordNet cntlist man page for more information.

See test.pl for additional example usage.

NOTES

Requires access to WordNet database files (data.noun/noun.dat, index.noun/noun.idx, etc.)

COPYRIGHT

Copyright 2000-2005 Jason Rennie. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl(1)

http://wordnet.princeton.edu/

http://people.csail.mit.edu/~jrennie/WordNet/

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 QUERYDATA (3) 2016-04-03

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.