GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
AI::Categorizer::Collection(3) User Contributed Perl Documentation AI::Categorizer::Collection(3)

AI::Categorizer::Collection - Access stored documents

  my $c = new AI::Categorizer::Collection::Files
    (path => '/tmp/docs/training',
     category_file => '/tmp/docs/cats.txt');
  print "Total number of docs: ", $c->count_documents, "\n";
  while (my $document = $c->next) {
    ...
  }
  $c->rewind; # For further operations

This abstract class implements an iterator for accessing documents in their natively stored format. You cannot directly create an instance of the Collection class, because it is abstract - see the documentation for the "Files", "SingleFile", or "InMemory" subclasses for a concrete interface.

new()
Creates a new Collection object and returns it. Accepts the following parameters:
category_hash
Indicates a reference to a hash which maps document names to category names. The keys of the hash are the document names, each value should be a reference to an array containing the names of the categories to which each document belongs.
category_file
Indicates a file which should be read in order to create the "category_hash". Each line of the file should list a document's name, followed by a list of category names, all separated by whitespace.
stopword_file
Specifies a file containing a list of "stopwords", which are words that should automatically be disregarded when scanning/reading documents. The file should contain one word per line. The file will be parsed and then fed as the "stopwords" parameter to the Document "new()" method.
verbose
If true, some status/debugging information will be printed to "STDOUT" during operation.
document_class
The class indicating what type of Document object should be created. This generally specifies the format that the documents are stored in. The default is "AI::Categorizer::Document::Text".
next()
Returns the next Document object in the Collection.
rewind()
Resets the iterator for further calls to "next()".
count_documents()
Returns the total number of documents in the Collection. Note that this usually resets the iterator. This is because it may not be possible to resume iterating where we left off.

Ken Williams, ken@mathforum.org

Copyright 2002-2003 Ken Williams. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AI::Categorizer(3), Storable(3)
2022-04-08 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.