GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  WWW (3)

NAME

WWW - World Wide Web Package

CONTENTS

Synopsis
Description
Subroutines
See Also
Author

SYNOPSIS

extract_description( FILE )
extract_meta( FILE, NAME )
hyperlink( LIST )

DESCRIPTION

This package provides a utility functions for the World Wide Web to extract descriptions of or meta information from files, and hyperlink text.

SUBROUTINES

The following Perl subroutines are defined and available:
extract_description( FILE ) Extracts a description from an HTML or plain text file given by the FILE name; FILE should be an absolute path. The first $description::chars (default: 2048) characters are read. If the file ends in one of the extensions htm, html, or shtml, it is presumed to be an HTML file; if the file ends in txt, it is presumed to be a plain text file. Other extensions are not recognized and no description is returned for them.
For HTML files, first, if a <META NAME="description" CONTENT="..."> or a <META NAME="DC.description" CONTENT="..."> (Dublin Core) element is found, then the words specified as the value of the CONTENT attribute is returned as the description.
Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and <TITLE> tags, and all other HTML tags are stripped. If <AREA ... ALT="..."> or <IMG ... ALT="..."> elements are found, then the words specified as the value of the ALT attributes are extracted.
Finally, for either HTML or plain text files, at most $description::words (default: 50) are returned.
extract_meta( FILE, NAME ) Extracts the value of the CONTENT attribute from a META element having the given NAME attribute from an HTML file given by the FILE name; FILE should be an absolute path. The file must end in one of the extensions htm, html, or shtml to be considered an HTML file. The first $description::chars (default: 2048) characters are read. The characters are cached between consecutive calls using the same filename.
hyperlink( LIST ) Adds hyperlinks to strings: that is strings that contain substrings that are valid URLs (according to RFC 1630) have the appropriate HTML tags ‘‘wrapped’’ around them so that they will be selectable when displayed in a browser. The ftp, gopher, http, https, mailto, news, telnet, and wais URLs are recognized. Example:


       Read all about it at
     http://www.usatoday.com/

becomes:

     Read all about it at
     <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>

SEE ALSO

perl(1)

Tim Berners-Lee. ‘‘Universal Resource Identifiers in WWW,’’ Request for Comments 1630, Network Working Group of the Internet Engineering Task Force, June 1994.

Tim Berners-Lee, Larry Masinter, and Mark McCahill. ‘‘Uniform Resource Locators (URL),’’ Request for Comments 1738, Network Working Group, 1994.

Dave Raggett, Arnaud Le Hors, and Ian Jacobs. ‘‘Notes on helping search engines index your Web site,’’ HTML 4.0 Specification, Appendix B: Performance, Implementation, and Design Notes, World Wide Web Consortium, April 1998.

--. ‘‘Objects, Images, and Applets: How to specify alternate text,’’ HTML 4.0 Specification, §13.8, World Wide Web Consortium, April 1998.

Dublin Core Directorate. ‘‘The Dublin Core: A Simple Content Description Model for Electronic Resources.’’

Larry Wall, et al. Programming Perl, 3rd ed., O’Reilly & Associates, Inc., Sebastopol, CA, 2000.

AUTHOR

Paul J. Lucas <pauljlucas@mac.com>
Search for    or go to Top of page |  Section 3 |  Main Index


WWW F3WWWF1 (3) February 12, 2000

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.