GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  XML::RSSLITE (3)

.ds Aq ’

NAME

XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser

CONTENTS

SYNOPSIS



  use XML::RSSLite;

  . . .

  parseRSS(\%result, \$content);

  print "=== Channel ===\n",
        "Title: $result{title}\n",
        "Desc:  $result{description}\n",
        "Link:  $result{link}\n\n";

  foreach $item (@{$result{item}}) {
  print "  --- Item ---\n",
        "  Title: $item->{title}\n",
        "  Desc:  $item->{description}\n",
        "  Link:  $item->{link}\n\n";
  }



DESCRIPTION

This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and aliases certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items.

This module extracts more usable links by parsing scriptingNews and weblog formats in addition to RDF & RSS. It also sanitizes the output for best results. The munging includes:
Remove html tags to leave plain text
Remove characters other than 0-9~!@#$%^&*()-+=a-zA-Z[];’,.:
Remove leading whitespace from URIs
Use <url> tags when <link> is empty
Use misplaced urls in <title> when <link> is empty
Exract links from <a href=...> if required
Limit links to ftp and http(s)
Join relative item urls (beginning with / or #) to the site base

    EXPORT

parseRSS($outHashRef, $inScalarRef) $inScalarRef is a reference to a scalar containing the document to be parsed, the contents will effectively be destroyed. $outHashRef is a reference to the hash within which to store the parsed content.

    EXPORTABLE

parseXML(\%parsedTree, \$parseThis, ’topTag’, $comments);
parsedTree - required Reference to hash to store the parsed document within.
parseThis - required Reference to scalar containing the document to parse.
topTag - optional Tag to consider the root node, leaving this undefined is not recommended.
comments - optional
false will remove contents from parseThis
true will not remove comments from parseThis
array reference is true, comments are stored here

    CAVEATS

This is not a conforming parser. It does not handle the following
o



  <foo bar=">">



o



  <foo><bar> <bar></bar> <bar></bar> </bar></foo>



o



  <![CDATA[ ]]>



o



  PI



It’s non-validating, without a DTD the following cannot be properly addressed
entities
namespaces This may or may not be arriving in some future release.

SEE ALSO

perl(1), XML::RSS, XML::SAX::PurePerl, XML::Parser::Lite, <XML::Parser>

AUTHOR

Jerrad Pierce <jpierce@cpan.org>.

Scott Thomason <scott@thomasons.org>

LICENSE

Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

POD ERRORS

Hey! <B>The above document had some coding errors, which are explained below:B>
Around line 410: You forgot a ’=back’ before ’=head2’
Around line 446: =back without =over
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 RSSLITE (3) 2009-09-06

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.