GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  XML::DRIVER::HTML (3)

.ds Aq ’

NAME

XML::Driver::HTML - SAX Driver for non wellformed HTML.

CONTENTS

SYNOPSIS



  use XML::Driver::HTML;

  $driver = new XML::Driver::HTML(
        Handler => $some_sax_filter_or_handler,
        Source => $some_PerlSAX_like_hash
        );

  $driver->parse();



or



  use XML::Driver::HTML;

  $driver = new XML::Driver::HTML();

  $driver->parse(
        Handler => $some_sax_filter_or_handler,
        Source => $some_PerlSAX_like_hash
        );

  $driver->parse(
        Handler => $some_other_sax_filter_or_handler,
        Source => $some_other_source
        );



DESCRIPTION

XML::Driver::HTML is a SAX Driver for HTML. There is no need for the HTML input to be weel formed, as XML::Driver::HTML is generating its SAX events by walking a HTML::TreeBuilder object. The simplest kind of use, is a filter from HTML to XHTML using XML::Handler::YAWriter as a SAX Handler.



    my $ya = new XML::Handler::YAWriter(
        Output => new IO::File ( ">-" ),
        Pretty => {
            NoWhiteSpace=>1,
            NoComments=>1,
            AddHiddenNewline=>1,
            AddHiddenAttrTab=>1,
            }
        );

    my $html = new XML::Driver::HTML(
        Handler => $ya,
        Source => { ByteStream => new IO::File ( "<-" ) }
        );
   
    $html->parse();



    METHODS

new Creates a new XML::Driver::HTML object. Default options for parsing, described below, are passed as key-value pairs or as a single hash. Options may be changed directly in the object.
parse Parses a document. Options, described below, are passed as key-value pairs or as a single hash. Options passed to <B>B>parse()<B>B> override the default options in the parser object for the duration of the parse.

    OPTIONS

The following options are supported by XML::Driver::HTML :
Handler Default SAX Handler to receive events
Source Hash containing the input source for parsing. The ‘Source’ hash may contain the following parameters:
ByteStream The raw byte stream (file handle) containing the document.
String A string containing the document.
SystemId The system identifier (URL) of the document.
Encoding A string describing the character encoding.

If more than one of ‘ByteStream’, ‘String’, or ‘SystemId’, then preference is given first to ‘ByteStream’, then ‘String’, then ‘SystemId’.

NOTES

XML::Driver::HTML requires Perl 5.6 to convert from ISO-8859-1 to UTF-8.

BUGS

not yet implemented:



    Interpretation of SystemId as being an URI
    XHTML document type



other bugs:



    HTML::Parser and HTML::TreeBuilder bugs concerning DOCTYPE and CSS.
    Perl handling of UFT8 is compatible between different versions. So
    you need exactly Perl 5.6.0, not lower not higher.



AUTHOR



  Michael Koehne, Kraehe@Copyleft.De
  (c) 2001 GNU General Public License



SEE ALSO

XML::Parser::PerlSAX and HTML::TreeBuilder
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 HTML (3) 2002-08-08

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.