![]() |
![]()
| ![]() |
![]()
NAMEHTML::Parser::Simple::Reporter - A sub-class of HTML::Parser::Simple Synopsis#!/usr/bin/env perl use strict; use warnings; use HTML::Parser::Simple::Reporter; # ------------------------- # Method 1: my($p) = HTML::Parser::Simple::Reporter -> new(input_file => 'data/s.1.html'); my($s) = $p -> traverse_file; print "$_\n" for @$s; # Method 2: my($p) = HTML::Parser::Simple::Reporter -> new; my($s) = $p -> traverse_file(input_file => 'data/s.1.html'); print "$_\n" for @$s; See scripts/traverse.file.pl. Description"HTML::Parser::Simple::Reporter" is a pure Perl module. It is a sub-class of HTML::Parser::Simple. Specifically, this module overrides the method "traverse($node)" in HTML::Parse::Simple, to demonstrate a different way of formatting the output. It parses HTML V 4 files, and generates a tree of nodes, with 1 node per HTML tag. The data associated with each node is documented in the "FAQ" in HTML::Parse::Simple. See also HTML::Parser::Simple and HTML::Parser::Simple::Attributes. DistributionsThis module is available as a Unix-style distro (*.tgz). See http://savage.net.au/Perl-modules.html for details. See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing. Constructor and initializationnew(...) returns an object of type "HTML::Parser::Simple::Reporter". This is the class contructor. Usage: "HTML::Parser::Simple::Reporter -> new()". This method takes a hashref of options. Call new() as "new({option_1 => value_1, option_2 => value_2, ...})". Available options (each one of which is also a method): But since this class is a sub-class of HTML::Parser::Simple, it share all the options to new() documented in that class: "Constructor and initialization" in HTML::Parser::Simple. MethodsThis module is a sub-class of HTML::Parser::Simple, and inherits all its methods. Further, it overrides the "traverse($node)" in HTML::Parser::Simple method. traverse($node, $output, $depth)Returns $output as an arrayref of strings. Traverses the tree built by calling "parse($html)" in HTML::Parser::Simple. Parameters:
Lastly note that this method ignores the root of the tree, and hence ignores the DOCTYPE which is stored as an attribute of the root. traverse_file($input_file_name)Returns an arrayref of formatted text generated from the nodes in the tree built by calling "parse($html)" in HTML::Parse::Simple. Traverses the given file, or the file named in "new(input_file => $name)", or the file named in input_file($name). Basically it does this (recalling that this class sub-classes HTML::Parser::Simple): # Read file and store contents in $html. $self -> parse($html); my($output) = []; $self -> traverse($self -> root, $output, 0); return $output; However, since this class has overridden the "traverse($node)" in HTML::Parse::Simple method, the output is not written anywhere, but rather is stored in an arrayref, and returned as the result of this method. Note: The parameter passed in to traverse_file($input_file_name), takes precedence over the input_file parameter passed in to new(), and over the internal value set with input_file($in_file_name). Lastly, the parameter passed in to traverse_file($input_file_name) is used to update the internal value set with the input_file parameter passed in to new(), or set with a call to input_file($in_file_name). See the "Synopsis" for sample code. See also scripts/traverse.file.pl. FAQSee "FAQ" in HTML::Parse::Simple. Author"HTML::Parser::Simple" was written by Ron Savage <ron@savage.net.au> in 2009. Home page: <http://savage.net.au/index.html>. CopyrightAustralian copyright (c) 2009 Ron Savage. All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
|