![]() |
![]()
| ![]() |
![]()
NAMEHTML::RobotsMETA - Parse HTML For Robots Exclusion META Markup SYNOPSISuse HTML::RobotsMETA; my $p = HTML::RobotsMETA->new; my $r = $p->parse_rules($html); if ($r->can_follow) { # follow links here! } else { # can't follow... } DESCRIPTIONHTML::RobotsMETA is a simple HTML::Parser subclass that extracts robots exclusion information from meta tags. There's not much more to it ;) DIRECTIVESCurrently HTML::RobotsMETA understands the following directives: METHODSnewCreates a new HTML::RobotsMETA parser. Takes no arguments parse_rulesParses an HTML string for META tags, and returns an instance of HTML::RobotsMETA::Rules object, which you can use in conditionals later parserReturns the HTML::Parser instance to use. get_parser_callbacksReturns callback specs to be used in HTML::Parser constructor. TODOTags that specify the crawler name (e.g. <META NAME="Googlebot">) are not handled yet. There also might be more obscure directives that I'm not aware of. AUTHORCopyright (c) 2007 Daisuke Maki <daisuke@endeworks.jp> SEE ALSOHTML::RobotsMETA::Rules HTML::Parser LICENSEThis program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html
|