Manual Reference Pages - WWW::ROBOTRULES::PARSER (3)
WWW::RobotRules::Parser - Just Parse robots.txt
my $p = WWW::RobotRules::Parser->new;
WWW::RobotRules::Parser allows you to simply parse robots.txt files as
described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules
(which is very cool), this module does not take into consideration your
user agent name when parsing. It just parses the structure and returns
a hash containing the whole set of rules. You can then use this to do
whatever you like with it.
I mainly wrote this to store away the parsed data structure else where for
later use, without having to specify an user agent.
Creates a new instance of WWW::RobotRules::Parser
Given the URI of the robots.txt file and its contents, parses the content and returns a data structure that looks like the following:
* => [ /private, /also_private ],
Another UserAgent => [ /dont_look ]
Where the key is the user agent name, and the value is an arrayref of all
paths that are prohibited by that user agent
Given the URI of the robots.txt file, retrieves and parses the file.
Copyright (c) 2006-2007 Daisuke Maki <email@example.com>
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
|perl v5.20.3 ||WWW::ROBOTRULES::PARSER (3) ||2007-12-01 |
Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.