GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
WWW::RobotRules::Parser(3) User Contributed Perl Documentation WWW::RobotRules::Parser(3)

WWW::RobotRules::Parser - Just Parse robots.txt

  use WWW::RobotRules::Parser;
  my $p = WWW::RobotRules::Parser->new;
  $p->parse($robots_txt_uri, $text);

  $p->parse_uri($robots_txt_uri);

WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing. It just parses the structure and returns a hash containing the whole set of rules. You can then use this to do whatever you like with it.

I mainly wrote this to store away the parsed data structure else where for later use, without having to specify an user agent.

Creates a new instance of WWW::RobotRules::Parser

Given the URI of the robots.txt file and its contents, parses the content and returns a data structure that looks like the following:

  {
     '*' => [ '/private', '/also_private' ],
     'Another UserAgent' => [ '/dont_look' ]
  }

Where the key is the user agent name, and the value is an arrayref of all paths that are prohibited by that user agent

Given the URI of the robots.txt file, retrieves and parses the file.

WWW::RobotRules

Copyright (c) 2006-2007 Daisuke Maki <daisuke@endeworks.jp>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

2007-12-01 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.