GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  LWP::PARALLEL (3)

.ds Aq ’

NAME

LWP::Parallel - Extension for LWP to allow parallel HTTP and FTP access

CONTENTS

SYNOPSIS



  use LWP::Parallel;
  print "This is LWP::Parallel_$LWP::Parallel::VERSION\n";



DESCRIPTION

    Introduction

ParallelUserAgent is an extension to the existing libwww module. It allows you to take a list of URLs (it currently supports HTTP, FTP, and FILE URLs. HTTPS might work, too) and connect to all of them _in parallel_, then wait for the results to come in.

See the Parallel::UserAgent for how to create a LWP UserAgent that will access multiple Web resources in parallel. The Parallel::RobotUA module will additionally offer proper handling of robot.txt file, the de-facto exclusion protocol for Web Robots.

    Examples

The following examples might help to get you started:



  require LWP::Parallel::UserAgent;
  use HTTP::Request;

  # display tons of debugging messages. See perldoc LWP::Debug
  #use LWP::Debug qw(+);

  # shortcut for demo URLs
  my $url = "http://localhost/";

  my $reqs = [ 
     HTTP::Request->new(GET, $url),
     HTTP::Request->new(GET, $url."homes/marclang/"),
  ];

  my $pua = LWP::Parallel::UserAgent->new();
  $pua->in_order  (1);  # handle requests in order of registration
  $pua->duplicates(0);  # ignore duplicates
  $pua->timeout   (2);  # in seconds
  $pua->redirect  (1);  # follow redirects

  foreach my $req (@$reqs) {
    print "Registering ".$req->url."\n";
    if ( my $res = $pua->register ($req) ) {
        print STDERR $res->error_as_HTML;
    } 
  }
  my $entries = $pua->wait();

  foreach (keys %$entries) {
    my $res = $entries->{$_}->response;

    print "Answer for ",$res->request->url, " was \t", $res->code,": ",
          $res->message,"\n";
  }



Parallel::UserAgent (as well as the Parallel::RobotUA) offer three default methods that will be called at certain points during the connection: on_connect, on_return and on_failure.



  #
  # provide subclassed UserAgent to override on_connect, on_failure and
  # on_return methods
  #
  package myUA;

  use Exporter();
  use LWP::Parallel::UserAgent qw(:CALLBACK);
  @ISA = qw(LWP::Parallel::UserAgent Exporter);
  @EXPORT = @LWP::Parallel::UserAgent::EXPORT_OK;

  # redefine methods: on_connect gets called whenever were about to
  # make a a connection
  sub on_connect {
    my ($self, $request, $response, $entry) = @_;
    print "Connecting to ",$request->url,"\n";
  }

  # on_failure gets called whenever a connection fails right away
  # (either we timed out, or failed to connect to this address before,
  # or its a duplicate). Please note that non-connection based
  # errors, for example requests for non-existant pages, will NOT call
  # on_failure since the response from the server will be a well
  # formed HTTP response!
  sub on_failure {
    my ($self, $request, $response, $entry) = @_;
    print "Failed to connect to ",$request->url,"\n\t",
          $response->code, ", ", $response->message,"\n"
            if $response;
  }

  # on_return gets called whenever a connection (or its callback)
  # returns EOF (or any other terminating status code available for
  # callback functions). Please note that on_return gets called for
  # any successfully terminated HTTP connection! This does not imply
  # that the response sent from the server is a success!
  sub on_return {
    my ($self, $request, $response, $entry) = @_;
    if ($response->is_success) {
       print "\n\nWoa! Request to ",$request->url," returned code ", $response->code,
          ": ", $response->message, "\n";
       print $response->content;
    } else {
       print "\n\nBummer! Request to ",$request->url," returned code ", $response->code,
          ": ", $response->message, "\n";
       # print $response->error_as_HTML;
    }
    return;
  }

  package main;
  use HTTP::Request;

  # shortcut for demo URLs
  my $url = "http://localhost/";

  my $reqs = [ 
     HTTP::Request->new(GET, $url),
     HTTP::Request->new(GET, $url."homes/marclang/"),
  ];

  my $pua = myUA->new();

  foreach my $req (@$reqs) {
    print "Registering ".$req->url."\n";
    $pua->register ($req);
  }
  my $entries = $pua->wait(); # responses will be caught by on_return, etc



The final example will demonstrate a simple Web Robot that keeps a cache of the robots.txt permission files it has encountered so far. This example also uses callbacks to handle the response as it comes in.



  require LWP::Parallel::UserAgent;
  use HTTP::Request;

  # persistent robot rules support. See perldoc WWW::RobotRules::AnyDBM_File
  require WWW::RobotRules::AnyDBM_File;

  # shortcut for demo URLs
  my $url = "http://www.cs.washington.edu/";

  my $reqs = [ 
     HTTP::Request->new(GET, $url),
            # these are all redirects. depending on how you set
            # redirect_ok they either just return the status code for
            # redirect (like 302 moved), or continue to follow redirection.
     HTTP::Request->new(GET, $url."research/ahoy/"),
     HTTP::Request->new(GET, $url."research/ahoy/doc/paper.html"),
     HTTP::Request->new(GET, "http://metacrawler.cs.washington.edu:6060/"),
            # these are all non-existant server. the first one should take
            # some time, but the following ones should be rejected right
            # away
     HTTP::Request->new(GET, "http://www.foobar.foo/research/ahoy/"),
     HTTP::Request->new(GET, "http://www.foobar.foo/foobar/foo/"),
     HTTP::Request->new(GET, "http://www.foobar.foo/baz/buzz.html"),
            # although server exists, file doesnt
     HTTP::Request->new(GET, $url."foobar/bar/baz.html"),
            ];

  my ($req,$res);

  # establish persistant robot rules cache. See WWW::RobotRules for
  # non-permanent version. you should probably adjust the agentname
  # and cache filename.
  my $rules = new WWW::RobotRules::AnyDBM_File ParallelUA, cache;

  # create new UserAgent (actually, a Robot)
  my $pua = new LWP::Parallel::RobotUA ("ParallelUA",
                                        yourname@your.site.com, $rules);

  $pua->timeout   (2);  # in seconds
  $pua->delay    ( 5);  # in seconds
  $pua->max_req  ( 2);  # max parallel requests per server
  $pua->max_hosts(10);  # max parallel servers accessed

  # for our own print statements that follow below:
  local($\) = ""; # ensure standard $OUTPUT_RECORD_SEPARATOR

  # register requests
  foreach $req (@$reqs) {
    print "Registering ".$req->url."\n";
    $pua->register ($req , \&handle_answer);
    #  Each request, even if it failed to # register properly, will
    #  show up in the final list of # requests returned by $pua->wait,
    #  so you can examine it # later.
  }

  # $pua->wait returns a pointer to an associative array, containing
  # an $entry for each request made, sorted by its url. (as returned
  # by $request->url->as_string)
  my $entries = $pua->wait(); # give another timeout here, 25 seconds

  # lets see what we got back (see also callback function!!)
  foreach (keys %$entries) {
    $res = $entries->{$_}->response;

    # examine response to find cascaded requests (redirects, etc) and
    # set current response to point to the very first response of this
    # sequence. (not very exciting if you set $pua->redirect(0))
    my $r = $res; my @redirects;
    while ($r) {
        $res = $r;
        $r = $r->previous;
        push (@redirects, $res) if $r;
    }
   
    # summarize response. see "perldoc HTTP::Response"
    print "Answer for ",$res->request->url, " was \t", $res->code,": ",
          $res->message,"\n";
    # print redirection history, in case we got redirected
    foreach (@redirects) {
        print "\t",$_->request->url, "\t", $_->code,": ", $_->message,"\n";
    }
  }

  # our callback function gets called whenever some data comes in
  # (same parameter format as standard LWP::UserAgent callbacks!)
  sub handle_answer {
    my ($content, $response, $protocol, $entry) = @_;

    print "Handling answer from ",$response->request->url,": ",
          length($content), " bytes, Code ",
          $response->code, ", ", $response->message,"\n";

    if (length ($content) ) {
        # just store content if it comes in
        $response->add_content($content);
    } else {
        # Having no content doesnt mean the connection is closed!
        # Sometimes the server might return zero bytes, so unless
        # you already got the information you need, you should continue
        # processing here (see below)
       
        # Otherwise you can return a special exit code that will
        # determins how ParallelUA will continue with this connection.

        # Note: We have to import those constants via "qw(:CALLBACK)"!

        # return C_ENDCON;  # will end only this connection
                            # (silly, we already have EOF)
        # return C_LASTCON; # wait for remaining open connections,
                            # but dont issue any new ones!!
        # return C_ENDALL;  # will immediately end all connections
                            # and return from $pua->wait
    }

    # ATTENTION!! If you want to keep reading from your connection,
    # you should have a final return undef statement here. Even if
    # you think that all data has arrived, it does not hurt to return
    # undef here. The Parallel UserAgent will figure out by itself
    # when to close the connection!

    return undef;           # just keep on connecting/reading/waiting
                            # until the server closes the connection.
  }



AUTHOR

Marc Langheinrich, marclang@cpan.org

SEE ALSO

See LWP for an overview on Web communication using Perl. See LWP::Parallel::UserAgent and LWP::Parallel::RobotUA for details on how to use this library.

COPYRIGHT

Copyright 1997-2004 Marc Langheinrich <marclang@cpan.org>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 LWP::PARALLEL (3) 2004-02-10

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.