GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  WWW::GOOGLE::NEWS (3)

.ds Aq ’

NAME

WWW::Google::News - Access to Google’s News Service (Not Usenet)

CONTENTS

SYNOPSIS



        # OO search interface

        use WWW::Google::News;

        my $news = WWW::Google::News->new();
        $news->topic("Frank Zappa");
        my $results = $news->search();

        # original news functions

        use WWW:Google::News qw(get_news);
        my $results = get_news();
 
        my $results = get_news_for_topic(impending asteriod impact);



DESCRIPTION

This module provides a couple of methods to scrape results from Google News, returning a data structure similar to the following (which happens to be suitable to feeding into XML::RSS).



  {
    Top Stories =>
              [
               {
                 url => http://www.washingtonpost.com/wp-dyn/articles/A9707-2002Nov19.html,
                 headline => Amendment to Homeland Security Bill Defeated
               },
               {
                 url => http://www.ananova.com/news/story/sm_712444.html,
                 headline => US and UN at odds as Iraq promises to meet deadline
               }
              ],
    Entertainment =>
             [
              {
                url => http://abcnews.go.com/sections/entertainment/DailyNews/Coburn021119.html,
                headline => James Coburn Dies
              },
              {
                url => http://www.cbsnews.com/stories/2002/11/15/entertainment/main529532.shtml,
                headline => 007s On Parade At \Die\ Premiere
              }
             ]
   }



METHODS

search() Perform search on Google News. Options for search term (topic), sort, date range, and maximum results. Scraper will maximize results per page, and will page through results until it gets enough stories. Internally uses get_news_for_topic().



        use WWW::Google::News;

        my $news = WWW::Google::News->new();

        # these methods will get or set their values
        $news->topic("Frank Zappa"); # search term
        $news->sort("date"); # relevance or date, relevance is default
        $news->start_date("2005-04-20"); # must provide start and end date,
        $news->end_date("2005-04-20");   # changes default sort to date
        $news->max(2); # max stories, default 20.  -1 => all stories.

        my $results = $news->search();
        foreach (@{$results}) {
          print "Source: " . $_->{source} . "\n";
          print "Date: " . $_->{date} . "\n";
          print "URL: " . $_->{url} . "\n";
          print "Summary: " . $_->{summary} . "\n";
          print "Headline: " . $_->{headline} . "\n";
          print "\n";
        }



get_news() Scrapes <http://news.google.com/news/gnmainlite.html> and returns a reference to a hash keyed on News Section, which points to an array of hashes keyed on URL , Headline, etc.



  use WWW::Google::News (get_news);

  my $news = get_news();
  foreach my $topic (keys %{$news}) {
    for (@{$news->{$topic}}) {
      print "Topic: $topic\n";
      print "Headline: " . $_->{headline} . "\n";
      print "URL: " . $_->{url} . "\n";
      print "Source: " . $_->{source} . "\n";
      print "When: " . $_->{date} . "\n";
      print "Summary: " . $_->{summary} . "\n";
      print "\n";
    }
  }



get_news_for_topic( $topic ) Queries <http://news.google.com/news> for results on a particular topic, and returns a pointer to an array of hashes containing result data, similar to get_news()

An RSS feed can be constructed from this very easily:



        use WWW::Google::News;
        use XML::RSS;

        $news = get_news_for_topic( $topic );
        # also supports the same options for search()
        # $news = get_news_for_topic( $topic, $start_date, $end_date, $sort, $max );
        my $rss = XML::RSS->new;
        $rss->channel(title => "Google News -- $topic");
        for (@{$news}) {
                $rss->add_item(
                        title => $_->{headline},
                        link  => $_->{url},
                        description  => $_->{description}, # source + summary
                );
        }
        print $rss->as_string;



get_news_greg_style() It also provides a method called get_news_greg_style() which returns the same data, only using a hash keyed on story number instead of the array described in the above.

TODO

Return info on images contained in certain articles.

Parse out sub articles from featured stories.

Consolidate scraping functions.

AUTHORS

Greg McCarroll <greg@mccarroll.demon.co.uk>, Bowen Dwelle <bowen@dwelle.org>, Scott Holdren <scott@sitening.com>

KUDOS

Darren Chamberlain for rss_alternate.pl

Leon Brocard for pulling me up on my obsessive compulsion to use hashes.

SEE ALSO

<http://news.google.com/> <http://news.google.com/news/gnmainlite.html>

POD ERRORS

Hey! <B>The above document had some coding errors, which are explained below:B>
Around line 413: You forgot a ’=back’ before ’=head1’
Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 NEWS (3) 2006-07-27

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.