Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Contact Us
Online Help
Domain Status
Man Pages

Virtual Servers

Topology Map

Server Agreement
Year 2038

USA Flag



Man Pages

Manual Reference Pages  -  URI::SEQUIN (3)

.ds Aq ’


URI::Sequin - Extract information from the URLs of Search-Engines



        use URI::Sequin qw/se_extract key_extract log_extract %log_types/;

        $url = &log_extract($line_from_log_file, NCSA);

        $log_types{MyLogType} = ^(.+?) -> .+$;
        $url = &log_extract($line_from_log_file, MyLogType);

        $keyword_string = &key_extract($url);

        ($search_engine_name, $search_engine_url) = @{&se_extract($url)};


This module provides three tools to aid people trying to analyse Search-Engine URLs. ItXs meant mainly for those who want to analyse referrer logs and pick out key information about site visitors, such as which Search-Engine and keywords they used to find the site.

The functions and globals provided (and exported by default) from this module are:
log_extract($log_line, ’Type’) This will pick out the referring URL from a line of a logfile. The ’type’ can be one of the built in types or can be a user-created one. For more information, see %log_types below. This subroutine accepts a scalar, and returns a scalar.
key_extract($url) This will try and determine the keywords used in $url. It accepts a scalar and returns a scalar. Should nothing be found, it returns an undefined value.
se_extract($url) This will try and determine the name of the Search-Engine used and its URL. It accepts a scalar, and returns an array containing firstly the Search- EngineXs name and secondly the Search-EngineXs URL. Should the URL appear not to be from a Search Query, it returns a reference to an empty array.
%log_types There are five built-in logfile types already in this hash. They are:
o IIS1 - Microsoft IIS 3.0 and 2.0
o IIS2 - Microsoft IIS4.0 (W3SVC format)
o NCSA - For APACHE, NETSCAPE and any other NCSA format logs
o ORW - O’Reilly WebSite format
o General - A generalised one that will work with most logfiles

ItXs easy to add another one. Simply add a key to the hash, with a value that is a regex. Parenthesise the part that is the referring URL, as the script uses $1 to obtain the URL. (see the example in the Synopsis section).

I have only one request for people who use this module. *Please* tell me where and how you’ve used it, and if you have any thoughts or suggestions on it, tell me!


Doesn’t like the Amnesi Search Engine. But then, neither do I. Also, the ’General’ log type needs to be used with discretion ... be sure that none of the URLs contain literal " if you use it.


Peter Sergeant <>


Copyright 2001 Peter Sergeant.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Hey! <B>The above document had some coding errors, which are explained below:B>
Around line 419: Non-ASCII character seen before =encoding in ’ItXs’. Assuming ISO8859-1
Search for    or go to Top of page |  Section 3 |  Main Index

perl v5.20.3 SEQUIN (3) 2003-09-01

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.