GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Regexp::Log::Common(3) User Contributed Perl Documentation Regexp::Log::Common(3)

Regexp::Log::Common - A regular expression parser for the Common Log Format

    my $foo = Regexp::Log::Common->new(
        format  => '%date %request',
        capture => [qw( ts request )],
    );

    # the format() and capture() methods can be used to set or get
    $foo->format('%date %request %status %bytes');
    $foo->capture(qw( ts req ));

    # this is necessary to know in which order
    # we will receive the captured fields from the regexp
    my @fields = $foo->capture;

    # the all-powerful capturing regexp :-)
    my $re = $foo->regexp;

    while (<>) {
        my %data;
        @data{@fields} = /$re/;    # no need for /o, it's a compiled regexp

        # now munge the fields
        ...
    }

Regexp::Log::Common uses Regexp::Log as a base class, to generate regular expressions for performing the usual data munging tasks on log files that cannot be simply split().

This specific module enables the computation of regular expressions for parsing the log files created using the Common Log Format. An example of this format are the logs generated by the httpd web server using the keyword 'common'.

The module also allows for the use of the Extended Common Log Format.

For more information on how to use this module, please see Regexp::Log.

Enables simple parsing of log files created using the Common Log Format or the Extended Common Log Format, such as the logs generated by the httpd/Apache web server using the keyword 'common'.

The Common Log Format is made up of several fields, each delimited by a single space.
  • Apache LogFormat:

        LogFormat "%h %l %u %t \"%r\" %>s %b common
        

    Note that the name at end, in this case 'common' is purely to identify the format locally, so that you can create a different LogFormat for different purposes. You then define in your virtual host a log line such as:

        CustomLog /var/www/logs/mysite-access.log common
        
  • Fields:

      remotehost rfc931 authuser [date] "request" status bytes
        
  • Example:

      127.0.0.1 - - [19/Jan/2005:21:47:11 +0000] "GET /brum.css HTTP/1.1" 304 0
    
      For the above example:
      remotehost: 127.0.0.1
      rfc931: -
      authuser: -
      [date]: [19/Jan/2005:21:47:11 +0000]
      "request": "GET /brum.css HTTP/1.1"
      status: 304
      bytes: 0
        
  • Available Capture Fields

      * host
      * rfc
      * authuser
      * date
      ** ts (date without the [])
      * request
      ** req (request without the quotes)
      * status
      * bytes
        
  • Method Call

        my $foo = Regexp::Log::Common->new( format  => ':common' );
        

The Extended Common Log Format is made up of several fields, each delimited by a single space.
  • Apache LogFormat:

        LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" extended
        
  • Fields:

      remotehost rfc931 authuser [date] "request" status bytes "referer" "user_agent"
        
  • Example:

      127.0.0.1 - - [19/Jan/2005:21:47:11 +0000] "GET /brum.css HTTP/1.1" 304 0 "http://birmingham.pm.org/" "Mozilla/2.0GoldB1 (Win95; I)"
    
      For the above example:
      remotehost: 127.0.0.1
      rfc931: -
      authuser: -
      [date]: [19/Jan/2005:21:47:11 +0000]
      "request": "GET /brum.css HTTP/1.1"
      status: 304
      bytes: 0
      "referer": "http://birmingham.pm.org/"
      "user_agent": "Mozilla/2.0GoldB1 (Win95; I)"
        
  • Available Capture Fields

      * host
      * rfc
      * authuser
      * date
      ** ts (date without the [])
      * request
      ** req (request without the quotes)
      * status
      * bytes
      * referer
      ** ref (referer without the quotes)
      * useragent
      ** ua (useragent without the quotes)
        
  • Method Call

        my $foo = Regexp::Log::Common->new( format  => ':extended' );
        

There are any number of LogFormat lines you can define, and although this module doesn't define all the formats, you can specify your own customer format to extract fields as necessary.
  • Apache LogFormat:

    Perhaps, you need to extend the 'extended' format:

        LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %D %v" custom
        
  • Example:

    This can produce a log line such as:

        103.245.44.14 - - [23/May/2014:21:38:01 +0100] "GET /volume/201109 HTTP/1.0" 200 37748 "-" "binlar_2.6.3 test@mgmt.mic" 2259292 blog.cpantesters.org
        
  • Available Capture Fields

    Depending on how you define the capture, this can be broken down into fields in a few different ways.

      host rfc authuser [date] "request" status bytes "referer" "useragent" time servername
        

    or a shorthand vareity

      h l u t "r" s b "referer" "useragent" D v
        

    Note that referer and useragent don't have single letter counterparts, as both the %{xxx}i and %{xxx}e format fields need to be defined explicitly. Currently only referer and useragent are defined from the %{xxx}i field set, and none are defined for the %{xxx}e field set. This may be expanded in the future.

  • Method Call

    To define these you would call the constructor, or the individual methods as:

        my $foo = Regexp::Log::Common->new(
            format  => '%host %rfc %authuser %date %request %status %bytes' .
                       '%referer %useragent %time %servername',
            capture => [qw( host rfc authuser ts request status bytes
                            referer useragent time servername)],
        );
        

    or

        my $foo = Regexp::Log::Common->new(
            format  => '%h %l %u %t %r %s %b %referer %useragent %D %v',
            capture => [qw( h l u t r s b refereer useragent D v)],
        );
        

There are several format fields available, although this module does not support them all. The ones it does currently support are as follows:

    shorthand       => longhand (if applicable)

    '%a'            => '%remoteip'
    '%A'            => '%localip'
    '%B'            => '%bytes'
    '%b'            => '%bytes'
    '%D'            => '%time'
    '%F'            => '%filename'
    '%h'            => '%host' or '%remotehost'
    '%H'            => '%protcol'
    '%k'            => '%keepalive'
    '%l'            => '%logname' or '%rfc'
    '%m'            => '%method'
    '%p'            => '%port'
    '%P'            => '%pid'
    '%q'            => '%queryatring'
    '%r'            => '%request'
    '%s'            => '%status'
    '%t'            => '%date', also '%ts' (excluding surrounding '[]')
    '%T'            => '%seconds'
    '%u'            => '%authuser'
    '%U'            => '%request' or '%req' (excluding surrounding '"')
    '%v'            => '%servername'
    '%V'            => '%servername'
    '%X'            => '%connection'
    '%I'
    '%O'
    
    %{Foobar}i fields
    
    '%referer'      => or '%ref' (excluding surrounding '"')
    '%useragent'    => or '%ua' (excluding surrounding '"')

For a more detail explanation, please see the Apache Log Formats documentation at <http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats>.

There are no known bugs at the time of this release. However, if you spot a bug or are experiencing difficulties that are not explained within the POD documentation, please submit a bug to the RT system (see link below). However, it would help greatly if you are able to pinpoint problems or even supply a patch.

Fixes are dependent upon their severity and my availability. Should a fix not be forthcoming, please feel free to (politely) remind me by sending an email to barbie@cpan.org .

RT: <http://rt.cpan.org/Public/Dist/Display.html?Name=Regexp-Log-Common>

Regexp::Log

BooK for initially putting the idea into my head, and the thread on a perl message board, that wanted the help that was solved with this exact module.

  Barbie <barbie@cpan.org>
  for Miss Barbell Productions, L<http://www.missbarbell.co.uk>

  Copyright (C) 2005-2014 Barbie for Miss Barbell Productions.

  This distribution is free software; you can redistribute it and/or
  modify it under the Artistic License v2.
2014-10-05 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.