GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Text::Filter(3) User Contributed Perl Documentation Text::Filter(3)

Text::Filter - base class for objects that can read and write text lines

A plethora of tools exist that operate as filters: they get data from a source, operate on this data, and write possibly modified data to a destination. In the Unix world, these tools can be chained using a technique called pipelining, where the output of one filter is connected to the input of another filter. Some non-Unix worlds are reported to have similar provisions.

To create Perl modules for filter functionality seems trivial at first. Just open the input file, read and process it, and write output to a destination file. But for really reusable modules this approach is too simple. A reusable module should not read and write files itself, but rely on the calling program to provide input as well as to handle the output.

"Text::Filter" is a base class for modules that have in common that they process text lines by reading from some source (usually a file), manipulating the contents and writing something back to some destination (usually some other file).

This module can be used on itself, but it is most powerfull when used to derive modules from it. See section EXAMPLES for an extensive example.

The main purpose of the "Text::Filter" class is to abstract out the details out how input and output must be done. Although in most cases input will come from a file, and output will be written to a file, advanced modules require more detailed control over the input and output. For example, the module could be called from another module, in this case the callee could be allowed to process only a part of the input. Or, a program could have prepared data in an array and wants to call the module to process this data as if it were read from a file. Also, the input stream provides a pushback functionality to make peeking at the input easy.

"Text::Filter" can be used on its own as a convenient input/output handler. For example:

    use Text::Filter;
    my $filter = Text::Filter->(input => *STDIN, output => *STDOUT);
    my $line;
    while ( defined($line = $filter->readline) ) {
        $filter->writeline($line);
    }

Or, even simpler:

    use Text::Filter;
    Text::Filter->run(input => *STDIN, output => *STDOUT);

Its real power shows when such a program is turned into a module for optimal reuse.

When creating a module that is to process lines of text, it can be derived from "Text::Filter", for example:

    package MyFilter;
    use base 'Text::Filter';

The constructor method must then call the new() method of the "Text::Filter" class to set up the base class. This is conveniently done by calling SUPER::new(). A hash containing attributes must be passed to this method, some of these attributes will be used by the base class setup.

    sub new {
        my $class = shift;
        # ... fetch non-attribute arguments from @_ ...
        # Create the instance, using the attribute arguments.
        my $self = $class->SUPER::new(@_);

Finally, the newly created object must be re-blessed into the desired class, and returned:

        # Rebless into the desired class.
        bless($self, $class);
    }

When creating new instances for this class, attributes "input" and "output" can be used to specify how input and output is to be handled. Several possible values can be supplied for these attributes.

For "input":

  • A scalar, containing a file name. The named file will be opened, input lines will be read using "<">.
  • A file handle (glob). Lines will be read using "<">.
  • An instance of class "IO::File". Lines will be read using "<">.
  • A reference to an array. Input lines will be shift()ed from the array.
  • A reference to a scalar. Input lines will be taken from the contents of the scalar (which will be modified). When exhausted, it will be set to undefined.
  • A reference to an anonymous subroutine. This routine will be called to get the next line of data.

The default is to read input using de "<>" operator.

For "output":

  • A scalar, containing a file name. The named file will be created automatically, output lines will be written using print().
  • A file handle (glob). Lines will be written using print().
  • An instance of class "IO::File". Lines will be written using print().
  • A reference to an array. Output lines will be push()ed into the array. The array will be initialised to "()" if necessary.
  • A reference to a scalar. Output lines will be appended to the scalar. The scalar will be initialised to "" if necessary.
  • A reference to an anonymous subroutine. This routine will be called to append a line of text to the destination.

The default is to write output to STDOUT.

Additional attributes can be used to specify actions to be performed after the data is fetched, or prior to being written. For example, to strip line endings upon input, and add them upon output.

The constructor is called new() and takes a hash with attributes as its parameter.

The following attributes are recognized and used by the constructor, all others are ignored.

The constructor will return a blessed hash containing all the original attributes, plus some new attributes. The names of the new attributes all start with "_filter_", the new attributes should not be touched.

input
This designates the input source. The value must be a scalar (containing a file name), a file handle (either a glob or an instance of class "IO::File"), an array reference, or a reference to a subroutine, as described above.

If a subroutine is specified, it must return the next line to be processed, and "undef" at end.

input_postread
This attribute can be used to select an action to be performed after the data has been read. Its prime purpose is to handle line endings (e.g. remove a trailing newline).

The value can be 'none' or 0 (no action), 'chomp' or 1 (standard chomp() operation), or a reference to a subroutine. Default value is 0 (no chomping).

If the value is a reference to a subroutine, this will be called with the text line that was just read as its only argument, and it must return the new contents of the text line.. If it returns undef, this line will be skipped.

filter
If specified, a reference to a subroutine that performs filtering. It will be called after input_postread, with the text line that was just read as its only argument, and it must return the new contents of the text line. If it returns undef, this line will be skipped.
output
This designates the output. The value must be a scalar (containing a file name), a file handle (either a glob or an instance of class "IO::File"), or a reference to a subroutine, as described above.

Note: when a file name is passed, a ">" will be prepended if necessary.

output_prewrite
This attribute can be used to select an action to be performed just before the data is added to the output. Its prime purpose is to handle line endings (e.g. add a trailing newline). The value can be 'none' or 0 (no action) , 'newline' or 1 (append the value of $/ to the line), or a reference to a subroutine. Default value is 0 (no action).

If the value is 'newline' or 1, and the value of $/ is "" (paragraph mode), two newlines will be added.

If the value is a reference to a subroutine, this will be called with the text line as its only argument, and it must return the new contents of the line to be output. If it returns undef, no output occurs.

Text::Filter->run([ attributes ])
This creates a temporary filter object using the attibutes as in "new", and runs its "run" method.

$filter->readline
If there is anything in the pushback buffer, this is returned and the pushback buffer is marked empty.

Otherwise, returns the next line from the input stream, or "undef" if there is no more input.

$filter->pushback($line)
Pushes a line of text back to the input stream. Returns the line.
$filter->peek
Peeks at the input. Short for pushback(readline()).
$filter->writeline ($line)
Adds $line to the output stream.
$filter->set_input($input [ , $postread ])
Sets the input method to $input. If the optional argument $postread is defined, sets the input line postprocessing strategy as well.
$filter->set_output($output, [ $prewrite ])
Sets the output method to $output. If the optional argument $prewrite is defined, sets the output line preprocessing strategy as well.
$filter->run( [ filter ])
This will run the readline/writeline loop. Optionally a filter argument (see CONSTRUCTOR, above) can be passed if filtering is desired and not yet otherwise designated.

This example shows how to filter empty and whitespace lines.

    use Text::Filter;
    Text::Filter->run(filter => sub { my $line = shift;
                                      return unless $line =~ /\S/;
                                      return $line;
                                  });

This is an example of how to use "Text::Filter" as a base class.

It implements a module that provides a single instance method: grep(), that performs some kind of grep(1)-style function (how surprising!).

A class method grepper() is also provided for easy access to do 'the right thing' in the most common case.

    package Grepper;

    use strict;
    use base qw(Exporter Text::Filter);
    our @EXPORT;

    # Setup.
    BEGIN {
        @EXPORT = qw(grepper);
    }

    # Constructor. Major part of the job is done by the superclass.
    sub new {
        my $class = shift;

        # Create a new instance by calling the superclass constructor.
        my $self = $class->SUPER::new(@_);
        # The superclass constructor will take care of handling
        # the input and output attributes, and setup everything for
        # handling the IO.

        # Bless the object into the desired class.
        bless ($self, $class);

        # And return it.
        $self;
    }

    # Instance method, just an example. No magic.
    sub grep {
        my $self = shift;
        my $pat = shift;
        my $line;
        while ( defined($line = $self->readline) ) {
            $self->writeline($line) if $line =~ $pat;
        }
    }

    # Class method, for convenience.
    # Usage: grepper (<input file>, <output file>, <pattern>);
    sub grepper {
        my ($input, $output, $pat) = @_;

        # Create a Grepper object.
        my $grepper = Grepper->new(input => $input, output => $output);

        # Call its grep method.
        $grepper->grep ($pat);
    }

Johan Vromans (jvromans@squirrel.nl) wrote this module.

This program is Copyright 1998,2013 by Squirrel Consultancy. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of either: a) the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or b) the "Artistic License" which comes with Perl.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the Artistic License for more details.

2013-01-17 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.