 |
|
| |
Regex(3) |
User Contributed Perl Documentation |
Regex(3) |
YAPE::Regex - Yet Another Parser/Extractor for Regular
Expressions
This document refers to YAPE::Regex version 4.00.
use YAPE::Regex;
use strict;
my $regex = qr/reg(ular\s+)?exp?(ression)?/i;
my $parser = YAPE::Regex->new($regex);
# here is the tokenizing part
while (my $chunk = $parser->next) {
# ...
}
The "YAPE" hierarchy of modules
is an attempt at a unified means of parsing and extracting content. It
attempts to maintain a generic interface, to promote simplicity and
reusability. The API is powerful, yet simple. The modules do tokenization
(which can be intercepted) and build trees, so that extraction of specific
nodes is doable.
This module is yet another (?) parser and tree-builder for Perl
regular expressions. It builds a tree out of a regex, but at the moment, the
extent of the extraction tool for the tree is quite limited (see
"Extracting Sections"). However, the tree can be useful to
extension modules.
In addition to the base class,
"YAPE::Regex", there is the auxiliary
class "YAPE::Regex::Element" (common to
all "YAPE" base classes) that holds the
individual nodes' classes. There is documentation for the node classes in
that module's documentation.
- "use YAPE::Regex;"
- "use YAPE::Regex qw( MyExt::Mod );"
If supplied no arguments, the module is loaded normally, and
the node classes are given the proper inheritence (from
"YAPE::Regex::Element"). If you supply
a module (or list of modules),
"import" will automatically include
them (if needed) and set up their node classes with the proper
inheritence -- that is, it will append
"YAPE::Regex" to
@MyExt::Mod::ISA, and
"YAPE::Regex::xxx" to each node
class's @ISA (where
"xxx" is the name of the specific node
class).
package MyExt::Mod;
use YAPE::Regex 'MyExt::Mod';
# does the work of:
# @MyExt::Mod::ISA = 'YAPE::Regex'
# @MyExt::Mod::text::ISA = 'YAPE::Regex::text'
# ...
- "my $p = YAPE::Regex->new($REx);"
Creates a "YAPE::Regex"
object, using the contents of $REx as a regular
expression. The "new" method will
attempt to convert $REx to a compiled
regex (using "qr//") if
$REx isn't already one. If there is an error in
the regex, this will fail, but the parser will pretend it was ok. It
will then report the bad token when it gets to it, in the course of
parsing.
- "my $text = $p->chunk($len);"
Returns the next $len characters in
the input string; $len defaults to 30
characters. This is useful for figuring out why a parsing error
occurs.
- "my $done = $p->done;"
Returns true if the parser is done with the input string, and
false otherwise.
- "my $errstr = $p->error;"
Returns the parser error message.
- "my $backref = $p->extract;"
Returns a code reference that returns the next back-reference
in the regex. For more information on enhancements in upcoming versions
of this module, check "Extracting Sections".
- "my $node = $p->display(...);"
Returns a string representation of the entire content. It
calls the "parse" method in case there
is more data that has not yet been parsed. This calls the
"fullstring" method on the root nodes.
Check the "YAPE::Regex::Element" docs
on the arguments to "fullstring".
- "my $node = $p->next;"
Returns the next token, or
"undef" if there is no valid token.
There will be an error message (accessible with the
"error" method) if there was a problem
in the parsing.
- "my $node = $p->parse;"
Calls "next" until all the
data has been parsed.
- "my $node = $p->root;"
Returns the root node of the tree structure.
- "my $state = $p->state;"
Returns the current state of the parser. It is one of the
following values: "alt",
"anchor",
"any",
"backref",
capture(N),
"Cchar",
"class",
"close",
"code",
"comment",
cond(TYPE),
"ctrl",
"cut",
"done",
"error",
"flags",
"group",
"hex",
"later",
lookahead(neg|pos),
lookbehind(neg|pos),
"macro",
"named",
"oct",
"slash",
"text", and
"utf8hex".
For capture(N), N will be the
number the captured pattern represents.
For cond(TYPE), TYPE will
either be a number representing the back-reference that the conditional
depends on, or the string
"assert".
For "lookahead" and
"lookbehind", one of
"neg" and
"pos" will be there, depending on the
type of assertion.
- "my $node = $p->top;"
Synonymous to "root".
While extraction of nodes is the goal of the
"YAPE" modules, the author is at a loss
for words as to what needs to be extracted from a regex. At the current
time, all the "extract" method does is
allow you access to the regex's set of back-references:
my $extor = $parser->extract;
while (my $backref = $extor->()) {
# ...
}
"japhy" is very open to
suggestions as to the approach to node extraction (in how the API should
look, in addition to what should be proffered). Preliminary ideas include
extraction keywords like the output of -Dr (or the
"re" module's
"debug" option).
- "YAPE::Regex::Explain"
Presents an explanation of a regular expression, node by
node.
- "YAPE::Regex::Reverse" (Not released)
Reverses the nodes of a regular expression.
This is a listing of things to add to future versions of this
module.
- •
- Create a robust "extract" method
Open to suggestions.
Following is a list of known or reported bugs.
- •
- "use charnames ':full'"
To understand "\N{...}"
properly, you must be using 5.6.0 or higher. However, the parser only
knows how to resolve full names (those made using use
charnames ':full'). There might be an
option in the future to specify a class name.
The "YAPE::Regex::Element"
documentation, for information on the node classes. Also,
"Text::Balanced", Damian Conway's
excellent module, used for the matching of "(?{ ...
})" and "(??{ ... })"
blocks.
The original author is Jeff "japhy" Pinyan (CPAN ID:
PINYAN).
Gene Sullivan (gsullivan@cpan.org) is a co-maintainer.
This module is free software; you can redistribute it and/or
modify it under the same terms as Perl itself. See perlartistic.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|