GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Asm::Preproc::Lexer(3) User Contributed Perl Documentation Asm::Preproc::Lexer(3)

Asm::Preproc::Lexer - Iterator to split input in tokens

  use Asm::Preproc::Lexer;

  my @tokens = (
     BLANKS  => qr/\s+/,       sub {()},
     COMMENT => [qr/\/\*/, qr/\*\//],
                               undef,
     QSTR    => [qr/'/],       sub { my($type, $value) = @_;
                                     [$type, 
                                      substr($value, 1, length($value)-2)] },
     QQSTR   => [qr/"/, qr/"/],
     NUM     => qr/\d+/,
     ID      => qr/[a-z]+/,    sub { my($type, $value) = @_; 
                                     [$type, $value] },
     SYM     => qr/(.)/,       sub { [$1, $1] },
  );

  my $lex = Asm::Preproc::Lexer->new;
  $lex->make_lexer(@tokens);

  my $lex2 = $lex->clone;

  $lex->from(sub {}, @lines);  # read Asm::Preproc::Line from iterator
  my $token = $lex->next;      # isa Asm::Preproc::Token
  my $token = $lex->();

This module implements a sub-class of Iterator::Simple::Lookahead to read text from iterators and split the text in tokens, according to the specification given to "make_lexer" constructor.

The objects are Iterator::Simple compatible, i.e. they can be used as an argument to "iter()".

The tokenizer reads Asm::Preproc::Line objects and splits them in Asm::Preproc::Token objects on each "next" call. "next" returns "undef" on end of input.

Creates a new tokenizer object, subclass of Iterator::Simple::Lookahead.

"make_lexer" must be called to create the tokenizer code before the iterator can be used.

Creates a new tokenizer object for the given token specification. Each token is specified by the following elements:
type
String to identify the token type, unused if the token is discarded (see "BLANKS" and "COMMENT" above).
regexp
One of:
1.
A single regular expression to match the token at the current input position.
2.
A list of one regular expression, to match delimited tokens that use the same delimiter for the start and the end. The token can span multiple lines. See see "QSTR" above for an example for multi-line single-quoted strings.
3.
A list of two regular expressions, to match the start of the token at the current input position, and the end of the token. The token can span multiple lines. See see "COMMENT" above for an example for multi-line comments.

The regular expression is matched where the previous match finished, and each sub-expression cannot span multiple lines. Parentheses may be used to capture sub-expressions in $1, $2, etc.

It is considered an error, and the tokeninzer dies with an error message when reading input, if some input cannot be recognized by any of the given "regexp" espressions. Therefore the "SYM" token above contains the catch-all expression "qr/(.)/".

transform (optional)
The optional code reference is a transform subroutine. It receives the "type" and "value" of the recognized token, and returns one of:
1.
An array ref with two elements "[$type, $value]", the new "type" and "value" to be returned in the Asm::Preproc::Token object.
2.
An empty array "()" to signal that this token shall be dicarded.

As an optimization, the transform subroutine code reference may be set to "undef", to signal that the token will be dicarded and there is no use in accumulating it while matching. This is usefull to discard comments upfront, instead of collecting the whole comment, and then pass it to the transform subroutine just to be discarded afterwards. See see "COMMENT" above for an example of usage.

Creates a copy of this tokenizer object without compiling a new lexing subroutine. The copied object has all pending input cleared.

Inserts the given input at the head of the input queue to the tokenizer. The input is either a list of Asm::Preproc::Line objects, or an interator function that returns a Asm::Preproc::Line object on each call.

The input list and interator can also return plain scalar strings, that are converted to Asm::Preproc::Line on the fly, but the information on input file location for error messages will not be available.

The new inserted input is processed before continuing with whatever was already in the queue.

Peek the Nth element from the stream, inherited from Iterator::Simple::Lookahead.

Retrieve the next token from the input strean as a Asm::Preproc::Token object, inherited from Iterator::Simple::Lookahead.

See Asm::Preproc.
2019-03-03 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.