|o||Token Nodes, which have a constant semantics, initialized when the token is read.|
|o||Rule Nodes, which have a dynamic semantics, based on action names and actions, as described below.|
|o||Null Nodes, which have a constant semantics, initialized when the recognizer is created.|
When a Marpa grammar is created, its dynamic semantics is specified indirectly, as <B>action namesB>. To implement its semantics of action names and actions, Marpa must do three things:
An action name and action is also used to create the per-parse-tree variable, as described below. The per-parse-tree variable is a special case, but it is intended to be used as part of the semantics.
o Determine the action name. o Resolve the action name to an action. An action is a Perl closure. o Call the Perl closure to produce the actual result.
For every input token, there is an associated <B>token nodeB>. Token nodes are leaf nodes in the parse tree. Tokens always have a <B>token symbolB>. At lexing time, they can be assigned a <B>token valueB>. If no token value is assigned at lex time, the token value defaults to a Perl undef.
Nodes which are ancestors of token nodes are called <B>rule nodesB>. Rule nodes are always associated with a rule. The value of a rule node is computed at Node Evaluation Time. Applications can specify, on a per-rule basis, Perl closures to evaluate rule nodes. The Perl closures which produce the value of rule nodes are called value actions.
A value actions arguments will be a per-parse-tree variable followed by the values of its child nodes in lexical order. The return value of a value action becomes the value of the node. A value action is always evaluated in scalar context. If there is no value action for a rule node, the value of the rule node is a Perl undef.
Some rules are sequence rules. Sequence rule nodes are also rule nodes. Everything said above about rule nodes applies to sequence rule nodes. Specifically, the arguments to the value actions for sequence rules are the per-parse-tree variable followed by the values of the child nodes in lexical order.
The difference (and it is a big one) is that in an ordinary rule, the right hand side is fixed in length, and that length is known when you are writing the code for the value action. In a sequence rule, the number of right hand side symbols is not known until node evaluation time. The value action of a sequence rule node must be a Perl closure capable of dealing with a variable number of arguments.
Sequence semantics work best when every child node in the sequence has the same semantics. When that is not the case, writing the sequence using ordinary non-sequence rules should be considered as an alternative.
By default, if a sequence rule has separators, the separators are thrown away before the value action is called. This means that separators do not appear in the @_ array of the Perl closure which is the value action. If the value of the keep rule property is a Perl true value, separators are kept, and do appear in the value actions @_ array.
A <B>null nodeB> is a node which derives the zero-length, or empty string. By default, the value of a null node is a Perl undef. This is adequate for many applications, but Marpa allows other values to be specified for null nodes. In fact, Marpa allows allows an arbitrarily complex null semantics. Readers interested in null nodes with values other than undef, or who would like to read a more detailed account of how Marpas null semantics works, should turn to the document on null semantics.
Most applications will find that the order in which Marpa executes its semantics just works. This section describes that order in detail. These details can matter in some applications, for example, those which exploit side effects.
When the semantics are applied to a parse tree, it produces a value called a <B>parse resultB>. Because Marpa allows ambiguous parsing, each parse can produce a <B>parse seriesB> a series of zero or more parse trees, each with its own parse result. The first call to the the recognizers value method after the recognizer is created is the start of the first parse series. The first parse series continues until there is a call to the the reset_evaluation method or until the recognizer is destroyed. Usually, an application is only interested in a single parse series.
When the reset_evaluation method is called for a recognizer, it begins a new parse series. The new parse series continues until there is another call to the the reset_evaluation method, or until the recognizer is destroyed.
While processing a recognizer, we have
While processing a parse series, we have:
o A Recognizer Setup Phase, which occurs during the call of a recognizers new method. It is followed by o the processing of zero or more parse series.
While processing a parse tree, we have:
o A Series Setup Phase, which occurs during the first call of the recognizers value method for that series. It is followed by o the processing of zero or more parse trees.
<B>Node Evaluation TimeB> is the Tree Traversal Phase, as seen from the point of view of each rule node. It is not a separate phase.
o A Tree Setup Phase, which occurs during the call of the recognizers value method for that parse tree. It is followed by o a Tree Traveral Phase.
In the Recognizer Setup Phase, the null values of the symbols are determined. The null values of symbols never change they are in effect properties of the grammar.
During the Series Setup Phase all value action names are resolved to Perl closures the value actions. The value actions are never called in the Series Setup Phase. Value actions are called in the Tree Traversal Phase. Also during the Series Setup Phase, the logic which ranks parse trees is executed.
In the Tree Setup Phase, the per-parse-tree variable is created. If a constructor was found for the action_object, it is run at this point, and the per-parse-tree variable is its return value. Exactly one Tree Setup Phase occurs for each parse tree.
During the Tree Traversal Phase, the value actions are called. Node Evaluation Time is the Tree Traversal Phase, as seen from the point of view of the individual nodes of the parse tree. If a node has a value action, it is called at Node Evaluation Time.
Marpa finds the action for each rule based on rule and symbol properties and on the grammar named arguments. Specifically, Marpa attempts the following, in order:
Resolution of action names is described below. If the action property or the default_action named argument is defined, but does not resolve successfully, Marpa throws an exception. Marpa prefers to fast fail in these cases, because they often indicate a mistake in writing the application.
o Resolving an action based on the action property of the rule, if one is defined. o Resolving an action based on the lhs property of the rule. o Resolving an action based on the default_action named argument of the grammar, if one is defined. o Defaulting to a virtual action, one which returns a Perl undef.
Action names come from these sources:
o The default_action named argument of Marpas grammar. o The action property of Marpas rules. o The new constructor in the package specified by the action_object named argument of the Marpa grammar. o The lhs property of Marpas rules.
The recognizers closures named argument allows the user to directly control the mapping from action names to actions. The value of the closures named argument is a reference to a hash whose keys are action names and whose hash values are CODE refs.
If an action name is the key of an entry in the closures hash, it resolves to the closure referenced by the value part of that hash entry. Resolution via the closures named argument is called <B>explicit resolutionB>.
When explicit resolution is the only kind of resolution that is wanted, it is best to pick a name that is very unlikely to be the name of a Perl closure. Many of Marpa::HTMLs action names are intended for explicit resolution only. In Marpa::HTML those action names begin with an exclamation mark (!), and that convention is recommended.
If explicit resolution fails, Marpa transforms the action name into a <B>fully qualifiedB> Perl name. An action name that contains a double colon ("::) or a single quote (") is considered to be a fully qualified name. Any other action name is considered to be a <B>bare action nameB>.
If the action name to be resolved is already a fully qualified name, it is not further transformed. It will be resolved in the form it was received, or not at all.
For bare action names, Marpa tries to qualify them by adding a package name. If the actions grammar named argument is defined, Marpa uses it as the package name. Otherwise, if the action_object grammar named argument is defined, Marpa uses it as the package name. Once Marpa has fully qualified the action name, Marpa looks for a Perl closure with that name.
Marpa will not attempt to resolve an action name that it cannot fully qualify. This implies that, for an action name to resolve successfully, one of these four things must be the case:
In all but one circumstance, failure to resolve an action name is thrown as an exception. Marpa is more lenient when a rule attempts to use the lhs rule property as an action name. That is the one case in which Marpa will look at other alternatives.
o The action name resolves explicitly. o The action name is fully qualified to begin with. o The actions named argument is defined. o The action_object named argument is defined.
Marpas philosophy is to require that the programmer be specific about action names. This can be an inconvenience, but Marpa prefers this to silently executing unintended code.
Generally it is a good practice to keep the semantic Perl closures in their own namespace. But if, for example, the user wants to leave the semantic closures in the main namespace, she can specify "main" as the value of the actions named argument.
In the Tree Setup Phase, Marpa creates a per-parse-tree variable. This becomes the first argument of the semantic Perl closures for the rule nodes. If the grammars action_object named argument is not defined, the per-parse-tree variable is initialized to an empty hash ref.
Most data for the value actions of the rules will be passed up the parse tree. The actions will see the values of the rule nodes child nodes as arguments, and will return their own value to be seen as an argument by their parent node. The per-parse-tree variable can be used for data which does not conveniently fit this model.
The lifetime of the per-parse-tree variable extends into the Tree Traversal Phase. During the Tree Traversal Phase, Marpas internals never alter the per-parse-tree variable it is reserved for use by the application.
If the grammars action_object named argument has a defined value, that value is treated as the name of a class. The <B>action object constructorB> is the new method in the action_object class.
The action object constructor is called in the Tree Setup Phase. The return value of the action object constructor becomes the per-parse-tree variable. It is a fatal error if the grammars action_object named argument is defined, but does not name a class with a new method.
Resolution of the action object constructor is resolution of the <B>action object constructor nameB>. The action object constructor name is formed by concatenating the literal string "::new" to the value of the action_object named argument.
All standard rules apply when resolving the action object constructor name. In particular, bypass via explicit resolution applies to the action object constructor. If the action object constructor name is a hash key in the evaluators closures named argument, then the value referred to by that hash entry becomes the action object constructor.
If a grammar has both the action and the action_object named arguments defined, all action names <B>exceptB> for the action object constructor will be resolved in the action package or not at all. The action object constructor is always in the action_object class, if it is anywhere.
If a parse is ambiguous, all parses are returned, with no duplication. By default, the order is arbitrary, but it is also possible to control the order. Details are in the document on parse order.
Marpa allows grammars with infinite loops. These are universally considered useless in practical applications. Marpa supports all the semantics for these that should be necessary and more. Because it can handle infinite loops, Marpa can accurately claim to support <B>every grammarB> that can be written in BNF.
Marpa applies semantics to infinite loops by breaking the loop at some convenient point, so that an infinite regress is prevented. The exact location at which the loop is broken varies among Marpa implementations. If an infinite loop is given a null semantics, the location of the break will not matter. A null semantics is the default.
More could be done. In particular, a precise definition of where loops are broken would allow applications to depend on the details of the structure of infinite loops. But practical interest in this seems non-existent. For those who want to know more, the details are in a separate document.
Copyright 2012 Jeffrey Kegler This file is part of Marpa::XS. Marpa::XS is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Marpa::XS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with Marpa::XS. If not, see http://www.gnu.org/licenses/.
|perl v5.20.3||MARPA::XS::SEMANTICS (3)||2016-04-05|