A symbol is <B>nulledB> if it produces the empty sentence.
<B>Nulling symbolsB> are those which are *always* nulled.
<B>Nullable symbolsB> are those which are *sometimes* nulled.
<B>Non-nullable symbolsB> are those which are *never* nulled.
All nulling symbols are also nullable symbols.
A <B>proper nullableB> is any nullable symbol which is not a nulling symbol.
In other words,
a proper nullable is a symbol that
might or might not be nulled.

Nullable symbols were a problem in previous versions of Earley parsers.
Aycock and Horspool 2002
outlined a new approach for dealing with them.
I use their ideas with modifications of my own.

Marpa rewrites its grammar to eliminate proper nullables.
It does this by turning every proper nullable into two symbols:
a non-nullable variant and a nulling variant.
The non-nullable variant keeps the original symbol’s name,
but is no longer allowed to appear in places
where it might be nulled.

The name of the nulling variant
is that of the original symbol with the nulling tag ("`[]`") suffixed.
When the nulling variant is used,
it must be nulled.

The newly introduced nulling symbols will not appear on any left hand sides,
with one exception:
grammars that allow a null parse
will have a nulling start rule.
Except for the nulling start symbol,
Marpa marks nulling symbols internally and
recognizes them directly,
without the need for empty rules.

Rules with proper nullables on the RHS
have to be replaced with
new rules covering every possible combination of
the non-nullable and nulling variants.
That rewrite is called the CHAF rewrite.

Here’s an example of a CHAF rewrite:
`
`

`
{ lhs => statement,
rhs => [
qw/optional_whitespace expression
optional_whitespace optional_modifier
optional_whitespace/
]
}
`

This rule contains four instances
of proper nullables,
illustrating my fear
that grammars of practical interest will
have lots of proper nullables
on right hand sides.
`optional_whitespace`
and `optional_modifier` are both
proper nullables and
`optional_whitespace`
appears three times.

Here’s is the output from `show_rules`, showing what Marpa does with this rule:

`
`

`
0: statement -> optional_whitespace expression optional_whitespace optional_modifier optional_whitespace
/* !used */
`

`
`

`
15: statement -> optional_whitespace expression statement[R0:2] /* vrhs real=2 */
16: statement -> optional_whitespace expression optional_whitespace[] optional_modifier[] optional_whitespace[]
17: statement -> optional_whitespace[] expression statement[R0:2] /* vrhs real=2 */
18: statement -> optional_whitespace[] expression optional_whitespace[] optional_modifier[] optional_whitespace[]
19: statement[R0:2] -> optional_whitespace statement[R0:3] /* vlhs vrhs real=1 */
20: statement[R0:2] -> optional_whitespace optional_modifier[] optional_whitespace[] /* vlhs real=3 */
21: statement[R0:2] -> optional_whitespace[] statement[R0:3] /* vlhs vrhs real=1 */
22: statement[R0:3] -> optional_modifier optional_whitespace /* vlhs real=2 */
23: statement[R0:3] -> optional_modifier optional_whitespace[] /* vlhs real=2 */
24: statement[R0:3] -> optional_modifier[] optional_whitespace /* vlhs real=2 */
`

Rule 0 is the original rule. Because Marpa has rewritten it,
the rule is marked "`!used`",
telling later stages in the precomputation to ignore it.
Marpa breaks Rule 0 up into three pieces,
each with no more than two proper nullables.
Marpa then eliminates the proper nullables in each piece by
<B>factoringB>.

To <B>factorB> a piece, Marpa first rewrites it into multiple rules,
one for each possible combination of nulled and non-nulled symbols.
Marpa then replaces each proper nullable
with its nulling or non-nullable variant, as appropriate.
There are two symbol variants for each proper nullable.
With a maximum of two proper nullables for each piece,
each with two variants, there are at most four combinations.
A rule for one of these combinations is called a
<B>factored ruleB>, or a <B>factorB>.

In the example in the above display,
the original rule (Rule 0) was broken into three pieces.
Rule 0 had 5 RHS symbols,
and the three pieces contain, respectively, the first two RHS symbols;
the third RHS symbol;
and the last two (that is, 4th and 5th) RHS symbols.

The first piece of Rule 0 is factored into four rules: Rules 15 to 18.
The second piece is factored into three rules: Rules 19 to 21.
The third and last piece of Rule 0 is also factored into three rules:
Rules 22 to 24.

When a rule is broken into pieces, the
left hand side of the first piece is the left hand
side of the original rule.
New symbols are introduced to be the left hand sides
of the other pieces.
The names of the new LHS symbols are formed
by suffixing a tag to
the name of the original left hand side.
The suffixed tag begins with a capital R and identifies the
location of the piece in the original rule.
For example,
the tag "`[R0:3]`" indicates that this
symbol is the left hand side for the piece of Rule 0 which begins at right hand symbol 3
(the first symbol is symbol 0).

When a new LHS symbol is created for a piece,
the worst case is that the new LHS is
also a proper nullable.
This worst case occurs when
all the original symbols in the
piece for the new LHS
and all symbols in all subsequent pieces are proper nullables.

A newly created LHS symbol will always appear on the RHS of the previous piece.
If the newly created symbol is a proper nullable,
then it counts against the limit of two proper nullables for that previous piece,
and it must be factored,
just the same as if it was
one of the proper nullables appearing in the original rule.

Rule 0 is such a worst case.
The last three symbols
of the right hand side are all
proper nullables.
That means that all the symbols in the last two pieces of the original rule
are proper nullables.
Therefore both of the newly created LHS symbols are proper nullables.

The original Rule 0 has 4 proper nullables.
After the CHAF rewrite,
there are 6 proper nullables:
the original 4 plus the 2 symbols newly created to serve as left hand sides.
This is why, in order to have at most 2 proper nullables per piece,
the original rule must to be divided into 3 pieces.

CHAF preserves user semantics.
Marpa,
when it splits the rule into
pieces and factors the pieces,
inserts logic to gather and preserve the values of child nodes.
The user’s semantic actions
see the original child values
as if the CHAF rewrite had never occurred.

Internally, Marpa converts productions specified as sequences into BNF productions.
The conversion is done in a standard way. For example,
`
`

`
{
lhs => statements,
rhs => [qw/statement/],
separator => comma,
min => 1
}
`

becomes

`
`

`
2: statements -> statements[Subseq:8:5] /* vrhs real=0 */
3: statements -> statements[Subseq:8:5] comma /* vrhs real=1 */
4: statements[Subseq:8:5] -> statement /* vlhs real=1 */
5: statements[Subseq:8:5] -> statements[Subseq:8:5] comma statement /* vlhs vrhs real=2 */
`

In the added symbol, the tag "`[Subseq:8:5]` indicates a special
symbol introduced to serve as the LHS of the sequence.
The `8:5` indentifies the sequence using the symbol ID’s of the two
symbols involved.
`statements`, the LHS symbol, has symbol ID 8.
`statement`", the RHS symbol, has symbol ID 5.

Here’s another example, this time of a sequence without a separator.
The rule

`
`

`
{ lhs => block, rhs => [qw/statements/], min => 0 },
`

is rewritten by Marpa as follows:

`
`

`
7: block -> /* empty !used */
8: block -> block[Subseq:0:8] /* vrhs real=0 */
9: block[Subseq:0:8] -> statements /* vlhs real=1 */
10: block[Subseq:0:8] -> block[Subseq:0:8] statements /* vlhs vrhs real=1 */
`

Note that rule 7, the empty rule with the `block` symbol as its LHS,
is marked "`!used`".
`block` is a proper nullable, and rules from sequence conversion
undergo the same rewriting of proper nullables as any other rules.
In this case a nulling variant of `block` (`block[]`)
was created.
That made the empty rule created by the sequence conversion
useless,
and that is why it was marked "`!used`".