Here-docs are incredibly handy when writing Perl, but incredibly tricky
when parsing it, primarily because they dont follow the general flow of
They jump ahead and nab lines directly off the input buffer. Whitespace
and newlines may not matter in most Perl code, but they matter in here-docs.
They are also tricky to store as an object. They look sort of like an
operator and a string, but they dont act like it. And they have a second
section that should be something like a separate token, but isnt because a
string can span from above the here-doc content to below it.
So when parsing, this is what we do.
Firstly, the PPI::Token::HereDoc object, does not represent the <<
operator, or the END_FLAG, or the content, or even the terminator.
It represents all of them at once.
The token itself has only the declaration part as its content.
# This is what the content of a HereDoc token is
# Or this
# Or even this
That is, the operator, any whitespace separator, and the quoted or bare
terminator. So when you call the content method on a HereDoc token, you
get << FOO.
As for the content and the terminator, when treated purely in content terms
they do not exist.
The content is made available with the heredoc method, and the name of
the terminator with the terminator method.
To make things work in the way you expect, PPI has to play some games
when doing line/column location calculation for tokens, and also during
the content parsing and generation processes.
Documents cannot simply by recreated by stitching together the token
contents, and involve a somewhat more expensive procedure, but the extra
expense should be relatively negligible unless you are doing huge
quantities of them.
Please note that due to the immature nature of PPI in general, we expect
HereDocs to be a rich (bad) source of corner-case bugs for quite a while,
but for the most part they should more or less DWYM.