NOTE: this filter is primarily for development and testing, it is not
designed for large datasets (it can use a lot of RAM if fed enough data).
Useful with the revml: destination to get RevML output in a
desired order. Otherwise the sorting built in to the change
aggregator should suffice.
The default sort spec is name,rev_id which is what is handy to
VCPs test suite as it puts all revisions in a predictable order
so the output revml can be compared to the input revml.
NOTE: this is primarily for development use; not all fields may work
right. All plain string fields should work right as well as name,
rev_id, change_id and their source_... equivalents (which are parsed and
compared piece-wise) and time, and mod_tome (which are stored as
Plain case sensitive string comparison is used for all fields other than
those mentioned in the preceding paragraphs.
This sort may be slow for extremely large data sets; it sorts things
by comparing revs to eachother field by field instead of by generating
indexes and VCP::Rev is not designed to be super fast when accessing
fields one by one. This can be altered if need be.
change_id or rev_id are split in to segments
suitable for sorting.
The splits occur at the following points:
1. Before and after each substring of consecutive digits
2. Before and after each substring of consecutive letters
3. Before and after each non-alpha-numeric character
The substrings are greedy: each is as long as possible and non-alphanumeric
characters are discarded. So 11..22aa33 is split in to 5 segments:
( 11, ", 22, aa", 33 ).
If a segment is numeric, it is left padded with 10 NUL characters.
This algorithm makes 1.52 be treated like revision 1, minor revision 52, not
like a floating point 1.52. So the following sort order is maintained:
The substring pre might be treated specially at some point.
(At least) the following cases are not handled by this algorithm:
1. floating point rev_ids: 1.0, 1.1, 1.11, 1.12, 1.2
2. letters as "prereleases": 1.0a, 1.0b, 1.0, 1.1a, 1.1