Manual Reference Pages - GENEZZO::UTIL (3)
PackRow2 takes list of items and packs them (non-destructively) into a
string of <= maxsize bytes. If offset is not specified, it builds the
string starting with the last item in the list, prepending it with
each preceding item until it runs out of space or the list is fully
consumed. If the packer runs out of space, it returns the offset into
the list where it stopped. The offset may be supplied as an argument
to this function, and the packer will pack the remainder of the list
starting at the offset, working back to the beginning of the list.
The final argument to the packer is a next pointer, a string that
identifies the location of the next part of a row split into multiple
pieces. Since the packer processes a list from back to front, the
address of the next piece can be obtained before constructing the
preceding piece. If the packer can process a complete list, it
returns an array containing a single packed string, a byte string
consisting of a count of the number of packed items, followed by
length/value pairs for each item. If the packer runs out of space, it
returns an array of the packed string and the offset of the remaining
For example, given the list @a = qw(alpha bravo charlie delta), and a
maxsize=15, PackRow2 returns a packed string (something like
x01x05delta) and the offset 3, indicating that the last item in the
list was processed, and the packer ran out of space at the third item.
The packed string could be stored in a pushhash, which would return an
index, e.g. 5/2, suitable for a next pointer. Packing the remainder
of the string generates another packed string
(e.g. x02x07charliex035/2) and the offset 2. The packing and storage
process continues until the entire list is consumed.
The packed string always contains a bitstring to identify null
columns, which is used by UnPackRow to correctly distinguish between
nulls and zero length strings.
Since the next pointer is used to find the next part of a split row,
it must always remain whole if it was split, how could you find the
next piece? The next pointer is a convention supported by
PackRow/UnPackRow to facilitate the construction of methods that
manipulate split rows. The packing function only flattens an array
into a byte string or series of strings; it does not provide any
intrinsic support to traverse these strings. Functions that
manipulate packed rows may use additional structures to support
multi-part rows, such as external metadata in the block row directory,
or specialized metadata columns embedded in the row itself.
column splitting (fragmentation)
The packer can support rows with individual columns that exceed the
maxsize. The offset can simultaneously maintain the current column
position, as well as the current character offset in that column.
Its wicked complicated. Generally, we say that a row is split into
row pieces, and the row pieces are chained (via the next pointers),
which lets us reconstruct a complete row. Individual columns that are
split are said to be fragmented.
The packer could be extended to support more complex structures than
arrays of scalars. In lieu of this ability, these structures can be
flattened using Data::Dumper or YAML to large strings.
Genezzo::Util - Utility functions
Should bundle all data file utility functions, such as FileGetHeaderInfo, SetHeaderInfo, etc, under separate Util::DataFile module
FileGetHeaderInfo: need to handle case of header which exceeds a single block. Probably should keep increasing the buffer size until find null terminator (within reason).
packrow: store metadata in col0 vs trailing col with next ptr
packrow: check pack format for a zero len row of zero cols. Does it need a nullvec?
packrow/unpackrow: in Perl 5.8 could use the nifty repeating templates to our advantage.
packrow: could generate skiplists as col zero metadata tracking byte position and column numbers to speed lookups
Jeffrey I. Cohen, firstname.lastname@example.org
Copyright (c) 2003-2007 Jeffrey I Cohen. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Address bug reports and comments to: email@example.com
For more information, please visit the Genezzo homepage
|perl v5.20.3 ||GENEZZO::UTIL (3) ||2007-01-23 |
Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.