GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Spreadsheet::ReadGnumeric(3) User Contributed Perl Documentation Spreadsheet::ReadGnumeric(3)

Spreadsheet::ReadGnumeric - read a Gnumeric file, return Spreadsheet::Read

Version 0.4

    use Spreadsheet::ReadGnumeric;
    my $reader = Spreadsheet::ReadGnumeric->new
        (rc => 1, cells => 1,   # these are the defaults
         process_formulas => 0);
    my $book = $reader->parse('spreadsheet.gnumeric');
    my $n_sheets = $book->[0]{sheets};
    my $sheet_1 = $book->[1];
    my $b12 = $sheet_1->{B12};
    # or equivalently:
    my $b12 = $sheet_1->{cell}[2][12];
    say $b12;   # => "not a vitamin", e.g.

Given a source of saved Gnumeric data, "Spreadsheet::ReadGnumeric" parses it and returns the result in the same format as "Spreadsheet::Read". In fact, "Spreadsheet::Read" has a few more bells and whistles, such as transposing the resulting sheets and providing an object-oriented interface to the resulting data, so it may be easier to access this module through "Spreadsheet::Read", even if you only want to parse Gnumeric data.

Note that Gnumeric only saves raw cell values, and not their formatted versions. In particular, Gnumeric saves formulas, and not formula values, never mind formatted formula values.

After creating an instance with "new", call "parse" with a file name, a stream, or a string containing the spreadsheet data. The "attr", "cells", "convert_colors" "gzipped_p", "merge", "minimal_attributes", and "rc" slots control how "Spreadsheet::ReadGnumeric" parses its input. They may also be used as initializer keywords to the "new" method.

The rest of these deal with the XML parsing mechanism and are probably not of interest (unless you need to write a subclass in order to extract more information).

attr

Slot that, if true, directs the parsing methods to retain style information in the returned "$sheet->{attr}" array of each sheet. If the value of this slot is "keep", an arrayref of each sheet's "Spreadsheet::Gnumeric::StyleRegion" instances is saved in "$sheet->{style_regions}".

The "attr" value is normally supplied to the "new" method as an initializer. See the "Spreadsheet::Read" Cell Attributes <https://metacpan.org/pod/Spreadsheet::Read#Cell-Attributes> documentation for details of the resulting attributes. See Spreadsheet::Gnumeric::StyleRegion for details of how attribute conversion is performed.

See also the "merge" slot, which requests additional processing of cell merge information.

cells

Slot that contains the value of the "Spreadsheet::Read" "cells" option. When true, this populates the "$sheet->{$cell_name}" with the corresponding cell data, where $cell_name is an alphanumeric cell name such as "B17" or "AA3". Defaults to a true value.

chars

Slot that contains the text content of an XML element being parsed.

convert_colors

Slot that contains a boolean which, if defined and true, causes Gnumeric colors to be converted to the more common HTML-style format. Gnumeric specifies colors as RGB triples using hex numbers of one to four digits for each primary color, separated by colons, so that white is "FFFF:FFFF:FFFF", and cyan is "0:FC00:F800". Conversion pads each primary to four digits, takes the first two, and prepends "#", producing "#FFFFFF" for white and "#00FCF8" for cyan.

For "Spreadsheet::Read" compatibility, the default is true. This value is ignored unless "attr" is true.

current_attrs

Slot that contains the attribute of an element being parsed.

current_elt

Slot that contains the name of an element being parsed, including any namespace prefix (see "namespaces").

element_stack

Slot that contains an arrayref that is a stack of containing element contexts. The last entry in this is the parent of the current element being parsed, and is itself an arrayref of the values of

    [ current_elt, current_attrs, chars, namespaces ]

while that containing element was being parsed.

find_style_for_cell

Given zero-based column and row indices, look in "style_regions" for a suitable "Spreadsheet::Gnumeric::StyleRegion" instance, and return its attribute hash.

formulas

Slot that contains an arrayref indexed by the "ExprID" attribute of formula cells. Each entry is itself an arrayref of four elements, the zero-based column and row indices where the formula is defined, the original formula as a string (complete with leading "="), and the tokenized formula; this is initially "undef" and only filled in if the formula is referenced elsewhere. The "process_formulas" slot describes how this is done.

gzipped_p

Slot that determines whether "parse" will test whether its argument is gzipped. There are three possibilities: Undefined, other false, and true. If undefined, then "parse" will open the file, make the test, and then reset the stream to the beginning (see "stream_gzipped_p"). Otherwise, "parse" will take the caller's word for it.

merge

Slot that requests additional processing of cell merging information. If true, (a) the value of the "merged" attribute for merged cells is the name of the upper left corner of the rectangle of merged cells, and (b) the contents and attributes of that upper-left cell are copied into all of the others. The default is false.

This flag is ignored unless the "attr" slot is also true.

minimal_attributes

Slot that, if true, causes attributes to be filled in only for cells that have content, i.e. actually appear as a "<Cell>" in the file with a nonempty string string. Otherwise, attributes are filled in for the full "$sheet->{maxrow}" vs "$sheet->{maxcol}" array. The default is false.

This flag is ignored unless the "attr" slot is also true.

namespaces

Slot that contains a hash of namespace prefix to defining URL, for all prefixes in scope. These are never modified by side effect, so that they can be copied into inner scopes. We use this to decide whether any given name is really in the schema we care about.

new

Creates a new "Spreadsheet::ReadGnumeric" instance. Keywords are any of the method names listed as slots.

parse

Given an input source, reads it and returns the "Spreadsheet::Read"-compatible data structure, constructed according to the options set up at instantiation time. If the "gzipped_p" slot is defined and true or the content starts with gzip magic (see "stream_gzipped_p"), which is the normal state of saved Gnumeric spreadsheets, it is uncompressed while being read. If "gzipped_p" is not defined, the stream is read to see how it starts and then rewound.

The input can be a reference (which is taken to be an open stream), a file name, or a literal string that contains the data. If given an non-seekable stream, the "stream_gzipped_p" slot must be defined in order to skip the test for compressed input.

process_formulas

Slot which, if true, requests that spreadsheet parsing should go to the trouble of reconstituting formulas. The default is false, which may change, since formula processing probably doesn't cost very much.

When Gnumeric copies a formula, it offsets the cell names referenced in the formula by the change in location. When the spreadsheet is saved, if neither formula is modified, the first time the equation is mentioned it is saved as a string with a unique "ExprID" and the second time the same "ExprID" is cited and the string is empty. If "process_formulas" is false, the second and all subsequent references to the formula are left empty, but if "process_formulas" is true, the correct equation is reconstituted for each cell in the same way the copies were created by Gnumeric.

rc

Slot that contains the value of the "Spreadsheet::Read" "rc" option. When true, this populates the "$sheet->{cell}[$col][$row]" arrays. Defaults to a true value.

sheet

Slot that contains the sheet currently under construction.

sheets

Read-only slot that contains the resulting "Spreadsheet::Read"-compatible data structure. This slot is initialized when its value is first requested; it is expected that the value is then updated by side-effect.

stream_gzipped_p

Given a stream open to the start of the data and an optional file name, determine whether we need to uncompress the data and then reset the stream back to the beginning. If the "gzipped_p" slot is defined, then we just return that and leave the stream untouched. The file name is only used for error messages.

style_attributes

Internal slot used to store a hashref of spreadsheet attributes during parsing.

style_regions

Slot that contains an arrayref of "Spreadsheet::Gnumeric::StyleRegion" instances, built by the style parsing code for each sheet and cleared when the sheet is completed.

"Spreadsheet::ReadGnumeric" uses "XML::Parser::Lite" to decode the XML into its component elements, calling the "_process_Foo_elt" method at each "</Foo>" closing tag (explicit or implicit), where "Foo" is the case-sensitive element name. If no such method exists, the element is ignored, though its content may already have been processed. For instance, we don't care about "<Sheets>" because it exists only as a container for "<Sheet>" elements, which we do process.

Elements with XMLNS prefixes (using the ":" separator) must be associated with the Gnumeric schema in order to be visible to this parsing mechanism. Elements without prefixes are assumed to be part of the Gnumeric schema (but the only Gnumeric files I've seen so far use prefixes on all elements). Note that element names stored in "current_elt" and "element_stack" have already been stripped of any Gnumeric-specific prefixes.

When the "_process_Foo_elt" method is called, it can examine the "current_attrs" hash for its attributes and the "chars" string for its character content; all element content will already have been processed. If necessary, it can examine the "element_stack" slot to check context. For example, "_process_Name_elt" needs to do this because it only processes Sheet element names. The "element_stack" also makes the attributes of enclosing elements available.

Consequently, extending "Spreadsheet::ReadGnumeric" to extract more information out of Gnumeric files should be a simple matter of defining a subclass with suitable "_process_*_elt" methods that examine their context, extract their content, and stuff the result into the "sheet" and/or "sheets" slot values accordingly. At least that's my theory, and I'm sticking to it.

Note that the "clip", "strip", and "pivot" options of "Spreadsheet::Read" are handled by its "_clipsheets" sub, so I do not intend to do anything about them here.

Please report any bugs or feature requests to "bug-spreadsheet-readgnumeric at rt.cpan.org", or through the web interface at <https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Spreadsheet-ReadGnumeric>.

See <http://www.gnumeric.org/>.
Frontend to multiple spreadsheet formats, with additional options for manipulating the result, plus the option to give them object capabilities. This module also describes the output data structure in depth.
This is a PDF file by David Gilbert dated 2001-11-05, available at <https://www.jfree.org/jworkbook/download/gnumeric-xml.pdf> (GFDL). I am not aware of any more recent version.

Bob Rogers "<rogers at rgrjr.com>", based heavily on the "Spreadsheet::Read" architecture.

This software is Copyright (c) 2023 by Bob Rogers.

This is free software, licensed under:

  The Artistic License 2.0 (GPL Compatible)
2023-06-08 perl v5.40.2

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.