 |
|
| |
Table(3) |
User Contributed Perl Documentation |
Table(3) |
Data::Table - Data type related to database tables, spreadsheets,
CSV/TSV files, HTML table displays, etc.
News: The package now includes "Perl Data::Table Cookbook" (PDF), which may serve as a better learning material.
To download the free Cookbook, visit https://sites.google.com/site/easydatabase/
# some cool ways to use Table.pm
use Data::Table;
$header = ["name", "age"];
$data = [
["John", 20],
["Kate", 18],
["Mike", 23]
];
$t = Data::Table->new($data, $header, 0); # Construct a table object with
# $data, $header, $type=0 (consider
# $data as the rows of the table).
print $t->csv; # Print out the table as a csv file.
$t = Data::Table::fromCSV("aaa.csv"); # Read a csv file into a table object
### Since version 1.51, a new method fromFile can automatically guess the correct file format
# either CSV or TSV file, file with or without a column header line
# e.g.
# $t = Data::Table::fromFile("aaa.csv");
# is equivalent.
print $t->html; # Display a 'portrait' HTML TABLE on web.
use DBI;
$dbh= DBI->connect("DBI:mysql:test", "test", "") or die $DBI::errstr;
my $minAge = 10;
$t = Data::Table::fromSQL($dbh, "select * from mytable where age >= ?", [$minAge]);
# Construct a table form an SQL
# database query.
$t->sort("age", 0, 0); # Sort by col 'age',numerical,ascending
print $t->html2; # Print out a 'landscape' HTML Table.
$row = $t->delRow(2); # Delete the third row (index=2).
$t->addRow($row, 4); # Add the deleted row back as fifth row.
@rows = $t->delRows([0..2]); # Delete three rows (row 0 to 2).
$col = $t->delCol("age"); # Delete column 'age'.
$t->addCol($col, "age",2); # Add column 'age' as the third column
@cols = $t->delCols(["name","phone","ssn"]);
# Delete 3 columns at the same time.
$name = $t->elm(2,"name"); # Element access
$t2=$t->subTable([1, 3..4],['age', 'name']);
# Extract a sub-table
$t->rename("Entry", "New Entry"); # Rename column 'Entry' by 'New Entry'
$t->replace("Entry", [1..$t->nofRow()], "New Entry");
# Replace column 'Entry' by an array of
# numbers and rename it as 'New Entry'
$t->swap("age","ssn"); # Swap the positions of column 'age'
# with column 'ssn' in the table.
$t->colMap('name', sub {return uc}); # Map a function to a column
$t->sort('age',0,0,'name',1,0); # Sort table first by the numerical
# column 'age' and then by the
# string column 'name' in ascending
# order
$t2=$t->match_pattern('$_->[0] =~ /^L/ && $_->[3]<0.2');
# Select the rows that matched the
# pattern specified
$t2=$t->match_pattern_hash('$_{"Amino acid"} =~ /^L-a/ && $_{"Grams \"(a.a.)\""}<0.2'));
# use column name in the pattern, method added in 1.62
$t2=$t->match_string('John'); # Select the rows that matches 'John'
# in any column
$t2=$t->clone(); # Make a copy of the table.
$t->rowMerge($t2); # Merge two tables
$t->colMerge($t2);
$t = Data::Table->new( # create an employ salary table
[
['Tom', 'male', 'IT', 65000],
['John', 'male', 'IT', 75000],
['Tom', 'male', 'IT', 65000],
['John', 'male', 'IT', 75000],
['Peter', 'male', 'HR', 85000],
['Mary', 'female', 'HR', 80000],
['Nancy', 'female', 'IT', 55000],
['Jack', 'male', 'IT', 88000],
['Susan', 'female', 'HR', 92000]
],
['Name', 'Sex', 'Department', 'Salary'], 0);
sub average { # this is an subroutine calculate mathematical average, ignore NULL
my @data = @_;
my ($sum, $n) = (0, 0);
foreach $x (@data) {
next unless $x;
$sum += $x; $n++;
}
return ($n>0)?$sum/$n:undef;
}
$t2 = $t->group(["Department","Sex"],["Name", "Salary"], [sub {scalar @_}, \&average], ["Nof Employee", "Average Salary"]);
# For each (Department,Sex) pair, calculate the number of employees and average salary
$t2 = $t2->pivot("Sex", 0, "Average Salary", ["Department"]);
# Show average salary information in a Department by Sex spreadsheet
This perl package uses perl5 objects to make it easy for
manipulating spreadsheet data among disk files, database, and Web
publishing.
A table object contains a header and a two-dimensional array of
scalars. Four class methods Data::fromFile, Data::Table::fromCSV,
Data::Table::fromTSV, and Data::Table::fromSQL allow users to create a table
object from a CSV/TSV file or a database SQL selection in a snap.
Table methods provide basic access, add, delete row(s) or
column(s) operations, as well as more advanced sub-table extraction, table
sorting, record matching via keywords or patterns, table merging, and web
publishing. Data::Table class also provides a straightforward interface to
other popular Perl modules such as DBI and GD::Graph.
The most updated version of the Perl Data::Table Cookbook is
available at
https://sites.google.com/site/easydatabase/
We use Data::Table instead of Table, because Table.pm has already
been used inside PerlQt module in CPAN.
A table object has three data members:
- 1. $data:
- a reference to an array of array-references. It's basically a reference to
a two-dimensional array.
- 2. $header:
- a reference to a string array. The array contains all the column
names.
- 3. $type = 1 or 0.
- 1 means that @$data is an array of table columns (fields) (column-based);
0 means that @$data is an array of table rows (records) (row-based);
Row-based/Column-based are two internal implementations for a
table object. E.g., if a spreadsheet consists of two columns lastname and
age. In a row-based table, $data = [ ['Smith', 29],
['Dole', 32] ]. In a column-based table, $data = [
['Smith', 'Dole'], [29, 32] ].
Two implementations have their pros and cons for different
operations. Row-based implementation is better for sorting and pattern
matching, while column-based one is better for adding/deleting/swapping
columns.
Users only need to specify the implementation type of the table
upon its creation via Data::Table::new, and can forget about it afterwards.
Implementation type of a table should be considered volatile, because
methods switch table objects from one type into another internally. Be
advised that row/column/element references gained via table::rowRef,
table::rowRefs, table::colRef, table::colRefs, or table::elmRef may become
stale after other method calls afterwards.
For those who want to inherit from the Data::Table class, internal
method table::rotate is used to switch from one implementation type into
another. There is an additional internal assistant data structure called
colHash in our current implementation. This hash table stores all column
names and their corresponding column index number as key-value pairs for
fast conversion. This gives users an option to use column name wherever a
column ID is expected, so that user don't have to use table::colIndex all
the time. E.g., you may say
$t->rename('oldColName', 'newColName') instead of
$t->rename($t->colIndex('oldColName'),
'newColIdx').
- $Data::Table::VERSION
- @Data::Table::OK
- see table::match_string, table::match_pattern, and
table::match_pattern_hash Since 1.62, we recommend you to use
$table->{OK} instead, which is a local array
reference.
- @Data::Table::MATCH
- see table::match_string, table::match_pattern, and
table::match_pattern_hash Since 1.67, we return the matched row indices in
an array. Data::Table::MATCH is this array reference. Here is an example
of setting a max price of 20 to all items with UnitPrice > 20.
$t_product->match_pattern_hash('$_{UnitPrice} > 20');
$t_product->setElm($t_product->{MATCH}, 'UnitPrice', 20);
- %Data::Table::DEFAULTS
- Store default settings, currently it contains CSV_DELIMITER (set to ','),
CSV_QUALIFER (set to '"'), and OS (set to 0). see table::fromCSV,
table::csv, table::fromTSV, table::tsv for details.
Syntax: return_type method_name ( [ parameter [ = default_value ]]
[, parameter [ = default_value ]] )
If method_name starts with table::, this is an instance method, it
can be used as $t->method( parameters ), where
$t is a table reference.
If method_name starts with Data::Table::, this is a class method,
it should be called as
Data::Table::method, e.g., $t =
Data::Table::fromCSV("filename.csv").
Conventions for local variables:
colID: either a numerical column index or a column name;
rowIdx: numerical row index;
rowIDsRef: reference to an array of column IDs;
rowIdcsRef: reference to an array of row indices;
rowRef, colRef: reference to an array of scalars;
data: ref_to_array_of_ref_to_array of data values;
header: ref to array of column headers;
table: a table object, a blessed reference.
- table Data::Table::new (
$data = [], $header = [], $type = 0, $enforceCheck = 1)
- create a new table. It returns a table object upon success, undef
otherwise. $data: points to the spreadsheet data.
$header: points to an array of column names.
Before version 1.69, a column name must have at least one non-digit
character. Since version 1.69, this is relaxed. Although integer and
numeric column names can now be accepted, when accessing a column by
integer, it is first interpreted as a column name.
$type: 0 or 1 for row-based/column-based
spreadsheet. $enforceCheck: 1/0 to turn on/off
initial checking on the size of each row/column to make sure the data
argument indeed points to a valid structure. In 1.63, we introduce
constants Data::Table::ROW_BASED and Data::Table::COL_BASED as synonyms
for $type. To create an empty Data::Table, use new
Data::Table([], [], Data::Table::ROW_BASED);
- table table::subTable
($rowIdcsRef, $colIDsRef, $arg_ref)
- create a new table, which is a subset of the original. It returns a table
object. $rowIdcsRef: points to an array of row
indices (or a true/false row mask array).
$colIDsRef: points to an array of column IDs. The
function make a copy of selected elements from the original table.
Undefined $rowIdcsRef or
$colIDsRef is interpreted as all rows or all
columns. The elements in $colIDsRef may be
modified as a side effect before version 1.62, fixed in 1.62. If
$arg_ref->{useRowMask} is set to 1,
$rowIdcsRef is a true/false row mask array, where
rows marked as TRUE will be returned. Row mask array is typically the
Data::Table::OK set by match_string/match_pattern/match_pattern_hash
methods.
- table
table::clone
- make a clone of the original. It return a table object, equivalent to
table::subTable(undef,undef).
- table
Data::Table::fromCSV ($name_or_handler, $includeHeader = 1, $header =
["col1", ... ], {OS=>$Data::Table::DEFAULTS{'OS'},
delimiter=>$Data::Table::DEFAULTS{'CSV_DELIMITER'},
qualifier=>$Data::Table::DEFAULTS{'CSV_QUALIFIER'}, skip_lines=>0,
skip_pattern=>undef,
encoding=>$Data::Table::DEFAULTS{'ENCODING'}})
- create a table from a CSV file. return a table object.
$name_or_handler: the CSV file name or an already
opened file handler. If a handler is used, it's not closed upon return. To
read from STDIN, use Data::Table::fromCSV(\*STDIN, 1).
$includeHeader: 0 or 1 to ignore/interpret the
first line in the file as column names, If it is set to 0, the array in
$header is used. If
$header is not supplied, the default column names
are "col1", "col2", ... optional named argument OS
specifies under which operating system the CSV file was generated. 0 for
UNIX, 1 for PC and 2 for MAC. If not specified,
$Data::Table::DEFAULTS{'OS'} is used, which
defaults to UNIX. Basically linebreak is defined as "\n",
"\r\n" and "\r" for three systems, respectively.
optional name argument delimiter and qualifier let user
replace comma and double-quote by other meaningful single characters.
<b>Exception</b>: if the delimiter or the qualifier is a
special symbol in regular expression, you must escape it by '\'. For
example, in order to use pipe symbol as the delimiter, you must specify
the delimiter as '\|'.
optional name argument skip_lines let you specify how many
lines in the csv file should be skipped, before the data are
interpretted.
optional name argument skip_pattern let you specify a regular
expression. Lines that match the regular expression will be skipped.
optional name argument encoding let you specify an encoding
method of the csv file. This option is added to fromCSV, fromTSV,
fromFile since version 1.69.
The following example reads a DOS format CSV file and writes a
MAC format:
$t = Data::Table:fromCSV('A_DOS_CSV_FILE.csv', 1, undef, {OS=>1});
$t->csv(1, {OS=>2, file=>'A_MAC_CSV_FILE.csv'});
open(SRC, 'A_DOS_CSV_FILE.csv') or die "Cannot open A_DOS_CSV_FILE.csv to read!";
$t = Data::Table::fromCSV(\*SRC, 1);
close(SRC);
The following example reads a non-standard CSV file with : as
the delimiter, ' as the qaulifier
my $s="col_A:col_B:col_C\n1:2, 3 or 5:3.5\none:'one:two':'double\", single'''";
open my $fh, "<", \$s or die "Cannot open in-memory file\n";
my $t_fh=Data::Table::fromCSV($fh, 1, undef, {delimiter=>':', qualifier=>"'"});
close($fh);
print $t_fh->csv;
# convert to the standard CSV (comma as the delimiter, double quote as the qualifier)
# col_A,col_B,col_C
# 1,"2, 3 or 5",3.5
# one,one:two,"double"", single'"
print $t->csv(1, {delimiter=>':', qualifier=>"'"}); # prints the csv file use the original definition
The following example reads bbb.csv file (included in the
package) by skipping the first line (skip_lines=>1), then treats any
line that starts with '#' (or space comma) as comments
(skip_pattern=>'^\s*#'), use ':' as the delimiter.
$t = Data::Table::fromCSV("bbb.csv", 1, undef, {skip_lines=>1, delimiter=>':', skip_pattern=>'^\s*#'});
Use the optional name argument encoding to specify file
encoding method.
$t = Data::Table::fromCSV("bbb.csv", 1,
undef, {encoding=>'UTF-8'});
- table table::fromCSVi
($name, $includeHeader = 1, $header = ["col1", ... ])
- Same as Data::Table::fromCSV. However, this is an instant method (that's
what 'i' stands for), which can be inherited.
- table
Data::Table::fromTSV ($name, $includeHeader = 1, $header =
["col1", ... ], {OS=>$Data::Table::DEFAULTS{'OS'},
skip_lines=>0, skip_pattern=>undef, transform_element=>1,
encoding=>$Data::Table::DEFAULTS{'ENCODING'}})
- create a table from a TSV file. return a table object.
$name: the TSV file name or an already opened file
handler. If a handler is used, it's not closed upon return. To read from
STDIN, use Data::Table::fromTSV(\*STDIN, 1).
$includeHeader: 0 or 1 to ignore/interpret the
first line in the file as column names, If it is set to 0, the array in
$header is used. If
$header is not supplied, the default column names
are "col1", "col2", ... optional named argument OS
specifies under which operating system the TSV file was generated. 0 for
UNIX, 1 for P C and 2 for MAC. If not specified,
$Data::Table::DEFAULTS{'OS'} is used, which
defaults to UNIX. Basically linebreak is defined as "\n",
"\r\n" and "\r" for three systems, respectively.
<b>Exception</b>: if the delimiter or the qualifier is a
special symbol in regular expression, you must escape it by '\'. For
example, in order to use pipe symbol as the delimiter, you must specify
the delimiter as '\|'.
optional name argument skip_lines let you specify how many
lines in the csv file should be skipped, before the data are
interpretted.
optional name argument skip_pattern let you specify a regular
expression. Lines that match the regular expression will be skipped.
optional name argument transform_element let you switch on/off
\t to tab, \N to undef (etc.) transformation. See TSV FORMAT for
details. However, elements are always transformed when export table to
tsv format, because not escaping an element containing a tab will be
disasterous.
optional name argument encoding enables one to provide an
encoding method when open the tsv file.
See similar examples under Data::Table::fromCSV;
Note: read "TSV FORMAT" section for details.
- table table::fromTSVi
($name, $includeHeader = 1, $header = ["col1", ... ])
- Same as Data::Table::fromTSV. However, this is an instant method (that's
what 'i' stands for), which can be inherited.
- table
Data::Table::fromFile ($file_name, $arg_ref = {linesChecked=>2,
allowNumericHeader=>0,
encoding=>$Data::Table::DEFAULTS{'ENCODING'}})
- create a table from a text file. return a table object.
$file_name: the file name (cannot take a file
handler). linesChecked: the first number of lines used for guessing the
input format. The delimiter will have to produce the same number of
columns for these lines. By default only check the first 2 lines, 0 means
all lines in the file. $arg_ref can take
additional parameters, such as OS, has_header, delimiter,
transform_element, etc. Encoding allows one to specify encoding methods
used to open the file, which defaults to UTF-8.
fromFile is added after version 1.51. It relies on the
following new methods to automatically figure out the correct file
format in order to call fromCSV or fromTSV internally:
fromFileGuessOS($file_name, {encoding=>'UTF-8'})
returns integer, 0 for UNIX, 1 for PC, 2 for MAC
fromFileGetTopLines($file_name, $os, $lineNumber, {encoding=>'UTF-8'}) # $os defaults to fromFileGuessOS($file_name), if not specified
returns an array of strings, each string represents each row with linebreak removed.
fromFileGuessDelimiter($lineArrayRef) # guess delimiter from ",", "\t", ":";
returns the guessed delimiter string.
fromFileIsHeader($line_concent, $delimiter, $allowNumericHeader) # $delimiter defaults to $Data::Table::DEFAULTS{'CSV_DELIMITER'}
returns 1 or 0.
It first ask fromFileGuessOS to figure out which OS (UNIX, PC
or MAC) generated the input file. The fetch the first linesChecked lines
using fromFileGetTopLines. It then guesses the best delimiter using
fromFileGuessDelimiter, then it checks if the first line looks like a
column header row using fromFileIsHeader. Since fromFileGuessOS and
fromFileGetTopLines needs to open/close the input file, these methods
can only take file name, not file handler. If user specify formatting
parameters in $arg_ref, the routine will skip
the corresponding guess work. At the end, fromFile simply calls either
fromCSV or fromTSV with $arg_ref forwarded. So
if you call fromFile({transform_element=>0}) on a TSV file,
transform_elment will be passed onto fromTSV calls internally.
fromFileGuessOS finds the linebreak that gives shortest first
line (in the priority of UNIX, PC, MAC upon tie). fromFileGuessDelimiter
works based on the assumption that the correct delimiter will produce
equal number of columns for the given rows. If multiple matches, it
chooses the delimiter that gives maximum number of columns. If none
matches, it returns the default delimiter. fromFileIsHeader works based
on the assumption that no column header can be empty or numeric values.
However, if we allow numeric column names (especially integer column
names), set {allowNumericHeader => 1}
- table
Data::Table::fromSQL ($dbh, $sql, $vars)
- create a table from the result of an SQL selection query. It returns a
table object upon success or undef otherwise.
$dbh: a valid database handler. Typically
$dbh is obtained from DBI->connect, see
"Interface to Database" or DBI.pm. $sql:
an SQL query string or a DBI::st object (starting in version 1.61).
$vars: optional reference to an array of variable
values, required if $sql contains '?'s which need
to be replaced by the corresponding variable values upon execution, see
DBI.pm for details. Hint: in MySQL, Data::Table::fromSQL($dbh, 'show
tables from test') will also create a valid table object.
Data::Table::fromSQL now can take DBI::st instead of a SQL
string. This is introduced, so that variable binding (such as CLOB/BLOB)
can be done outside the method, for example:
$sql = 'insert into test_table (id, blob_data) values (1, :val)';
$sth = $dbh->prepare($sql);
$sth->bind_param(':val', $blob, {ora_type => SQLT_BIN});
Data::Table::fromSQL($dbh, $sth);
- table
Data::Table::fromSQLi ($dbh, $sql, $vars)
- Same as Data::Table::fromSQL. However, this is an instant method (that's
what 'i' stands for), whic h can be inherited.
- int table::colIndex
($colID)
- translate a column name into its numerical position, the first column has
index 0 as in as any perl array. return -1 for invalid column names.
Since 1.69, we allow integer to be used as a column header.
The integer $colID will first be checked against
column names, if matched, the corresponding column index is returned.
E.g., if column name for the 3rd column is "1",
colIndex(1) will return 2 instead of 1! In such case, if one need
to access the second column, one has to access it by column name, i.e.,
$t->col(($t->header)[1]).
- int table::nofCol
- return number of columns.
- int table::nofRow
- return number of rows.
- int table::lastCol
- return the index of the last columns, i.e., nofCol - 1.
- int table::lastRow
- return the index of the last rows, i.e., nofRow - 1; This is syntax sugar.
# these two are equivalent
foreach my $i (0 .. $t->lastRow)
foreach my $i (0 .. $t->nofRow - 1)
- bool table::isEmpty
- return whether the table has any column, introduced in 1.63.
- bool
table::hasCol($colID)
- returns whether the colID is a table column, introduced in 1.63.
- bool
table::colName($colNumericIndex)
- returns the column name for a numeric column index, notice the first
column has an index of 0. Introduced in 1.68.
- scalar table::elm
($rowIdx, $colID)
- return the value of a table element at [$rowIdx,
$colID], undef if $rowIdx
or $colID is invalid.
- refto_scalar
table::elmRef ($rowIdx, $colID)
- return the reference to a table element at [$rowIdx,
$colID], to allow possible modification. It
returns undef for invalid $rowIdx or
$colID.
- array table::header
($header)
- Without argument, it returns an array of column names. Otherwise, use the
new header.
- int table::type
- return the implementation type of the table (row-based/column-based) at
the time, be aware that the type of a table should be considered as
volatile during method calls.
- string table::csv
($header, {OS=>$Data::Table::DEFAULTS{'OS'}, file=>undef,
delimiter=>$Data::Table::DEFAULTS{'CSV_DELIMITER'},
qualifier=>$Data::Table::DEFAULTS{'CSV_QAULIFIER'}})
- return a string corresponding to the CSV representation of the table.
$header controls whether to print the header line,
1 for yes, 0 for no. optional named argument OS specifies for which
operating system the CSV file is generated. 0 for UNIX, 1 for P C and 2
for MAC. If not specified,
$Data::Table::DEFAULTS{'OS'} is used. Basically
linebreak is defined as "\n", "\r\n" and
"\r" for three systems, respectively. if 'file' is given, the
csv content will be written into it, besides returning the string. One may
specify custom delimiter and qualifier if the other than default are
desired.
- string
table::tsv
- return a string corresponding to the TSV representation of the table.
$header controls whether to print the header line,
1 for yes, 0 for no. optional named argument OS specifies for which
operating system the TSV file is generated. 0 for UNIX, 1 for P C and 2
for MAC. If not specified,
$Data::Table::DEFAULTS{'OS'} is used. Basically
linebreak is defined as "\n", "\r\n" and
"\r" for three systems, respectively. if 'file' is given, the
tsv content will be written into it, besides returning the string.
Note: read "TSV FORMAT" section for details.
- string table::html
($colorArrayRef_or_colorHashRef =
["#D4D4BF","#ECECE4","#CCCC99"], $tag_tbl =
{border => '1'}, $tag_tr = {align => 'left'}, $tag_th = {align =>
'center'}, $tag_td = {col3 => 'align="right"
valign="bottom"', 4 => 'align="left"'}, $l_portrait =
1, $callback = undef )
- return a string corresponding to a 'Portrait/Landscape'-style html-tagged
table. $colorArrayRef_or_colorHashRef: If a hash
reference is provided, it will take three CSS class names for odd data
rows, even data rows and for the header row. The default hash is
{even=>"data_table_even", odd=>"data_table_odd",
header=>"data_table_header"). If a hash reference is not
found, a reference to an array of three color strings is expected to
provided for backgrounds for even-row records, odd-row records, and -der
row, respectively. A default color array
("#D4D4BF","#ECECE4","#CCCC99") will be used
if $colors isn't defined.
Since version 1.74, users can prevent default coloring by
passing in a color array reference ["", "",
""].
Before version 1.59, the parameter can only accept an array
reference.
$tag_tbl: a reference to a hash that
specifies any legal attributes such as name, border, id, class, etc. for
the TABLE tag.
$tag_tr: a reference to a hash that
specifies any legal attributes for the TR tag.
$tag_th: a reference to a hash that
specifies any legal attributes for the TH tag.
$tag_td: a reference to a hash that
specifies any legal attributes for the TD tag.
Notice $tag_tr and
$tag_th controls all the rows and columns of the
whole table. The keys of the hash are the attribute names in these
cases. However, $tag_td is column specific,
i.e., you should specify TD attributes for every column separately. The
key of %$tag_td are either column names or column indices, the value is
a reference to a hash. E.g., $tag_td = {col3
=> {'style'=>'background-color:#cccc99;'}}. However, before
version 1.74, the value is the full string to be inserted into the TD
tag. E.g., $tag_td = {col3 => 'align=right
valign=bottom} only change the TD tag in "col3" to be <TD
align=right valign=bottom>;. This format is still supported for
backward compatibility.
$portrait controls the layout of the
table. The default is 1, i.e., the table is shown in the
"Portrait" style, like in Excel. 0 means
"Landscape". Since version 1.59, tbody and thead tags are
added to the portrait mode output.
Since version 1.74, $callback is
introduced to give users fine control on the tag for each cell, i.e.,
for each th/td cells. $callback is a subroutine
reference, where the sub is expected to take parameters ($tag,
$row_index, $col_index,
$col_name, $table),
$tag is reference to a hash containing existing
TH/TD tags, the sub will return a new tag. The rest of the parameters
give sub access to the identity of the table cell, as well as the table
itself.
If the following example, the callback function colors each
UnitPrice cell based on whether its value is >=20 or <20. It
colors each Discontinued cell based on whether its value is TRUE or
FALSE. One can also control the column header cells, which has row index
of -1. That is the reason we use "$row >=0 " within
callback to make sure it cell is not a column header.
$t=Data::Table::fromCSV("Data-Table-1.74/Product.csv",1,undef, {'OS'=>1});
my $callback = sub {
my ($tag, $row, $col, $colName, $table) = @_;
if ($row >=0 && $colName eq 'UnitPrice') {
$tag->{'style'} = 'background-color:'. (($table->elm($row, $col)>=20) ? '#fc8d59':'#91bfdb') . ';';
}
if ($row >=0 && $colName eq 'Discontinued') {
$tag->{'style'} = 'background-color:'. (($table->elm($row, $col) eq 'TRUE') ? '#999999':'#af8dc3') .';';
}
return $tag;
};
print $t->html(undef, undef, undef, undef, undef, undef, $callback);
Attention: You will have to escape HTML-Entities yourself (for
example '<' as '<'), if you have characters in you table which
need to be escaped. You can do this for example with the
escapeHTML-function from CGI.pm (or the HTML::Entities module).
use CGI qw(escapeHTML);
[...]
$t->colMap($columnname, sub{escapeHTML($_)}); # for every column, where HTML-Entities occur.
- string table::html2
($colors = ["#D4D4BF","#ECECE4","#CCCC99"],
$specs = {'name' => '', 'border' => '1', ...})
- This method is deprecated. It's here for compatibility. It now simple call
html method with $portrait = 0, see previous
description.
return a string corresponding to a "Landscape"
html-tagged table. This is useful to present a table with many columns,
but very few entries. Check the above table::html for parameter
descriptions.
- string
table::wiki(...)
- This method accepts the same parameters as table::html, returns a
wikitable instead.
- string
table::wiki2(...)
- This method accepts the same parameters as table::html2, returns a
wikitable instead in landscape orientation.
- int table::setElm ($rowIdx,
$colID, $val)
- modify the value of a table element at [$rowIdx,
$colID] to a new value
$val. It returns 1 upon success, undef otherwise.
In 1.68, setElm can manipulate multiple elements, i.e.,
$rowIdx and $colIdx can be
references to an index array, and setElm() will modifies all cells
defined by the grid.
$t->setElm([0..2], ['ColA', 'ColB'], 'new value');
$t->setElm(0, [1..2], 'new value');
# puts a limit on the price of all expensive items
$t_product->match_pattern_hash('$_{UnitPrice} > 20');
$t_product->setElm($t_product->{MATCH}, 'UnitPrice', 20);
- int table::addRow ($rowRef,
$rowIdx = table::nofRow, $arg_ref = {addNewCol => 0})
- add a new row ($rowRef may point to the actual list of scalars, or it can
be a hash_ref (supported since version 1.60)). If
$rowRef points to a hash, the method will lookup
the value of a field by ts column name:
$rowRef->{colName}, if not found, undef is used
for that field. The new row will be referred as
$rowIdx as the result. E.g., addRow($aRow, 0) will
put the new row as the very first row. By default, it appends a row to the
end. In 1.67, we support {addNewCol => 1}, if specified, a new column
will be automatically created for each new element encountered in the
$rowRef.
# automatically add a new column "aNewColumn" to $t, in order to hold the new value
$t->addRow({anExistingColumn => 123, aNewColumn => "XYZ"}, undef, {addNewCol => 1});
# $t only had one column, after this call, it will contain a new column 'col2', in order to hold the new value
$t->addRow([123, "XYZ"], undef, {addNewCol => 1});
It returns 1 upon success, undef otherwise.
- refto_array
table::delRow ( $rowIdx )
- delete a row at $rowIdx. It will the reference to
the deleted row.
- refto_array
table::delRows ( $rowIdcsRef )
- delete rows in @$rowIdcsRef. It will return an array of deleted rows in
the same order of $rowIdcsRef upon success. upon
success.
- int table::addCol ($colRef,
$colName, $colIdx = numCol)
- add a new column ($colRef points to the actual data), the new column will
be referred as $colName or
$colIdx as the result. E.g., addCol($aCol,
'newCol', 0) will put the new column as the very first column. By default,
append a column to the end. It will return 1 upon success or undef
otherwise. In 1.68, $colRef can be a scalar, which
is the default value that can be used to create the new column. E.g., to
create a new column with default value of undef, 0, 'default',
respectively, one can do:
$t->addCol(undef, 'NewCol');
$t->addCol(0, 'NewIntCol');
$t->addCol('default', 'NewStringCol');
- refto_array
table::delCol ($colID)
- delete a column at $colID return the reference to
the deleted column.
- arrayof_refto_array
table::delCols ($colIDsRef)
- delete a list of columns, pointed by $colIDsRef.
It will return an array of deleted columns in the same order of
$colIDsRef upon success.
- refto_array
table::rowRef ($rowIdx)
- return a reference to the row at $rowIdx upon
success or undef otherwise.
- refto_arrayof_refto_array
table::rowRefs ($rowIdcsRef)
- return a reference to array of row references upon success, undef
otherwise.
- array table::row
($rowIdx)
- return a copy of the row at $rowIdx upon success
or undef otherwise.
- refto_hash
table::rowHashRef ($rowIdx)
- return a reference to a hash, which contains a copy of the row at
$rowIdx, upon success or undef otherwise. The keys
in the hash are column names, and the values are corresponding elements in
that row. The hash is a copy, therefore modifying the hash values doesn't
change the original table.
- refto_array
table::colRef ($colID)
- return a reference to the column at $colID upon
success.
- refto_arrayof_refto_array
table::colRefs ($colIDsRef)
- return a reference to array of column references upon success.
- array table::col
($colID)
- return a copy to the column at $colID upon success
or undef otherwise.
- int table::rename ($colID,
$newName)
- rename the column at $colID to a
$newName (the newName must be valid, and should
not be identical to any other existing column names). It returns 1 upon
success or undef otherwise.
- refto_array
table::replace ($oldColID, $newColRef, $newName)
- replace the column at $oldColID by the array
pointed by $newColRef, and renamed it to
$newName. $newName is
optional if you don't want to rename the column. It returns 1 upon success
or undef otherwise.
- int table::swap ($colID1,
$colID2)
- swap two columns referred by $colID1 and
$colID2. It returns 1 upon success or undef
otherwise.
- int table::moveCol($colID,
$colIdx, $newColName)
- move column referred by $colID to a new location
$colIdx. If $newColName is
specified, the column will be renamed as well. It returns 1 upon success
or undef otherwise.
- int
table::reorder($colIDRefs, $arg_ref)
- Rearrange the columns according to the order specified in
$colIDRef. Columns not specified in the reference
array will be appended to the end! If one would like to drop columns not
specified, set $arg_ref to {keepRest => 0}.
reorder() changes the table itself, while subTable(undef,
$colIDRefs) will return a new table.
reorder() might also runs faster than subTable, as elements may not
need to be copied.
- int table::colMap ($colID,
$fun)
- foreach element in column $colID, map a function
$fun to it. It returns 1 upon success or undef
otherwise. This is a handy way to format a column. E.g. if a column named
URL contains URL strings, colMap("URL", sub {"<a
href='$_'>$_</a>"}) before html() will change each
URL into a clickable hyper link while displayed in a web browser.
- int table::colsMap
($fun)
- foreach row in the table, map a function $fun to
it. It can do whatever colMap can do and more. It returns 1 upon success
or undef otherwise. colMap function only give $fun
access to the particular element per row, while colsMap give
$fun full access to all elements per row. E.g. if
two columns named duration and unit (["2", "hrs"],
["30", "sec"]). colsMap(sub {$_->[0] .= "
(".$_->[1].")"; } will change each row into (["2
hrs", "hrs"], ["30 sec", "sec"]). As
show, in the $func, a column element should be
referred as $_->[$colIndex].
- int table::sort($colID1,
$type1, $order1, $colID2, $type2, $order2, ... )
- sort a table in place. First sort by column
$colID1 in $order1 as
$type1, then sort by
$colID2 in $order2 as
$type2, ... $type is 0 for
numerical and 1 for others; $order is 0 for
ascending and 1 for descending;
In 1.62, instead of memorize these numbers, you can use
constants instead (notice constants do not start with '$').
Data::Table::NUMBER
Data::Table::STRING
Data::Table::ASC
Data::Table::DESC
Sorting is done in the priority of colID1, colID2, ... It
returns 1 upon success or undef otherwise. Notice the table is
rearranged as a result! This is different from perl's list sort, which
returns a sorted copy while leave the original list untouched, the
authors feel inplace sorting is more natural.
table::sort can take a user supplied operator, this is useful
when neither numerical nor alphabetic order is correct.
$Well=["A_1", "A_2", "A_11", "A_12", "B_1", "B_2", "B_11", "B_12"];
$t = Data::Table->new([$Well], ["PlateWell"], 1);
$t->sort("PlateWell", 1, 0);
print join(" ", $t->col("PlateWell"));
# prints: A_1 A_11 A_12 A_2 B_1 B_11 B_12 B_2
# in string sorting, "A_11" and "A_12" appears before "A_2";
my $my_sort_func = sub {
my @a = split /_/, $_[0];
my @b = split /_/, $_[1];
my $res = ($a[0] cmp $b[0]) || (int($a[1]) <=> int($b[1]));
};
$t->sort("PlateWell", $my_sort_func, 0);
print join(" ", $t->col("PlateWell"));
# prints the correct order: A_1 A_2 A_11 A_12 B_1 B_2 B_11 B_12
- table
table::match_pattern ($pattern, $countOnly)
- return a new table consisting those rows evaluated to be true by
$pattern upon success or undef otherwise. If
$countOnly is set to 1, it simply returns the
number of rows that matches the string without making a new copy of table.
$countOnly is 0 by default.
Side effect: @Data::Table::OK (should
use $t->{OK} after 1.62) stores a true/false
array for the original table rows. Using it, users can find out what are
the rows being selected/unselected. Side effect:
@Data::Table::MATCH stores a reference to an
array containing all row indices for matched rows.
In the $pattern string, a column
element should be referred as
$_->[$colIndex]. E.g.,
match_pattern('$_->[0]>3 &&
$_->[1]=~/^L') retrieve all the rows where
its first column is greater than 3 and second column starts with letter
'L'. Notice it only takes colIndex, column names are not acceptable
here!
- table
table::match_pattern_hash ($pattern, $countOnly)
- return a new table consisting those rows evaluated to be true by
$pattern upon success or undef otherwise. If
$countOnly is set to 1, it simply returns the
number of rows that matches the string without making a new copy of table.
$countOnly is 0 by default.
Side effect: @Data::Table::OK stores a
reference to a true/false array for the original table rows. Using it,
users can find out what are the rows being selected/unselected. Side
effect: @Data::Table::MATCH stores a reference
to an array containing all row indices for matched rows.
In the $pattern string, a column
element should be referred as ${column_name}.
match_pattern_hash() is added in 1.62. The difference between
this method and match_pattern is each row is fed to the pattern as a
hash %_. In the case of match_pattern, each row
is fed as an array ref $_. The pattern for
match_pattern_hash() becomes much cleaner.
If a table has two columns: Col_A as the 1st column and Col_B
as the 2nd column, a filter "Col_A > 2 AND Col_B < 2" is
written before as
$t->match_pattern('$_->[0] > 2
&& $_->[1] <2'); where we need to
figure out $t->colIndex('Col_A') is 0 and
$t->colIndex('Col_B') is 1, in order to build
the pattern. Now you can use column name directly in the pattern:
$t->match_pattern_hash('$_{Col_A} >2
&& $_{Col_B} <2'); This method
creates $t->{OK}, as well as
@Data::Table::OK, same as
match_pattern().
Simple boolean operators such as and/or can be directly put
into the pattern string. More complex logic can also be supported in the
example below:
my $t= Data::Table->new([[2,5,'Jan'], [1,6,'Feb'], [-3,2,'Apr'], [6,-4,'Dec']], ['X','Y','Month'], 0);
# we need to use our instead of my, so that %Q1 is accessible within match_pattern_hash
our %Q1 = ('Jan'=>1, 'Feb'=>1, 'Mar'=>1);
# find records belongin to Q1 months, we need to use %::Q1 to access the Q1 defined outside Data::Table
$t2=$t->match_pattern_hash('exists $::Q1{$_{Month}}');
similarly, subroutines can be accessed inside
match_pattern_hash using "::":
sub in_Q1 {
my $x = shift;
return ($x eq 'Jan' or $x eq 'Feb' or $x eq 'Mar');
}
$t2=$t->match_pattern_hash('::in_Q1($_{Month})');
However, such usage is discouraged, as
match_pattern_hash() does not throw errors when the pattern is
invalid. For complex filtering logic, we strongly recommend you stick to
row-based looping.
- table
table::match_string ($s, $caseIgnore, $countOnly)
- return a new table consisting those rows contains string
$s in any of its fields upon success, undef
otherwise. if $caseIgnore evaluated to true, case
will is be ignored (s/$s/i). If $countOnly is set
to 1, it simply returns the number of rows that matches the string without
making a new copy of table. $countOnly is 0 by
default.
Side effect: @Data::Table::OK stores a
reference to a true/false array for the original table rows. Side
effect: @Data::Table::MATCH stores a reference
to an array containing all row indices for matched rows. Using it, users
can find out what are the rows being selected/unselected. The
$s string is actually treated as a regular
expression and applied to each row element, therefore one can actually
specify several keywords by saying, for instance,
match_string('One|Other').
- table
table::rowMask($mask, $complement)
- mask is reference to an array, where elements are evaluated to be true or
false. The size of the mask must be equal to the nofRow of the table.
return a new table consisting those rows where the corresponding mask
element is true (or false, when complement is set to true).
E.g.,
$t1=$tbl->match_string('keyword');
$t2=$tbl->rowMask(\@Data::Table::OK, 1)
creates two new tables. $t1 contains all rows
match 'keyword', while $t2 contains all other
rows.
mask is reference to an array, where elements are evaluated to
be true or false. The size of the mask must be equal to the nofRow of
the table. return
a new table consisting those rows where the corresponding mask element is
true (or false, when complement is set to true).
E.g.,
$t1=$tbl->match_string('keyword');
$t2=$tbl->rowMask(\@Data::Table::OK, 1)
creates two new tables. $t1 contains all rows
match 'keyword', while $t2 contains all other
rows.
- table
table::iterator({$reverse => 0})
- Returns a reference to a enumerator routine, which enables one to loop
through each table row. If $reverse is set to 1,
it will enumerate backward. The convenience here is each row is fetch as a
rowHashRef, so one can easily access row elements by name.
my $next = $t_product->iterator();
while (my $row = $next->()) {
# have access to a row as a hash reference, access row number by &$next(1);
$t_product->setElm($next->(1), 'ProductName', 'New! '.$row->{ProductName});
}
In this example, each $row is fetched
as a hash reference, so one can access the elements by
$row->{colName}. Be aware that the elements
in the hash is a copy of the original table elements, so modifying
$row->{colName} does not modify the original
table. If table modification is intended, one needs to obtain the row
index of the returned row. $next->(1) call
with a non-empty argument returns the row index of the record that was
previously fetched with $next->(). In this
example, one uses the row index to modify the original table.
- table
table::each_group($colsToGroupBy, $funsToApply)
- Primary key columns are specified in
$colsToGroupBy. All rows are grouped by primary
keys first (keys sorted as string). Then for each group, subroutines
$funToAppy is applied to corresponding rows.
$funToApply are passed with two parameters
($tableRef, $rowIDsRef). All rows sharing the key
are passed in as a Data::Table object (with all columns and in the order
of ascending row index) in the first parameter. The second optional
parameter contains an array of row indices of the group members. Since all
rows in the passed-in table contains the same keys, the key value can be
obtained from its first table row.
- table
table::group($colsToGroupBy, $colsToCalculate, $funsToApply, $newColNames,
$keepRestCols)
- Primary key columns are specified in
$colsToGroupBy. All rows are grouped by primary
keys first. Then for each group, an array of subroutines (in
$funsToAppy) are applied to corresponding columns
and yield a list of new columns (specified in
$newColNames).
$colsToGroupBy,
$colsToCalculate are references to array of
colIDs. $funsToApply is a reference to array of
subroutine references. $newColNames are a
reference to array of new column name strings. If specified, the size of
arrays pointed by $colsToCalculate,
$funsToApply and
$newColNames should be i dentical. A column may
be used more than once in $colsToCalculate.
$keepRestCols is default to 1 (was
introduced as 0 in 1.64, changed to 1 in 1.66 for backward
compatibility) introduced in 1.64), otherwise, the remaining columns are
returned with the first encountered value of that group.
E.g., an employee salary table $t
contains the following columns: Name, Sex, Department, Salary. (see
examples in the SYNOPSIS)
$t2 = $t->group(["Department","Sex"],["Name", "Salary"], [sub {scalar @_}, \&average], ["Nof Employee", "Average Salary"], 0);
Department, Sex are used together as the primary key columns,
a new column "Nof Employee" is created by counting the number
of employee names in each group, a new column "Average Salary"
is created by averaging the Salary data falled into each group. As the
result, we have the head count and average salary information for each
(Department, Sex) pair. With your own functions (such as sum, product,
average, standard deviation, etc), group method is very handy for
accounting purpose. If primary key columns are not defined, all records
will be treated as one group.
$t2 = $t->group(undef,["Name", "Salary"], [sub {scalar @_}, \&average], ["Nof Employee", "Average Salary"], 0);
The above statement will output the total number of employees
and their average salary as one line.
- table
table::pivot($colToSplit, $colToSplitIsStringOrNumeric, $colToFill,
$colsToGroupBy, $keepRestCols)
- Every unique values in a column (specified by
$colToSplit) become a new column. undef value
become "NULL".
$colToSplitIsStringOrNumeric is set to numeric (0
or Data::Table:NUMBER), the new column names are prefixed by
"oldColumnName=". The new cell element is filled by the value
specified by $colToFill (was 1/0 before version
1.63).
Note: yes, it seems I made an incompatible change in version
1.64, where $colToSplitIsStringOrNumber used to
be $colToSplitIsNumeric, where 0 meant STRING
and 1 meant NUMBER. Now it is opposite. However, I also added auto-type
detection code, that this parameter essentially is auto-guessed and most
old code should behave the same as before.
When primary key columns are specified by
$colsToGroupBy, all records sharing the same
primary key collapse into one row, with values in
$colToFill filling the corresponding new
columns. If $colToFill is not specified, a cell
is filled with the number of records fall into that cell.
$colToSplit and
$colToFill are colIDs.
$colToSplitIsNumeric is 1/0.
$colsToGroupBy is a reference to array of
colIDs. $keepRestCols is 1/0, by default is 0.
If $keepRestCols is off, only primary key
columns and new columns are exported, otherwise, all the rest columns
are exported as well.
E.g., applying pivot method to the resultant table of the
example of the group method.
$t2->pivot("Sex", 0, "Average Salary",["Department"]);
This creates a 2x3 table, where Departments are use as row
keys, Sex (female and male) become two new columns. "Average
Salary" values are used to fill the new table elements. Used
together with group method, pivot method is very handy for accounting
type of analysis. If $colsToGroupBy is left as
undef, all rows are treated as one group. If
$colToSplit is left as undef, the method will
generate a column named "(all)" that matches all records share
the corresponding primary key.
- table
table::melt($keyCols, $variableCols, $arg_ref)
- The idea of melt() and cast() are taken from Hadley
Wickham's Reshape package in R language. A table is first melt()
into a tall-skiny format, where measurements are stored in the format of a
variable-value pair per row. Such a format can then be easily
cast() into various contingency tables.
One needs to specify the columns consisting of primary keys,
columns that are consider as variable columns. The output variable
column is named 'variable' unless specified by
$arg_ref{variableColName}. The output value
column is named 'value', unless specified in
$arg_ref{valueColName}. By default NULL values
are not output, unless $arg_ref{skip_NULL} is
set to false. By default empty string values are kept, unless one sets
skip_empty to `.
For each object (id), we measure variable x1 and x2 at two time points
$t = new Data::Table([[1,1,5,6], [1,2,3,5], [2,1,6,1], [2,2,2,4]], ['id','time','x1','x2'], Data::Table::ROW_BASED);
# id time x1 x2
# 1 1 5 6
# 1 2 3 5
# 2 1 6 1
# 2 2 2 4
# melting a table into a tall-and-skinny table
$t2 = $t->melt(['id','time']);
#id time variable value
# 1 1 x1 5
# 1 1 x2 6
# 1 2 x1 3
# 1 2 x2 5
# 2 1 x1 6
# 2 1 x2 1
# 2 2 x1 2
# 2 2 x2 4
# casting the table, &average is a method to calculate mean
# for each object (id), we calculate average value of x1 and x2 over time
$t3 = $t2->cast(['id'],'variable',Data::Table::STRING,'value', \&average);
# id x1 x2
# 1 4 5.5
# 2 4 2.5
- table
table::cast($colsToGroupBy, $colToSplit, $colToSplitIsStringOrNumeric,
$colToCalculate, $funToApply)
- see melt(), as melt() and cast() are meant to use
together.
The table has been melten before. cast() group the
table according to primary keys specified in
$colsToGroupBy. For each group of objects
sharing the same id, it further groups values (specified by
$colToCalculate) according to unique variable
names (specified by $colToSplit). Then it
applies subroutine $funToApply to obtain an
aggregate value. For the output, each unique primary key will be a row,
each unique variable name will become a column, the cells are the
calculated aggregated value.
If $colsToGroupBy is undef, all rows
are treated as within the same group. If
$colToSplit is undef, a new column
"(all)" is used to hold the results.
$t = Data::Table->new( # create an employ salary table
[
['Tom', 'male', 'IT', 65000],
['John', 'male', 'IT', 75000],
['Tom', 'male', 'IT', 65000],
['John', 'male', 'IT', 75000],
['Peter', 'male', 'HR', 85000],
['Mary', 'female', 'HR', 80000],
['Nancy', 'female', 'IT', 55000],
['Jack', 'male', 'IT', 88000],
['Susan', 'female', 'HR', 92000]
],
['Name', 'Sex', 'Department', 'Salary'], Data::Table::ROW_BASED);
# get a Department x Sex contingency table, get average salary across all four groups
print $t->cast(['Department'], 'Sex', Data::Table::STRING, 'Salary', \&average)->csv(1);
Department,female,male
IT,55000,73600
HR,86000,85000
# get average salary for each department
print $t->cast(['Department'], undef, Data::Table::STRING, 'Salary', \&average)->csv(1);
Department,(all)
IT,70500
HR,85666.6666666667
# get average salary for each gender
print $t->cast(['Sex'], undef, Data::Table::STRING, 'Salary', \&average)->csv(1);
Sex,(all)
male,75500
female,75666.6666666667
# get average salary for all records
print $t->cast(undef, undef, Data::Table::STRING, 'Salary', \&average)->csv(1);
(all)
75555.5555555556
- int table::rowMerge ($tbl,
$argRef)
- Append all the rows in the table object $tbl to
the original rows. Before 1.62, the merging table
$tbl must have the same number of columns as the
original, as well as the columns are in exactly the same order. It returns
1 upon success, undef otherwise. The table object
$tbl should not be used afterwards, since it
becomes part of the new table.
Since 1.62, you may provide {byName =>1, addNewCol=>1}
as $argRef. If byName is set to 1, the columns
in in $tbl do not need to be in the same order
as they are in the first table, instead the column name is used for the
matching. If addNewCol is set to 1, if $tbl
contains a new column name that does not already exist in the first
table, this new column will be automatically added to the resultant
table. Typically, you want to specify there two options
simultaneously.
- int table::colMerge ($tbl,
$argRef)
- Append all the columns in table object $tbl to the
original columns. Table $tbl must have the same
number of rows as the original. It returns 1 upon success, undef
otherwise. Table $tbl should not be used
afterwards, since it becomes part of the new table.
Since 1.62, you can specify {renameCol => 1} as
$argRef. This is to auto fix any column name
collision. If $tbl contains a column that
already exists in the first table, it will be renamed (by a suffix _2)
to avoid the collision.
- table table::join
($tbl, $type, $cols1, $cols2, $argRef)
- Join two tables. The following join types are supported (defined by
$type):
0: inner join 1: left outer join 2: right outer join 3: full
outer join
In 1.62, instead of memorize these numbers, you can use
constants instead (notice constants do not start with '$').
Data::Table::INNER_JOIN
Data::Table::LEFT_JOIN
Data::Table::RIGHT_JOIN
Data::Table::FULL_JOIN
$cols1 and
$cols2 are references to array of colIDs, where
rows with the same elements in all listed columns are merged. As the
result table, columns listed in $cols2 are
deleted, before a new table is returned.
The implementation is hash-join, the running time should be
linear with respect to the sum of number of rows in the two tables
(assume both tables fit in memory).
If the non-key columns of the two tables share the same name,
the routine will fail, as the result table cannot contain two columns of
the same name. In 1.62, one can specify {renameCol=>1} as
$argRef, so that the second column will be
automatically renamed (with suffix _2) to avoid collision.
If you would like to treat the NULLs in the key columns as
empty string, set {NULLasEmpty => 1}. If you do not want to treat
NULLs as empty strings, but you still like the NULLs in two tables to be
considered as equal (but not equal to ''), set {matchNULL => 1}.
Obviously if NULLasEmpty is set to 1, matchNULL will have no effect.
All internal methods are mainly implemented for used by other
methods in the Table class. Users should avoid using them. Nevertheless,
they are listed here for developers who would like to understand the code
and may derive a new class from Data::Table.
- int table::rotate
- convert the internal structure of a table between row-based and
column-based. return 1 upon success, undef otherwise.
- string
csvEscape($string, {delimiter=>, qualifier})
- Encode a scalar into a CSV-formatted field.
optional named arguments: delimiter and qualifier, in case
user wants to use characters other than the defaults. The default
delimiter and qualifier is taken from
$Data::Table::DEFAULTS{'CSV_DELIMITER'}
(defaults to ',') and
$Data::Table::DEFAULTS{'CSV_QUALIFIER'}
(defaults to '"'), respectively.
Please note that this function only escape one element in a
table. To escape the whole table row, you need to
join($delimiter, map {csvEscape($_)} @row .
$endl; $endl refers to
End-of-Line, which you may or may not want to add, and it is
OS-dependent. Therefore, csvEscape method is kept to the simplest form
as an element transformer.
- refto_array
parseCSV($string)
- Break a CSV encoded string to an array of scalars (check it out, we did it
the cool way).
optional argument size: specify the expected number of fields
after csv-split. optional named arguments: delimiter and qualifier, in
case user wants to use characters other than the defaults. respectively.
The default delimiter and qualifier is taken from
$Data::Table::DEFAULTS{'CSV_DELIMITER'}
(defaults to ',') and
$Data::Table::DEFAULTS{'CSV_QUALIFIER'}
(defaults to '"'), respectively.
- string
tsvEscape($rowRef)
- Encode a scalar into a TSV-formatted string.
There is no standard for TSV format as far as we know. CSV format
can't handle binary data very well, therefore, we choose the TSV format to
overcome this limitation.
We define TSV based on MySQL convention.
"\0", "\n", "\t", "\r", "\b", "'", "\"", and "\\" are all escaped by '\' in the TSV file.
(Warning: MySQL treats '\f' as 'f', and it's not escaped here)
Undefined values are represented as '\N'.
However, you can switch off this transformation by setting
{transform_element => 0} in the fromTSV or tsv method. Before if a cell
reads 'A line break is \n', it is read in as 'A link break is [return]' in
memory. When use tsv method to export, it is transformed back to 'A line
break is \n'. However, if it is exported as a csv, the [return] will break
the format. Now if transform_element is set to 0, the cell is stored as 'A
line break is \n' in memory, so that csv export will be correct. However, do
remember to set {transform_element => 0} in tsv export method, otherwise,
the cell will become 'A line break is \\n'. Be aware that trasform_element
controls column headers as well.
Spreadsheet is a very generic type, therefore Data::Table class
provides an easy interface between databases, web pages, CSV/TSV files,
graphics packages, etc.
Here is a summary (partially repeat) of some classic usages of
Data::Table.
use DBI;
$dbh= DBI->connect("DBI:mysql:test", "test", "") or die $DBI::errstr;
my $minAge = 10;
$t = Data::Table::fromSQL($dbh, "select * from mytable where age >= ?", [$minAge]);
print $t->html;
$t = fromFile("mydata.csv"); # after version 1.51
$t = fromFile("mydata.tsv"); # after version 1.51
$t = fromCSV("mydata.csv");
$t->sort(1,1,0);
print $t->csv;
Same for TSV
Read in two tables from NorthWind.xls file, writes them out to
XLSX format. See Data::Table::Excel module for details.
use Data::Table::Excel;
my ($tableObjects, $tableNames)=xls2tables("NorthWind.xls");
$t_category = $tableObjects[0];
$t_product = $tableObjects[1];
tables2xlsx("NorthWind.xlsx", [$t_category, $t_product]);
use GD::Graph::points;
$graph = GD::Graph::points->new(400, 300);
$t2 = $t->match('$_->[1] > 20 && $_->[3] < 35.7');
my $gd = $graph->plot($t->colRefs([0,2]));
open(IMG, '>mygraph.png') or die $!;
binmode IMG;
print IMG $gd->png;
close IMG;
Copyright 1998-2008, Yingyao Zhou & Guangzhou Zou. All rights
reserved.
It was first written by Zhou in 1998, significantly improved and
maintained by Zou since 1999. The authors thank Tong Peng and Yongchuang Tao
for valuable suggestions. We also thank those who kindly reported bugs, some
of them are acknowledged in the "Changes" file.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Please send bug reports and comments to: easydatabase at gmail dot
com. When sending bug reports, please provide the version of Table.pm, the
version of Perl.
DBI, GD::Graph, Data::Table::Excel.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|