GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  HPL_PDPANCRT (3)

NAME

HPL_pdpancrT - Crout panel factorization.

CONTENTS

Synopsis
Description
Arguments
See Also

SYNOPSIS

#include "hpl.h"

void HPL_pdpancrT( HPL_T_panel * PANEL, const int M, const int N, const int ICOFF, double * WORK );

DESCRIPTION

HPL_pdpancrT factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Crout variant of the usual one-dimensional algorithm. The lower triangular N0-by-N0 upper block of the panel is stored in transpose form.

Bi-directional exchange is used to perform the swap::broadcast operations at once for one column in the panel. This results in a lower number of slightly larger messages than usual. On P processes and assuming bi-directional links, the running time of this function can be approximated by (when N is equal to N0):

N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
N0^2 * ( M - N0/3 ) * gam2-3

where M is the local number of rows of the panel, lat and bdwth are the latency and bandwidth of the network for double precision real words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS rate of execution. The recursive algorithm allows indeed to almost achieve Level 3 BLAS performance in the panel factorization. On a large number of modern machines, this operation is however latency bound, meaning that its cost can be estimated by only the latency portion N0 * log_2(P) * lat. Mono-directional links will double this communication cost.

Note that one iteration of the the main loop is unrolled. The local computation of the absolute value max of the next column is performed just after its update by the current column. This allows to bring the current column only once through cache at each step. The current implementation does not perform any blocking for this sequence of BLAS operations, however the design allows for plugging in an optimal (machine-specific) specialized BLAS-like kernel. This idea has been suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.

ARGUMENTS

PANEL (local input/output) HPL_T_panel *
  On entry, PANEL points to the data structure containing the panel information.
M (local input) const int
  On entry, M specifies the local number of rows of sub(A).
N (local input) const int
  On entry, N specifies the local number of columns of sub(A).
ICOFF (global input) const int
  On entry, ICOFF specifies the row and column offset of sub(A) in A.
WORK (local workspace) double *
  On entry, WORK is a workarray of size at least 2*(4+2*N0).

SEE ALSO

HPL_dlocmax (3), HPL_dlocswpN (3), HPL_dlocswpT (3), HPL_pdmxswp (3), HPL_pdpancrN (3), HPL_pdpanllN (3), HPL_pdpanllT (3), HPL_pdpanrlN (3), HPL_pdpanrlT (3).
Search for    or go to Top of page |  Section 3 |  Main Index


HPL 2.1 HPL_PDPANCRT (3) October 26, 2012

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.