HPL_pdrpanrlN - Right-looking recursive panel factorization.
#include "hpl.h"
void HPL_pdrpanrlN( HPL_T_panel * PANEL,
const
int M,
const int N,
const int ICOFF,
double * WORK );
HPL_pdrpanrlN recursively factorizes a panel of columns using the
recursive Right-looking variant of the one-dimensional algorithm. The lower
triangular N0-by-N0 upper block of the panel is stored in no-transpose form
(i.e. just like the input matrix itself).
Bi-directional exchange is used to perform the swap::broadcast operations at
once for one column in the panel. This results in a lower number of slightly
larger messages than usual. On P processes and assuming bi-directional links,
the running time of this function can be approximated by (when N is equal to
N0):
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
N0^2 * ( M - N0/3 ) * gam2-3
where M is the local number of rows of the panel, lat and bdwth are the latency
and bandwidth of the network for double precision real words, and gam2-3 is an
estimate of the Level 2 and Level 3 BLAS rate of execution. The recursive
algorithm allows indeed to almost achieve Level 3 BLAS performance in the
panel factorization. On a large number of modern machines, this operation is
however latency bound, meaning that its cost can be estimated by only the
latency portion N0 * log_2(P) * lat. Mono-directional links will double this
communication cost.
- PANEL (local input/output) HPL_T_panel *
- On entry, PANEL points to the data structure containing the panel
information.
- M (local input) const int
- On entry, M specifies the local number of rows of sub(A).
- N (local input) const int
- On entry, N specifies the local number of columns of sub(A).
- ICOFF (global input) const int
- On entry, ICOFF specifies the row and column offset of sub(A) in A.
- WORK (local workspace) double *
- On entry, WORK is a workarray of size at least 2*(4+2*N0).
HPL_dlocmax (3),
HPL_dlocswpN (3),
HPL_dlocswpT (3),
HPL_pdmxswp (3),
HPL_pdpancrN (3),
HPL_pdpancrT (3),
HPL_pdpanllN (3),
HPL_pdpanllT (3),
HPL_pdpanrlN (3),
HPL_pdpanrlT (3),
HPL_pdrpancrN (3),
HPL_pdrpancrT (3),
HPL_pdrpanllN (3),
HPL_pdrpanllT (3),
HPL_pdrpanrlT (3),
HPL_pdfact (3).