ORTE_SNAPC - Open RTE MCA Snapshot Coordination (SnapC) Framework: Overview of
Open RTE's SnapC framework, and selected modules. Open MPI 3.1.3
Open RTE can coordinate the generation of a global snapshot of a parallel job
from many distributed local snapshots. The components in this framework
determine how to: Initiate the checkpoint of the parallel application, gather
together the many distributed local snapshots, and provide the user with a
global snapshot handle reference that can be used to restart the parallel
In order for a process to use the Open RTE SnapC components it must adhear to a
few programmatic requirements.
First, the program must call ORTE_INIT
early in its execution. This
should only be called once, and it is not possible to checkpoint the process
without it first having called this function.
The program must call ORTE_FINALIZE
A user may initiate a checkpoint of a parallel application by using the
orte-checkpoint(1) and orte-restart(1) commands.
Open RTE ships with one SnapC component: full
The following MCA parameters apply to all components:
- Set the verbosity level for all components. Default is 0, or silent except
component gathers together the local snapshots to the disk local
to the Head Node Process (HNP) before completing the checkpoint of the
process. This component does not currently support replicated HNPs, or timer
based gathering of local snapshot data. This is a 3-tiered hierarchy of
component has the following MCA parameters:
- The component's priority to use when selecting the most appropriate
component for a run.
- Set the verbosity level for this component. Default is 0, or silent except
component simply selects no SnapC component. All of the SnapC
function calls return immediately with ORTE_SUCCESS.
This component is the last component to be selected by default. This means that
if another component is available, and the none
component was not
explicity requested then ORTE will attempt to activate all of the available
components before falling back to this component.
orte-checkpoint(1), orte-restart(1), opal-checkpoint(1), opal-restart(1),