nvidia-cuda-mps-control - NVIDIA CUDA Multi Process Service
management program
nvidia-cuda-mps-control [-d | -f]
MPS is a runtime service designed to let multiple MPI processes
using CUDA to run concurrently in a way that's transparent to the MPI
program. A CUDA program runs in MPS mode if the MPS control daemon is
running on the system.
When CUDA is first initialized in a program, the CUDA driver
attempts to connect to the MPS control daemon. If the connection attempt
fails, the program continues to run as it normally would without MPS. If
however, the connection attempt to the control daemon succeeds, the CUDA
driver then requests the daemon to start an MPS server on its behalf. If
there's an MPS server already running, and the user id of that server
process matches that of the requesting client process, the control daemon
simply notifies the client process of it, which then proceeds to connect to
the server. If there's no MPS server already running on the system, the
control daemon launches an MPS server with the same user id (UID) as that of
the requesting client process. If there's an MPS server already running, but
with a different user id than that of the client process, the control daemon
requests the existing server to shutdown as soon as all its clients are
done. Once the existing server has terminated, the control daemon launches a
new server with the user id same as that of the queued client process.
The MPS server creates the shared GPU context, and manages its
clients. An MPS server can support a finite amount of CUDA contexts
determined by the hardware architecture it is running on. For compute
capability SM 3.5 through SM 6.0 the limit is 16 clients per GPU at a time.
Compute capability SM 7.0 has a limit of 48. MPS is transparent to CUDA
programs, with all the complexity of communication between the client
process, the server and the control daemon hidden within the driver
binaries.
Currently, CUDA MPS is available on 64-bit Linux only, requires a
device that supports Unified Virtual Address (UVA) and has compute
capability SM 3.5 or higher. Applications requiring pre-CUDA 4.0 APIs are
not supported under CUDA MPS. Certain capabilities are only available
starting with compute capability SM 7.0.
Refer to the MPS documentation on NVIDiA Docs for more
details.
Start the MPS control daemon in background mode, assuming the user
has enough privilege (e.g. root). Parent process exits when control daemon
started listening for client connections.
Start the MPS control daemon in foreground mode, assuming the user
has enough privilege (e.g. root). The debug messages are sent to standard
output.
Relax the single UID user requirement, allowing any UID to connect
and share an MPS server started as root.
Start the front-end management user interface to the MPS control
daemon, which needs to be started first. The front-end UI keeps reading
commands from stdin until EOF. Commands are separated by the newline
character. If an invalid command is issued and rejected, an error message
will be printed to stdout. The exit status of the front-end UI is zero if
communication with the daemon is successful. A non-zero value is returned if
the daemon is not found or connection to the daemon is broken unexpectedly.
See the "quit" command below for more information about the exit
status.
Commands supported by the MPS control daemon:
- get_server_list
- Print out a list of PIDs of all MPS servers.
- get_server_status
PID
- Print out the status of the server with given (PID).
- start_server
-uid UID
- Start a new MPS server for the specified user (UID).
- shutdown_server
PID [-f]
- Shutdown the MPS server with given PID. MPS server only exits after
all clients disconnect and the MPS server may accept new clients while
there is a connected client. -f is forced immediate shutdown. If a
client launches a faulty kernel that runs forever, a forced shutdown of
the MPS server may be required, since the MPS server creates and issues
GPU work on behalf of its clients.
- get_client_list
PID
- Print out a list of PIDs of all clients connected to the MPS server with
given PID.
- quit [-t
TIMEOUT]
- Shutdown the MPS control daemon process and all MPS servers. The MPS
control daemon stops accepting new clients while waiting for current MPS
servers and MPS clients to finish. If TIMEOUT is specified (in
seconds), the daemon will force MPS servers to shutdown if they are still
running after TIMEOUT seconds.
This command is synchronous. The front-end UI waits for the
daemon to shutdown, then returns the daemon's exit status. The exit
status is zero iff all MPS servers have exited gracefully.
- Commands available to
Volta MPS control daemon:
- get_device_client_list
PID
- List the devices and PIDs of client applications that enumerated this
device. It optionally takes the server instance PID.
- set_default_active_thread_percentage
percentage
- Set the default active thread percentage for MPS servers. If there is
already a server spawned, this command will only affect the next server.
The set value is lost if a quit command is executed. The default is
100.
- get_default_active_thread_percentage
- Query the current default available thread percentage.
- set_active_thread_percentage
PID percentage
- Set the active thread percentage for the MPS server instance of the given
PID. All clients created with that server afterwards will observe the new
limit. Existing clients are not affected.
- get_active_thread_percentage
PID
- Query the current available thread percentage of the MPS server instance
of the given PID.
- set_default_device_pinned_mem_limit
dev value
- Sets the default device pinned memory limit for for the MPS servers. If
there is already a server spawned, this command will only affect the next
server. The value must be in the form of an integer followed by a
qualifier, either “G” or “M” that specifies
the value in Gigabytes or Megabytes respectively.
- get_default_device_pinned_mem_limit
dev
- Query the current default pinned memory limit for the device.
- set_device_pinned_mem_limit
PID dev value
- Overrides the device pinned memory limit for the MPS server of the given
PID. All the clients created with that server afterwards will observe the
new limit. Existing clients are not affected.
- get_default_device_pinned_mem_limit
PID dev
- Query the current device pinned memory limit of the MPS server instance of
the given PID for the device dev.
- terminate_client
server PID client PID
- Terminates all the outstanding GPU work of the MPS client process of the
given client PID running on the MPS server denoted by <server
PID>.
- ps [-p
PID]
- Reports a snapshot of the current client processes.
- set_default_client_priority
priority
- Set the client priority level that will be used for new clients. Priority
values are only considered as hints to the CUDA driver and can be ignored
or overriden depending on platform. priority follows a convention
where smaller numbers are higher priorities, and the default priority
value is 0. The only other supported value for priority is 1, which
represents a below-normal priority level.
- get_default_client_priority
- Query the current priority value that will be used for new clients.
- CUDA_MPS_PIPE_DIRECTORY
- Specify the directory that contains the named pipes and UNIX domain
sockets used for communication among the MPS control, MPS server, and MPS
clients. The value of this environment variable should be consistent in
the MPS control daemon and all MPS client processes. Default directory is
/tmp/nvidia-mps
- CUDA_MPS_LOG_DIRECTORY
- Specify the directory that contains the MPS log files. This variable is
used by the MPS control daemon only. Default directory is
/var/log/nvidia-mps
- CUDA_VISIBLE_DEVICES
- Specify which CUDA devices are visible to the control daemon. Takes either
an index value or UUID.
- CUDA_DEVICE_MAX_CONNECTIONS
- Specify the preferred number of connections between the host and
device.
- CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
- Specify the portion of available threads that clients can use on Volta+
hardware. Setting this for the control daemon will set the default active
thread percentage for all servers spawned, while setting this for a client
or client context will constrain the active thread percentage for that
unit and cannot be higher than the active thread percentage value set for
the control daemon.
- CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING
- Specify whether individual client contexts are allowed to have different
values for CUDA_MPS_ACTIVE_THREAD_PERCENTAGE.
- CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
- Specify the amount of GPU memory available to MPS client processes.
- CUDA_MPS_CLIENT_PRIORITY
- Specify the default client priority value at initialization.
Log files created by the MPS control daemon in the specified
directory
- control.log
- Record startup and shutdown of MPS control daemon, user commands issued
with their results, and status of MPS servers.
- server.log
- Record startup and shutdown of MPS servers, and status of MPS
clients.