cupyx.distributed.init_process_group#

cupyx.distributed.init_process_group(n_devices, rank, *, backend='nccl', host=None, port=None, use_mpi=False)[source]#

Start cupyx.distributed and obtain a communicator.

This call initializes the distributed environment, it needs to be called for every process that is involved in the communications.

A single device per returned communication is only allowed. It is the user responsibility of setting the appropiated gpu to be used before creating and using the communicator.

Currently the user needs to specify each process rank and the total number of processes, and start all the processes in different hosts manually.

The process with rank 0 will spawn a TCP server using a subprocess that listens in the port indicated by the env var CUPYX_DISTRIBUTED_PORT, the rank 0 must be executed in the host determined by the env var CUPYX_DISTRIBUTED_HOST. In case their values are not specified, ‘127.0.0.1’ and 13333 will be used by default.

Note that this feature is expected to be used within a trusted cluster environment.

Example

>>> import cupy
>>> def process_0():
...     import cupyx.distributed
...     cupy.cuda.Device(0).use()
...     comm = cupyx.distributed.init_process_group(2, 0)
...     array = cupy.ones(1)
...     comm.broadcast(array, 0)
...
>>> def process_1():
...     import cupyx.distributed
...     cupy.cuda.Device(1).use()
...     comm = cupyx.distributed.init_process_group(2, 1)
...     array = cupy.zeros(1)
...     comm.broadcast(array, 0)
...     cupy.equal(array, cupy.ones(1))

Parameters:

n_devices (int) – Total number of devices that will be used in the distributed execution.
rank (int) – Unique id of the GPU that the communicator is associated to its value needs to be 0 <= rank < n_devices.
backend (str) – Backend to use for the communications. Optional, defaults to “nccl”.
host (str) – host address for the process rendezvous on initialization defaults to None.
port (int) – port for the process rendezvous on initialization defaults to None.
use_mpi (bool) – if False, it avoids using MPI for synchronization and uses the provided TCP server for exchanging CPU only information. defaults to False.

Returns:

object used to perform communications, adheres to the: Backend specification:

Return type:

Backend