cupy.cuda.nccl.NcclCommunicator#

class cupy.cuda.nccl.NcclCommunicator(int ndev, tuple commId, int rank)#

Initialize an NCCL communicator for one device controlled by one process.

Parameters:

ndev (int) – Total number of GPUs to be used.
commId (tuple) – The unique ID returned by get_unique_id().
rank (int) – The rank of the GPU managed by the current process.

Returns:

An NcclCommunicator instance.

Return type:

NcclCommunicator

Note

This method is for creating an NCCL communicator in a multi-process environment, typically managed by MPI or multiprocessing. For controlling multiple devices by one process, use initAll() instead.

See also

ncclCommInitRank

Methods

abort(self)#

allGather(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, intptr_t stream)#

allReduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, intptr_t stream)#

bcast(self, intptr_t buff, int count, int datatype, int root, intptr_t stream)#

broadcast(self, intptr_t sendbuff, intptr_t recvbuff, int count, int datatype, int root, intptr_t stream)#

check_async_error(self)#

destroy(self)#

device_id(self)#

static initAll(devices)#

Initialize NCCL communicators for multiple devices in a single process.

Parameters:: devices (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first devices GPUs will be used.
Returns:: A list of NcclCommunicator instances.
Return type:: list

Note

This method is for creating a group of NCCL communicators, each controlling one device, in a single process like this:

from cupy.cuda import nccl
# Use 3 GPUs: #0, #2, and #3
comms = nccl.NcclCommunicator.initAll([0, 2, 3])
assert len(comms) == 3

In a multi-process setup, use the default initializer instead.