cupy.cuda.nccl.NcclCommunicator¶

class cupy.cuda.nccl.NcclCommunicator(int ndev, tuple commId, int rank)¶

Initialize an NCCL communicator for one device controlled by one process.

Parameters:	ndev (int) – Total number of GPUs to be used. commId (tuple) – The unique ID returned by `get_unique_id()`. rank (int) – The rank of the GPU managed by the current process.
Returns:	An `NcclCommunicator` instance.
Return type:	NcclCommunicator

Note

This method is for creating an NCCL communicator in a multi-process environment, typically managed by MPI or multiprocessing. For controlling multiple devices by one process, use initAll() instead.

See also

ncclCommInitRank

Methods

abort(self)¶

allGather(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, intptr_t stream)¶

allReduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, intptr_t stream)¶

bcast(self, intptr_t buff, int count, int datatype, int root, intptr_t stream)¶

broadcast(self, intptr_t sendbuff, intptr_t recvbuff, int count, int datatype, int root, intptr_t stream)¶

check_async_error(self)¶

destroy(self)¶

device_id(self)¶

static initAll(devices)¶

Initialize NCCL communicators for multiple devices in a single process.

Parameters:	devices (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first `devices` GPUs will be used.
Returns:	A list of `NcclCommunicator` instances.
Return type:	list

Note

This method is for creating a group of NCCL communicators, each controlling one device, in a single process like this:

from cupy.cuda import nccl
# Use 3 GPUs: #0, #2, and #3
comms = nccl.NcclCommunicator.initAll([0, 2, 3])
assert len(comms) == 3

In a multi-process setup, use the default initializer instead.