cupy.cuda.nccl.NcclCommunicator

class cupy.cuda.nccl.NcclCommunicator(int ndev, tuple commId, int rank)

Initialize an NCCL communicator for one device controlled by one process.

Parameters
  • ndev (int) – Total number of GPUs to be used.

  • commId (tuple) – The unique ID returned by get_unique_id().

  • rank (int) – The rank of the GPU managed by the current process.

Returns

An NcclCommunicator instance.

Return type

NcclCommunicator

Note

This method is for creating an NCCL communicator in a multi-process environment, typically managed by MPI or multiprocessing. For controlling multiple devices by one process, use initAll() instead.

See also

ncclCommInitRank

Methods

abort(self)
allGather(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, intptr_t stream)
allReduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, intptr_t stream)
bcast(self, intptr_t buff, int count, int datatype, int root, intptr_t stream)
broadcast(self, intptr_t sendbuff, intptr_t recvbuff, int count, int datatype, int root, intptr_t stream)
check_async_error(self)
destroy(self)
device_id(self)
static initAll(devices)

Initialize NCCL communicators for multiple devices in a single process.

Parameters

devices (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first devices GPUs will be used.

Returns

A list of NcclCommunicator instances.

Return type

list

Note

This method is for creating a group of NCCL communicators, each controlling one device, in a single process like this:

from cupy.cuda import nccl
# Use 3 GPUs: #0, #2, and #3
comms = nccl.NcclCommunicator.initAll([0, 2, 3])
assert len(comms) == 3

In a multi-process setup, use the default initializer instead.

See also

ncclCommInitAll

rank_id(self)
recv(self, intptr_t recvbuf, size_t count, int datatype, int peer, intptr_t stream)
reduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, int root, intptr_t stream)
reduceScatter(self, intptr_t sendbuf, intptr_t recvbuf, size_t recvcount, int datatype, int op, intptr_t stream)
send(self, intptr_t sendbuf, size_t count, int datatype, int peer, intptr_t stream)
size(self)