cupy.cuda.nccl.NcclCommunicator#

class cupy.cuda.nccl.NcclCommunicator(int ndev, bytes commId, int rank, NcclConfig config=None)#

Initialize an NCCL communicator for one device controlled by one process.

Parameters:
  • ndev (int) – Total number of GPUs to be used.

  • commId (bytes) – The unique ID returned by get_unique_id().

  • rank (int) – The rank of the GPU managed by the current process.

  • config (NcclConfig) – Configuration for communicator creation. None by default.

Returns:

An NcclCommunicator instance.

Return type:

NcclCommunicator

Note

This method is for creating an NCCL communicator in a multi-process environment, typically managed by MPI or multiprocessing. For controlling multiple devices by one process, use initAll() instead.

Methods

abort(self)#
allGather(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, intptr_t stream)#
allReduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, intptr_t stream)#
bcast(self, intptr_t buff, size_t count, int datatype, int root, intptr_t stream)#
broadcast(self, intptr_t sendbuff, intptr_t recvbuff, size_t count, int datatype, int root, intptr_t stream)#
check_async_error(self)#
commSplit(self, int color, int key, NcclConfig config=None)#

Split the communicator into multiple, disjoint communicators.

Parameters:
  • color (int) – Controls the assignment of processes to communicators. Processes with the same color are assigned to the same communicator. If color is -1, the process is not included in any communicator.

  • key (int) – Controls the rank assignment within the new communicator. The process with the lowest key value is assigned rank 0.

  • config (NcclConfig) – Configuration for communicator creation. NULL by default.

Returns:

A new communicator. Return None if color is

-1.

Return type:

NcclCommunicator

Note

This method requires NCCL 2.18.1 or newer. When split, there should not be any outstanding NCCL operations on the comm. Otherwise, it might cause a deadlock.

from cupy.cuda import nccl
comm = nccl.NcclCommunicator(world_size, uid, rank)
new_comm = comm.commSplit(color, key)
if new_comm is not None:
    # use new_comm for collective communication
    new_comm.destroy()
comm.destroy()

See also

ncclCommSplit

destroy(self)#
device_id(self)#
static initAll(devices)#

Initialize NCCL communicators for multiple devices in a single process.

Parameters:

devices (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first devices GPUs will be used.

Returns:

A list of NcclCommunicator instances.

Return type:

list

Note

This method is for creating a group of NCCL communicators, each controlling one device, in a single process like this:

from cupy.cuda import nccl
# Use 3 GPUs: #0, #2, and #3
comms = nccl.NcclCommunicator.initAll([0, 2, 3])
assert len(comms) == 3

In a multi-process setup, use the default initializer instead.

See also

ncclCommInitAll

rank_id(self)#
recv(self, intptr_t recvbuf, size_t count, int datatype, int peer, intptr_t stream)#
reduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, int root, intptr_t stream)#
reduceScatter(self, intptr_t sendbuf, intptr_t recvbuf, size_t recvcount, int datatype, int op, intptr_t stream)#
send(self, intptr_t sendbuf, size_t count, int datatype, int peer, intptr_t stream)#
size(self)#
__eq__(value, /)#

Return self==value.

__ne__(value, /)#

Return self!=value.

__lt__(value, /)#

Return self<value.

__le__(value, /)#

Return self<=value.

__gt__(value, /)#

Return self>value.

__ge__(value, /)#

Return self>=value.

Attributes

comm#