cupy.cuda.nccl.NcclCommunicator#
- class cupy.cuda.nccl.NcclCommunicator(int ndev, bytes commId, int rank, NcclConfig config=None)#
Initialize an NCCL communicator for one device controlled by one process.
- Parameters:
ndev (int) – Total number of GPUs to be used.
commId (bytes) – The unique ID returned by
get_unique_id().rank (int) – The rank of the GPU managed by the current process.
config (NcclConfig) – Configuration for communicator creation. None by default.
- Returns:
An
NcclCommunicatorinstance.- Return type:
Note
This method is for creating an NCCL communicator in a multi-process environment, typically managed by MPI or
multiprocessing. For controlling multiple devices by one process, useinitAll()instead.See also
Methods
- abort(self)#
- allGather(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, intptr_t stream)#
- allReduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, intptr_t stream)#
- bcast(self, intptr_t buff, size_t count, int datatype, int root, intptr_t stream)#
- broadcast(self, intptr_t sendbuff, intptr_t recvbuff, size_t count, int datatype, int root, intptr_t stream)#
- check_async_error(self)#
- commSplit(self, int color, int key, NcclConfig config=None)#
Split the communicator into multiple, disjoint communicators.
- Parameters:
color (int) – Controls the assignment of processes to communicators. Processes with the same color are assigned to the same communicator. If color is
-1, the process is not included in any communicator.key (int) – Controls the rank assignment within the new communicator. The process with the lowest key value is assigned rank 0.
config (NcclConfig) – Configuration for communicator creation. NULL by default.
- Returns:
- A new communicator. Return
Noneif color is -1.
- A new communicator. Return
- Return type:
Note
This method requires NCCL 2.18.1 or newer. When split, there should not be any outstanding NCCL operations on the comm. Otherwise, it might cause a deadlock.
from cupy.cuda import nccl comm = nccl.NcclCommunicator(world_size, uid, rank) new_comm = comm.commSplit(color, key) if new_comm is not None: # use new_comm for collective communication new_comm.destroy() comm.destroy()
See also
- destroy(self)#
- device_id(self)#
- static initAll(devices)#
Initialize NCCL communicators for multiple devices in a single process.
- Parameters:
devices (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first
devicesGPUs will be used.- Returns:
A list of
NcclCommunicatorinstances.- Return type:
Note
This method is for creating a group of NCCL communicators, each controlling one device, in a single process like this:
from cupy.cuda import nccl # Use 3 GPUs: #0, #2, and #3 comms = nccl.NcclCommunicator.initAll([0, 2, 3]) assert len(comms) == 3
In a multi-process setup, use the default initializer instead.
See also
- rank_id(self)#
- recv(self, intptr_t recvbuf, size_t count, int datatype, int peer, intptr_t stream)#
- reduce(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, int root, intptr_t stream)#
- reduceScatter(self, intptr_t sendbuf, intptr_t recvbuf, size_t recvcount, int datatype, int op, intptr_t stream)#
- send(self, intptr_t sendbuf, size_t count, int datatype, int peer, intptr_t stream)#
- size(self)#
- __eq__(value, /)#
Return self==value.
- __ne__(value, /)#
Return self!=value.
- __lt__(value, /)#
Return self<value.
- __le__(value, /)#
Return self<=value.
- __gt__(value, /)#
Return self>value.
- __ge__(value, /)#
Return self>=value.
Attributes
- comm#