cupy.cuda.nccl.NcclCommunicator¶
-
class
cupy.cuda.nccl.
NcclCommunicator
(int ndev, tuple commId, int rank)¶ Initialize an NCCL communicator for one device controlled by one process.
Parameters: - ndev (int) – Total number of GPUs to be used.
- commId (tuple) – The unique ID returned by
get_unique_id()
. - rank (int) – The rank of the GPU managed by the current process.
Returns: An
NcclCommunicator
instance.Return type: Note
This method is for creating an NCCL communicator in a multi-process environment, typically managed by MPI or
multiprocessing
. For controlling multiple devices by one process, useinitAll()
instead.See also
Methods
-
abort
(self)¶
-
allGather
(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, intptr_t stream)¶
-
allReduce
(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, intptr_t stream)¶
-
bcast
(self, intptr_t buff, int count, int datatype, int root, intptr_t stream)¶
-
broadcast
(self, intptr_t sendbuff, intptr_t recvbuff, int count, int datatype, int root, intptr_t stream)¶
-
check_async_error
(self)¶
-
destroy
(self)¶
-
device_id
(self)¶
-
static
initAll
(devices)¶ Initialize NCCL communicators for multiple devices in a single process.
Parameters: devices (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first devices
GPUs will be used.Returns: A list of NcclCommunicator
instances.Return type: list Note
This method is for creating a group of NCCL communicators, each controlling one device, in a single process like this:
from cupy.cuda import nccl # Use 3 GPUs: #0, #2, and #3 comms = nccl.NcclCommunicator.initAll([0, 2, 3]) assert len(comms) == 3
In a multi-process setup, use the default initializer instead.
See also
-
rank_id
(self)¶
-
reduce
(self, intptr_t sendbuf, intptr_t recvbuf, size_t count, int datatype, int op, int root, intptr_t stream)¶
-
reduceScatter
(self, intptr_t sendbuf, intptr_t recvbuf, size_t recvcount, int datatype, int op, intptr_t stream)¶
-
size
(self)¶