cupyx.distributed.NCCLBackend#
- class cupyx.distributed.NCCLBackend(n_devices, rank, host='127.0.0.1', port=13333, use_mpi=False)[source]#
Interface that uses NVIDIA’s NCCL to perform communications.
- Parameters:
n_devices (int) – Total number of devices that will be used in the distributed execution.
rank (int) – Unique id of the GPU that the communicator is associated to its value needs to be 0 <= rank < n_devices.
host (str, optional) – host address for the process rendezvous on initialization. Defaults to “127.0.0.1”.
port (int, optional) – port used for the process rendezvous on initialization. Defaults to 13333.
use_mpi (bool, optional) – switch between MPI and use the included TCP server for initialization & synchronization. Defaults to False.
Methods
- all_gather(in_array, out_array, count, stream=None)[source]#
Performs an all gather operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent.
out_array (cupy.ndarray) – array where the result with be stored.
count (int) – Number of elements to send to each rank.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- all_reduce(in_array, out_array, op='sum', stream=None)[source]#
Performs an all reduce operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent.
out_array (cupy.ndarray) – array where the result with be stored.
op (str) – reduction operation, can be one of (‘sum’, ‘prod’, ‘min’ ‘max’), arrays of complex type only support ‘sum’. Defaults to ‘sum’.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- all_to_all(in_array, out_array, stream=None)[source]#
Performs an all to all operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent. Its shape must be (total_ranks, …).
out_array (cupy.ndarray) – array where the result with be stored. Its shape must be (total_ranks, …).
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- barrier()[source]#
Performs a barrier operation.
The barrier is done in the cpu and is a explicit synchronization mechanism that halts the thread progression.
- broadcast(in_out_array, root=0, stream=None)[source]#
Performs a broadcast operation.
- Parameters:
in_out_array (cupy.ndarray) – array to be sent for root rank. Other ranks will receive the broadcast data here.
root (int, optional) – rank of the process that will send the broadcast. Defaults to 0.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- gather(in_array, out_array, root=0, stream=None)[source]#
Performs a gather operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent.
out_array (cupy.ndarray) – array where the result with be stored. Its shape must be (total_ranks, …).
root (int) – rank that will receive in_array from other ranks.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- recv(out_array, peer, stream=None)[source]#
Performs a receive operation.
- Parameters:
array (cupy.ndarray) – array used to receive data.
peer (int) – rank of the process array will be received from.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- reduce(in_array, out_array, root=0, op='sum', stream=None)[source]#
Performs a reduce operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent.
out_array (cupy.ndarray) – array where the result with be stored. will only be modified by the root process.
root (int, optional) – rank of the process that will perform the reduction. Defaults to 0.
op (str) – reduction operation, can be one of (‘sum’, ‘prod’, ‘min’ ‘max’), arrays of complex type only support ‘sum’. Defaults to ‘sum’.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- reduce_scatter(in_array, out_array, count, op='sum', stream=None)[source]#
Performs a reduce scatter operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent.
out_array (cupy.ndarray) – array where the result with be stored.
count (int) – Number of elements to send to each rank.
op (str) – reduction operation, can be one of (‘sum’, ‘prod’, ‘min’ ‘max’), arrays of complex type only support ‘sum’. Defaults to ‘sum’.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- scatter(in_array, out_array, root=0, stream=None)[source]#
Performs a scatter operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent. Its shape must be (total_ranks, …).
out_array (cupy.ndarray) – array where the result with be stored.
root (int) – rank that will send the in_array to other ranks.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- send(array, peer, stream=None)[source]#
Performs a send operation.
- Parameters:
array (cupy.ndarray) – array to be sent.
peer (int) – rank of the process array will be sent to.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- send_recv(in_array, out_array, peer, stream=None)[source]#
Performs a send and receive operation.
- Parameters:
in_array (cupy.ndarray) – array to be sent.
out_array (cupy.ndarray) – array used to receive data.
peer (int) – rank of the process to send in_array and receive out_array.
stream (cupy.cuda.Stream, optional) – if supported, stream to perform the communication.
- __eq__(value, /)#
Return self==value.
- __ne__(value, /)#
Return self!=value.
- __lt__(value, /)#
Return self<value.
- __le__(value, /)#
Return self<=value.
- __gt__(value, /)#
Return self>value.
- __ge__(value, /)#
Return self>=value.