Communication between processes#

init_process_group(n_devices, rank, *[, ...])

Start cupyx.distributed and obtain a communicator.

NCCLBackend(n_devices, rank[, host, port, ...])

Interface that uses NVIDIA's NCCL to perform communications.

ndarray distributed across devices#

distributed_array(array, index_map[, mode])

Creates a distributed array from the given data.

DistributedArray(self, shape, dtype, chunks_map)

Multi-dimensional array distributed across multiple CUDA devices.

make_2d_index_map(i_partitions, ...)

Create an index_map for a 2D matrix with a specified blocking.

matmul(a, b[, out])

Matrix multiplication between distributed arrays.