Routines (NumPy)

The following pages describe NumPy-compatible routines. These functions cover a subset of NumPy routines.

CUB/cuTENSOR backend for some CuPy routines

Some CuPy reduction routines, including sum(), amin(), amax(), argmin(), argmax(), and other functions built on top of them, can be accelerated by switching to the CUB or cuTENSOR backend. These backends can be enabled by setting the CUPY_ACCELERATORS environement variable as documented here. Note that while in general the accelerated reductions are faster, there could be exceptions depending on the data layout. In particular, the CUB reduction only supports reduction over contiguous axes.

CUB also accelerates other routines, such as inclusive scans (ex: cumsum()), histograms, sparse matrix-vector multiplications (not applicable in CUDA 11), and cupy.ReductionKernel.

In any case, we recommend users to perform some benchmarks to determine whether CUB/cuTENSOR offers better performance or not.