CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms.

CuPy provides a ndarray, sparse matrices, and the associated routines for GPU devices, all having the same API as NumPy and SciPy:

Routines are backed by CUDA libraries (cuBLAS, cuFFT, cuSPARSE, cuSOLVER, cuRAND), Thrust, CUB, and cuTENSOR to provide the best performance.

It is also possible to easily implement custom CUDA kernels that work with ndarray using:

  • Kernel Templates: Quickly define element-wise and reduction operation as a single CUDA kernel

  • Raw Kernel: Import existing CUDA C/C++ code

  • Just-in-time Transpiler (JIT): Generate CUDA kernel from Python source code

  • Kernel Fusion: Fuse multiple CuPy operations into a single CUDA kernel

CuPy can run in multi-GPU or cluster environments. The distributed communication package (cupyx.distributed) provides collective and peer-to-peer primitives for ndarray, backed by NCCL.

For users who need more fine-grain control for performance, accessing low-level CUDA features are available:

  • Stream and Event: CUDA stream and per-thread default stream are supported by all APIs

  • Memory Pool: Customizable memory allocator with a built-in memory pool

  • Profiler: Supports profiling code using CUDA Profiler and NVTX

  • Host API Binding: Directly call CUDA libraries, such as NCCL, cuDNN, cuTENSOR, and cuSPARSELt APIs from Python

CuPy implements standard APIs for data exchange and interoperability, such as DLPack, CUDA Array Interface, __array_ufunc__ (NEP 13), __array_function__ (NEP 18), and Array API Standard. Thanks to these protocols, CuPy easily integrates with NumPy, PyTorch, TensorFlow, MPI4Py, and any other libraries supporting the standard.

Under AMD ROCm environment, CuPy automatically translates all CUDA API calls to ROCm HIP (hipBLAS, hipFFT, hipSPARSE, hipRAND, hipCUB, hipThrust, RCCL, etc.), allowing code written using CuPy to run on both NVIDIA and AMD GPU without any modification.

Project Goal#

The goal of the CuPy project is to provide Python users GPU acceleration capabilities, without the in-depth knowledge of underlying GPU technologies. The CuPy team focuses on providing:

  • A complete NumPy and SciPy API coverage to become a full drop-in replacement, as well as advanced CUDA features to maximize the performance.

  • Mature and quality library as a fundamental package for all projects needing acceleration, from a lab environment to a large-scale cluster.