CuPy – NumPy & SciPy for GPU#

Overview#

CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms.

CuPy provides a ndarray, sparse matrices, and the associated routines for GPU devices, all having the same API as NumPy and SciPy:

Routines are backed by CUDA libraries (cuBLAS, cuFFT, cuSPARSE, cuSOLVER, cuRAND), Thrust, CUB, and cuTENSOR to provide the best performance.

It is also possible to easily implement custom CUDA kernels that work with ndarray using:

  • Kernel Templates: Quickly define element-wise and reduction operation as a single CUDA kernel

  • Raw Kernel: Import existing CUDA C/C++ code

  • Just-in-time Transpiler (JIT): Generate CUDA kernel from Python source code

  • Kernel Fusion: Fuse multiple CuPy operations into a single CUDA kernel

CuPy can run in multi-GPU or cluster environments. The distributed communication package (cupyx.distributed) provides collective and peer-to-peer primitives for ndarray, backed by NCCL.

For users who need more fine-grain control for performance, accessing low-level CUDA features are available:

  • Stream and Event: CUDA stream and per-thread default stream are supported by all APIs

  • Memory Pool: Customizable memory allocator with a built-in memory pool

  • Profiler: Supports profiling code using CUDA Profiler and NVTX

  • Host API Binding: Directly call CUDA libraries, such as NCCL, cuDNN, cuTENSOR, and cuSPARSELt APIs from Python

CuPy implements standard APIs for data exchange and interoperability, such as DLPack, CUDA Array Interface, __array_ufunc__ (NEP 13), __array_function__ (NEP 18), and Array API Standard. Thanks to these protocols, CuPy easily integrates with NumPy, PyTorch, TensorFlow, MPI4Py, and any other libraries supporting the standard.

Under AMD ROCm environment, CuPy automatically translates all CUDA API calls to ROCm HIP (hipBLAS, hipFFT, hipSPARSE, hipRAND, hipCUB, hipThrust, RCCL, etc.), allowing code written using CuPy to run on both NVIDIA and AMD GPU without any modification.

Project Goal#

The goal of the CuPy project is to provide Python users GPU acceleration capabilities, without the in-depth knowledge of underlying GPU technologies. The CuPy team focuses on providing:

  • A complete NumPy and SciPy API coverage to become a full drop-in replacement, as well as advanced CUDA features to maximize the performance.

  • Mature and quality library as a fundamental package for all projects needing acceleration, from a lab environment to a large-scale cluster.

Installation#

Requirements#

  • NVIDIA CUDA GPU with the Compute Capability 3.0 or larger.

  • CUDA Toolkit: v10.2 / v11.0 / v11.1 / v11.2 / v11.3 / v11.4 / v11.5 / v11.6 / v11.7 / v11.8 / v12.0 / v12.1

    • If you have multiple versions of CUDA Toolkit installed, CuPy will automatically choose one of the CUDA installations. See Working with Custom CUDA Installation for details.

    • This requirement is optional if you install CuPy from conda-forge. However, you still need to have a compatible driver installed for your GPU. See Installing CuPy from Conda-Forge for details.

  • Python: v3.8 / v3.9 / v3.10 / v3.11

Note

Currently, CuPy is tested against Ubuntu 18.04 LTS / 20.04 LTS (x86_64), CentOS 7 / 8 (x86_64) and Windows Server 2016 (x86_64).

Python Dependencies#

NumPy/SciPy-compatible API in CuPy v12 is based on NumPy 1.24 and SciPy 1.10, and has been tested against the following versions:

Note

SciPy and Optuna are optional dependencies and will not be installed automatically.

Note

Before installing CuPy, we recommend you to upgrade setuptools and pip:

$ python -m pip install -U setuptools pip

Additional CUDA Libraries#

Part of the CUDA features in CuPy will be activated only when the corresponding libraries are installed.

  • cuTENSOR: v1.4 / v1.5 / v1.6 / v1.7

  • NCCL: v2.8 / v2.9 / v2.10 / v2.11 / v2.12 / v2.13 / v2.14 / v2.15 / v2.16 / v2.17

    • The library to perform collective multi-GPU / multi-node computations.

  • cuDNN: v7.6 / v8.0 / v8.1 / v8.2 / v8.3 / v8.4 / v8.5 / v8.6 / v8.7 / v8.8

    • The library to accelerate deep neural network computations.

  • cuSPARSELt: v0.2.0

    • The library to accelerate sparse matrix-matrix multiplication.

Installing CuPy#

Installing CuPy from PyPI#

Wheels (precompiled binary packages) are available for Linux and Windows. Package names are different depending on your CUDA Toolkit version.

CUDA

Command

v10.2 (x86_64 / aarch64)

pip install cupy-cuda102

v11.0 (x86_64)

pip install cupy-cuda110

v11.1 (x86_64)

pip install cupy-cuda111

v11.2 ~ 11.8 (x86_64 / aarch64)

pip install cupy-cuda11x

v12.x (x86_64 / aarch64)

pip install cupy-cuda12x

Note

To enable features provided by additional CUDA libraries (cuTENSOR / NCCL / cuDNN), you need to install them manually. If you installed CuPy via wheels, you can use the installer command below to setup these libraries in case you don’t have a previous installation:

$ python -m cupyx.tools.install_library --cuda 11.x --library cutensor

Note

Append --pre -f https://pip.cupy.dev/pre options to install pre-releases (e.g., pip install cupy-cuda11x --pre -f https://pip.cupy.dev/pre).

When using wheels, please be careful not to install multiple CuPy packages at the same time. Any of these packages and cupy package (source installation) conflict with each other. Please make sure that only one CuPy package (cupy or cupy-cudaXX where XX is a CUDA version) is installed:

$ pip freeze | grep cupy

Installing CuPy from Conda-Forge#

Conda/Anaconda is a cross-platform package management solution widely used in scientific computing and other fields. The above pip install instruction is compatible with conda environments. Alternatively, for both Linux (x86_64, ppc64le, aarch64-sbsa) and Windows once the CUDA driver is correctly set up, you can also install CuPy from the conda-forge channel:

$ conda install -c conda-forge cupy

and conda will install a pre-built CuPy binary package for you, along with the CUDA runtime libraries (cudatoolkit). It is not necessary to install CUDA Toolkit in advance.

Conda has a built-in mechanism to determine and install the latest version of cudatoolkit supported by your driver. However, if for any reason you need to force-install a particular CUDA version (say 11.0), you can do:

$ conda install -c conda-forge cupy cudatoolkit=11.0

Note

cuDNN, cuTENSOR, and NCCL are available on conda-forge as optional dependencies. The following command can install them all at once:

$ conda install -c conda-forge cupy cudnn cutensor nccl

Each of them can also be installed separately as needed.

Note

If you encounter any problem with CuPy installed from conda-forge, please feel free to report to cupy-feedstock, and we will help investigate if it is just a packaging issue in conda-forge’s recipe or a real issue in CuPy.

Note

If you did not install CUDA Toolkit by yourself, the nvcc compiler might not be available, as the cudatoolkit package from conda-forge does not include the nvcc compiler toolchain. If you would like to use it from a local CUDA installation, you need to make sure the version of CUDA Toolkit matches that of cudatoolkit to avoid surprises.

Installing CuPy from Source#

Use of wheel packages is recommended whenever possible. However, if wheels cannot meet your requirements (e.g., you are running non-Linux environment or want to use a version of CUDA / cuDNN / NCCL not supported by wheels), you can also build CuPy from source.

Note

CuPy source build requires g++-6 or later. For Ubuntu 18.04, run apt-get install g++. For Ubuntu 16.04, CentOS 6 or 7, follow the instructions here.

Note

When installing CuPy from source, features provided by additional CUDA libraries will be disabled if these libraries are not available at the build time. See Installing cuDNN and NCCL for the instructions.

Note

If you upgrade or downgrade the version of CUDA Toolkit, cuDNN, NCCL or cuTENSOR, you may need to reinstall CuPy. See Reinstalling CuPy for details.

You can install the latest stable release version of the CuPy source package via pip.

$ pip install cupy

If you want to install the latest development version of CuPy from a cloned Git repository:

$ git clone --recursive https://github.com/cupy/cupy.git
$ cd cupy
$ pip install .

Note

Cython 0.29.22 or later is required to build CuPy from source. It will be automatically installed during the build process if not available.

Uninstalling CuPy#

Use pip to uninstall CuPy:

$ pip uninstall cupy

Note

If you are using a wheel, cupy shall be replaced with cupy-cudaXX (where XX is a CUDA version number).

Note

If CuPy is installed via conda, please do conda uninstall cupy instead.

Upgrading CuPy#

Just use pip install with -U option:

$ pip install -U cupy

Note

If you are using a wheel, cupy shall be replaced with cupy-cudaXX (where XX is a CUDA version number).

Reinstalling CuPy#

To reinstall CuPy, please uninstall CuPy and then install it. When reinstalling CuPy, we recommend using --no-cache-dir option as pip caches the previously built binaries:

$ pip uninstall cupy
$ pip install cupy --no-cache-dir

Note

If you are using a wheel, cupy shall be replaced with cupy-cudaXX (where XX is a CUDA version number).

Using CuPy inside Docker#

We are providing the official Docker images. Use NVIDIA Container Toolkit to run CuPy image with GPU. You can login to the environment with bash, and run the Python interpreter:

$ docker run --gpus all -it cupy/cupy /bin/bash

Or run the interpreter directly:

$ docker run --gpus all -it cupy/cupy /usr/bin/python3

FAQ#

pip fails to install CuPy#

Please make sure that you are using the latest setuptools and pip:

$ pip install -U setuptools pip

Use -vvvv option with pip command. This will display all logs of installation:

$ pip install cupy -vvvv

If you are using sudo to install CuPy, note that sudo command does not propagate environment variables. If you need to pass environment variable (e.g., CUDA_PATH), you need to specify them inside sudo like this:

$ sudo CUDA_PATH=/opt/nvidia/cuda pip install cupy

If you are using certain versions of conda, it may fail to build CuPy with error g++: error: unrecognized command line option ‘-R’. This is due to a bug in conda (see conda/conda#6030 for details). If you encounter this problem, please upgrade your conda.

Installing cuDNN and NCCL#

We recommend installing cuDNN and NCCL using binary packages (i.e., using apt or yum) provided by NVIDIA.

If you want to install tar-gz version of cuDNN and NCCL, we recommend installing it under the CUDA_PATH directory. For example, if you are using Ubuntu, copy *.h files to include directory and *.so* files to lib64 directory:

$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64

The destination directories depend on your environment.

If you want to use cuDNN or NCCL installed in another directory, please use CFLAGS, LDFLAGS and LD_LIBRARY_PATH environment variables before installing CuPy:

$ export CFLAGS=-I/path/to/cudnn/include
$ export LDFLAGS=-L/path/to/cudnn/lib
$ export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH

Working with Custom CUDA Installation#

If you have installed CUDA on the non-default directory or multiple CUDA versions on the same host, you may need to manually specify the CUDA installation directory to be used by CuPy.

CuPy uses the first CUDA installation directory found by the following order.

  1. CUDA_PATH environment variable.

  2. The parent directory of nvcc command. CuPy looks for nvcc command from PATH environment variable.

  3. /usr/local/cuda

For example, you can build CuPy using non-default CUDA directory by CUDA_PATH environment variable:

$ CUDA_PATH=/opt/nvidia/cuda pip install cupy

Note

CUDA installation discovery is also performed at runtime using the rule above. Depending on your system configuration, you may also need to set LD_LIBRARY_PATH environment variable to $CUDA_PATH/lib64 at runtime.

CuPy always raises cupy.cuda.compiler.CompileException#

If CuPy raises a CompileException for almost everything, it is possible that CuPy cannot detect CUDA installed on your system correctly. The followings are error messages commonly observed in such cases.

  • nvrtc: error: failed to load builtins

  • catastrophic error: cannot open source file "cuda_fp16.h"

  • error: cannot overload functions distinguished by return type alone

  • error: identifier "__half_raw" is undefined

Please try setting LD_LIBRARY_PATH and CUDA_PATH environment variable. For example, if you have CUDA installed at /usr/local/cuda-9.2:

$ export CUDA_PATH=/usr/local/cuda-9.2
$ export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH

Also see Working with Custom CUDA Installation.

Build fails on Ubuntu 16.04, CentOS 6 or 7#

In order to build CuPy from source on systems with legacy GCC (g++-5 or earlier), you need to manually set up g++-6 or later and configure NVCC environment variable.

On Ubuntu 16.04:

$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt update
$ sudo apt install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"

On CentOS 6 / 7:

$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable
$ export NVCC="nvcc --compiler-bindir gcc"

Using CuPy on AMD GPU (experimental)#

CuPy has an experimental support for AMD GPU (ROCm).

Requirements#

The following ROCm libraries are required:

$ sudo apt install hipblas hipsparse rocsparse rocrand rocthrust rocsolver rocfft hipcub rocprim rccl

Environment Variables#

When building or running CuPy for ROCm, the following environment variables are effective.

  • ROCM_HOME: directory containing the ROCm software (e.g., /opt/rocm).

Docker#

You can try running CuPy for ROCm using Docker.

$ docker run -it --device=/dev/kfd --device=/dev/dri --group-add video cupy/cupy-rocm

Installing Binary Packages#

Wheels (precompiled binary packages) are available for Linux (x86_64). Package names are different depending on your ROCm version.

ROCm

Command

v4.3

$ pip install cupy-rocm-4-3

v5.0

$ pip install cupy-rocm-5-0

Building CuPy for ROCm From Source#

To build CuPy from source, set the CUPY_INSTALL_USE_HIP, ROCM_HOME, and HCC_AMDGPU_TARGET environment variables. (HCC_AMDGPU_TARGET is the ISA name supported by your GPU. Run rocminfo and use the value displayed in Name: line (e.g., gfx900). You can specify a comma-separated list of ISAs if you have multiple GPUs of different architectures.)

$ export CUPY_INSTALL_USE_HIP=1
$ export ROCM_HOME=/opt/rocm
$ export HCC_AMDGPU_TARGET=gfx906
$ pip install cupy

Note

If you don’t specify the HCC_AMDGPU_TARGET environment variable, CuPy will be built for the GPU architectures available on the build host. This behavior is specific to ROCm builds; when building CuPy for NVIDIA CUDA, the build result is not affected by the host configuration.

Limitations#

The following features are not available due to the limitation of ROCm or because that they are specific to CUDA:

  • CUDA Array Interface

  • cuTENSOR

  • Handling extremely large arrays whose size is around 32-bit boundary (HIP is known to fail with sizes 2**32-1024)

  • Atomic addition in FP16 (cupy.ndarray.scatter_add and cupyx.scatter_add)

  • Multi-GPU FFT and FFT callback

  • Some random number generation algorithms

  • Several options in RawKernel/RawModule APIs: Jitify, dynamic parallelism

  • Per-thread default stream

The following features are not yet supported:

  • Sparse matrices (cupyx.scipy.sparse)

  • cuDNN (hipDNN)

  • Hermitian/symmetric eigenvalue solver (cupy.linalg.eigh)

  • Polynomial roots (uses Hermitian/symmetric eigenvalue solver)

  • Splines in cupyx.scipy.interpolate (make_interp_spline, spline modes of RegularGridInterpolator/interpn), as they depend on sparse matrices.

The following features may not work in edge cases (e.g., some combinations of dtype):

Note

We are investigating the root causes of the issues. They are not necessarily CuPy’s issues, but ROCm may have some potential bugs.

  • cupy.ndarray.__getitem__ (#4653)

  • cupy.ix_ (#4654)

  • Some polynomial routines (#4758, #4759)

  • cupy.broadcast (#4662)

  • cupy.convolve (#4668)

  • cupy.correlate (#4781)

  • Some random sampling routines (cupy.random, #4770)

  • cupy.linalg.einsum

  • cupyx.scipy.ndimage and cupyx.scipy.signal (#4878, #4879, #4880)

User Guide#

This user guide provides an overview of CuPy and explains its important features; details are found in CuPy API Reference.

Basics of CuPy#

In this section, you will learn about the following things:

  • Basics of cupy.ndarray

  • The concept of current device

  • host-device and device-device array transfer

Basics of cupy.ndarray#

CuPy is a GPU array backend that implements a subset of NumPy interface. In the following code, cp is an abbreviation of cupy, following the standard convention of abbreviating numpy as np:

>>> import numpy as np
>>> import cupy as cp

The cupy.ndarray class is at the core of CuPy and is a replacement class for NumPy’s numpy.ndarray.

>>> x_gpu = cp.array([1, 2, 3])

x_gpu above is an instance of cupy.ndarray. As one can see, CuPy’s syntax here is identical to that of NumPy. The main difference between cupy.ndarray and numpy.ndarray is that the CuPy arrays are allocated on the current device, which we will talk about later.

Most of the array manipulations are also done in the way similar to NumPy. Take the Euclidean norm (a.k.a L2 norm), for example. NumPy has numpy.linalg.norm() function that calculates it on CPU.

>>> x_cpu = np.array([1, 2, 3])
>>> l2_cpu = np.linalg.norm(x_cpu)

Using CuPy, we can perform the same calculations on GPU in a similar way:

>>> x_gpu = cp.array([1, 2, 3])
>>> l2_gpu = cp.linalg.norm(x_gpu)

CuPy implements many functions on cupy.ndarray objects. See the reference for the supported subset of NumPy API. Knowledge of NumPy will help you utilize most of the CuPy features. We, therefore, recommend you familiarize yourself with the NumPy documentation.

Current Device#

CuPy has a concept of a current device, which is the default GPU device on which the allocation, manipulation, calculation, etc., of arrays take place. Suppose ID of the current device is 0. In such a case, the following code would create an array x_on_gpu0 on GPU 0.

>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])

To switch to another GPU device, use the Device context manager:

>>> with cp.cuda.Device(1):
...    x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])

All CuPy operations (except for multi-GPU features and device-to-device copy) are performed on the currently active device.

In general, CuPy functions expect that the array is on the same device as the current one. Passing an array stored on a non-current device may work depending on the hardware configuration but is generally discouraged as it may not be performant.

Note

If the array’s device and the current device mismatch, CuPy functions try to establish peer-to-peer memory access (P2P) between them so that the current device can directly read the array from another device. Note that P2P is available only when the topology permits it. If P2P is unavailable, such an attempt will fail with ValueError.

cupy.ndarray.device attribute indicates the device on which the array is allocated.

>>> with cp.cuda.Device(1):
...    x = cp.array([1, 2, 3, 4, 5])
>>> x.device
<CUDA Device 1>

Note

When only one device is available, explicit device switching is not needed.

Current Stream#

Associated with the concept of current devices are current streams, which help avoid explicitly passing streams in every single operation so as to keep the APIs pythonic and user-friendly. In CuPy, all CUDA operations such as data transfer (see the Data Transfer section) and kernel launches are enqueued onto the current stream, and the queued tasks on the same stream will be executed in serial (but asynchronously with respect to the host).

The default current stream in CuPy is CUDA’s null stream (i.e., stream 0). It is also known as the legacy default stream, which is unique per device. However, it is possible to change the current stream using the cupy.cuda.Stream API, please see Accessing CUDA Functionalities for example. The current stream in CuPy can be retrieved using cupy.cuda.get_current_stream().

It is worth noting that CuPy’s current stream is managed on a per thread, per device basis, meaning that on different Python threads or different devices the current stream (if not the null stream) can be different.

Data Transfer#

Move arrays to a device#

cupy.asarray() can be used to move a numpy.ndarray, a list, or any object that can be passed to numpy.array() to the current device:

>>> x_cpu = np.array([1, 2, 3])
>>> x_gpu = cp.asarray(x_cpu)  # move the data to the current device.

cupy.asarray() can accept cupy.ndarray, which means we can transfer the array between devices with this function.

>>> with cp.cuda.Device(0):
...     x_gpu_0 = cp.ndarray([1, 2, 3])  # create an array in GPU 0
>>> with cp.cuda.Device(1):
...     x_gpu_1 = cp.asarray(x_gpu_0)  # move the array to GPU 1

Note

cupy.asarray() does not copy the input array if possible. So, if you put an array of the current device, it returns the input object itself.

If we do copy the array in this situation, you can use cupy.array() with copy=True. Actually cupy.asarray() is equivalent to cupy.array(arr, dtype, copy=False).

Move array from a device to the host#

Moving a device array to the host can be done by cupy.asnumpy() as follows:

>>> x_gpu = cp.array([1, 2, 3])  # create an array in the current device
>>> x_cpu = cp.asnumpy(x_gpu)  # move the array to the host.

We can also use cupy.ndarray.get():

>>> x_cpu = x_gpu.get()

Memory management#

Check Memory Management for a detailed description of how memory is managed in CuPy using memory pools.

How to write CPU/GPU agnostic code#

CuPy’s compatibility with NumPy makes it possible to write CPU/GPU agnostic code. For this purpose, CuPy implements the cupy.get_array_module() function that returns a reference to cupy if any of its arguments resides on a GPU and numpy otherwise. Here is an example of a CPU/GPU agnostic function that computes log1p:

>>> # Stable implementation of log(1 + exp(x))
>>> def softplus(x):
...     xp = cp.get_array_module(x)  # 'xp' is a standard usage in the community
...     print("Using:", xp.__name__)
...     return xp.maximum(0, x) + xp.log1p(xp.exp(-abs(x)))

When you need to manipulate CPU and GPU arrays, an explicit data transfer may be required to move them to the same location – either CPU or GPU. For this purpose, CuPy implements two sister methods called cupy.asnumpy() and cupy.asarray(). Here is an example that demonstrates the use of both methods:

>>> x_cpu = np.array([1, 2, 3])
>>> y_cpu = np.array([4, 5, 6])
>>> x_cpu + y_cpu
array([5, 7, 9])
>>> x_gpu = cp.asarray(x_cpu)
>>> x_gpu + y_cpu
Traceback (most recent call last):
...
TypeError: Unsupported type <class 'numpy.ndarray'>
>>> cp.asnumpy(x_gpu) + y_cpu
array([5, 7, 9])
>>> cp.asnumpy(x_gpu) + cp.asnumpy(y_cpu)
array([5, 7, 9])
>>> x_gpu + cp.asarray(y_cpu)
array([5, 7, 9])
>>> cp.asarray(x_gpu) + cp.asarray(y_cpu)
array([5, 7, 9])

The cupy.asnumpy() method returns a NumPy array (array on the host), whereas cupy.asarray() method returns a CuPy array (array on the current device). Both methods can accept arbitrary input, meaning that they can be applied to any data that is located on either the host or device and can be converted to an array.

User-Defined Kernels#

CuPy provides easy ways to define three types of CUDA kernels: elementwise kernels, reduction kernels and raw kernels. In this documentation, we describe how to define and call each kernels.

Basics of elementwise kernels#

An elementwise kernel can be defined by the ElementwiseKernel class. The instance of this class defines a CUDA kernel which can be invoked by the __call__ method of this instance.

A definition of an elementwise kernel consists of four parts: an input argument list, an output argument list, a loop body code, and the kernel name. For example, a kernel that computes a squared difference \(f(x, y) = (x - y)^2\) is defined as follows:

>>> squared_diff = cp.ElementwiseKernel(
...    'float32 x, float32 y',
...    'float32 z',
...    'z = (x - y) * (x - y)',
...    'squared_diff')

The argument lists consist of comma-separated argument definitions. Each argument definition consists of a type specifier and an argument name. Names of NumPy data types can be used as type specifiers.

Note

n, i, and names starting with an underscore _ are reserved for the internal use.

The above kernel can be called on either scalars or arrays with broadcasting:

>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> y = cp.arange(5, dtype=np.float32)
>>> squared_diff(x, y)
array([[ 0.,  0.,  0.,  0.,  0.],
       [25., 25., 25., 25., 25.]], dtype=float32)
>>> squared_diff(x, 5)
array([[25., 16.,  9.,  4.,  1.],
       [ 0.,  1.,  4.,  9., 16.]], dtype=float32)

Output arguments can be explicitly specified (next to the input arguments):

>>> z = cp.empty((2, 5), dtype=np.float32)
>>> squared_diff(x, y, z)
array([[ 0.,  0.,  0.,  0.,  0.],
       [25., 25., 25., 25., 25.]], dtype=float32)

Type-generic kernels#

If a type specifier is one character, then it is treated as a type placeholder. It can be used to define a type-generic kernels. For example, the above squared_diff kernel can be made type-generic as follows:

>>> squared_diff_generic = cp.ElementwiseKernel(
...     'T x, T y',
...     'T z',
...     'z = (x - y) * (x - y)',
...     'squared_diff_generic')

Type placeholders of a same character in the kernel definition indicate the same type. The actual type of these placeholders is determined by the actual argument type. The ElementwiseKernel class first checks the output arguments and then the input arguments to determine the actual type. If no output arguments are given on the kernel invocation, then only the input arguments are used to determine the type.

The type placeholder can be used in the loop body code:

>>> squared_diff_generic = cp.ElementwiseKernel(
...     'T x, T y',
...     'T z',
...     '''
...         T diff = x - y;
...         z = diff * diff;
...     ''',
...     'squared_diff_generic')

More than one type placeholder can be used in a kernel definition. For example, the above kernel can be further made generic over multiple arguments:

>>> squared_diff_super_generic = cp.ElementwiseKernel(
...     'X x, Y y',
...     'Z z',
...     'z = (x - y) * (x - y)',
...     'squared_diff_super_generic')

Note that this kernel requires the output argument explicitly specified, because the type Z cannot be automatically determined from the input arguments.

Raw argument specifiers#

The ElementwiseKernel class does the indexing with broadcasting automatically, which is useful to define most elementwise computations. On the other hand, we sometimes want to write a kernel with manual indexing for some arguments. We can tell the ElementwiseKernel class to use manual indexing by adding the raw keyword preceding the type specifier.

We can use the special variable i and method _ind.size() for the manual indexing. i indicates the index within the loop. _ind.size() indicates total number of elements to apply the elementwise operation. Note that it represents the size after broadcast operation.

For example, a kernel that adds two vectors with reversing one of them can be written as follows:

>>> add_reverse = cp.ElementwiseKernel(
...     'T x, raw T y', 'T z',
...     'z = x + y[_ind.size() - i - 1]',
...     'add_reverse')

(Note that this is an artificial example and you can write such operation just by z = x + y[::-1] without defining a new kernel). A raw argument can be used like an array. The indexing operator y[_ind.size() - i - 1] involves an indexing computation on y, so y can be arbitrarily shaped and strode.

Note that raw arguments are not involved in the broadcasting. If you want to mark all arguments as raw, you must specify the size argument on invocation, which defines the value of _ind.size().

Texture memory#

Texture objects (TextureObject) can be passed to ElementwiseKernel with their type marked by a unique type placeholder distinct from any other types used in the same kernel, as its actual datatype is determined when populating the texture memory. The texture coordinates can be computed in the kernel by the per-thread loop index i.

Reduction kernels#

Reduction kernels can be defined by the ReductionKernel class. We can use it by defining four parts of the kernel code:

  1. Identity value: This value is used for the initial value of reduction.

  2. Mapping expression: It is used for the pre-processing of each element to be reduced.

  3. Reduction expression: It is an operator to reduce the multiple mapped values. The special variables a and b are used for its operands.

  4. Post mapping expression: It is used to transform the resulting reduced values. The special variable a is used as its input. Output should be written to the output parameter.

ReductionKernel class automatically inserts other code fragments that are required for an efficient and flexible reduction implementation.

For example, L2 norm along specified axes can be written as follows:

>>> l2norm_kernel = cp.ReductionKernel(
...     'T x',  # input params
...     'T y',  # output params
...     'x * x',  # map
...     'a + b',  # reduce
...     'y = sqrt(a)',  # post-reduction map
...     '0',  # identity value
...     'l2norm'  # kernel name
... )
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> l2norm_kernel(x, axis=1)
array([ 5.477226 , 15.9687195], dtype=float32)

Note

raw specifier is restricted for usages that the axes to be reduced are put at the head of the shape. It means, if you want to use raw specifier for at least one argument, the axis argument must be 0 or a contiguous increasing sequence of integers starting from 0, like (0, 1), (0, 1, 2), etc.

Note

Texture memory is not yet supported in ReductionKernel.

Raw kernels#

Raw kernels can be defined by the RawKernel class. By using raw kernels, you can define kernels from raw CUDA source.

RawKernel object allows you to call the kernel with CUDA’s cuLaunchKernel interface. In other words, you have control over grid size, block size, shared memory size and stream.

>>> add_kernel = cp.RawKernel(r'''
... extern "C" __global__
... void my_add(const float* x1, const float* x2, float* y) {
...     int tid = blockDim.x * blockIdx.x + threadIdx.x;
...     y[tid] = x1[tid] + x2[tid];
... }
... ''', 'my_add')
>>> x1 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
>>> x2 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
>>> y = cp.zeros((5, 5), dtype=cp.float32)
>>> add_kernel((5,), (5,), (x1, x2, y))  # grid, block and arguments
>>> y
array([[ 0.,  2.,  4.,  6.,  8.],
       [10., 12., 14., 16., 18.],
       [20., 22., 24., 26., 28.],
       [30., 32., 34., 36., 38.],
       [40., 42., 44., 46., 48.]], dtype=float32)

Raw kernels operating on complex-valued arrays can be created as well:

>>> complex_kernel = cp.RawKernel(r'''
... #include <cupy/complex.cuh>
... extern "C" __global__
... void my_func(const complex<float>* x1, const complex<float>* x2,
...              complex<float>* y, float a) {
...     int tid = blockDim.x * blockIdx.x + threadIdx.x;
...     y[tid] = x1[tid] + a * x2[tid];
... }
... ''', 'my_func')
>>> x1 = cupy.arange(25, dtype=cupy.complex64).reshape(5, 5)
>>> x2 = 1j*cupy.arange(25, dtype=cupy.complex64).reshape(5, 5)
>>> y = cupy.zeros((5, 5), dtype=cupy.complex64)
>>> complex_kernel((5,), (5,), (x1, x2, y, cupy.float32(2.0)))  # grid, block and arguments
>>> y
array([[ 0. +0.j,  1. +2.j,  2. +4.j,  3. +6.j,  4. +8.j],
       [ 5.+10.j,  6.+12.j,  7.+14.j,  8.+16.j,  9.+18.j],
       [10.+20.j, 11.+22.j, 12.+24.j, 13.+26.j, 14.+28.j],
       [15.+30.j, 16.+32.j, 17.+34.j, 18.+36.j, 19.+38.j],
       [20.+40.j, 21.+42.j, 22.+44.j, 23.+46.j, 24.+48.j]],
      dtype=complex64)

Note that while we encourage the usage of complex<T> types for complex numbers (available by including <cupy/complex.cuh> as shown above), for CUDA codes already written using functions from cuComplex.h there is no need to make the conversion yourself: just set the option translate_cucomplex=True when creating a RawKernel instance.

The CUDA kernel attributes can be retrieved by either accessing the attributes dictionary, or by accessing the RawKernel object’s attributes directly; the latter can also be used to set certain attributes:

>>> add_kernel = cp.RawKernel(r'''
... extern "C" __global__
... void my_add(const float* x1, const float* x2, float* y) {
...     int tid = blockDim.x * blockIdx.x + threadIdx.x;
...     y[tid] = x1[tid] + x2[tid];
... }
... ''', 'my_add')
>>> add_kernel.attributes  
{'max_threads_per_block': 1024, 'shared_size_bytes': 0, 'const_size_bytes': 0, 'local_size_bytes': 0, 'num_regs': 10, 'ptx_version': 70, 'binary_version': 70, 'cache_mode_ca': 0, 'max_dynamic_shared_size_bytes': 49152, 'preferred_shared_memory_carveout': -1}
>>> add_kernel.max_dynamic_shared_size_bytes  
49152
>>> add_kernel.max_dynamic_shared_size_bytes = 50000  # set a new value for the attribute  
>>> add_kernel.max_dynamic_shared_size_bytes  
50000

Dynamical parallelism is supported by RawKernel. You just need to provide the linking flag (such as -dc) to RawKernel’s options argument. The static CUDA device runtime library (cudadevrt) is automatically discovered by CuPy. For further detail, see CUDA Toolkit’s documentation.

Accessing texture (surface) memory in RawKernel is supported via CUDA Runtime’s Texture (Surface) Object API, see the documentation for TextureObject (SurfaceObject) as well as CUDA C Programming Guide. For using the Texture Reference API, which is marked as deprecated as of CUDA Toolkit 10.1, see the introduction to RawModule below.

If your kernel relies on the C++ std library headers such as <type_traits>, it is likely you will encounter compilation errors. In this case, try enabling CuPy’s Jitify support by setting jitify=True when creating the RawKernel instance. It provides basic C++ std support to remedy common errors.

Note

The kernel does not have return values. You need to pass both input arrays and output arrays as arguments.

Note

When using printf() in your CUDA kernel, you may need to synchronize the stream to see the output. You can use cupy.cuda.Stream.null.synchronize() if you are using the default stream.

Note

In all of the examples above, we declare the kernels in an extern "C" block, indicating that the C linkage is used. This is to ensure the kernel names are not mangled so that they can be retrived by name.

Kernel arguments#

Python primitive types and NumPy scalars are passed to the kernel by value. Array arguments (pointer arguments) have to be passed as CuPy ndarrays. No validation is performed by CuPy for arguments passed to the kernel, including types and number of arguments.

Especially note that when passing a CuPy ndarray, its dtype should match with the type of the argument declared in the function signature of the CUDA source code (unless you are casting arrays intentionally).

As an example, cupy.float32 and cupy.uint64 arrays must be passed to the argument typed as float* and unsigned long long*, respectively. CuPy does not directly support arrays of non-primitive types such as float3, but nothing prevents you from casting a float* or void* to a float3* in a kernel.

Python primitive types, int, float, complex and bool map to long long, double, cuDoubleComplex and bool, respectively.

NumPy scalars (numpy.generic) and NumPy arrays (numpy.ndarray) of size one are passed to the kernel by value. This means that you can pass by value any base NumPy types such as numpy.int8 or numpy.float64, provided the kernel arguments match in size. You can refer to this table to match CuPy/NumPy dtype and CUDA types:

CuPy/NumPy type

Corresponding kernel types

itemsize (bytes)

bool

bool

1

int8

char, signed char

1

int16

short, signed short

2

int32

int, signed int

4

int64

long long, signed long long

8

uint8

unsigned char

1

uint16

unsigned short

2

uint32

unsigned int

4

uint64

unsigned long long

8

float16

half

2

float32

float

4

float64

double

8

complex64

float2, cuFloatComplex, complex<float>

8

complex128

double2, cuDoubleComplex, complex<double>

16

The CUDA standard guarantees that the size of fundamental types on the host and device always match. The itemsize of size_t, ptrdiff_t, intptr_t, uintptr_t, long, signed long and unsigned long are however platform dependent. To pass any CUDA vector builtins such as float3 or any other user defined structure as kernel arguments (provided it matches the device-side kernel parameter type), see Custom user types below.

Custom user types#

It is possible to use custom types (composite types such as structures and structures of structures) as kernel arguments by defining a custom NumPy dtype. When doing this, it is your responsibility to match host and device structure memory layout. The CUDA standard guarantees that the size of fundamental types on the host and device always match. It may however impose device alignment requirements on composite types. This means that for composite types the struct member offsets may be different from what you might expect.

When a kernel argument is passed by value, the CUDA driver will copy exactly sizeof(param_type) bytes starting from the beginning of the NumPy object data pointer, where param_type is the parameter type in your kernel. You have to match param_type’s memory layout (ex: size, alignment and struct padding/packing) by defining a corresponding NumPy dtype.

For builtin CUDA vector types such as int2 and double4 and other packed structures with named members you can directly define such NumPy dtypes as the following:

>>> import numpy as np
>>> names = ['x', 'y', 'z']
>>> types = [np.float32]*3
>>> float3 = np.dtype({'names': names, 'formats': types})
>>> arg = np.random.rand(3).astype(np.float32).view(float3)
>>> print(arg)  
[(0.9940819, 0.62873816, 0.8953669)]
>>> arg['x'] = 42.0
>>> print(arg)  
[(42., 0.62873816, 0.8953669)]

Here arg can be used directly as a kernel argument. When there is no need to name fields you may prefer this syntax to define packed structures such as vectors or matrices:

>>> import numpy as np
>>> float5x5 = np.dtype({'names': ['dummy'], 'formats': [(np.float32,(5,5))]})
>>> arg = np.random.rand(25).astype(np.float32).view(float5x5)
>>> print(arg.itemsize)
100

Here arg represents a 100-byte scalar (i.e. a NumPy array of size 1) that can be passed by value to any kernel. Kernel parameters are passed by value in a dedicated 4kB memory bank which has its own cache with broadcast. Upper bound for total kernel parameters size is thus 4kB (see this link). It may be important to note that this dedicated memory bank is not shared with the device __constant__ memory space.

For now, CuPy offers no helper routines to create user defined composite types. Such composite types can however be built recursively using NumPy dtype offsets and itemsize capabilities, see cupy/examples/custum_struct for examples of advanced usage.

Warning

You cannot directly pass static arrays as kernel arguments with the type arg[N] syntax where N is a compile time constant. The signature of __global__ void kernel(float arg[5]) is seen as __global__ void kernel(float* arg) by the compiler. If you want to pass five floats to the kernel by value you need to define a custom structure struct float5 { float val[5]; }; and modify the kernel signature to __global__ void kernel(float5 arg).

Raw modules#

For dealing a large raw CUDA source or loading an existing CUDA binary, the RawModule class can be more handy. It can be initialized either by a CUDA source code, or by a path to the CUDA binary. It accepts most of the arguments as in RawKernel. The needed kernels can then be retrieved by calling the get_function() method, which returns a RawKernel instance that can be invoked as discussed above.

>>> loaded_from_source = r'''
... extern "C"{
...
... __global__ void test_sum(const float* x1, const float* x2, float* y, \
...                          unsigned int N)
... {
...     unsigned int tid = blockDim.x * blockIdx.x + threadIdx.x;
...     if (tid < N)
...     {
...         y[tid] = x1[tid] + x2[tid];
...     }
... }
...
... __global__ void test_multiply(const float* x1, const float* x2, float* y, \
...                               unsigned int N)
... {
...     unsigned int tid = blockDim.x * blockIdx.x + threadIdx.x;
...     if (tid < N)
...     {
...         y[tid] = x1[tid] * x2[tid];
...     }
... }
...
... }'''
>>> module = cp.RawModule(code=loaded_from_source)
>>> ker_sum = module.get_function('test_sum')
>>> ker_times = module.get_function('test_multiply')
>>> N = 10
>>> x1 = cp.arange(N**2, dtype=cp.float32).reshape(N, N)
>>> x2 = cp.ones((N, N), dtype=cp.float32)
>>> y = cp.zeros((N, N), dtype=cp.float32)
>>> ker_sum((N,), (N,), (x1, x2, y, N**2))   # y = x1 + x2
>>> assert cp.allclose(y, x1 + x2)
>>> ker_times((N,), (N,), (x1, x2, y, N**2)) # y = x1 * x2
>>> assert cp.allclose(y, x1 * x2)

The instruction above for using complex numbers in RawKernel also applies to RawModule.

For CUDA kernels that need to access global symbols, such as constant memory, the get_global() method can be used, see its documentation for further detail.

Note that the deprecated API cupy.RawModule.get_texref() has been removed since CuPy vX.X due to the removal of texture reference support from CUDA.

To support C++ template kernels, RawModule additionally provide a name_expressions argument. A list of template specializations should be provided, so that the corresponding kernels can be generated and retrieved by type:

>>> code = r'''
... template<typename T>
... __global__ void fx3(T* arr, int N) {
...     unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;
...     if (tid < N) {
...         arr[tid] = arr[tid] * 3;
...     }
... }
... '''
>>>
>>> name_exp = ['fx3<float>', 'fx3<double>']
>>> mod = cp.RawModule(code=code, options=('-std=c++11',),
...     name_expressions=name_exp)
>>> ker_float = mod.get_function(name_exp[0])  # compilation happens here
>>> N=10
>>> a = cp.arange(N, dtype=cp.float32)
>>> ker_float((1,), (N,), (a, N))
>>> a
array([ 0.,  3.,  6.,  9., 12., 15., 18., 21., 24., 27.], dtype=float32)
>>> ker_double = mod.get_function(name_exp[1])
>>> a = cp.arange(N, dtype=cp.float64)
>>> ker_double((1,), (N,), (a, N))
>>> a
array([ 0.,  3.,  6.,  9., 12., 15., 18., 21., 24., 27.])

Note

The name expressions used to both initialize a RawModule instance and retrieve the kernels are the original (un-mangled) kernel names with all template parameters unambiguously specified. The name mangling and demangling are handled under the hood so that users do not need to worry about it.

Kernel fusion#

cupy.fuse() is a decorator that fuses functions. This decorator can be used to define an elementwise or reduction kernel more easily than ElementwiseKernel or ReductionKernel.

By using this decorator, we can define the squared_diff kernel as follows:

>>> @cp.fuse()
... def squared_diff(x, y):
...     return (x - y) * (x - y)

The above kernel can be called on either scalars, NumPy arrays or CuPy arrays likes the original function.

>>> x_cp = cp.arange(10)
>>> y_cp = cp.arange(10)[::-1]
>>> squared_diff(x_cp, y_cp)
array([81, 49, 25,  9,  1,  1,  9, 25, 49, 81])
>>> x_np = np.arange(10)
>>> y_np = np.arange(10)[::-1]
>>> squared_diff(x_np, y_np)
array([81, 49, 25,  9,  1,  1,  9, 25, 49, 81])

At the first function call, the fused function analyzes the original function based on the abstracted information of arguments (e.g. their dtypes and ndims) and creates and caches an actual CUDA kernel. From the second function call with the same input types, the fused function calls the previously cached kernel, so it is highly recommended to reuse the same decorated functions instead of decorating local functions that are defined multiple times.

cupy.fuse() also supports simple reduction kernel.

>>> @cp.fuse()
... def sum_of_products(x, y):
...     return cp.sum(x * y, axis = -1)

You can specify the kernel name by using the kernel_name keyword argument as follows:

>>> @cp.fuse(kernel_name='squared_diff')
... def squared_diff(x, y):
...     return (x - y) * (x - y)

Note

Currently, cupy.fuse() can fuse only simple elementwise and reduction operations. Most other routines (e.g. cupy.matmul(), cupy.reshape()) are not supported.

JIT kernel definition#

The cupyx.jit.rawkernel decorator can create raw CUDA kernels from Python functions.

In this section, a Python function wrapped with the decorator is called a target function.

A target function consists of elementary scalar operations, and users have to manage how to parallelize them. CuPy’s array operations which automatically parallelize operations (e.g., add(), sum()) are not supported. If a custom kernel based on such array functions is desired, please refer to the Kernel fusion section.

Basic Usage#

Here is a short example for how to write a cupyx.jit.rawkernel to copy the values from x to y using a grid-stride loop:

>>> from cupyx import jit
>>>
>>> @jit.rawkernel()
... def elementwise_copy(x, y, size):
...     tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
...     ntid = jit.gridDim.x * jit.blockDim.x
...     for i in range(tid, size, ntid):
...         y[i] = x[i]

>>> size = cupy.uint32(2 ** 22)
>>> x = cupy.random.normal(size=(size,), dtype=cupy.float32)
>>> y = cupy.empty((size,), dtype=cupy.float32)

>>> elementwise_copy((128,), (1024,), (x, y, size))  # RawKernel style
>>> assert (x == y).all()

>>> elementwise_copy[128, 1024](x, y, size)  #  Numba style
>>> assert (x == y).all()

Both styles to launch the kernel, as shown above, are supported. The first two entries are the grid and block sizes, respectively. grid ( RawKernel style (128,) or Numba style [128]) is the sizes of the grid, i.e., the numbers of blocks in each dimension; block ((1024,) or [1024]) is the dimensions of each thread block, please refer to cupyx.jit._interface._JitRawKernel for details. Launching a CUDA kernel on a GPU with pre-determined grid/block sizes requires basic understanding in the CUDA Programming Model.

The compilation will be deferred until the first function call. CuPy’s JIT compiler infers the types of arguments at the call time, and will cache the compiled kernels for speeding up any subsequent calls.

See Custom kernels for a full list of API.

Basic Design#

CuPy’s JIT compiler generates CUDA code via Python AST. We decided not to use Python bytecode to analyze the target function to avoid perforamance degradation. The CUDA source code generated from the Python bytecode will not effectively optimized by CUDA compiler, because for-loops and other control statements of the target function are fully transformed to jump instruction when converting the target function to bytecode.

Typing rule#

The types of local variables are inferred at the first assignment in the function. The first assignment must be done at the top-level of the function; in other words, it must not be in if/else bodies or for-loops.

Limitations#

JIT does not work inside Python’s interactive interpreter (REPL) as the compiler needs to get the source code of the target function.

Accessing CUDA Functionalities#

Streams and Events#

In this section we discuss basic usages for CUDA streams and events. For the API reference please see Streams and events. For their roles in the CUDA programming model, please refer to CUDA Programming Guide.

CuPy provides high-level Python APIs Stream and Event for creating streams and events, respectively. Data copies and kernel launches are enqueued onto the Current Stream, which can be queried via get_current_stream() and changed either by setting up a context manager:

>>> import numpy as np
>>>
>>> a_np = np.arange(10)
>>> s = cp.cuda.Stream()
>>> with s:
...     a_cp = cp.asarray(a_np)  # H2D transfer on stream s
...     b_cp = cp.sum(a_cp)      # kernel launched on stream s
...     assert s == cp.cuda.get_current_stream()
...
>>> # fall back to the previous stream in use (here the default stream)
>>> # when going out of the scope of s

or by using the use() method:

>>> s = cp.cuda.Stream()
>>> s.use()  # any subsequent operations are done on steam s  
<Stream ... (device ...)>
>>> b_np = cp.asnumpy(b_cp)
>>> assert s == cp.cuda.get_current_stream()
>>> cp.cuda.Stream.null.use()  # fall back to the default (null) stream
<Stream 0 (device -1)>
>>> assert cp.cuda.Stream.null == cp.cuda.get_current_stream()

Events can be created either manually or through the record() method. Event objects can be used for timing GPU activities (via get_elapsed_time()) or setting up inter-stream dependencies:

>>> e1 = cp.cuda.Event()
>>> e1.record()
>>> a_cp = b_cp * a_cp + 8
>>> e2 = cp.cuda.get_current_stream().record()
>>>
>>> # set up a stream order
>>> s2 = cp.cuda.Stream()
>>> s2.wait_event(e2)
>>> with s2:
...     # the a_cp is guaranteed updated when this copy (on s2) starts
...     a_np = cp.asnumpy(a_cp)
>>>
>>> # timing
>>> e2.synchronize()
>>> t = cp.cuda.get_elapsed_time(e1, e2)  # only include the compute time, not the copy time

Just like the Device objects, Stream and Event objects can also be used for synchronization.

Note

In CuPy, the Stream objects are managed on the per thread, per device basis.

Note

On NVIDIA GPUs, there are two stream singleton objects null and ptds, referred to as the legacy default stream and the per-thread default stream, respectively. CuPy uses the former as default when no user-defined stream is in use. To change this behavior, set the environment variable CUPY_CUDA_PER_THREAD_DEFAULT_STREAM to 1, see Environment variables. This is not applicable to AMD GPUs.

To interoperate with streams created in other Python libraries, CuPy provides the ExternalStream API to wrap an existing stream pointer (given as a Python int). See Interoperability for details.

CUDA Driver and Runtime API#

Under construction. Please see Runtime API for the API reference.

Fast Fourier Transform with CuPy#

CuPy covers the full Fast Fourier Transform (FFT) functionalities provided in NumPy (cupy.fft) and a subset in SciPy (cupyx.scipy.fft). In addition to those high-level APIs that can be used as is, CuPy provides additional features to

  1. access advanced routines that cuFFT offers for NVIDIA GPUs,

  2. control better the performance and behavior of the FFT routines.

Some of these features are experimental (subject to change, deprecation, or removal, see API Compatibility Policy) or may be absent in hipFFT/rocFFT targeting AMD GPUs.

SciPy FFT backend#

Since SciPy v1.4 a backend mechanism is provided so that users can register different FFT backends and use SciPy’s API to perform the actual transform with the target backend, such as CuPy’s cupyx.scipy.fft module. For a one-time only usage, a context manager scipy.fft.set_backend() can be used:

import cupy as cp
import cupyx.scipy.fft as cufft
import scipy.fft

a = cp.random.random(100).astype(cp.complex64)
with scipy.fft.set_backend(cufft):
    b = scipy.fft.fft(a)  # equivalent to cufft.fft(a)

However, such usage can be tedious. Alternatively, users can register a backend through scipy.fft.register_backend() or scipy.fft.set_global_backend() to avoid using context managers:

import cupy as cp
import cupyx.scipy.fft as cufft
import scipy.fft
scipy.fft.set_global_backend(cufft)

a = cp.random.random(100).astype(cp.complex64)
b = scipy.fft.fft(a)  # equivalent to cufft.fft(a)

Note

Please refer to SciPy FFT documentation for further information.

Note

To use the backend together with an explicit plan argument requires SciPy version 1.5.0 or higher. See below for how to create FFT plans.

User-managed FFT plans#

For performance reasons, users may wish to create, reuse, and manage the FFT plans themselves. CuPy provides a high-level experimental API get_fft_plan() for this need. Users specify the transform to be performed as they would with most of the high-level FFT APIs, and a plan will be generated based on the input.

import cupy as cp
from cupyx.scipy.fft import get_fft_plan

a = cp.random.random((4, 64, 64)).astype(cp.complex64)
plan = get_fft_plan(a, axes=(1, 2), value_type='C2C')  # for batched, C2C, 2D transform

The returned plan can be used either explicitly as an argument with the cupyx.scipy.fft APIs:

import cupyx.scipy.fft

# the rest of the arguments must match those used when generating the plan
out = cupyx.scipy.fft.fft2(a, axes=(1, 2), plan=plan)

or as a context manager for the cupy.fft APIs:

with plan:
    # the arguments must match those used when generating the plan
    out = cp.fft.fft2(a, axes=(1, 2))

FFT plan cache#

However, there are occasions when users may not want to manage the FFT plans by themselves. Moreover, plans could also be reused internally in CuPy’s routines, to which user-managed plans would not be applicable. Therefore, starting CuPy v8 we provide a built-in plan cache, enabled by default. The plan cache is done on a per device, per thread basis, and can be retrieved by the get_plan_cache() API.

>>> import cupy as cp
>>>
>>> cache = cp.fft.config.get_plan_cache()
>>> cache.show_info()
------------------- cuFFT plan cache (device 0) -------------------
cache enabled? True
current / max size   : 0 / 16 (counts)
current / max memsize: 0 / (unlimited) (bytes)
hits / misses: 0 / 0 (counts)

cached plans (most recently used first):

>>> # perform a transform, which would generate a plan and cache it
>>> a = cp.random.random((4, 64, 64))
>>> out = cp.fft.fftn(a, axes=(1, 2))
>>> cache.show_info()  # hit = 0
------------------- cuFFT plan cache (device 0) -------------------
cache enabled? True
current / max size   : 1 / 16 (counts)
current / max memsize: 262144 / (unlimited) (bytes)
hits / misses: 0 / 1 (counts)

cached plans (most recently used first):
key: ((64, 64), (64, 64), 1, 4096, (64, 64), 1, 4096, 105, 4, 'C', 2, None), plan type: PlanNd, memory usage: 262144

>>> # perform the same transform again, the plan is looked up from cache and reused
>>> out = cp.fft.fftn(a, axes=(1, 2))
>>> cache.show_info()  # hit = 1
------------------- cuFFT plan cache (device 0) -------------------
cache enabled? True
current / max size   : 1 / 16 (counts)
current / max memsize: 262144 / (unlimited) (bytes)
hits / misses: 1 / 1 (counts)

cached plans (most recently used first):
key: ((64, 64), (64, 64), 1, 4096, (64, 64), 1, 4096, 105, 4, 'C', 2, None), plan type: PlanNd, memory usage: 262144

>>> # clear the cache
>>> cache.clear()
>>> cp.fft.config.show_plan_cache_info()  # = cache.show_info(), for all devices
=============== cuFFT plan cache info (all devices) ===============
------------------- cuFFT plan cache (device 0) -------------------
cache enabled? True
current / max size   : 0 / 16 (counts)
current / max memsize: 0 / (unlimited) (bytes)
hits / misses: 0 / 0 (counts)

cached plans (most recently used first):

The returned PlanCache object has other methods for finer control, such as setting the cache size (either by counts or by memory usage). If the size is set to 0, the cache is disabled. Please refer to its documentation for more detail.

Note

As shown above each FFT plan has an associated working area allocated. If an out-of-memory error happens, one may want to inspect, clear, or limit the plan cache.

Note

The plans returned by get_fft_plan() are not cached.

FFT callbacks#

cuFFT provides FFT callbacks for merging pre- and/or post- processing kernels with the FFT routines so as to reduce the access to global memory. This capability is supported experimentally by CuPy. Users need to supply custom load and/or store kernels as strings, and set up a context manager via set_cufft_callbacks(). Note that the load (store) kernel pointer has to be named as d_loadCallbackPtr (d_storeCallbackPtr).

import cupy as cp

# a load callback that overwrites the input array to 1
code = r'''
__device__ cufftComplex CB_ConvertInputC(
    void *dataIn,
    size_t offset,
    void *callerInfo,
    void *sharedPtr)
{
    cufftComplex x;
    x.x = 1.;
    x.y = 0.;
    return x;
}
__device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInputC;
'''

a = cp.random.random((64, 128, 128)).astype(cp.complex64)

# this fftn call uses callback
with cp.fft.config.set_cufft_callbacks(cb_load=code):
    b = cp.fft.fftn(a, axes=(1,2))

# this does not use
c = cp.fft.fftn(cp.ones(shape=a.shape, dtype=cp.complex64), axes=(1,2))

# result agrees
assert cp.allclose(b, c)

# "static" plans are also cached, but are distinct from their no-callback counterparts
cp.fft.config.get_plan_cache().show_info()

Note

Internally, this feature requires recompiling a Python module for each distinct pair of load and store kernels. Therefore, the first invocation will be very slow, and this cost is amortized if the callbacks can be reused in the subsequent calculations. The compiled modules are cached on disk, with a default position $HOME/.cupy/callback_cache that can be changed by the environment variable CUPY_CACHE_DIR.

Multi-GPU FFT#

CuPy currently provides two kinds of experimental support for multi-GPU FFT.

Warning

Using multiple GPUs to perform FFT is not guaranteed to be more performant. The rule of thumb is if the transform fits in 1 GPU, you should avoid using multiple.

The first kind of support is with the high-level fft() and ifft() APIs, which requires the input array to reside on one of the participating GPUs. The multi-GPU calculation is done under the hood, and by the end of the calculation the result again resides on the device where it started. Currently only 1D complex-to-complex (C2C) transform is supported; complex-to-real (C2R) or real-to-complex (R2C) transforms (such as rfft() and friends) are not. The transform can be either batched (batch size > 1) or not (batch size = 1).

import cupy as cp

cp.fft.config.use_multi_gpus = True
cp.fft.config.set_cufft_gpus([0, 1])  # use GPU 0 & 1

shape = (64, 64)  # batch size = 64
dtype = cp.complex64
a = cp.random.random(shape).astype(dtype)  # reside on GPU 0

b = cp.fft.fft(a)  # computed on GPU 0 & 1, reside on GPU 0

If you need to perform 2D/3D transforms (ex: fftn()) instead of 1D (ex: fft()), it would likely still work, but in this particular use case it loops over the transformed axes under the hood (which is exactly what is done in NumPy too), which could lead to suboptimal performance.

The second kind of usage is to use the low-level, private CuPy APIs. You need to construct a Plan1d object and use it as if you are programming in C/C++ with cuFFT. Using this approach, your input array can reside on the host as a numpy.ndarray so that its size can be much larger than what a single GPU can accommodate, which is one of the main reasons to run multi-GPU FFT.

import numpy as np
import cupy as cp

# no need to touch cp.fft.config, as we are using low-level API

shape = (64, 64)
dtype = np.complex64
a = np.random.random(shape).astype(dtype)  # reside on CPU

if len(shape) == 1:
    batch = 1
    nx = shape[0]
elif len(shape) == 2:
    batch = shape[0]
    nx = shape[1]

# compute via cuFFT
cufft_type = cp.cuda.cufft.CUFFT_C2C  # single-precision c2c
plan = cp.cuda.cufft.Plan1d(nx, cufft_type, batch, devices=[0,1])
out_cp = np.empty_like(a)  # output on CPU
plan.fft(a, out_cp, cufft.CUFFT_FORWARD)

out_np = numpy.fft.fft(a)  # use NumPy's fft
# np.fft.fft alway returns np.complex128
if dtype is numpy.complex64:
    out_np = out_np.astype(dtype)

# check result
assert np.allclose(out_cp, out_np, rtol=1e-4, atol=1e-7)

For this use case, please consult the cuFFT documentation on multi-GPU transform for further detail.

Note

The multi-GPU plans are cached if auto-generated via the high-level APIs, but not if manually generated via the low-level APIs.

Half-precision FFT#

cuFFT provides cufftXtMakePlanMany and cufftXtExec routines to support a wide range of FFT needs, including 64-bit indexing and half-precision FFT. CuPy provides an experimental support for this capability via the new (though private) XtPlanNd API. For half-precision FFT, on supported hardware it can be twice as fast than its single-precision counterpart. NumPy does not yet provide the necessary infrastructure for half-precision complex numbers (i.e., numpy.complex32), though, so the steps for this feature is currently a bit more involved than common cases.

import cupy as cp
import numpy as np


shape = (1024, 256, 256)  # input array shape
idtype = odtype = edtype = 'E'  # = numpy.complex32 in the future

# store the input/output arrays as fp16 arrays twice as long, as complex32 is not yet available
a = cp.random.random((shape[0], shape[1], 2*shape[2])).astype(cp.float16)
out = cp.empty_like(a)

# FFT with cuFFT
plan = cp.cuda.cufft.XtPlanNd(shape[1:],
                              shape[1:], 1, shape[1]*shape[2], idtype,
                              shape[1:], 1, shape[1]*shape[2], odtype,
                              shape[0], edtype,
                              order='C', last_axis=-1, last_size=None)

plan.fft(a, out, cp.cuda.cufft.CUFFT_FORWARD)

# FFT with NumPy
a_np = cp.asnumpy(a).astype(np.float32)  # upcast
a_np = a_np.view(np.complex64)
out_np = np.fft.fftn(a_np, axes=(-2,-1))
out_np = np.ascontiguousarray(out_np).astype(np.complex64)  # downcast
out_np = out_np.view(np.float32)
out_np = out_np.astype(np.float16)

# don't worry about accruacy for now, as we probably lost a lot during casting
print('ok' if cp.mean(cp.abs(out - cp.asarray(out_np))) < 0.1 else 'not ok')

The 64-bit indexing support for all high-level FFT APIs is planned for a future CuPy release.

Memory Management#

CuPy uses memory pool for memory allocations by default. The memory pool significantly improves the performance by mitigating the overhead of memory allocation and CPU/GPU synchronization.

There are two different memory pools in CuPy:

  • Device memory pool (GPU device memory), which is used for GPU memory allocations.

  • Pinned memory pool (non-swappable CPU memory), which is used during CPU-to-GPU data transfer.

Attention

When you monitor the memory usage (e.g., using nvidia-smi for GPU memory or ps for CPU memory), you may notice that memory not being freed even after the array instance become out of scope. This is an expected behavior, as the default memory pool “caches” the allocated memory blocks.

See Low-level CUDA support for the details of memory management APIs.

For using pinned memory more conveniently, we also provide a few high-level APIs in the cupyx namespace, including cupyx.empty_pinned(), cupyx.empty_like_pinned(), cupyx.zeros_pinned(), and cupyx.zeros_like_pinned(). They return NumPy arrays backed by pinned memory. If CuPy’s pinned memory pool is in use, the pinned memory is allocated from the pool.

Note

CuPy v8 and above provides a FFT plan cache that could use a portion of device memory if FFT and related functions are used. The memory taken can be released by shrinking or disabling the cache.

Memory Pool Operations#

The memory pool instance provides statistics about memory allocation. To access the default memory pool instance, use cupy.get_default_memory_pool() and cupy.get_default_pinned_memory_pool(). You can also free all unused memory blocks hold in the memory pool. See the example code below for details:

import cupy
import numpy

mempool = cupy.get_default_memory_pool()
pinned_mempool = cupy.get_default_pinned_memory_pool()

# Create an array on CPU.
# NumPy allocates 400 bytes in CPU (not managed by CuPy memory pool).
a_cpu = numpy.ndarray(100, dtype=numpy.float32)
print(a_cpu.nbytes)                      # 400

# You can access statistics of these memory pools.
print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 0
print(pinned_mempool.n_free_blocks())    # 0

# Transfer the array from CPU to GPU.
# This allocates 400 bytes from the device memory pool, and another 400
# bytes from the pinned memory pool.  The allocated pinned memory will be
# released just after the transfer is complete.  Note that the actual
# allocation size may be rounded to larger value than the requested size
# for performance.
a = cupy.array(a_cpu)
print(a.nbytes)                          # 400
print(mempool.used_bytes())              # 512
print(mempool.total_bytes())             # 512
print(pinned_mempool.n_free_blocks())    # 1

# When the array goes out of scope, the allocated device memory is released
# and kept in the pool for future reuse.
a = None  # (or `del a`)
print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 512
print(pinned_mempool.n_free_blocks())    # 1

# You can clear the memory pool by calling `free_all_blocks`.
mempool.free_all_blocks()
pinned_mempool.free_all_blocks()
print(mempool.used_bytes())              # 0
print(mempool.total_bytes())             # 0
print(pinned_mempool.n_free_blocks())    # 0

See cupy.cuda.MemoryPool and cupy.cuda.PinnedMemoryPool for details.

Limiting GPU Memory Usage#

You can hard-limit the amount of GPU memory that can be allocated by using CUPY_GPU_MEMORY_LIMIT environment variable (see Environment variables for details).

# Set the hard-limit to 1 GiB:
#   $ export CUPY_GPU_MEMORY_LIMIT="1073741824"

# You can also specify the limit in fraction of the total amount of memory
# on the GPU. If you have a GPU with 2 GiB memory, the following is
# equivalent to the above configuration.
#   $ export CUPY_GPU_MEMORY_LIMIT="50%"

import cupy
print(cupy.get_default_memory_pool().get_limit())  # 1073741824

You can also set the limit (or override the value specified via the environment variable) using cupy.cuda.MemoryPool.set_limit(). In this way, you can use a different limit for each GPU device.

import cupy

mempool = cupy.get_default_memory_pool()

with cupy.cuda.Device(0):
    mempool.set_limit(size=1024**3)  # 1 GiB

with cupy.cuda.Device(1):
    mempool.set_limit(size=2*1024**3)  # 2 GiB

Note

CUDA allocates some GPU memory outside of the memory pool (such as CUDA context, library handles, etc.). Depending on the usage, such memory may take one to few hundred MiB. That will not be counted in the limit.

Changing Memory Pool#

You can use your own memory allocator instead of the default memory pool by passing the memory allocation function to cupy.cuda.set_allocator() / cupy.cuda.set_pinned_memory_allocator(). The memory allocator function should take 1 argument (the requested size in bytes) and return cupy.cuda.MemoryPointer / cupy.cuda.PinnedMemoryPointer.

CuPy provides two such allocators for using managed memory and stream ordered memory on GPU, see cupy.cuda.malloc_managed() and cupy.cuda.malloc_async(), respectively, for details. To enable a memory pool backed by managed memory, you can construct a new MemoryPool instance with its allocator set to malloc_managed() as follows

import cupy

# Use managed memory
cupy.cuda.set_allocator(cupy.cuda.MemoryPool(cupy.cuda.malloc_managed).malloc)

Note that if you pass malloc_managed() directly to set_allocator() without constructing a MemoryPool instance, when the memory is freed it will be released back to the system immediately, which may or may not be desired.

Stream Ordered Memory Allocator is a new feature added since CUDA 11.2. CuPy provides an experimental interface to it. Similar to CuPy’s memory pool, Stream Ordered Memory Allocator also allocates/deallocates memory asynchronously from/to a memory pool in a stream-ordered fashion. The key difference is that it is a built-in feature implemented in the CUDA driver by NVIDIA, so other CUDA applications in the same processs can easily allocate memory from the same pool.

To enable a memory pool that manages stream ordered memory, you can construct a new MemoryAsyncPool instance:

import cupy

# Use asynchronous stream ordered memory
cupy.cuda.set_allocator(cupy.cuda.MemoryAsyncPool().malloc)

# Create a custom stream
s = cupy.cuda.Stream()

# This would allocate memory asynchronously on stream s
with s:
    a = cupy.empty((100,), dtype=cupy.float64)

Note that in this case we do not use the MemoryPool class. The MemoryAsyncPool takes a different input argument from that of MemoryPool to indicate which pool to use. Please refer to MemoryAsyncPool’s documentation for further detail.

Note that if you pass malloc_async() directly to set_allocator() without constructing a MemoryAsyncPool instance, the device’s current memory pool will be used.

When using stream ordered memory, it is important that you maintain a correct stream semantics yourselves using, for example, the Stream and Event APIs (see Streams and Events for details); CuPy does not attempt to act smartly for you. Upon deallocation, the memory is freed asynchronously either on the stream it was allocated (first attempt), or on any current CuPy stream (second attempt). It is permitted that the stream on which the memory was allocated gets destroyed before all memory allocated on it is freed.

In addition, applications/libraries internally use cudaMalloc (CUDA’s default, synchronous allocator) could have unexpected interplay with Stream Ordered Memory Allocator. Specifically, memory freed to the memory pool might not be immediately visible to cudaMalloc, leading to potential out-of-memory errors. In this case, you can either call free_all_blocks() or just manually perform a (event/stream/device) synchronization, and retry.

Currently the MemoryAsyncPool interface is experimental. In particular, while its API is largely identical to that of MemoryPool, several of the pool’s methods require a sufficiently new driver (and of course, a supported hardware, CUDA version, and platform) due to CUDA’s limitation.

You can even disable the default memory pool by the code below. Be sure to do this before any other CuPy operations.

import cupy

# Disable memory pool for device memory (GPU)
cupy.cuda.set_allocator(None)

# Disable memory pool for pinned memory (CPU).
cupy.cuda.set_pinned_memory_allocator(None)

Performance Best Practices#

Here we gather a few tricks and advices for improving CuPy’s performance.

Benchmarking#

It is utterly important to first identify the performance bottleneck before making any attempt to optimize your code. To help set up a baseline benchmark, CuPy provides a useful utility cupyx.profiler.benchmark() for timing the elapsed time of a Python function on both CPU and GPU:

>>> from cupyx.profiler import benchmark
>>>
>>> def my_func(a):
...     return cp.sqrt(cp.sum(a**2, axis=-1))
...
>>> a = cp.random.random((256, 1024))
>>> print(benchmark(my_func, (a,), n_repeat=20))  
my_func             :    CPU:   44.407 us   +/- 2.428 (min:   42.516 / max:   53.098) us     GPU-0:  181.565 us   +/- 1.853 (min:  180.288 / max:  188.608) us

Because GPU executions run asynchronously with respect to CPU executions, a common pitfall in GPU programming is to mistakenly measure the elapsed time using CPU timing utilities (such as time.perf_counter() from the Python Standard Library or the %timeit magic from IPython), which have no knowledge in the GPU runtime. cupyx.profiler.benchmark() addresses this by setting up CUDA events on the Current Stream right before and after the function to be measured and synchronizing over the end event (see Streams and Events for detail). Below we sketch what is done internally in cupyx.profiler.benchmark():

>>> import time
>>> start_gpu = cp.cuda.Event()
>>> end_gpu = cp.cuda.Event()
>>>
>>> start_gpu.record()
>>> start_cpu = time.perf_counter()
>>> out = my_func(a)
>>> end_cpu = time.perf_counter()
>>> end_gpu.record()
>>> end_gpu.synchronize()
>>> t_gpu = cp.cuda.get_elapsed_time(start_gpu, end_gpu)
>>> t_cpu = end_cpu - start_cpu

Additionally, cupyx.profiler.benchmark() runs a few warm-up runs to reduce timing fluctuation and exclude the overhead in first invocations.

One-Time Overheads#

Be aware of these overheads when benchmarking CuPy code.

Context Initialization#

It may take several seconds when calling a CuPy function for the first time in a process. This is because the CUDA driver creates a CUDA context during the first CUDA API call in CUDA applications.

Kernel Compilation#

CuPy uses on-the-fly kernel synthesis. When a kernel call is required, it compiles a kernel code optimized for the dimensions and dtypes of the given arguments, sends them to the GPU device, and executes the kernel.

CuPy caches the kernel code sent to GPU device within the process, which reduces the kernel compilation time on further calls.

The compiled code is also cached in the directory ${HOME}/.cupy/kernel_cache (the path can be overwritten by setting the CUPY_CACHE_DIR environment variable). This allows reusing the compiled kernel binary across the process.

In-depth profiling#

Under construction. To mark with NVTX/rocTX ranges, you can use the cupyx.profiler.time_range() API. To start/stop the profiler, you can use the cupyx.profiler.profile() API.

Use CUB/cuTENSOR backends for reduction and other routines#

For reduction operations (such as sum(), prod(), amin(), amax(), argmin(), argmax()) and many more routines built upon them, CuPy ships with our own implementations so that things just work out of the box. However, there are dedicated efforts to further accelerate these routines, such as CUB and cuTENSOR.

In order to support more performant backends wherever applicable, starting v8 CuPy introduces an environment variable CUPY_ACCELERATORS to allow users to specify the desired backends (and in what order they are tried). For example, consider summing over a 256-cubic array:

>>> from cupyx.profiler import benchmark
>>> a = cp.random.random((256, 256, 256), dtype=cp.float32)
>>> print(benchmark(a.sum, (), n_repeat=100))  
sum                 :    CPU:   12.101 us   +/- 0.694 (min:   11.081 / max:   17.649) us     GPU-0:10174.898 us   +/-180.551 (min:10084.576 / max:10595.936) us

We can see that it takes about 10 ms to run (on this GPU). However, if we launch the Python session using CUPY_ACCELERATORS=cub python, we get a ~100x speedup for free (only ~0.1 ms):

>>> print(benchmark(a.sum, (), n_repeat=100))  
sum                 :    CPU:   20.569 us   +/- 5.418 (min:   13.400 / max:   28.439) us     GPU-0:  114.740 us   +/- 4.130 (min:  108.832 / max:  122.752) us

CUB is a backend shipped together with CuPy. It also accelerates other routines, such as inclusive scans (ex: cumsum()), histograms, sparse matrix-vector multiplications (not applicable in CUDA 11), and ReductionKernel. cuTENSOR offers optimized performance for binary elementwise ufuncs, reduction and tensor contraction. If cuTENSOR is installed, setting CUPY_ACCELERATORS=cub,cutensor, for example, would try CUB first and fall back to cuTENSOR if CUB does not provide the needed support. In the case that both backends are not applicable, it falls back to CuPy’s default implementation.

Note that while in general the accelerated reductions are faster, there could be exceptions depending on the data layout. In particular, the CUB reduction only supports reduction over contiguous axes. In any case, we recommend to perform some benchmarks to determine whether CUB/cuTENSOR offers better performance or not.

Note

CuPy v11 and above uses CUB by default. To turn it off, you need to explicitly specify the environment variable CUPY_ACCELERATORS="".

Overlapping work using streams#

Under construction.

Use JIT compiler#

Under construction. For now please refer to JIT kernel definition for a quick introduction.

Prefer float32 over float64#

Under construction.

Interoperability#

CuPy can be used in conjunction with other libraries.

NumPy#

cupy.ndarray implements __array_ufunc__ interface (see NEP 13 — A Mechanism for Overriding Ufuncs for details). This enables NumPy ufuncs to be directly operated on CuPy arrays. __array_ufunc__ feature requires NumPy 1.13 or later.

import cupy
import numpy

arr = cupy.random.randn(1, 2, 3, 4).astype(cupy.float32)
result = numpy.sum(arr)
print(type(result))  # => <class 'cupy._core.core.ndarray'>

cupy.ndarray also implements __array_function__ interface (see NEP 18 — A dispatch mechanism for NumPy’s high level array functions for details). This enables code using NumPy to be directly operated on CuPy arrays. __array_function__ feature requires NumPy 1.16 or later; As of NumPy 1.17, __array_function__ is enabled by default.

Numba#

Numba is a Python JIT compiler with NumPy support.

cupy.ndarray implements __cuda_array_interface__, which is the CUDA array interchange interface compatible with Numba v0.39.0 or later (see CUDA Array Interface for details). It means you can pass CuPy arrays to kernels JITed with Numba. The following is a simple example code borrowed from numba/numba#2860:

import cupy
from numba import cuda

@cuda.jit
def add(x, y, out):
        start = cuda.grid(1)
        stride = cuda.gridsize(1)
        for i in range(start, x.shape[0], stride):
                out[i] = x[i] + y[i]

a = cupy.arange(10)
b = a * 2
out = cupy.zeros_like(a)

print(out)  # => [0 0 0 0 0 0 0 0 0 0]

add[1, 32](a, b, out)

print(out)  # => [ 0  3  6  9 12 15 18 21 24 27]

In addition, cupy.asarray() supports zero-copy conversion from Numba CUDA array to CuPy array.

import numpy
import numba
import cupy

x = numpy.arange(10)  # type: numpy.ndarray
x_numba = numba.cuda.to_device(x)  # type: numba.cuda.cudadrv.devicearray.DeviceNDArray
x_cupy = cupy.asarray(x_numba)  # type: cupy.ndarray

Warning

__cuda_array_interface__ specifies that the object lifetime must be managed by the user, so it is an undefined behavior if the exported object is destroyed while still in use by the consumer library.

Note

CuPy uses two environment variables controlling the exchange behavior: CUPY_CUDA_ARRAY_INTERFACE_SYNC and CUPY_CUDA_ARRAY_INTERFACE_EXPORT_VERSION.

mpi4py#

MPI for Python (mpi4py) is a Python wrapper for the Message Passing Interface (MPI) libraries.

MPI is the most widely used standard for high-performance inter-process communications. Recently several MPI vendors, including MPICH, Open MPI and MVAPICH, have extended their support beyond the MPI-3.1 standard to enable “CUDA-awareness”; that is, passing CUDA device pointers directly to MPI calls to avoid explicit data movement between the host and the device.

With the __cuda_array_interface__ (as mentioned above) and DLPack data exchange protocols (see DLPack below) implemented in CuPy, mpi4py now provides (experimental) support for passing CuPy arrays to MPI calls, provided that mpi4py is built against a CUDA-aware MPI implementation. The following is a simple example code borrowed from mpi4py Tutorial:

# To run this script with N MPI processes, do
# mpiexec -n N python this_script.py

import cupy
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()

# Allreduce
sendbuf = cupy.arange(10, dtype='i')
recvbuf = cupy.empty_like(sendbuf)
comm.Allreduce(sendbuf, recvbuf)
assert cupy.allclose(recvbuf, sendbuf*size)

This new feature is added since mpi4py 3.1.0. See the mpi4py website for more information.

PyTorch#

PyTorch is a machine learning framefork that provides high-performance, differentiable tensor operations.

PyTorch also supports __cuda_array_interface__, so zero-copy data exchange between CuPy and PyTorch can be achieved at no cost. The only caveat is PyTorch by default creates CPU tensors, which do not have the __cuda_array_interface__ property defined, and users need to ensure the tensor is already on GPU before exchanging.

>>> import cupy as cp
>>> import torch
>>>
>>> # convert a torch tensor to a cupy array
>>> a = torch.rand((4, 4), device='cuda')
>>> b = cp.asarray(a)
>>> b *= b
>>> b
array([[0.8215962 , 0.82399917, 0.65607935, 0.30354425],
       [0.422695  , 0.8367199 , 0.00208597, 0.18545236],
       [0.00226746, 0.46201342, 0.6833052 , 0.47549972],
       [0.5208748 , 0.6059282 , 0.1909013 , 0.5148635 ]], dtype=float32)
>>> a
tensor([[0.8216, 0.8240, 0.6561, 0.3035],
        [0.4227, 0.8367, 0.0021, 0.1855],
        [0.0023, 0.4620, 0.6833, 0.4755],
        [0.5209, 0.6059, 0.1909, 0.5149]], device='cuda:0')
>>> # check the underlying memory pointer is the same
>>> assert a.__cuda_array_interface__['data'][0] == b.__cuda_array_interface__['data'][0]
>>>
>>> # convert a cupy array to a torch tensor
>>> a = cp.arange(10)
>>> b = torch.as_tensor(a, device='cuda')
>>> b += 3
>>> b
tensor([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12], device='cuda:0')
>>> a
array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
>>> assert a.__cuda_array_interface__['data'][0] == b.__cuda_array_interface__['data'][0]

PyTorch also supports zero-copy data exchange through DLPack (see DLPack below):

import cupy
import torch

from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack

# Create a PyTorch tensor.
tx1 = torch.randn(1, 2, 3, 4).cuda()

# Convert it into a DLPack tensor.
dx = to_dlpack(tx1)

# Convert it into a CuPy array.
cx = cupy.from_dlpack(dx)

# Convert it back to a PyTorch tensor.
tx2 = from_dlpack(cx.toDlpack())

pytorch-pfn-extras library provides additional integration features with PyTorch, including memory pool sharing and stream sharing:

>>> import cupy
>>> import torch
>>> import pytorch_pfn_extras as ppe
>>>
>>> # Perform CuPy memory allocation using the PyTorch memory pool.
>>> ppe.cuda.use_torch_mempool_in_cupy()
>>> torch.cuda.memory_allocated()
0
>>> arr = cupy.arange(10)
>>> torch.cuda.memory_allocated()
512
>>>
>>> # Change the default stream in PyTorch and CuPy:
>>> stream = torch.cuda.Stream()
>>> with ppe.cuda.stream(stream):
...     ...
Using custom kernels in PyTorch#

With the DLPack protocol, it becomes very simple to implement functions in PyTorch using CuPy user-defined kernels. Below is the example of a PyTorch autograd function that computes the forward and backward pass of the logarithm using cupy.RawKernel s.

import cupy
import torch


cupy_custom_kernel_fwd = cupy.RawKernel(
    r"""
extern "C" __global__
void cupy_custom_kernel_fwd(const float* x, float* y, int size) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid < size)
        y[tid] = log(x[tid]);
}
""",
    "cupy_custom_kernel_fwd",
)


cupy_custom_kernel_bwd = cupy.RawKernel(
    r"""
extern "C" __global__
void cupy_custom_kernel_bwd(const float* x, float* gy, float* gx, int size) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid < size)
        gx[tid] = gy[tid] / x[tid];
}
""",
    "cupy_custom_kernel_bwd",
)


class CuPyLog(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        ctx.input = x
        # Enforce contiguous arrays to simplify RawKernel indexing.
        cupy_x = cupy.ascontiguousarray(cupy.from_dlpack(x.detach()))
        cupy_y = cupy.empty(cupy_x.shape, dtype=cupy_x.dtype)
        x_size = cupy_x.size
        bs = 128
        cupy_custom_kernel_fwd(
            (bs,), ((x_size + bs - 1) // bs,), (cupy_x, cupy_y, x_size)
        )
        # the ownership of the device memory backing cupy_y is implicitly
        # transferred to torch_y, so this operation is safe even after
        # going out of scope of this function.
        torch_y = torch.from_dlpack(cupy_y)
        return torch_y

    @staticmethod
    def backward(ctx, grad_y):
        # Enforce contiguous arrays to simplify RawKernel indexing.
        cupy_input = cupy.from_dlpack(ctx.input.detach()).ravel()
        cupy_grad_y = cupy.from_dlpack(grad_y.detach()).ravel()
        cupy_grad_x = cupy.zeros(cupy_grad_y.shape, dtype=cupy_grad_y.dtype)
        gy_size = cupy_grad_y.size
        bs = 128
        cupy_custom_kernel_bwd(
            (bs,),
            ((gy_size + bs - 1) // bs,),
            (cupy_input, cupy_grad_y, cupy_grad_x, gy_size),
        )
        # the ownership of the device memory backing cupy_grad_x is implicitly
        # transferred to torch_y, so this operation is safe even after
        # going out of scope of this function.
        torch_grad_x = torch.from_dlpack(cupy_grad_x)
        return torch_grad_x

Note

Directly feeding a torch.Tensor to cupy.from_dlpack() is only supported in the (new) DLPack data exchange protocol added in CuPy v10+ and PyTorch 1.10+. For earlier versions, you will need to wrap the Tensor with torch.utils.dlpack.to_dlpack() as shown in the above examples.

RMM#

RMM (RAPIDS Memory Manager) provides highly configurable memory allocators.

RMM provides an interface to allow CuPy to allocate memory from the RMM memory pool instead of from CuPy’s own pool. It can be set up as simple as:

import cupy
import rmm
cupy.cuda.set_allocator(rmm.rmm_cupy_allocator)

Sometimes, a more performant allocator may be desirable. RMM provides an option to switch the allocator:

import cupy
import rmm
rmm.reinitialize(pool_allocator=True)  # can also set init pool size etc here
cupy.cuda.set_allocator(rmm.rmm_cupy_allocator)

For more information on CuPy’s memory management, see Memory Management.

DLPack#

DLPack is a specification of tensor structure to share tensors among frameworks.

CuPy supports importing from and exporting to DLPack data structure (cupy.from_dlpack() and cupy.ndarray.toDlpack()).

Here is a simple example:

import cupy

# Create a CuPy array.
cx1 = cupy.random.randn(1, 2, 3, 4).astype(cupy.float32)

# Convert it into a DLPack tensor.
dx = cx1.toDlpack()

# Convert it back to a CuPy array.
cx2 = cupy.from_dlpack(dx)

TensorFlow also supports DLpack, so zero-copy data exchange between CuPy and TensorFlow through DLPack is possible:

>>> import tensorflow as tf
>>> import cupy as cp
>>>
>>> # convert a TF tensor to a cupy array
>>> with tf.device('/GPU:0'):
...     a = tf.random.uniform((10,))
...
>>> a
<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([0.9672388 , 0.57568085, 0.53163004, 0.6536236 , 0.20479882,
       0.84908986, 0.5852566 , 0.30355775, 0.1733712 , 0.9177849 ],
      dtype=float32)>
>>> a.device
'/job:localhost/replica:0/task:0/device:GPU:0'
>>> cap = tf.experimental.dlpack.to_dlpack(a)
>>> b = cp.from_dlpack(cap)
>>> b *= 3
>>> b
array([1.4949363 , 0.60699713, 1.3276931 , 1.5781245 , 1.1914308 ,
       2.3180873 , 1.9560868 , 1.3932796 , 1.9299742 , 2.5352407 ],
      dtype=float32)
>>> a
<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([1.4949363 , 0.60699713, 1.3276931 , 1.5781245 , 1.1914308 ,
       2.3180873 , 1.9560868 , 1.3932796 , 1.9299742 , 2.5352407 ],
      dtype=float32)>
>>>
>>> # convert a cupy array to a TF tensor
>>> a = cp.arange(10)
>>> cap = a.toDlpack()
>>> b = tf.experimental.dlpack.from_dlpack(cap)
>>> b.device
'/job:localhost/replica:0/task:0/device:GPU:0'
>>> b
<tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Be aware that in TensorFlow all tensors are immutable, so in the latter case any changes in b cannot be reflected in the CuPy array a.

Note that as of DLPack v0.5 for correctness the above approach (implicitly) requires users to ensure that such conversion (both importing and exporting a CuPy array) must happen on the same CUDA/HIP stream. If in doubt, the current CuPy stream in use can be fetched by, for example, calling cupy.cuda.get_current_stream(). Please consult the other framework’s documentation for how to access and control the streams.

DLPack data exchange protocol#

To obviate user-managed streams and DLPack tensor objects, the DLPack data exchange protocol provides a mechanism to shift the responsibility from users to libraries. Any compliant objects (such as cupy.ndarray) must implement a pair of methods __dlpack__ and __dlpack_device__. The function cupy.from_dlpack() accepts such object and returns a cupy.ndarray that is safely accessible on CuPy’s current stream. Likewise, cupy.ndarray can be exported via any compliant library’s from_dlpack() function.

Note

CuPy uses CUPY_DLPACK_EXPORT_VERSION to control how to handle tensors backed by CUDA managed memory.

Device Memory Pointers#

Import#

CuPy provides UnownedMemory API that allows interoperating with GPU device memory allocated in other libraries.

# Create a memory chunk from raw pointer and its size.
mem = cupy.cuda.UnownedMemory(140359025819648, 1024, owner=None)

# Wrap it as a MemoryPointer.
memptr = cupy.cuda.MemoryPointer(mem, offset=0)

# Create an ndarray view backed by the memory pointer.
arr = cupy.ndarray((16, 16), dtype=cupy.float32, memptr=memptr)
assert arr.nbytes <= arr.data.mem.size

Be aware that you are responsible for specifying a correct shape, dtype, strides, and order such that it fits in the chunk when creating an ndarray view.

The UnownedMemory API does not manage the lifetime of the memory allocation. You must ensure that the pointer is alive while in use by CuPy. In case the pointer lifetime is managed by a Python object, you can pass it to the owner argument of the UnownedMemory to keep the reference to the object.

Export#

You can pass memory pointers allocated in CuPy to other libraries.

arr = cupy.arange(10)
print(arr.data.ptr, arr.nbytes)  # => (140359025819648, 80)

The memory allocated by CuPy will be freed when the ndarray (arr) gets destructed. You must keep ndarray instance alive while the pointer is in use by other libraries.

CUDA Stream Pointers#

Import#

CuPy provides ExternalStream API that allows interoperating with CUDA streams created in other libraries.

import torch

# Create a stream on PyTorch.
s = torch.cuda.Stream()

# Switch the current stream in PyTorch.
with torch.cuda.stream(s):
    # Switch the current stream in CuPy, using the pointer of the stream created in PyTorch.
    with cupy.cuda.ExternalStream(s.cuda_stream):
        # This block runs on the same CUDA stream.
        torch.arange(10, device='cuda')
        cupy.arange(10)

The ExternalStream API does not manage the lifetime of the stream. You must ensure that the stream pointer is alive while in use by CuPy.

You also need to make sure that the ExternalStream object is used on the device where the stream was created. CuPy can validate that for you if you pass device_id argument when creating ExternalStream.

Export#

You can pass streams created in CuPy to other libraries.

s = cupy.cuda.Stream()
print(s.ptr, s.device_id)  # => (93997451352336, 0)

The CUDA stream will be destroyed when the Stream (s) gets destructed. You must keep the Stream instance alive while the pointer is in use by other libraries.

Differences between CuPy and NumPy#

The interface of CuPy is designed to obey that of NumPy. However, there are some differences.

Cast behavior from float to integer#

Some casting behaviors from float to integer are not defined in C++ specification. The casting from a negative float to unsigned integer and infinity to integer is one of such examples. The behavior of NumPy depends on your CPU architecture. This is the result on an Intel CPU:

>>> np.array([-1], dtype=np.float32).astype(np.uint32)
array([4294967295], dtype=uint32)
>>> cupy.array([-1], dtype=np.float32).astype(np.uint32)
array([0], dtype=uint32)
>>> np.array([float('inf')], dtype=np.float32).astype(np.int32)
array([-2147483648], dtype=int32)
>>> cupy.array([float('inf')], dtype=np.float32).astype(np.int32)
array([2147483647], dtype=int32)

Random methods support dtype argument#

NumPy’s random value generator does not support a dtype argument and instead always returns a float64 value. We support the option in CuPy because cuRAND, which is used in CuPy, supports both float32 and float64.

>>> np.random.randn(dtype=np.float32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: randn() got an unexpected keyword argument 'dtype'
>>> cupy.random.randn(dtype=np.float32)    
array(0.10689262300729752, dtype=float32)

Out-of-bounds indices#

CuPy handles out-of-bounds indices differently by default from NumPy when using integer array indexing. NumPy handles them by raising an error, but CuPy wraps around them.

>>> x = np.array([0, 1, 2])
>>> x[[1, 3]] = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 1 with size 3
>>> x = cupy.array([0, 1, 2])
>>> x[[1, 3]] = 10
>>> x
array([10, 10,  2])

Duplicate values in indices#

CuPy’s __setitem__ behaves differently from NumPy when integer arrays reference the same location multiple times. In that case, the value that is actually stored is undefined. Here is an example of CuPy.

>>> a = cupy.zeros((2,))
>>> i = cupy.arange(10000) % 2
>>> v = cupy.arange(10000).astype(np.float32)
>>> a[i] = v
>>> a  
array([ 9150.,  9151.])

NumPy stores the value corresponding to the last element among elements referencing duplicate locations.

>>> a_cpu = np.zeros((2,))
>>> i_cpu = np.arange(10000) % 2
>>> v_cpu = np.arange(10000).astype(np.float32)
>>> a_cpu[i_cpu] = v_cpu
>>> a_cpu
array([9998., 9999.])

Zero-dimensional array#

Reduction methods#

NumPy’s reduction functions (e.g. numpy.sum()) return scalar values (e.g. numpy.float32). However CuPy counterparts return zero-dimensional cupy.ndarray s. That is because CuPy scalar values (e.g. cupy.float32) are aliases of NumPy scalar values and are allocated in CPU memory. If these types were returned, it would be required to synchronize between GPU and CPU. If you want to use scalar values, cast the returned arrays explicitly.

>>> type(np.sum(np.arange(3))) == np.int64
True
>>> type(cupy.sum(cupy.arange(3))) == cupy.ndarray
True
Type promotion#

CuPy automatically promotes dtypes of cupy.ndarray s in a function with two or more operands, the result dtype is determined by the dtypes of the inputs. This is different from NumPy’s rule on type promotion, when operands contain zero-dimensional arrays. Zero-dimensional numpy.ndarray s are treated as if they were scalar values if they appear in operands of NumPy’s function, This may affect the dtype of its output, depending on the values of the “scalar” inputs.

>>> (np.array(3, dtype=np.int32) * np.array([1., 2.], dtype=np.float32)).dtype
dtype('float32')
>>> (np.array(300000, dtype=np.int32) * np.array([1., 2.], dtype=np.float32)).dtype
dtype('float64')
>>> (cupy.array(3, dtype=np.int32) * cupy.array([1., 2.], dtype=np.float32)).dtype
dtype('float64')

Matrix type (numpy.matrix)#

SciPy returns numpy.matrix (a subclass of numpy.ndarray) when dense matrices are computed from sparse matrices (e.g., coo_matrix + ndarray). However, CuPy returns cupy.ndarray for such operations.

There is no plan to provide numpy.matrix equivalent in CuPy. This is because the use of numpy.matrix is no longer recommended since NumPy 1.15.

Data types#

Data type of CuPy arrays cannot be non-numeric like strings or objects. See Overview for details.

Universal Functions only work with CuPy array or scalar#

Unlike NumPy, Universal Functions in CuPy only work with CuPy array or scalar. They do not accept other objects (e.g., lists or numpy.ndarray).

>>> np.power([np.arange(5)], 2)
array([[ 0,  1,  4,  9, 16]])
>>> cupy.power([cupy.arange(5)], 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unsupported type <class 'list'>

Random seed arrays are hashed to scalars#

Like Numpy, CuPy’s RandomState objects accept seeds either as numbers or as full numpy arrays.

>>> seed = np.array([1, 2, 3, 4, 5])
>>> rs = cupy.random.RandomState(seed=seed)

However, unlike Numpy, array seeds will be hashed down to a single number and so may not communicate as much entropy to the underlying random number generator.

NaN (not-a-number) handling#

By default CuPy’s reduction functions (e.g., cupy.sum()) handle NaNs in complex numbers differently from NumPy’s counterparts:

>>> a = [0.5 + 3.7j, complex(0.7, np.nan), complex(np.nan, -3.9), complex(np.nan, np.nan)]
>>>
>>> a_np = np.asarray(a)
>>> print(a_np.max(), a_np.min())
(0.7+nanj) (0.7+nanj)
>>>
>>> a_cp = cp.asarray(a_np)
>>> print(a_cp.max(), a_cp.min())
(nan-3.9j) (nan-3.9j)

The reason is that internally the reduction is performed in a strided fashion, thus it does not ensure a proper comparison order and cannot follow NumPy’s rule to always propagate the first-encountered NaN. Note that this difference does not apply when CUB is enabled (which is the default for CuPy v11 or later.)

Contiguity / Strides#

To provide the best performance, the contiguity of a resulting ndarray is not guaranteed to match with that of NumPy’s output.

>>> a = np.array([[1, 2], [3, 4]], order='F')
>>> print((a + a).flags.f_contiguous)
True
>>> a = cp.array([[1, 2], [3, 4]], order='F')
>>> print((a + a).flags.f_contiguous)
False

API Compatibility Policy#

This document expresses the design policy on compatibilities of CuPy APIs. Development team should obey this policy on deciding to add, extend, and change APIs and their behaviors.

This document is written for both users and developers. Users can decide the level of dependencies on CuPy’s implementations in their codes based on this document. Developers should read through this document before creating pull requests that contain changes on the interface. Note that this document may contain ambiguities on the level of supported compatibilities.

Versioning and Backward Compatibilities#

The updates of CuPy are classified into three levels: major, minor, and revision. These types have distinct levels of backward compatibilities.

  • Major update contains disruptive changes that break the backward compatibility.

  • Minor update contains additions and extensions to the APIs that keep the backward compatibility supported.

  • Revision update contains improvements on the API implementations without changing any API specifications.

Note that we do not support full backward compatibility, which is almost infeasible for Python-based APIs, since there is no way to completely hide the implementation details.

Processes to Break Backward Compatibilities#

Deprecation, Dropping, and Its Preparation#

Any APIs may be deprecated at some minor updates. In such a case, the deprecation note is added to the API documentation, and the API implementation is changed to fire a deprecation warning (if possible). There should be another way to reimplement the same functionality previously written using the deprecated APIs.

Any APIs may be marked as to be dropped in the future. In such a case, the dropping is stated in the documentation with the major version number on which the API is planned to be dropped, and the API implementation is changed to fire a future warning (if possible).

The actual dropping should be done through the following steps:

  • Make the API deprecated. At this point, users should not use the deprecated API in their new application codes.

  • After that, mark the API as to be dropped in the future. It must be done in the minor update different from that of the deprecation.

  • At the major version announced in the above update, drop the API.

Consequently, it takes at least two minor versions to drop any APIs after the first deprecation.

API Changes and Its Preparation#

Any APIs may be marked as to be changed in the future for changes without backward compatibility. In such a case, the change is stated in the documentation with the version number on which the API is planned to be changed, and the API implementation is changed to fire the future warning on the certain usages.

The actual change should be done in the following steps:

  • Announce that the API will be changed in the future. At this point, the actual version of change need not be accurate.

  • After the announcement, mark the API as to be changed in the future with version number of planned changes. At this point, users should not use the marked API in their new application codes.

  • At the major update announced in the above update, change the API.

Supported Backward Compatibility#

This section defines backward compatibilities that minor updates must maintain.

Documented Interface#

CuPy has an official API documentation. Many applications can be written based on the documented features. We support backward compatibilities of documented features. In other words, codes only based on the documented features run correctly with minor-/revision- updated versions.

Developers are encouraged to use apparent names for objects of implementation details. For example, attributes outside of the documented APIs should have one or more underscores at the prefix of their names.

Undocumented behaviors#

Behaviors of CuPy implementation not stated in the documentation are undefined. Undocumented behaviors are not guaranteed to be stable between different minor/revision versions.

Minor update may contain changes to undocumented behaviors. For example, suppose an API X is added at the minor update. In the previous version, attempts to use X cause AttributeError. This behavior is not stated in the documentation, so this is undefined. Thus, adding the API X in minor version is permissible.

Revision update may also contain changes to undefined behaviors. Typical example is a bug fix. Another example is an improvement on implementation, which may change the internal object structures not shown in the documentation. As a consequence, even revision updates do not support compatibility of pickling, unless the full layout of pickled objects is clearly documented.

Documentation Error#

Compatibility is basically determined based on the documentation, though it sometimes contains errors. It may make the APIs confusing to assume the documentation always stronger than the implementations. We therefore may fix the documentation errors in any updates that may break the compatibility in regard to the documentation.

Note

Developers MUST NOT fix the documentation and implementation of the same functionality at the same time in revision updates as “bug fix”. Such a change completely breaks the backward compatibility. If you want to fix the bugs in both sides, first fix the documentation to fit it into the implementation, and start the API changing procedure described above.

Object Attributes and Properties#

Object attributes and properties are sometimes replaced by each other at minor updates. It does not break the user codes, except for the codes depending on how the attributes and properties are implemented.

Functions and Methods#

Methods may be replaced by callable attributes keeping the compatibility of parameters and return values in minor updates. It does not break the user codes, except for the codes depending on how the methods and callable attributes are implemented.

Exceptions and Warnings#

The specifications of raising exceptions are considered as a part of standard backward compatibilities. No exception is raised in the future versions with correct usages that the documentation allows, unless the API changing process is completed.

On the other hand, warnings may be added at any minor updates for any APIs. It means minor updates do not keep backward compatibility of warnings.

Installation Compatibility#

The installation process is another concern of compatibilities. We support environmental compatibilities in the following ways.

  • Any changes of dependent libraries that force modifications on the existing environments must be done in major updates. Such changes include following cases:

    • dropping supported versions of dependent libraries (e.g. dropping cuDNN v2)

    • adding new mandatory dependencies (e.g. adding h5py to setup_requires)

  • Supporting optional packages/libraries may be done in minor updates (e.g. supporting h5py in optional features).

Note

The installation compatibility does not guarantee that all the features of CuPy correctly run on supported environments. It may contain bugs that only occurs in certain environments. Such bugs should be fixed in some updates.

API Reference#


The N-dimensional array (ndarray)#

cupy.ndarray is the CuPy counterpart of NumPy numpy.ndarray. It provides an intuitive interface for a fixed-size multidimensional array which resides in a CUDA device.

For the basic concept of ndarrays, please refer to the NumPy documentation.

cupy.ndarray(self, shape[, dtype, memptr, ...])

Multi-dimensional array on a CUDA device.

Conversion to/from NumPy arrays#

cupy.ndarray and numpy.ndarray are not implicitly convertible to each other. That means, NumPy functions cannot take cupy.ndarrays as inputs, and vice versa.

Note that converting between cupy.ndarray and numpy.ndarray incurs data transfer between the host (CPU) device and the GPU device, which is costly in terms of performance.

cupy.array(obj[, dtype, copy, order, subok, ...])

Creates an array on the current device.

cupy.asarray(a[, dtype, order])

Converts an object to array.

cupy.asnumpy(a[, stream, order, out])

Returns an array on the host memory from an arbitrary source array.

Code compatibility features#

cupy.ndarray is designed to be interchangeable with numpy.ndarray in terms of code compatibility as much as possible. But occasionally, you will need to know whether the arrays you’re handling are cupy.ndarray or numpy.ndarray. One example is when invoking module-level functions such as cupy.sum() or numpy.sum(). In such situations, cupy.get_array_module() can be used.

cupy.get_array_module(*args)

Returns the array module for arguments.

cupyx.scipy.get_array_module(*args)

Returns the array module for arguments.

Universal functions (cupy.ufunc)#

CuPy provides universal functions (a.k.a. ufuncs) to support various elementwise operations. CuPy’s ufunc supports following features of NumPy’s one:

  • Broadcasting

  • Output type determination

  • Casting rules

ufunc#

ufunc(name, nin, nout, _Ops ops[, preamble, ...])

Universal function.

Methods#

These methods are only available for selected ufuncs.

Hint

In case you need support for other ufuncs, submit a feature request along with your use-case in the tracker issue.

Available ufuncs#

Math operations#

add

Adds two arrays elementwise.

subtract

Subtracts arguments elementwise.

multiply

Multiplies two arrays elementwise.

matmul

matmul(x1, x2, /, out=None, **kwargs)

divide

Elementwise true division (i.e.

logaddexp

Computes log(exp(x1) + exp(x2)) elementwise.

logaddexp2

Computes log2(exp2(x1) + exp2(x2)) elementwise.

true_divide

Elementwise true division (i.e.

floor_divide

Elementwise floor division (i.e.

negative

Takes numerical negative elementwise.

positive

Takes numerical positive elementwise.

power

Computes x1 ** x2 elementwise.

float_power

First array elements raised to powers from second array, element-wise.

remainder

Computes the remainder of Python division elementwise.

mod

Computes the remainder of Python division elementwise.

fmod

Computes the remainder of C division elementwise.

divmod

absolute

Elementwise absolute value function.

fabs

Calculates absolute values element-wise.

rint

Rounds each element of an array to the nearest integer.

sign

Elementwise sign function.

heaviside

Compute the Heaviside step function.

conj

Returns the complex conjugate, element-wise.

conjugate

Returns the complex conjugate, element-wise.

exp

Elementwise exponential function.

exp2

Elementwise exponentiation with base 2.

log

Elementwise natural logarithm function.

log2

Elementwise binary logarithm function.

log10

Elementwise common logarithm function.

expm1

Computes exp(x) - 1 elementwise.

log1p

Computes log(1 + x) elementwise.

sqrt

Elementwise square root function.

square

Elementwise square function.

cbrt

Elementwise cube root function.

reciprocal

Computes 1 / x elementwise.

gcd

Computes gcd of x1 and x2 elementwise.

lcm

Computes lcm of x1 and x2 elementwise.

Trigonometric functions#

sin

Elementwise sine function.

cos

Elementwise cosine function.

tan

Elementwise tangent function.

arcsin

Elementwise inverse-sine function (a.k.a.

arccos

Elementwise inverse-cosine function (a.k.a.

arctan

Elementwise inverse-tangent function (a.k.a.

arctan2

Elementwise inverse-tangent of the ratio of two arrays.

hypot

Computes the hypoteneous of orthogonal vectors of given length.

sinh

Elementwise hyperbolic sine function.

cosh

Elementwise hyperbolic cosine function.

tanh

Elementwise hyperbolic tangent function.

arcsinh

Elementwise inverse of hyperbolic sine function.

arccosh

Elementwise inverse of hyperbolic cosine function.

arctanh

Elementwise inverse of hyperbolic tangent function.

degrees

Converts angles from radians to degrees elementwise.

radians

Converts angles from degrees to radians elementwise.

deg2rad

Converts angles from degrees to radians elementwise.

rad2deg

Converts angles from radians to degrees elementwise.

Bit-twiddling functions#

bitwise_and

Computes the bitwise AND of two arrays elementwise.

bitwise_or

Computes the bitwise OR of two arrays elementwise.

bitwise_xor

Computes the bitwise XOR of two arrays elementwise.

invert

Computes the bitwise NOT of an array elementwise.

left_shift

Shifts the bits of each integer element to the left.

right_shift

Shifts the bits of each integer element to the right.

Comparison functions#

greater

Tests elementwise if x1 > x2.

greater_equal

Tests elementwise if x1 >= x2.

less

Tests elementwise if x1 < x2.

less_equal

Tests elementwise if x1 <= x2.

not_equal

Tests elementwise if x1 != x2.

equal

Tests elementwise if x1 == x2.

logical_and

Computes the logical AND of two arrays.

logical_or

Computes the logical OR of two arrays.

logical_xor

Computes the logical XOR of two arrays.

logical_not

Computes the logical NOT of an array.

maximum

Takes the maximum of two arrays elementwise.

minimum

Takes the minimum of two arrays elementwise.

fmax

Takes the maximum of two arrays elementwise.

fmin

Takes the minimum of two arrays elementwise.

Floating functions#

isfinite

Tests finiteness elementwise.

isinf

Tests if each element is the positive or negative infinity.

isnan

Tests if each element is a NaN.

fabs

Calculates absolute values element-wise.

signbit

Tests elementwise if the sign bit is set (i.e.

copysign

Returns the first argument with the sign bit of the second elementwise.

nextafter

Computes the nearest neighbor float values towards the second argument.

modf

Extracts the fractional and integral parts of an array elementwise.

ldexp

Computes x1 * 2 ** x2 elementwise.

frexp

Decomposes each element to mantissa and two's exponent.

fmod

Computes the remainder of C division elementwise.

floor

Rounds each element of an array to its floor integer.

ceil

Rounds each element of an array to its ceiling integer.

trunc

Rounds each element of an array towards zero.

Generalized Universal Functions#

In addition to regular ufuncs, CuPy also provides a wrapper class to convert regular cupy functions into Generalized Universal Functions as in NumPy https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html. This allows to automatically use keyword arguments such as axes, order, dtype without needing to explicitly implement them in the wrapped function.

GeneralizedUFunc(func, signature, **kwargs)

Creates a Generalized Universal Function by wrapping a user provided function with the signature.

Routines (NumPy)#

The following pages describe NumPy-compatible routines. These functions cover a subset of NumPy routines.

Array creation routines#

Ones and zeros#

empty(shape[, dtype, order])

Returns an array without initializing the elements.

empty_like(a[, dtype, order, subok, shape])

Returns a new array with same shape and dtype of a given array.

eye(N[, M, k, dtype, order])

Returns a 2-D array with ones on the diagonals and zeros elsewhere.

identity(n[, dtype])

Returns a 2-D identity array.

ones(shape[, dtype, order])

Returns a new array of given shape and dtype, filled with ones.

ones_like(a[, dtype, order, subok, shape])

Returns an array of ones with same shape and dtype as a given array.

zeros(shape[, dtype, order])

Returns a new array of given shape and dtype, filled with zeros.

zeros_like(a[, dtype, order, subok, shape])

Returns an array of zeros with same shape and dtype as a given array.

full(shape, fill_value[, dtype, order])

Returns a new array of given shape and dtype, filled with a given value.

full_like(a, fill_value[, dtype, order, ...])

Returns a full array with same shape and dtype as a given array.

From existing data#

array(obj[, dtype, copy, order, subok, ndmin])

Creates an array on the current device.

asarray(a[, dtype, order])

Converts an object to array.

asanyarray(a[, dtype, order])

Converts an object to array.

ascontiguousarray(a[, dtype])

Returns a C-contiguous array.

copy(a[, order])

Creates a copy of a given array on the current device.

frombuffer(*args, **kwargs)

Interpret a buffer as a 1-dimensional array.

fromfile(*args, **kwargs)

Reads an array from a file.

fromfunction(*args, **kwargs)

Construct an array by executing a function over each coordinate.

fromiter(*args, **kwargs)

Create a new 1-dimensional array from an iterable object.

fromstring(*args, **kwargs)

A new 1-D array initialized from text data in a string.

loadtxt(*args, **kwargs)

Load data from a text file.

Numerical ranges#

arange(start[, stop, step, dtype])

Returns an array with evenly spaced values within a given interval.

linspace(start, stop[, num, endpoint, ...])

Returns an array with evenly-spaced values within a given interval.

logspace(start, stop[, num, endpoint, base, ...])

Returns an array with evenly-spaced values on a log-scale.

meshgrid(*xi, **kwargs)

Return coordinate matrices from coordinate vectors.

mgrid

Construct a multi-dimensional "meshgrid".

ogrid

Construct a multi-dimensional "meshgrid".

Building matrices#

diag(v[, k])

Returns a diagonal or a diagonal array.

diagflat(v[, k])

Creates a diagonal array from the flattened input.

tri(N[, M, k, dtype])

Creates an array with ones at and below the given diagonal.

tril(m[, k])

Returns a lower triangle of an array.

triu(m[, k])

Returns an upper triangle of an array.

vander(x[, N, increasing])

Returns a Vandermonde matrix.

Array manipulation routines#

Basic operations#

copyto(dst, src[, casting, where])

Copies values from one array to another with broadcasting.

shape(a)

Returns the shape of an array

Changing array shape#

reshape(a, newshape[, order])

Returns an array with new shape and same elements.

ravel(a[, order])

Returns a flattened array.

Transpose-like operations#

moveaxis(a, source, destination)

Moves axes of an array to new positions.

rollaxis(a, axis[, start])

Moves the specified axis backwards to the given place.

swapaxes(a, axis1, axis2)

Swaps the two axes.

transpose(a[, axes])

Permutes the dimensions of an array.

See also

cupy.ndarray.T

Changing number of dimensions#

atleast_1d(*arys)

Converts arrays to arrays with dimensions >= 1.

atleast_2d(*arys)

Converts arrays to arrays with dimensions >= 2.

atleast_3d(*arys)

Converts arrays to arrays with dimensions >= 3.

broadcast(*arrays)

Object that performs broadcasting.

broadcast_to(array, shape)

Broadcast an array to a given shape.

broadcast_arrays(*args)

Broadcasts given arrays.

expand_dims(a, axis)

Expands given arrays.

squeeze(a[, axis])

Removes size-one axes from the shape of an array.

Changing kind of array#

asarray(a[, dtype, order])

Converts an object to array.

asanyarray(a[, dtype, order])

Converts an object to array.

asfarray(a[, dtype])

Converts array elements to float type.

asfortranarray(a[, dtype])

Return an array laid out in Fortran order in memory.

ascontiguousarray(a[, dtype])

Returns a C-contiguous array.

asarray_chkfinite(a[, dtype, order])

Converts the given input to an array, and raises an error if the input contains NaNs or Infs.

require(a[, dtype, requirements])

Return an array which satisfies the requirements.

Joining arrays#

concatenate(tup[, axis, out, dtype, casting])

Joins arrays along an axis.

stack(tup[, axis, out, dtype, casting])

Stacks arrays along a new axis.

vstack(tup, *[, dtype, casting])

Stacks arrays vertically.

hstack(tup, *[, dtype, casting])

Stacks arrays horizontally.

dstack(tup)

Stacks arrays along the third axis.

column_stack(tup)

Stacks 1-D and 2-D arrays as columns into a 2-D array.

row_stack(tup, *[, dtype, casting])

Stacks arrays vertically.

Splitting arrays#

split(ary, indices_or_sections[, axis])

Splits an array into multiple sub arrays along a given axis.

array_split(ary, indices_or_sections[, axis])

Splits an array into multiple sub arrays along a given axis.

dsplit(ary, indices_or_sections)

Splits an array into multiple sub arrays along the third axis.

hsplit(ary, indices_or_sections)

Splits an array into multiple sub arrays horizontally.

vsplit(ary, indices_or_sections)

Splits an array into multiple sub arrays along the first axis.

Tiling arrays#

tile(A, reps)

Construct an array by repeating A the number of times given by reps.

repeat(a, repeats[, axis])

Repeat arrays along an axis.

Adding and removing elements#

delete(arr, indices[, axis])

Delete values from an array along the specified axis.

append(arr, values[, axis])

Append values to the end of an array.

resize(a, new_shape)

Return a new array with the specified shape.

unique(ar[, return_index, return_inverse, ...])

Find the unique elements of an array.

trim_zeros(filt[, trim])

Trim the leading and/or trailing zeros from a 1-D array or sequence.

Rearranging elements#

flip(a[, axis])

Reverse the order of elements in an array along the given axis.

fliplr(a)

Flip array in the left/right direction.

flipud(a)

Flip array in the up/down direction.

reshape(a, newshape[, order])

Returns an array with new shape and same elements.

roll(a, shift[, axis])

Roll array elements along a given axis.

rot90(a[, k, axes])

Rotate an array by 90 degrees in the plane specified by axes.

Binary operations#

Elementwise bit operations#

bitwise_and

Computes the bitwise AND of two arrays elementwise.

bitwise_or

Computes the bitwise OR of two arrays elementwise.

bitwise_xor

Computes the bitwise XOR of two arrays elementwise.

invert

Computes the bitwise NOT of an array elementwise.

left_shift

Shifts the bits of each integer element to the left.

right_shift

Shifts the bits of each integer element to the right.

Bit packing#

packbits(a[, axis, bitorder])

Packs the elements of a binary-valued array into bits in a uint8 array.

unpackbits(a[, axis, bitorder])

Unpacks elements of a uint8 array into a binary-valued output array.

Output formatting#

binary_repr(num[, width])

Return the binary representation of the input number as a string.

Data type routines#

can_cast(from_, to[, casting])

Returns True if cast between data types can occur according to the casting rule.

min_scalar_type(a)

For scalar a, returns the data type with the smallest size and smallest scalar kind which can hold its value.

result_type(*arrays_and_dtypes)

Returns the type that results from applying the NumPy type promotion rules to the arguments.

common_type(*arrays)

Return a scalar type which is common to the input arrays.

promote_types (alias of numpy.promote_types())

obj2sctype (alias of numpy.obj2sctype())

Creating data types#

dtype (alias of numpy.dtype)

format_parser (alias of numpy.format_parser)

Data type information#

finfo (alias of numpy.finfo)

iinfo (alias of numpy.iinfo)

MachAr (alias of numpy.MachAr)

Data type testing#

issctype (alias of numpy.issctype())

issubdtype (alias of numpy.issubdtype())

issubsctype (alias of numpy.issubsctype())

issubclass_ (alias of numpy.issubclass_())

find_common_type (alias of numpy.find_common_type())

Miscellaneous#

typename (alias of numpy.typename())

sctype2char (alias of numpy.sctype2char())

mintypecode (alias of numpy.mintypecode())

Discrete Fourier Transform (cupy.fft)#

Standard FFTs#

fft(a[, n, axis, norm])

Compute the one-dimensional FFT.

ifft(a[, n, axis, norm])

Compute the one-dimensional inverse FFT.

fft2(a[, s, axes, norm])

Compute the two-dimensional FFT.

ifft2(a[, s, axes, norm])

Compute the two-dimensional inverse FFT.

fftn(a[, s, axes, norm])

Compute the N-dimensional FFT.

ifftn(a[, s, axes, norm])

Compute the N-dimensional inverse FFT.

Real FFTs#

rfft(a[, n, axis, norm])

Compute the one-dimensional FFT for real input.

irfft(a[, n, axis, norm])

Compute the one-dimensional inverse FFT for real input.

rfft2(a[, s, axes, norm])

Compute the two-dimensional FFT for real input.

irfft2(a[, s, axes, norm])

Compute the two-dimensional inverse FFT for real input.

rfftn(a[, s, axes, norm])

Compute the N-dimensional FFT for real input.

irfftn(a[, s, axes, norm])

Compute the N-dimensional inverse FFT for real input.

Hermitian FFTs#

hfft(a[, n, axis, norm])

Compute the FFT of a signal that has Hermitian symmetry.

ihfft(a[, n, axis, norm])

Compute the FFT of a signal that has Hermitian symmetry.

Helper routines#

fftfreq(n[, d])

Return the FFT sample frequencies.

rfftfreq(n[, d])

Return the FFT sample frequencies for real input.

fftshift(x[, axes])

Shift the zero-frequency component to the center of the spectrum.

ifftshift(x[, axes])

The inverse of fftshift().

CuPy-specific APIs#

See the description below for details.

config.set_cufft_callbacks(...)

A context manager for setting up load and/or store callbacks.

config.set_cufft_gpus(gpus)

Set the GPUs to be used in multi-GPU FFT.

config.get_plan_cache()

Get the per-thread, per-device plan cache, or create one if not found.

config.show_plan_cache_info()

Show all of the plan caches' info on this thread.

Normalization#

The default normalization (norm is "backward" or None) has the direct transforms unscaled and the inverse transforms scaled by \(1/n\). If the keyword argument norm is "forward", it is the exact opposite of "backward": the direct transforms are scaled by \(1/n\) and the inverse transforms are unscaled. Finally, if the keyword argument norm is "ortho", both transforms are scaled by \(1/\sqrt{n}\).

Code compatibility features#

FFT functions of NumPy always return numpy.ndarray which type is numpy.complex128 or numpy.float64. CuPy functions do not follow the behavior, they will return numpy.complex64 or numpy.float32 if the type of the input is numpy.float16, numpy.float32, or numpy.complex64.

Internally, cupy.fft always generates a cuFFT plan (see the cuFFT documentation for detail) corresponding to the desired transform. When possible, an n-dimensional plan will be used, as opposed to applying separate 1D plans for each axis to be transformed. Using n-dimensional planning can provide better performance for multidimensional transforms, but requires more GPU memory than separable 1D planning. The user can disable n-dimensional planning by setting cupy.fft.config.enable_nd_planning = False. This ability to adjust the planning type is a deviation from the NumPy API, which does not use precomputed FFT plans.

Moreover, the automatic plan generation can be suppressed by using an existing plan returned by cupyx.scipy.fftpack.get_fft_plan() as a context manager. This is again a deviation from NumPy.

Finally, when using the high-level NumPy-like FFT APIs as listed above, internally the cuFFT plans are cached for possible reuse. The plan cache can be retrieved by get_plan_cache(), and its current status can be queried by show_plan_cache_info(). For finer control of the plan cache, see PlanCache.

Multi-GPU FFT#

cupy.fft can use multiple GPUs. To enable (disable) this feature, set cupy.fft.config.use_multi_gpus to True (False). Next, to set the number of GPUs or the participating GPU IDs, use the function cupy.fft.config.set_cufft_gpus(). All of the limitations listed in the cuFFT documentation apply here. In particular, using more than one GPU does not guarantee better performance.

Functional programming#

Note

cupy.vectorize applies JIT compiler to the given Python function. See JIT kernel definition for details.

apply_along_axis(func1d, axis, arr, *args, ...)

Apply a function to 1-D slices along the given axis.

vectorize(pyfunc[, otypes, doc, excluded, ...])

Generalized function class.

piecewise(x, condlist, funclist)

Evaluate a piecewise-defined function.

Indexing routines#

Generating index arrays#

c_

r_

nonzero(a)

Return the indices of the elements that are non-zero.

where(condition[, x, y])

Return elements, either from x or y, depending on condition.

indices(dimensions[, dtype])

Returns an array representing the indices of a grid.

mask_indices(n, mask_func[, k])

Return the indices to access (n, n) arrays, given a masking function.

tril_indices(n[, k, m])

Returns the indices of the lower triangular matrix.

tril_indices_from(arr[, k])

Returns the indices for the lower-triangle of arr.

triu_indices(n[, k, m])

Returns the indices of the upper triangular matrix.

triu_indices_from(arr[, k])

Returns indices for the upper-triangle of arr.

ix_(*args)

Construct an open mesh from multiple sequences.

ravel_multi_index(multi_index, dims[, mode, ...])

Converts a tuple of index arrays into an array of flat indices, applying boundary modes to the multi-index.

unravel_index(indices, dims[, order])

Converts array of flat indices into a tuple of coordinate arrays.

diag_indices(n[, ndim])

Return the indices to access the main diagonal of an array.

diag_indices_from(arr)

Return the indices to access the main diagonal of an n-dimensional array.

Indexing-like operations#

take(a, indices[, axis, out])

Takes elements of an array at specified indices along an axis.

take_along_axis(a, indices, axis)

Take values from the input array by matching 1d index and data slices.

choose(a, choices[, out, mode])

compress(condition, a[, axis, out])

Returns selected slices of an array along given axis.

diag(v[, k])

Returns a diagonal or a diagonal array.

diagonal(a[, offset, axis1, axis2])

Returns specified diagonals.

select(condlist, choicelist[, default])

Return an array drawn from elements in choicelist, depending on conditions.

lib.stride_tricks.as_strided(x[, shape, strides])

Create a view into the array with the given shape and strides.

Inserting data into arrays#

place(arr, mask, vals)

Change elements of an array based on conditional and input values.

put(a, ind, v[, mode])

Replaces specified elements of an array with given values.

putmask(a, mask, values)

Changes elements of an array inplace, based on a conditional mask and input values.

fill_diagonal(a, val[, wrap])

Fills the main diagonal of the given array of any dimensionality.

Iterating over arrays#

flatiter(a)

Flat iterator object to iterate over arrays.

Input and output#

NumPy binary files (NPY, NPZ)#

load(file[, mmap_mode, allow_pickle])

Loads arrays or pickled objects from .npy, .npz or pickled file.

save(file, arr[, allow_pickle])

Saves an array to a binary file in .npy format.

savez(file, *args, **kwds)

Saves one or more arrays into a file in uncompressed .npz format.

savez_compressed(file, *args, **kwds)

Saves one or more arrays into a file in compressed .npz format.

Text files#

loadtxt(*args, **kwargs)

Load data from a text file.

savetxt(fname, X, *args, **kwargs)

Save an array to a text file.

genfromtxt(*args, **kwargs)

Load data from text file, with missing values handled as specified.

fromstring(*args, **kwargs)

A new 1-D array initialized from text data in a string.

String formatting#

array2string(a, *args, **kwargs)

Return a string representation of an array.

array_repr(arr[, max_line_width, precision, ...])

Returns the string representation of an array.

array_str(arr[, max_line_width, precision, ...])

Returns the string representation of the content of an array.

format_float_positional(x, *args, **kwargs)

Format a floating-point scalar as a decimal string in positional notation.

format_float_scientific(x, *args, **kwargs)

Format a floating-point scalar as a decimal string in scientific notation.

Base-n representations#

binary_repr(num[, width])

Return the binary representation of the input number as a string.

base_repr(number[, base, padding])

Return a string representation of a number in the given base system.

Linear algebra (cupy.linalg)#

Matrix and vector products#

dot(a, b[, out])

Returns a dot product of two arrays.

vdot(a, b)

Returns the dot product of two vectors.

inner(a, b)

Returns the inner product of two arrays.

outer(a, b[, out])

Returns the outer product of two vectors.

matmul

matmul(x1, x2, /, out=None, **kwargs)

tensordot(a, b[, axes])

Returns the tensor dot product of two arrays along specified axes.

einsum(subscripts, *operands[, dtype, optimize])

Evaluates the Einstein summation convention on the operands.

linalg.matrix_power(M, n)

Raise a square matrix to the (integer) power n.

kron(a, b)

Returns the kronecker product of two arrays.

Decompositions#

linalg.cholesky(a)

Cholesky decomposition.

linalg.qr(a[, mode])

QR decomposition.

linalg.svd(a[, full_matrices, compute_uv])

Singular Value Decomposition.

Matrix eigenvalues#

linalg.eigh(a[, UPLO])

Return the eigenvalues and eigenvectors of a complex Hermitian (conjugate symmetric) or a real symmetric matrix.

linalg.eigvalsh(a[, UPLO])

Compute the eigenvalues of a complex Hermitian or real symmetric matrix.

Norms and other numbers#

linalg.norm(x[, ord, axis, keepdims])

Returns one of matrix norms specified by ord parameter.

linalg.det(a)

Returns the determinant of an array.

linalg.matrix_rank(M[, tol])

Return matrix rank of array using SVD method

linalg.slogdet(a)

Returns sign and logarithm of the determinant of an array.

trace(a[, offset, axis1, axis2, dtype, out])

Returns the sum along the diagonals of an array.

Solving equations and inverting matrices#

linalg.solve(a, b)

Solves a linear matrix equation.

linalg.tensorsolve(a, b[, axes])

Solves tensor equations denoted by ax = b.

linalg.lstsq(a, b[, rcond])

Return the least-squares solution to a linear matrix equation.

linalg.inv(a)

Computes the inverse of a matrix.

linalg.pinv(a[, rcond])

Compute the Moore-Penrose pseudoinverse of a matrix.

linalg.tensorinv(a[, ind])

Computes the inverse of a tensor.

Logic functions#

Truth value testing#

all(a[, axis, out, keepdims])

Tests whether all array elements along a given axis evaluate to True.

any(a[, axis, out, keepdims])

Tests whether any array elements along a given axis evaluate to True.

union1d(arr1, arr2)

Find the union of two arrays.

Array contents#

isfinite

Tests finiteness elementwise.

isinf

Tests if each element is the positive or negative infinity.

isnan

Tests if each element is a NaN.

isneginf(x[, out])

Test element-wise for negative infinity, return result as bool array.

isposinf(x[, out])

Test element-wise for positive infinity, return result as bool array.

Array type testing#

iscomplex(x)

Returns a bool array, where True if input element is complex.

iscomplexobj(x)

Check for a complex type or an array of complex numbers.

isfortran(a)

Returns True if the array is Fortran contiguous but not C contiguous.

isreal(x)

Returns a bool array, where True if input element is real.

isrealobj(x)

Return True if x is a not complex type or an array of complex numbers.

isscalar(element)

Returns True if the type of num is a scalar type.

Logic operations#

logical_and

Computes the logical AND of two arrays.

logical_or

Computes the logical OR of two arrays.

logical_not

Computes the logical NOT of an array.

logical_xor

Computes the logical XOR of two arrays.

Comparison#

allclose(a, b[, rtol, atol, equal_nan])

Returns True if two arrays are element-wise equal within a tolerance.

isclose(a, b[, rtol, atol, equal_nan])

Returns a boolean array where two arrays are equal within a tolerance.

array_equal(a1, a2[, equal_nan])

Returns True if two arrays are element-wise exactly equal.

array_equiv(a1, a2)

Returns True if all elements are equal or shape consistent, i.e., one input array can be broadcasted to create the same shape as the other.

greater

Tests elementwise if x1 > x2.

greater_equal

Tests elementwise if x1 >= x2.

less

Tests elementwise if x1 < x2.

less_equal

Tests elementwise if x1 <= x2.

equal

Tests elementwise if x1 == x2.

not_equal

Tests elementwise if x1 != x2.

Mathematical functions#

Trigonometric functions#

sin

Elementwise sine function.

cos

Elementwise cosine function.

tan

Elementwise tangent function.

arcsin

Elementwise inverse-sine function (a.k.a.

arccos

Elementwise inverse-cosine function (a.k.a.

arctan

Elementwise inverse-tangent function (a.k.a.

hypot

Computes the hypoteneous of orthogonal vectors of given length.

arctan2

Elementwise inverse-tangent of the ratio of two arrays.

degrees

Converts angles from radians to degrees elementwise.

radians

Converts angles from degrees to radians elementwise.

unwrap(p[, discont, axis, period])

Unwrap by taking the complement of large deltas w.r.t.

deg2rad

Converts angles from degrees to radians elementwise.

rad2deg

Converts angles from radians to degrees elementwise.

Hyperbolic functions#

sinh

Elementwise hyperbolic sine function.

cosh

Elementwise hyperbolic cosine function.

tanh

Elementwise hyperbolic tangent function.

arcsinh

Elementwise inverse of hyperbolic sine function.

arccosh

Elementwise inverse of hyperbolic cosine function.

arctanh

Elementwise inverse of hyperbolic tangent function.

Rounding#

around(a[, decimals, out])

Rounds to the given number of decimals.

round_(a[, decimals, out])

rint

Rounds each element of an array to the nearest integer.

fix

If given value x is positive, it return floor(x).

floor

Rounds each element of an array to its floor integer.

ceil

Rounds each element of an array to its ceiling integer.

trunc

Rounds each element of an array towards zero.

Sums, products, differences#

prod(a[, axis, dtype, out, keepdims])

Returns the product of an array along given axes.

sum(a[, axis, dtype, out, keepdims])

Returns the sum of an array along given axes.

nanprod(a[, axis, dtype, out, keepdims])

Returns the product of an array along given axes treating Not a Numbers (NaNs) as zero.

nansum(a[, axis, dtype, out, keepdims])

Returns the sum of an array along given axes treating Not a Numbers (NaNs) as zero.

cumprod(a[, axis, dtype, out])

Returns the cumulative product of an array along a given axis.

cumsum(a[, axis, dtype, out])

Returns the cumulative sum of an array along a given axis.

nancumprod(a[, axis, dtype, out])

Returns the cumulative product of an array along a given axis treating Not a Numbers (NaNs) as one.

nancumsum(a[, axis, dtype, out])

Returns the cumulative sum of an array along a given axis treating Not a Numbers (NaNs) as zero.

diff(a[, n, axis, prepend, append])

Calculate the n-th discrete difference along the given axis.

gradient(f, *varargs[, axis, edge_order])

Return the gradient of an N-dimensional array.

ediff1d(arr[, to_end, to_begin])

Calculates the difference between consecutive elements of an array.

cross(a, b[, axisa, axisb, axisc, axis])

Returns the cross product of two vectors.

trapz(y[, x, dx, axis])

Integrate along the given axis using the composite trapezoidal rule.

Exponents and logarithms#

exp

Elementwise exponential function.

expm1

Computes exp(x) - 1 elementwise.

exp2

Elementwise exponentiation with base 2.

log

Elementwise natural logarithm function.

log10

Elementwise common logarithm function.

log2

Elementwise binary logarithm function.

log1p

Computes log(1 + x) elementwise.

logaddexp

Computes log(exp(x1) + exp(x2)) elementwise.

logaddexp2

Computes log2(exp2(x1) + exp2(x2)) elementwise.

Other special functions#

i0

Modified Bessel function of the first kind, order 0.

sinc

Elementwise sinc function.

Floating point routines#

signbit

Tests elementwise if the sign bit is set (i.e.

copysign

Returns the first argument with the sign bit of the second elementwise.

frexp

Decomposes each element to mantissa and two's exponent.

ldexp

Computes x1 * 2 ** x2 elementwise.

nextafter

Computes the nearest neighbor float values towards the second argument.

Rational routines#

lcm

Computes lcm of x1 and x2 elementwise.

gcd

Computes gcd of x1 and x2 elementwise.

Arithmetic operations#

add

Adds two arrays elementwise.

reciprocal

Computes 1 / x elementwise.

positive

Takes numerical positive elementwise.

negative

Takes numerical negative elementwise.

multiply

Multiplies two arrays elementwise.

divide

Elementwise true division (i.e.

power

Computes x1 ** x2 elementwise.

subtract

Subtracts arguments elementwise.

true_divide

Elementwise true division (i.e.

floor_divide

Elementwise floor division (i.e.

float_power

First array elements raised to powers from second array, element-wise.

fmod

Computes the remainder of C division elementwise.

mod

Computes the remainder of Python division elementwise.

modf

Extracts the fractional and integral parts of an array elementwise.

remainder

Computes the remainder of Python division elementwise.

divmod

Handling complex numbers#

angle(z[, deg])

Returns the angle of the complex argument.

real(val)

Returns the real part of the elements of the array.

imag(val)

Returns the imaginary part of the elements of the array.

conj

Returns the complex conjugate, element-wise.

conjugate

Returns the complex conjugate, element-wise.

Miscellaneous#

convolve(a, v[, mode])

Returns the discrete, linear convolution of two one-dimensional sequences.

clip(a, a_min, a_max[, out])

Clips the values of an array to a given interval.

sqrt

Elementwise square root function.

cbrt

Elementwise cube root function.

square

Elementwise square function.

absolute

Elementwise absolute value function.

fabs

Calculates absolute values element-wise.

sign

Elementwise sign function.

maximum

Takes the maximum of two arrays elementwise.

minimum

Takes the minimum of two arrays elementwise.

fmax

Takes the maximum of two arrays elementwise.

fmin

Takes the minimum of two arrays elementwise.

nan_to_num(x[, copy, nan, posinf, neginf])

Replace NaN with zero and infinity with large finite numbers (default behaviour) or with the numbers defined by the user using the nan, posinf and/or neginf keywords.

heaviside

Compute the Heaviside step function.

real_if_close(a[, tol])

If input is complex with all imaginary parts close to zero, return real parts.

interp(x, xp, fp[, left, right, period])

One-dimensional linear interpolation.

Miscellaneous routines#

Memory ranges#

byte_bounds(a)

Returns pointers to the end-points of an array.

shares_memory(a, b[, max_work])

may_share_memory(a, b[, max_work])

Utility#

show_config(*[, _full])

Prints the current runtime configuration to standard output.

Matlab-like Functions#

who([vardict])

Print the CuPy arrays in the given dictionary.

Padding arrays#

pad(array, pad_width[, mode])

Pads an array with specified widths and values.

Polynomials#

Power Series (cupy.polynomial.polynomial)#
Misc Functions#

polyvander(x, deg)

Computes the Vandermonde matrix of given degree.

polycompanion(c)

Computes the companion matrix of c.

Polyutils#
Functions#

as_series(alist[, trim])

Returns argument as a list of 1-d arrays.

trimseq(seq)

Removes small polynomial series coefficients.

trimcoef(c[, tol])

Removes small trailing coefficients from a polynomial.

Poly1d#
Basics#

poly1d(c_or_r[, r, variable])

A one-dimensional polynomial class.

cupy.poly(seq_of_zeros)

Computes the coefficients of a polynomial with the given roots sequence.

polyval(p, x)

Evaluates a polynomial at specific values.

roots(p)

Computes the roots of a polynomial with given coefficients.

Fitting#

polyfit(x, y, deg[, rcond, full, w, cov])

Returns the least squares fit of polynomial of degree deg to the data y sampled at x.

Arithmetic#

polyadd(a1, a2)

Computes the sum of two polynomials.

polysub(a1, a2)

Computes the difference of two polynomials.

polymul(a1, a2)

Computes the product of two polynomials.

Random sampling (cupy.random)#

Differences between cupy.random and numpy.random:

  • Most functions under cupy.random support the dtype option, which do not exist in the corresponding NumPy APIs. This option enables generation of float32 values directly without any space overhead.

  • cupy.random.default_rng() uses XORWOW bit generator by default.

  • Random states cannot be serialized. See the description below for details.

  • CuPy does not guarantee that the same number generator is used across major versions. This means that numbers generated by cupy.random by new major version may not be the same as the previous one, even if the same seed and distribution are used.

New Random Generator API#
Random Generator#

default_rng([seed])

Construct a new Generator with the default BitGenerator (XORWOW).

Generator(bit_generator)

Container for the BitGenerators.

Bit Generators#

BitGenerator([seed])

Generic BitGenerator.

CuPy provides the following bit generator implementations:

XORWOW([seed, size])

BitGenerator that uses cuRAND XORWOW device generator.

MRG32k3a([seed, size])

BitGenerator that uses cuRAND MRG32k3a device generator.

Philox4x3210([seed, size])

BitGenerator that uses cuRAND Philox4x3210 device generator.

Legacy Random Generation#

RandomState([seed, method])

Portable container of a pseudo-random number generator.

Functions in cupy.random#

beta(a, b[, size, dtype])

Beta distribution.

binomial(n, p[, size, dtype])

Binomial distribution.

bytes(length)

Returns random bytes.

chisquare(df[, size, dtype])

Chi-square distribution.

choice(a[, size, replace, p])

Returns an array of random values from a given 1-D array.

dirichlet(alpha[, size, dtype])

Dirichlet distribution.

exponential(scale[, size, dtype])

Exponential distribution.

f(dfnum, dfden[, size, dtype])

F distribution.

gamma(shape[, scale, size, dtype])

Gamma distribution.

geometric(p[, size, dtype])

Geometric distribution.

gumbel([loc, scale, size, dtype])

Returns an array of samples drawn from a Gumbel distribution.

hypergeometric(ngood, nbad, nsample[, size, ...])

hypergeometric distribution.

laplace([loc, scale, size, dtype])

Laplace distribution.

logistic([loc, scale, size, dtype])

Logistic distribution.

lognormal([mean, sigma, size, dtype])

Returns an array of samples drawn from a log normal distribution.

logseries(p[, size, dtype])

Log series distribution.

multinomial(n, pvals[, size])

Returns an array from multinomial distribution.

multivariate_normal(mean, cov[, size, ...])

Multivariate normal distribution.

negative_binomial(n, p[, size, dtype])

Negative binomial distribution.

noncentral_chisquare(df, nonc[, size, dtype])

Noncentral chisquare distribution.

noncentral_f(dfnum, dfden, nonc[, size, dtype])

Noncentral F distribution.

normal([loc, scale, size, dtype])

Returns an array of normally distributed samples.

pareto(a[, size, dtype])

Pareto II or Lomax distribution.

permutation(a)

Returns a permuted range or a permutation of an array.

poisson([lam, size, dtype])

Poisson distribution.

power(a[, size, dtype])

Power distribution.

rand(*size, **kwarg)

Returns an array of uniform random values over the interval [0, 1).

randint(low[, high, size, dtype])

Returns a scalar or an array of integer values over [low, high).

randn(*size, **kwarg)

Returns an array of standard normal random values.

random([size, dtype])

Returns an array of random values over the interval [0, 1).

random_integers(low[, high, size])

Return a scalar or an array of integer values over [low, high]

random_sample([size, dtype])

Returns an array of random values over the interval [0, 1).

ranf([size, dtype])

Returns an array of random values over the interval [0, 1).

rayleigh([scale, size, dtype])

Rayleigh distribution.

sample([size, dtype])

Returns an array of random values over the interval [0, 1).

seed([seed])

Resets the state of the random number generator with a seed.

shuffle(a)

Shuffles an array.

standard_cauchy([size, dtype])

Standard cauchy distribution.

standard_exponential([size, dtype])

Standard exponential distribution.

standard_gamma(shape[, size, dtype])

Standard gamma distribution.

standard_normal([size, dtype])

Returns an array of samples drawn from the standard normal distribution.

standard_t(df[, size, dtype])

Standard Student's t distribution.

triangular(left, mode, right[, size, dtype])

Triangular distribution.

uniform([low, high, size, dtype])

Returns an array of uniformly-distributed samples over an interval.

vonmises(mu, kappa[, size, dtype])

von Mises distribution.

wald(mean, scale[, size, dtype])

Wald distribution.

weibull(a[, size, dtype])

weibull distribution.

zipf(a[, size, dtype])

Zipf distribution.

CuPy does not provide cupy.random.get_state nor cupy.random.set_state at this time. Use the following CuPy-specific APIs instead. Note that these functions use cupy.random.RandomState instance to represent the internal state, which cannot be serialized.

get_random_state()

Gets the state of the random number generator for the current device.

set_random_state(rs)

Sets the state of the random number generator for the current device.

Set routines#

Making proper sets#

unique(ar[, return_index, return_inverse, ...])

Find the unique elements of an array.

Boolean operations#

in1d(ar1, ar2[, assume_unique, invert])

Tests whether each element of a 1-D array is also present in a second array.

intersect1d(arr1, arr2[, assume_unique, ...])

Find the intersection of two arrays.

isin(element, test_elements[, ...])

Calculates element in test_elements, broadcasting over element only.

setdiff1d(ar1, ar2[, assume_unique])

Find the set difference of two arrays.

setxor1d(ar1, ar2[, assume_unique])

Find the set exclusive-or of two arrays.

Sorting, searching, and counting#

Sorting#

sort(a[, axis, kind])

Returns a sorted copy of an array with a stable sorting algorithm.

lexsort(keys)

Perform an indirect sort using an array of keys.

argsort(a[, axis, kind])

Returns the indices that would sort an array with a stable sorting.

msort(a)

Returns a copy of an array sorted along the first axis.

sort_complex(a)

Sort a complex array using the real part first, then the imaginary part.

partition(a, kth[, axis])

Returns a partitioned copy of an array.

argpartition(a, kth[, axis])

Returns the indices that would partially sort an array.

Searching#

argmax(a[, axis, dtype, out, keepdims])

Returns the indices of the maximum along an axis.

nanargmax(a[, axis, dtype, out, keepdims])

Return the indices of the maximum values in the specified axis ignoring NaNs.

argmin(a[, axis, dtype, out, keepdims])

Returns the indices of the minimum along an axis.

nanargmin(a[, axis, dtype, out, keepdims])

Return the indices of the minimum values in the specified axis ignoring NaNs.

argwhere(a)

Return the indices of the elements that are non-zero.

nonzero(a)

Return the indices of the elements that are non-zero.

flatnonzero(a)

Return indices that are non-zero in the flattened version of a.

where(condition[, x, y])

Return elements, either from x or y, depending on condition.

searchsorted(a, v[, side, sorter])

Finds indices where elements should be inserted to maintain order.

extract(condition, a)

Return the elements of an array that satisfy some condition.

Counting#

count_nonzero(a[, axis])

Counts the number of non-zero values in the array.

Statistics#

Order statistics#

amin(a[, axis, out, keepdims])

Returns the minimum of an array or the minimum along an axis.

amax(a[, axis, out, keepdims])

Returns the maximum of an array or the maximum along an axis.

nanmin(a[, axis, out, keepdims])

Returns the minimum of an array along an axis ignoring NaN.

nanmax(a[, axis, out, keepdims])

Returns the maximum of an array along an axis ignoring NaN.

ptp(a[, axis, out, keepdims])

Returns the range of values (maximum - minimum) along an axis.

percentile(a, q[, axis, out, ...])

Computes the q-th percentile of the data along the specified axis.

quantile(a, q[, axis, out, overwrite_input, ...])

Computes the q-th quantile of the data along the specified axis.

Averages and variances#

median(a[, axis, out, overwrite_input, keepdims])

Compute the median along the specified axis.

average(a[, axis, weights, returned, keepdims])

Returns the weighted average along an axis.

mean(a[, axis, dtype, out, keepdims])

Returns the arithmetic mean along an axis.

std(a[, axis, dtype, out, ddof, keepdims])

Returns the standard deviation along an axis.

var(a[, axis, dtype, out, ddof, keepdims])

Returns the variance along an axis.

nanmedian(a[, axis, out, overwrite_input, ...])

Compute the median along the specified axis, while ignoring NaNs.

nanmean(a[, axis, dtype, out, keepdims])

Returns the arithmetic mean along an axis ignoring NaN values.

nanstd(a[, axis, dtype, out, ddof, keepdims])

Returns the standard deviation along an axis ignoring NaN values.

nanvar(a[, axis, dtype, out, ddof, keepdims])

Returns the variance along an axis ignoring NaN values.

Correlations#

corrcoef(a[, y, rowvar, bias, ddof, dtype])

Returns the Pearson product-moment correlation coefficients of an array.

correlate(a, v[, mode])

Returns the cross-correlation of two 1-dimensional sequences.

cov(a[, y, rowvar, bias, ddof, fweights, ...])

Returns the covariance matrix of an array.

Histograms#

histogram(x[, bins, range, weights, density])

Computes the histogram of a set of data.

histogram2d(x, y[, bins, range, weights, ...])

Compute the bi-dimensional histogram of two data samples.

histogramdd(sample[, bins, range, weights, ...])

Compute the multidimensional histogram of some data.

bincount(x[, weights, minlength])

Count number of occurrences of each value in array of non-negative ints.

digitize(x, bins[, right])

Finds the indices of the bins to which each value in input array belongs.

Test support (cupy.testing)#

Asserts#

Hint

These APIs can accept both numpy.ndarray and cupy.ndarray.

assert_array_almost_equal(x, y[, decimal, ...])

Raises an AssertionError if objects are not equal up to desired precision.

assert_allclose(actual, desired[, rtol, ...])

Raises an AssertionError if objects are not equal up to desired tolerance.

assert_array_almost_equal_nulp(x, y[, nulp])

Compare two arrays relatively to their spacing.

assert_array_max_ulp(a, b[, maxulp, dtype])

Check that all items of arrays differ in at most N Units in the Last Place.

assert_array_equal(x, y[, err_msg, verbose, ...])

Raises an AssertionError if two array_like objects are not equal.

assert_array_less(x, y[, err_msg, verbose])

Raises an AssertionError if array_like objects are not ordered by less than.

CuPy-specific APIs#
Asserts#

assert_array_list_equal(xlist, ylist[, ...])

Compares lists of arrays pairwise with assert_array_equal.

NumPy-CuPy Consistency Check#

The following decorators are for testing consistency between CuPy’s functions and corresponding NumPy’s ones.

numpy_cupy_allclose([rtol, atol, err_msg, ...])

Decorator that checks NumPy results and CuPy ones are close.

numpy_cupy_array_almost_equal([decimal, ...])

Decorator that checks NumPy results and CuPy ones are almost equal.

numpy_cupy_array_almost_equal_nulp([nulp, ...])

Decorator that checks results of NumPy and CuPy are equal w.r.t.

numpy_cupy_array_max_ulp([maxulp, dtype, ...])

Decorator that checks results of NumPy and CuPy ones are equal w.r.t.

numpy_cupy_array_equal([err_msg, verbose, ...])

Decorator that checks NumPy results and CuPy ones are equal.

numpy_cupy_array_list_equal([err_msg, ...])

Decorator that checks the resulting lists of NumPy and CuPy's one are equal.

numpy_cupy_array_less([err_msg, verbose, ...])

Decorator that checks the CuPy result is less than NumPy result.

Parameterized dtype Test#

The following decorators offer the standard way for parameterized test with respect to single or the combination of dtype(s).

for_dtypes(dtypes[, name])

Decorator for parameterized dtype test.

for_all_dtypes([name, no_float16, no_bool, ...])

Decorator that checks the fixture with all dtypes.

for_float_dtypes([name, no_float16])

Decorator that checks the fixture with float dtypes.

for_signed_dtypes([name])

Decorator that checks the fixture with signed dtypes.

for_unsigned_dtypes([name])

Decorator that checks the fixture with unsinged dtypes.

for_int_dtypes([name, no_bool])

Decorator that checks the fixture with integer and optionally bool dtypes.

for_complex_dtypes([name])

Decorator that checks the fixture with complex dtypes.

for_dtypes_combination(types[, names, full])

Decorator that checks the fixture with a product set of dtypes.

for_all_dtypes_combination([names, ...])

Decorator that checks the fixture with a product set of all dtypes.

for_signed_dtypes_combination([names, full])

Decorator for parameterized test w.r.t.

for_unsigned_dtypes_combination([names, full])

Decorator for parameterized test w.r.t.

for_int_dtypes_combination([names, no_bool, ...])

Decorator for parameterized test w.r.t.

Parameterized order Test#

The following decorators offer the standard way to parameterize tests with orders.

for_orders(orders[, name])

Decorator to parameterize tests with order.

for_CF_orders([name])

Decorator that checks the fixture with orders 'C' and 'F'.

Window functions#

Various windows#

bartlett(M)

Returns the Bartlett window.

blackman(M)

Returns the Blackman window.

hamming(M)

Returns the Hamming window.

hanning(M)

Returns the Hanning window.

kaiser(M, beta)

Return the Kaiser window.

Routines (SciPy)#

The following pages describe SciPy-compatible routines. These functions cover a subset of SciPy routines.

Discrete Fourier transforms (cupyx.scipy.fft)#

Fast Fourier Transforms (FFTs)#

fft(x[, n, axis, norm, overwrite_x, plan])

Compute the one-dimensional FFT.

ifft(x[, n, axis, norm, overwrite_x, plan])

Compute the one-dimensional inverse FFT.

fft2(x[, s, axes, norm, overwrite_x, plan])

Compute the two-dimensional FFT.

ifft2(x[, s, axes, norm, overwrite_x, plan])

Compute the two-dimensional inverse FFT.

fftn(x[, s, axes, norm, overwrite_x, plan])

Compute the N-dimensional FFT.

ifftn(x[, s, axes, norm, overwrite_x, plan])

Compute the N-dimensional inverse FFT.

rfft(x[, n, axis, norm, overwrite_x, plan])

Compute the one-dimensional FFT for real input.

irfft(x[, n, axis, norm, overwrite_x, plan])

Compute the one-dimensional inverse FFT for real input.

rfft2(x[, s, axes, norm, overwrite_x, plan])

Compute the two-dimensional FFT for real input.

irfft2(x[, s, axes, norm, overwrite_x, plan])

Compute the two-dimensional inverse FFT for real input.

rfftn(x[, s, axes, norm, overwrite_x, plan])

Compute the N-dimensional FFT for real input.

irfftn(x[, s, axes, norm, overwrite_x, plan])

Compute the N-dimensional inverse FFT for real input.

hfft(x[, n, axis, norm, overwrite_x, plan])

Compute the FFT of a signal that has Hermitian symmetry.

ihfft(x[, n, axis, norm, overwrite_x, plan])

Compute the FFT of a signal that has Hermitian symmetry.

hfft2(x[, s, axes, norm, overwrite_x, plan])

Compute the FFT of a two-dimensional signal that has Hermitian symmetry.

ihfft2(x[, s, axes, norm, overwrite_x, plan])

Compute the Inverse FFT of a two-dimensional signal that has Hermitian symmetry.

hfftn(x[, s, axes, norm, overwrite_x, plan])

Compute the FFT of a N-dimensional signal that has Hermitian symmetry.

ihfftn(x[, s, axes, norm, overwrite_x, plan])

Compute the Inverse FFT of a N-dimensional signal that has Hermitian symmetry.

Discrete Cosine and Sine Transforms (DST and DCT)#

dct(x[, type, n, axis, norm, overwrite_x])

Return the Discrete Cosine Transform of an array, x.

idct(x[, type, n, axis, norm, overwrite_x])

Return the Inverse Discrete Cosine Transform of an array, x.

dctn(x[, type, s, axes, norm, overwrite_x])

Compute a multidimensional Discrete Cosine Transform.

idctn(x[, type, s, axes, norm, overwrite_x])

Compute a multidimensional Discrete Cosine Transform.

dst(x[, type, n, axis, norm, overwrite_x])

Return the Discrete Sine Transform of an array, x.

idst(x[, type, n, axis, norm, overwrite_x])

Return the Inverse Discrete Sine Transform of an array, x.

dstn(x[, type, s, axes, norm, overwrite_x])

Compute a multidimensional Discrete Sine Transform.

idstn(x[, type, s, axes, norm, overwrite_x])

Compute a multidimensional Discrete Sine Transform.

Fast Hankel Transforms#

fht(a, dln, mu[, offset, bias])

Compute the fast Hankel transform.

ifht(A, dln, mu[, offset, bias])

Compute the inverse fast Hankel transform.

Helper functions#

fftshift(x[, axes])

Shift the zero-frequency component to the center of the spectrum.

ifftshift(x[, axes])

The inverse of fftshift().

fftfreq(n[, d])

Return the FFT sample frequencies.

rfftfreq(n[, d])

Return the FFT sample frequencies for real input.

next_fast_len(target[, real])

Find the next fast size to fft.

Code compatibility features#
  1. As with other FFT modules in CuPy, FFT functions in this module can take advantage of an existing cuFFT plan (returned by get_fft_plan()) to accelerate the computation. The plan can be either passed in explicitly via the keyword-only plan argument or used as a context manager. One exception to this are the DCT and DST transforms, which do not currently support a plan argument.

  2. The boolean switch cupy.fft.config.enable_nd_planning also affects the FFT functions in this module, see Discrete Fourier Transform (cupy.fft). This switch is neglected when planning manually using get_fft_plan().

  3. Like in scipy.fft, all FFT functions in this module have an optional argument overwrite_x (default is False), which has the same semantics as in scipy.fft: when it is set to True, the input array x can (not will) be overwritten arbitrarily. For this reason, when an in-place FFT is desired, the user should always reassign the input in the following manner: x = cupyx.scipy.fftpack.fft(x, ..., overwrite_x=True, ...).

  4. The cupyx.scipy.fft module can also be used as a backend for scipy.fft e.g. by installing with scipy.fft.set_backend(cupyx.scipy.fft). This can allow scipy.fft to work with both numpy and cupy arrays. For more information, see SciPy FFT backend.

  5. The boolean switch cupy.fft.config.use_multi_gpus also affects the FFT functions in this module, see Discrete Fourier Transform (cupy.fft). Moreover, this switch is honored when planning manually using get_fft_plan().

  6. Both type II and III DCT and DST transforms are implemented. Type I and IV transforms are currently unavailable.

Legacy discrete fourier transforms (cupyx.scipy.fftpack)#

Note

As of SciPy version 1.4.0, scipy.fft is recommended over scipy.fftpack. Consider using cupyx.scipy.fft instead.

Fast Fourier Transforms (FFTs)#

fft(x[, n, axis, overwrite_x, plan])

Compute the one-dimensional FFT.

ifft(x[, n, axis, overwrite_x, plan])

Compute the one-dimensional inverse FFT.

fft2(x[, shape, axes, overwrite_x, plan])

Compute the two-dimensional FFT.

ifft2(x[, shape, axes, overwrite_x, plan])

Compute the two-dimensional inverse FFT.

fftn(x[, shape, axes, overwrite_x, plan])

Compute the N-dimensional FFT.

ifftn(x[, shape, axes, overwrite_x, plan])

Compute the N-dimensional inverse FFT.

rfft(x[, n, axis, overwrite_x, plan])

Compute the one-dimensional FFT for real input.

irfft(x[, n, axis, overwrite_x])

Compute the one-dimensional inverse FFT for real input.

get_fft_plan(a[, shape, axes, value_type])

Generate a CUDA FFT plan for transforming up to three axes.

Code compatibility features#
  1. As with other FFT modules in CuPy, FFT functions in this module can take advantage of an existing cuFFT plan (returned by get_fft_plan()) to accelarate the computation. The plan can be either passed in explicitly via the plan argument or used as a context manager. The argument plan is currently experimental and the interface may be changed in the future version. The get_fft_plan() function has no counterpart in scipy.fftpack.

  2. The boolean switch cupy.fft.config.enable_nd_planning also affects the FFT functions in this module, see Discrete Fourier Transform (cupy.fft). This switch is neglected when planning manually using get_fft_plan().

  3. Like in scipy.fftpack, all FFT functions in this module have an optional argument overwrite_x (default is False), which has the same semantics as in scipy.fftpack: when it is set to True, the input array x can (not will) be overwritten arbitrarily. For this reason, when an in-place FFT is desired, the user should always reassign the input in the following manner: x = cupyx.scipy.fftpack.fft(x, ..., overwrite_x=True, ...).

  4. The boolean switch cupy.fft.config.use_multi_gpus also affects the FFT functions in this module, see Discrete Fourier Transform (cupy.fft). Moreover, this switch is honored when planning manually using get_fft_plan().

Interpolation (cupyx.scipy.interpolate)#

Univariate interpolation#

BarycentricInterpolator(xi[, yi, axis])

The interpolating polynomial for a set of points.

KroghInterpolator(xi, yi[, axis])

Interpolating polynomial for a set of points.

barycentric_interpolate(xi, yi, x[, axis])

Convenience function for polynomial interpolation.

krogh_interpolate(xi, yi, x[, der, axis])

Convenience function for polynomial interpolation

pchip_interpolate(xi, yi, x[, der, axis])

Convenience function for pchip interpolation.

CubicHermiteSpline(x, y, dydx[, axis, ...])

Piecewise-cubic interpolator matching values and first derivatives.

PchipInterpolator(x, y[, axis, extrapolate])

PCHIP 1-D monotonic cubic interpolation.

Akima1DInterpolator(x, y[, axis])

Akima interpolator

PPoly(c, x[, extrapolate, axis])

Piecewise polynomial in terms of coefficients and breakpoints The polynomial between x[i] and x[i + 1] is written in the local power basis.

BPoly(c, x[, extrapolate, axis])

Piecewise polynomial in terms of coefficients and breakpoints.

1-D Splines#

BSpline(t, c, k[, extrapolate, axis])

Univariate spline in the B-spline basis.

make_interp_spline(x, y[, k, t, bc_type, ...])

Compute the (coefficients of) interpolating B-spline.

splder(tck[, n])

Compute the spline representation of the derivative of a given spline

splantider(tck[, n])

Compute the spline for the antiderivative (integral) of a given spline.

Multivariate interpolation#

Unstructured data:

RBFInterpolator(y, d[, neighbors, ...])

Radial basis function (RBF) interpolation in N dimensions.

For data on a grid:

interpn(points, values, xi[, method, ...])

Multidimensional interpolation on regular or rectilinear grids.

RegularGridInterpolator(points, values[, ...])

Interpolation on a regular or rectilinear grid in arbitrary dimensions.

Tensor product polynomials:

NdPPoly(c, x[, extrapolate])

Piecewise tensor product polynomial

Linear algebra (cupyx.scipy.linalg)#

Basics#

solve_triangular(a, b[, trans, lower, ...])

Solve the equation a x = b for x, assuming a is a triangular matrix.

tril(m[, k])

Make a copy of a matrix with elements above the k-th diagonal zeroed.

triu(m[, k])

Make a copy of a matrix with elements below the k-th diagonal zeroed.

Decompositions#

lu(a[, permute_l, overwrite_a, check_finite])

LU decomposition.

lu_factor(a[, overwrite_a, check_finite])

LU decomposition.

lu_solve(lu_and_piv, b[, trans, ...])

Solve an equation system, a * x = b, given the LU factorization of a

Special Matrices#

block_diag(*arrs)

Create a block diagonal matrix from provided arrays.

circulant(c)

Construct a circulant matrix.

companion(a)

Create a companion matrix.

convolution_matrix(a, n[, mode])

Construct a convolution matrix.

dft(n[, scale])

Discrete Fourier transform matrix.

fiedler(a)

Returns a symmetric Fiedler matrix

fiedler_companion(a)

Returns a Fiedler companion matrix

hadamard(n[, dtype])

Construct an Hadamard matrix.

hankel(c[, r])

Construct a Hankel matrix.

helmert(n[, full])

Create an Helmert matrix of order n.

hilbert(n)

Create a Hilbert matrix of order n.

kron(a, b)

Kronecker product.

leslie(f, s)

Create a Leslie matrix.

toeplitz(c[, r])

Construct a Toeplitz matrix.

tri(N[, M, k, dtype])

Construct (N, M) matrix filled with ones at and below the k-th diagonal.

Multidimensional image processing (cupyx.scipy.ndimage)#

Filters#

convolve(input, weights[, output, mode, ...])

Multi-dimensional convolution.

convolve1d(input, weights[, axis, output, ...])

One-dimensional convolution.

correlate(input, weights[, output, mode, ...])

Multi-dimensional correlate.

correlate1d(input, weights[, axis, output, ...])

One-dimensional correlate.

gaussian_filter(input, sigma[, order, ...])

Multi-dimensional Gaussian filter.

gaussian_filter1d(input, sigma[, axis, ...])

One-dimensional Gaussian filter along the given axis.

gaussian_gradient_magnitude(input, sigma[, ...])

Multi-dimensional gradient magnitude using Gaussian derivatives.

gaussian_laplace(input, sigma[, output, ...])

Multi-dimensional Laplace filter using Gaussian second derivatives.

generic_filter(input, function[, size, ...])

Compute a multi-dimensional filter using the provided raw kernel or reduction kernel.

generic_filter1d(input, function, filter_size)

Compute a 1D filter along the given axis using the provided raw kernel.

generic_gradient_magnitude(input, derivative)

Multi-dimensional gradient magnitude filter using a provided derivative function.

generic_laplace(input, derivative2[, ...])

Multi-dimensional Laplace filter using a provided second derivative function.

laplace(input[, output, mode, cval])

Multi-dimensional Laplace filter based on approximate second derivatives.

maximum_filter(input[, size, footprint, ...])

Multi-dimensional maximum filter.

maximum_filter1d(input, size[, axis, ...])

Compute the maximum filter along a single axis.

median_filter(input[, size, footprint, ...])

Multi-dimensional median filter.

minimum_filter(input[, size, footprint, ...])

Multi-dimensional minimum filter.

minimum_filter1d(input, size[, axis, ...])

Compute the minimum filter along a single axis.

percentile_filter(input, percentile[, size, ...])

Multi-dimensional percentile filter.

prewitt(input[, axis, output, mode, cval])

Compute a Prewitt filter along the given axis.

rank_filter(input, rank[, size, footprint, ...])

Multi-dimensional rank filter.

sobel(input[, axis, output, mode, cval])

Compute a Sobel filter along the given axis.

uniform_filter(input[, size, output, mode, ...])

Multi-dimensional uniform filter.

uniform_filter1d(input, size[, axis, ...])

One-dimensional uniform filter along the given axis.

Fourier filters#

fourier_ellipsoid(input, size[, n, axis, output])

Multidimensional ellipsoid Fourier filter.

fourier_gaussian(input, sigma[, n, axis, output])

Multidimensional Gaussian shift filter.

fourier_shift(input, shift[, n, axis, output])

Multidimensional Fourier shift filter.

fourier_uniform(input, size[, n, axis, output])

Multidimensional uniform shift filter.

Interpolation#

affine_transform(input, matrix[, offset, ...])

Apply an affine transformation.

map_coordinates(input, coordinates[, ...])

Map the input array to new coordinates by interpolation.

rotate(input, angle[, axes, reshape, ...])

Rotate an array.

shift(input, shift[, output, order, mode, ...])

Shift an array.

spline_filter(input[, order, output, mode])

Multidimensional spline filter.

spline_filter1d(input[, order, axis, ...])

Calculate a 1-D spline filter along the given axis.

zoom(input, zoom[, output, order, mode, ...])

Zoom an array.

Measurements#

center_of_mass(input[, labels, index])

Calculate the center of mass of the values of an array at labels.

extrema(input[, labels, index])

Calculate the minimums and maximums of the values of an array at labels, along with their positions.

histogram(input, min, max, bins[, labels, index])

Calculate the histogram of the values of an array, optionally at labels.

label(input[, structure, output])

Labels features in an array.

labeled_comprehension(input, labels, index, ...)

Array resulting from applying func to each labeled region.

maximum(input[, labels, index])

Calculate the maximum of the values of an array over labeled regions.

maximum_position(input[, labels, index])

Find the positions of the maximums of the values of an array at labels.

mean(input[, labels, index])

Calculates the mean of the values of an n-D image array, optionally

median(input[, labels, index])

Calculate the median of the values of an array over labeled regions.

minimum(input[, labels, index])

Calculate the minimum of the values of an array over labeled regions.

minimum_position(input[, labels, index])

Find the positions of the minimums of the values of an array at labels.

standard_deviation(input[, labels, index])

Calculates the standard deviation of the values of an n-D image array, optionally at specified sub-regions.

sum_labels(input[, labels, index])

Calculates the sum of the values of an n-D image array, optionally

variance(input[, labels, index])

Calculates the variance of the values of an n-D image array, optionally at specified sub-regions.

Morphology#

binary_closing(input[, structure, ...])

Multidimensional binary closing with the given structuring element.

binary_dilation(input[, structure, ...])

Multidimensional binary dilation with the given structuring element.

binary_erosion(input[, structure, ...])

Multidimensional binary erosion with a given structuring element.

binary_fill_holes(input[, structure, ...])

Fill the holes in binary objects.

binary_hit_or_miss(input[, structure1, ...])

Multidimensional binary hit-or-miss transform.

binary_opening(input[, structure, ...])

Multidimensional binary opening with the given structuring element.

binary_propagation(input[, structure, mask, ...])

Multidimensional binary propagation with the given structuring element.

black_tophat(input[, size, footprint, ...])

Multidimensional black tophat filter.

generate_binary_structure(rank, connectivity)

Generate a binary structure for binary morphological operations.

grey_closing(input[, size, footprint, ...])

Calculates a multi-dimensional greyscale closing.

grey_dilation(input[, size, footprint, ...])

Calculates a greyscale dilation.

grey_erosion(input[, size, footprint, ...])

Calculates a greyscale erosion.

grey_opening(input[, size, footprint, ...])

Calculates a multi-dimensional greyscale opening.

iterate_structure(structure, iterations[, ...])

Iterate a structure by dilating it with itself.

morphological_gradient(input[, size, ...])

Multidimensional morphological gradient.

morphological_laplace(input[, size, ...])

Multidimensional morphological laplace.

white_tophat(input[, size, footprint, ...])

Multidimensional white tophat filter.

OpenCV mode#

cupyx.scipy.ndimage supports additional mode, opencv. If it is given, the function performs like cv2.warpAffine or cv2.resize. Example:

import cupyx.scipy.ndimage
import cupy as cp
import cv2

im = cv2.imread('TODO') # pls fill in your image path

trans_mat = cp.eye(4)
trans_mat[0][0] = trans_mat[1][1] = 0.5

smaller_shape = (im.shape[0] // 2, im.shape[1] // 2, 3)
smaller = cp.zeros(smaller_shape) # preallocate memory for resized image

cupyx.scipy.ndimage.affine_transform(im, trans_mat, output_shape=smaller_shape,
                                     output=smaller, mode='opencv')

cv2.imwrite('smaller.jpg', cp.asnumpy(smaller)) # smaller image saved locally

Signal processing (cupyx.scipy.signal)#

Convolution#

convolve(in1, in2[, mode, method])

Convolve two N-dimensional arrays.

correlate(in1, in2[, mode, method])

Cross-correlate two N-dimensional arrays.

fftconvolve(in1, in2[, mode, axes])

Convolve two N-dimensional arrays using FFT.

oaconvolve(in1, in2[, mode, axes])

Convolve two N-dimensional arrays using the overlap-add method.

convolve2d(in1, in2[, mode, boundary, fillvalue])

Convolve two 2-dimensional arrays.

correlate2d(in1, in2[, mode, boundary, ...])

Cross-correlate two 2-dimensional arrays.

sepfir2d(input, hrow, hcol)

Convolve with a 2-D separable FIR filter.

choose_conv_method(in1, in2[, mode])

Find the fastest convolution/correlation method.

Filtering#

order_filter(a, domain, rank)

Perform an order filter on an N-D array.

medfilt(volume[, kernel_size])

Perform a median filter on an N-dimensional array.

medfilt2d(input[, kernel_size])

Median filter a 2-dimensional array.

wiener(im[, mysize, noise])

Perform a Wiener filter on an N-dimensional array.

symiirorder1(input, c0, z1[, precision])

Implement a smoothing IIR filter with mirror-symmetric boundary conditions using a cascade of first-order sections. The second section uses a reversed sequence. This implements a system with the following transfer function and mirror-symmetric boundary conditions::.

symiirorder2(input, r, omega[, precision])

Implement a smoothing IIR filter with mirror-symmetric boundary conditions using a cascade of second-order sections. The second section uses a reversed sequence. This implements the following transfer function::.

lfilter(b, a, x[, axis, zi])

Filter data along one-dimension with an IIR or FIR filter.

lfiltic(b, a, y[, x])

Construct initial conditions for lfilter given input and output vectors.

lfilter_zi(b, a)

Construct initial conditions for lfilter for step response steady-state.

filtfilt(b, a, x[, axis, padtype, padlen, ...])

Apply a digital filter forward and backward to a signal.

savgol_filter(x, window_length, polyorder[, ...])

Apply a Savitzky-Golay filter to an array.

deconvolve(signal, divisor)

Deconvolves divisor out of signal using inverse filtering.

detrend(data[, axis, type, bp, overwrite_data])

Remove linear trend along axis from data.

Filter design#

bilinear(b, a[, fs])

Return a digital IIR filter from an analog one using a bilinear transform.

bilinear_zpk(z, p, k, fs)

Return a digital IIR filter from an analog one using a bilinear transform.

savgol_coeffs(window_length, polyorder[, ...])

Compute the coefficients for a 1-D Savitzky-Golay FIR filter.

Sparse matrices (cupyx.scipy.sparse)#

CuPy supports sparse matrices using cuSPARSE. These matrices have the same interfaces of SciPy’s sparse matrices.

Conversion to/from SciPy sparse matrices#

cupyx.scipy.sparse.*_matrix and scipy.sparse.*_matrix are not implicitly convertible to each other. That means, SciPy functions cannot take cupyx.scipy.sparse.*_matrix objects as inputs, and vice versa.

  • To convert SciPy sparse matrices to CuPy, pass it to the constructor of each CuPy sparse matrix class.

  • To convert CuPy sparse matrices to SciPy, use get method of each CuPy sparse matrix class.

Note that converting between CuPy and SciPy incurs data transfer between the host (CPU) device and the GPU device, which is costly in terms of performance.

Conversion to/from CuPy ndarrays#
  • To convert CuPy ndarray to CuPy sparse matrices, pass it to the constructor of each CuPy sparse matrix class.

  • To convert CuPy sparse matrices to CuPy ndarray, use toarray of each CuPy sparse matrix instance (e.g., cupyx.scipy.sparse.csr_matrix.toarray()).

Converting between CuPy ndarray and CuPy sparse matrices does not incur data transfer; it is copied inside the GPU device.

Contents#
Sparse matrix classes#

coo_matrix(arg1[, shape, dtype, copy])

COOrdinate format sparse matrix.

csc_matrix(arg1[, shape, dtype, copy])

Compressed Sparse Column matrix.

csr_matrix(arg1[, shape, dtype, copy])

Compressed Sparse Row matrix.

dia_matrix(arg1[, shape, dtype, copy])

Sparse matrix with DIAgonal storage.

spmatrix([maxprint])

Base class of all sparse matrixes.

Functions#

Building sparse matrices:

eye(m[, n, k, dtype, format])

Creates a sparse matrix with ones on diagonal.

identity(n[, dtype, format])

Creates an identity matrix in sparse format.

kron(A, B[, format])

Kronecker product of sparse matrices A and B.

kronsum(A, B[, format])

Kronecker sum of sparse matrices A and B.

diags(diagonals[, offsets, shape, format, dtype])

Construct a sparse matrix from diagonals.

spdiags(data, diags, m, n[, format])

Creates a sparse matrix from diagonals.

tril(A[, k, format])

Returns the lower triangular portion of a matrix in sparse format

triu(A[, k, format])

Returns the upper triangular portion of a matrix in sparse format

bmat(blocks[, format, dtype])

Builds a sparse matrix from sparse sub-blocks

hstack(blocks[, format, dtype])

Stacks sparse matrices horizontally (column wise)

vstack(blocks[, format, dtype])

Stacks sparse matrices vertically (row wise)

rand(m, n[, density, format, dtype, ...])

Generates a random sparse matrix.

random(m, n[, density, format, dtype, ...])

Generates a random sparse matrix.

Sparse matrix tools:

find(A)

Returns the indices and values of the nonzero elements of a matrix

Identifying sparse matrices:

issparse(x)

Checks if a given matrix is a sparse matrix.

isspmatrix(x)

Checks if a given matrix is a sparse matrix.

isspmatrix_csc(x)

Checks if a given matrix is of CSC format.

isspmatrix_csr(x)

Checks if a given matrix is of CSR format.

isspmatrix_coo(x)

Checks if a given matrix is of COO format.

isspmatrix_dia(x)

Checks if a given matrix is of DIA format.

Submodules#

csgraph

linalg

Exceptions#

Sparse linear algebra (cupyx.scipy.sparse.linalg)#

Abstract linear operators#

LinearOperator(shape, matvec[, rmatvec, ...])

Common interface for performing matrix vector products

aslinearoperator(A)

Return A as a LinearOperator.

Matrix norms#

norm(x[, ord, axis])

Norm of a cupy.scipy.spmatrix

Solving linear problems#

Direct methods for linear equation systems:

spsolve(A, b)

Solves a sparse linear system A x = b

spsolve_triangular(A, b[, lower, ...])

Solves a sparse triangular system A x = b.

factorized(A)

Return a function for solving a sparse linear system, with A pre-factorized.

Iterative methods for linear equation systems:

cg(A, b[, x0, tol, maxiter, M, callback, atol])

Uses Conjugate Gradient iteration to solve Ax = b.

gmres(A, b[, x0, tol, restart, maxiter, M, ...])

Uses Generalized Minimal RESidual iteration to solve Ax = b.

cgs(A, b[, x0, tol, maxiter, M, callback, atol])

Use Conjugate Gradient Squared iteration to solve Ax = b.

minres(A, b[, x0, shift, tol, maxiter, M, ...])

Uses MINimum RESidual iteration to solve Ax = b.

Iterative methods for least-squares problems:

lsqr(A, b)

Solves linear system with QR decomposition.

lsmr(A, b[, x0, damp, atol, btol, conlim, ...])

Iterative solver for least-squares problems.

Matrix factorizations#

Eigenvalue problems:

eigsh(a[, k, which, ncv, maxiter, tol, ...])

Find k eigenvalues and eigenvectors of the real symmetric square matrix or complex Hermitian matrix A.

lobpcg(A, X[, B, M, Y, tol, maxiter, ...])

Locally Optimal Block Preconditioned Conjugate Gradient Method (LOBPCG)

Singular values problems:

svds(a[, k, ncv, tol, which, maxiter, ...])

Finds the largest k singular values/vectors for a sparse matrix.

Complete or incomplete LU factorizations:

splu(A[, permc_spec, diag_pivot_thresh, ...])

Computes the LU decomposition of a sparse square matrix.

spilu(A[, drop_tol, fill_factor, drop_rule, ...])

Computes the incomplete LU decomposition of a sparse square matrix.

SuperLU(obj)

Compressed sparse graph routines (cupyx.scipy.sparse.csgraph)#

Note

The csgraph module uses pylibcugraph as a backend. You need to install pylibcugraph package <https://anaconda.org/rapidsai/pylibcugraph> from rapidsai Conda channel to use features listed on this page.

Note

Currently, the csgraph module is not supported on AMD ROCm platforms.

Contents#

connected_components(csgraph[, directed, ...])

Analyzes the connected components of a sparse graph

Spatial algorithms and data structures (cupyx.scipy.spatial)#

Note

The spatial module uses pylibraft as a backend. You need to install pylibraft package <https://anaconda.org/rapidsai/pylibraft> from rapidsai Conda channel to use features listed on this page.

Note

Currently, the spatial module is not supported on AMD ROCm platforms.

Functions#

distance_matrix(x, y[, p])

Compute the distance matrix.

Distance computations (cupyx.scipy.spatial.distance)#

Note

The distance module uses pylibraft as a backend. You need to install pylibraft package <https://anaconda.org/rapidsai/pylibraft> from rapidsai Conda channel to use features listed on this page.

Note

Currently, the distance module is not supported on AMD ROCm platforms.

Distance matrix computations#

Distance matrix computation from a collection of raw observation vectors stored in a rectangular array.

pdist(X[, metric, out])

Compute distance between observations in n-dimensional space.

cdist(XA, XB[, metric, out])

Compute distance between each pair of the two collections of inputs.

distance_matrix(x, y[, p])

Compute the distance matrix.

Distance functions#

Distance functions between two numeric vectors u and v. Computing distances over a large collection of vectors is inefficient for these functions. Use cdist for this purpose.

minkowski(u, v, p)

Compute the Minkowski distance between two 1-D arrays.

canberra(u, v)

Compute the Canberra distance between two 1-D arrays.

chebyshev(u, v)

Compute the Chebyshev distance between two 1-D arrays.

cityblock(u, v)

Compute the City Block (Manhattan) distance between two 1-D arrays.

correlation(u, v)

Compute the correlation distance between two 1-D arrays.

cosine(u, v)

Compute the Cosine distance between two 1-D arrays.

hamming(u, v)

Compute the Hamming distance between two 1-D arrays.

euclidean(u, v)

Compute the Euclidean distance between two 1-D arrays.

jensenshannon(u, v)

Compute the Jensen-Shannon distance between two 1-D arrays.

russellrao(u, v)

Compute the Russell-Rao distance between two 1-D arrays.

sqeuclidean(u, v)

Compute the squared Euclidean distance between two 1-D arrays.

hellinger(u, v)

Compute the Hellinger distance between two 1-D arrays.

kl_divergence(u, v)

Compute the Kullback-Leibler divergence between two 1-D arrays.

Special functions (cupyx.scipy.special)#

Bessel functions#

j0

Bessel function of the first kind of order 0.

j1

Bessel function of the first kind of order 1.

k0

Modified Bessel function of the second kind of order 0.

k0e

Exponentially scaled modified Bessel function K of order 0

k1

Modified Bessel function of the second kind of order 1.

k1e

Exponentially scaled modified Bessel function K of order 1

y0

Bessel function of the second kind of order 0.

y1

Bessel function of the second kind of order 1.

yn

Bessel function of the second kind of order n.

i0

Modified Bessel function of order 0.

i0e

Exponentially scaled modified Bessel function of order 0.

i1

Modified Bessel function of order 1.

i1e

Exponentially scaled modified Bessel function of order 1.

spherical_yn(n, z[, derivative])

Spherical Bessel function of the second kind or its derivative.

Raw statistical functions#

bdtr

Binomial distribution cumulative distribution function.

bdtrc

Binomial distribution survival function.

bdtri

Inverse function to bdtr with respect to p.

btdtr

Cumulative distribution function of the beta distribution.

btdtri

The p-th quantile of the beta distribution.

fdtr

F cumulative distribution function.

fdtrc

F survival function.

fdtri

The p-th quantile of the F-distribution.

gdtr

Gamma distribution cumulative distribution function.

gdtrc

Gamma distribution survival function.

nbdtr

Negative binomial distribution cumulative distribution function.

nbdtrc

Negative binomial distribution survival function.

nbdtri

Inverse function to nbdtr with respect to p.

pdtr

Poisson cumulative distribution function.

pdtrc

Binomial distribution survival function.

pdtri

Inverse function to pdtr with respect to m.

chdtr

Chi-square cumulative distribution function.

chdtrc

Chi square survival function.

chdtri

Inverse to chdtrc with respect to x.

ndtr

Cumulative distribution function of normal distribution.

log_ndtr

Logarithm of Gaussian cumulative distribution function.

ndtri

Inverse of the cumulative distribution function of the standard

logit

Logit function.

expit

Logistic sigmoid function (expit).

log_expit

Logarithm of the logistic sigmoid function.

boxcox

Compute the Box-Cox transformation.

boxcox1p

Compute the Box-Cox transformation op 1 + x.

inv_boxcox

Compute the Box-Cox transformation.

inv_boxcox1p

Compute the Box-Cox transformation op 1 + x.

Information Theory functions#

entr

Elementwise function for computing entropy.

rel_entr

Elementwise function for computing relative entropy.

kl_div

Elementwise function for computing Kullback-Leibler divergence.

huber

Elementwise function for computing the Huber loss.

pseudo_huber

Elementwise function for computing the Pseudo-Huber loss.

Error function and Fresnel integrals#

erf

Error function.

erfc

Complementary error function.

erfcx

Scaled complementary error function.

erfinv

Inverse function of error function.

erfcinv

Inverse function of complementary error function.

Legendre functions#

lpmv

Associated Legendre function of integer order and real degree.

sph_harm

Spherical Harmonic.

Other special functions#

exp1

Exponential integral E1.

expi

Exponential integral Ei.

expn

Generalized exponential integral En.

exprel

Computes (exp(x) - 1) / x.

softmax(x[, axis])

Softmax function.

log_softmax(x[, axis])

Compute logarithm of softmax function

zeta

Hurwitz zeta function.

zetac

Riemann zeta function minus 1.

Convenience functions#

cbrt

Cube root.

exp10

Computes 10**x.

exp2

Computes 2**x.

radian

Degrees, minutes, seconds to radians:

cosdg

Cosine of x with x in degrees.

sindg

Sine of x with x in degrees.

tandg

Tangent of x with x in degrees.

cotdg

Cotangent of x with x in degrees.

log1p

Elementwise function for scipy.special.log1p

expm1

Computes exp(x) - 1.

cosm1

Computes cos(x) - 1.

round(a[, decimals, out])

xlogy

Compute x*log(y) so that the result is 0 if x = 0.

xlog1py

Compute x*log1p(y) so that the result is 0 if x = 0.

logsumexp(a[, axis, b, keepdims, return_sign])

Compute the log of the sum of exponentials of input elements.

sinc

Elementwise sinc function.

Statistical functions (cupyx.scipy.stats)#

Summary statistics#

trim_mean(a, proportiontocut[, axis])

Return mean of array after trimming distribution from both tails.

entropy(pk[, qk, base, axis])

Calculate the entropy of a distribution for given probability values.

Other statistical functionality#

boxcox_llf(lmb, data)

The boxcox log-likelihood function.

zmap(scores, compare[, axis, ddof, nan_policy])

Calculate the relative z-scores.

zscore(a[, axis, ddof, nan_policy])

Compute the z-score.

CuPy-specific functions#

CuPy-specific functions are placed under cupyx namespace.

cupyx.rsqrt

Returns the reciprocal square root.

cupyx.scatter_add(a, slices, value)

Adds given values to specified elements of an array.

cupyx.scatter_max(a, slices, value)

Stores a maximum value of elements specified by indices to an array.

cupyx.scatter_min(a, slices, value)

Stores a minimum value of elements specified by indices to an array.

cupyx.empty_pinned(shape[, dtype, order])

Returns a new, uninitialized NumPy array with the given shape and dtype.

cupyx.empty_like_pinned(a[, dtype, order, ...])

Returns a new, uninitialized NumPy array with the same shape and dtype as those of the given array.

cupyx.zeros_pinned(shape[, dtype, order])

Returns a new, zero-initialized NumPy array with the given shape and dtype.

cupyx.zeros_like_pinned(a[, dtype, order, ...])

Returns a new, zero-initialized NumPy array with the same shape and dtype as those of the given array.

Profiling utilities#

cupyx.profiler.benchmark(func[, args, ...])

Timing utility for measuring time spent by both CPU and GPU.

cupyx.profiler.time_range([message, ...])

Mark function calls with ranges using NVTX/rocTX.

cupyx.profiler.profile()

Enable CUDA profiling during with statement.

DLPack utilities#

Below are helper functions for creating a cupy.ndarray from either a DLPack tensor or any object supporting the DLPack data exchange protocol. For further detail see DLPack.

cupy.from_dlpack(array)

Zero-copy conversion between array objects compliant with the DLPack data exchange protocol.

Automatic Kernel Parameters Optimizations (cupyx.optimizing)#

cupyx.optimizing.optimize(*[, key, path, ...])

Context manager that optimizes kernel launch parameters.

Low-level CUDA support#

Device management#

cupy.cuda.Device([device])

Object that represents a CUDA device.

Memory management#

cupy.get_default_memory_pool()

Returns CuPy default memory pool for GPU memory.

cupy.get_default_pinned_memory_pool()

Returns CuPy default memory pool for pinned memory.

cupy.cuda.Memory(size_t size)

Memory allocation on a CUDA device.

cupy.cuda.MemoryAsync(size_t size, stream)

Asynchronous memory allocation on a CUDA device.

cupy.cuda.ManagedMemory(size_t size)

Managed memory (Unified memory) allocation on a CUDA device.

cupy.cuda.UnownedMemory(intptr_t ptr, ...)

CUDA memory that is not owned by CuPy.

cupy.cuda.PinnedMemory(size[, flags])

Pinned memory allocation on host.

cupy.cuda.MemoryPointer(BaseMemory mem, ...)

Pointer to a point on a device memory.

cupy.cuda.PinnedMemoryPointer(mem, ...)

Pointer of a pinned memory.

cupy.cuda.malloc_managed(size_t size)

Allocate managed memory (unified memory).

cupy.cuda.malloc_async(size_t size)

(Experimental) Allocate memory from Stream Ordered Memory Allocator.

cupy.cuda.alloc(size)

Calls the current allocator.

cupy.cuda.alloc_pinned_memory(size_t size)

Calls the current allocator.

cupy.cuda.get_allocator()

Returns the current allocator for GPU memory.

cupy.cuda.set_allocator([allocator])

Sets the current allocator for GPU memory.

cupy.cuda.using_allocator([allocator])

Sets a thread-local allocator for GPU memory inside

cupy.cuda.set_pinned_memory_allocator([...])

Sets the current allocator for the pinned memory.

cupy.cuda.MemoryPool([allocator])

Memory pool for all GPU devices on the host.

cupy.cuda.MemoryAsyncPool([pool_handles])

(Experimental) CUDA memory pool for all GPU devices on the host.

cupy.cuda.PinnedMemoryPool([allocator])

Memory pool for pinned memory on the host.

cupy.cuda.PythonFunctionAllocator(...)

Allocator with python functions to perform memory allocation.

cupy.cuda.CFunctionAllocator(intptr_t param, ...)

Allocator with C function pointers to allocation routines.

Memory hook#

cupy.cuda.MemoryHook()

Base class of hooks for Memory allocations.

cupy.cuda.memory_hooks.DebugPrintHook([...])

Memory hook that prints debug information.

cupy.cuda.memory_hooks.LineProfileHook([...])

Code line CuPy memory profiler.

Streams and events#

cupy.cuda.Stream([null, non_blocking, ptds])

CUDA stream.

cupy.cuda.ExternalStream(ptr[, device_id])

CUDA stream not managed by CuPy.

cupy.cuda.get_current_stream()

Gets current CUDA stream.

cupy.cuda.Event([block, disable_timing, ...])

CUDA event, a synchronization point of CUDA streams.

cupy.cuda.get_elapsed_time(start_event, ...)

Gets the elapsed time between two events.

Graphs#

cupy.cuda.Graph(*args, **kwargs)

The CUDA graph object.

Texture and surface memory#

cupy.cuda.texture.ChannelFormatDescriptor(...)

A class that holds the channel format description.

cupy.cuda.texture.CUDAarray(...)

Allocate a CUDA array (cudaArray_t) that can be used as texture memory.

cupy.cuda.texture.ResourceDescriptor(...)

A class that holds the resource description.

cupy.cuda.texture.TextureDescriptor([...])

A class that holds the texture description.

cupy.cuda.texture.TextureObject(...)

A class that holds a texture object.

cupy.cuda.texture.SurfaceObject(...)

A class that holds a surface object.

Profiler#

cupy.cuda.profile()

Enable CUDA profiling during with statement.

cupy.cuda.profiler.initialize(...)

Initialize the CUDA profiler.

cupy.cuda.profiler.start()

Enable profiling.

cupy.cuda.profiler.stop()

Disable profiling.

cupy.cuda.nvtx.Mark(message, int id_color=-1)

Marks an instantaneous event (marker) in the application.

cupy.cuda.nvtx.MarkC(message, uint32_t color=0)

Marks an instantaneous event (marker) in the application.

cupy.cuda.nvtx.RangePush(message, ...)

Starts a nested range.

cupy.cuda.nvtx.RangePushC(message, ...)

Starts a nested range.

cupy.cuda.nvtx.RangePop()

Ends a nested range started by a RangePush*() call.

NCCL#

cupy.cuda.nccl.NcclCommunicator(int ndev, ...)

Initialize an NCCL communicator for one device controlled by one process.

cupy.cuda.nccl.get_build_version()

cupy.cuda.nccl.get_version()

Returns the runtime version of NCCL.

cupy.cuda.nccl.get_unique_id()

cupy.cuda.nccl.groupStart()

Start a group of NCCL calls.

cupy.cuda.nccl.groupEnd()

End a group of NCCL calls.

Runtime API#

CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the CUDA Runtime API documentation to use these functions.

cupy.cuda.runtime.driverGetVersion()

cupy.cuda.runtime.runtimeGetVersion()

cupy.cuda.runtime.getDevice()

cupy.cuda.runtime.getDeviceProperties(int device)

cupy.cuda.runtime.deviceGetAttribute(...)

cupy.cuda.runtime.deviceGetByPCIBusId(...)

cupy.cuda.runtime.deviceGetPCIBusId(int device)

cupy.cuda.runtime.deviceGetDefaultMemPool(...)

Get the default mempool on the current device.

cupy.cuda.runtime.deviceGetMemPool(int device)

Get the current mempool on the current device.

cupy.cuda.runtime.deviceSetMemPool(...)

Set the current mempool on the current device to pool.

cupy.cuda.runtime.memPoolCreate(...)

cupy.cuda.runtime.memPoolDestroy(intptr_t pool)

cupy.cuda.runtime.memPoolTrimTo(...)

cupy.cuda.runtime.getDeviceCount()

cupy.cuda.runtime.setDevice(int device)

cupy.cuda.runtime.deviceSynchronize()

cupy.cuda.runtime.deviceCanAccessPeer(...)

cupy.cuda.runtime.deviceEnablePeerAccess(...)

cupy.cuda.runtime.deviceGetLimit(int limit)

cupy.cuda.runtime.deviceSetLimit(int limit, ...)

cupy.cuda.runtime.malloc(size_t size)

cupy.cuda.runtime.mallocManaged(size_t size, ...)

cupy.cuda.runtime.malloc3DArray(...)

cupy.cuda.runtime.mallocArray(...)

cupy.cuda.runtime.mallocAsync(size_t size, ...)

cupy.cuda.runtime.mallocFromPoolAsync(...)

cupy.cuda.runtime.hostAlloc(size_t size, ...)

cupy.cuda.runtime.hostRegister(intptr_t ptr, ...)

cupy.cuda.runtime.hostUnregister(intptr_t ptr)

cupy.cuda.runtime.free(intptr_t ptr)

cupy.cuda.runtime.freeHost(intptr_t ptr)

cupy.cuda.runtime.freeArray(intptr_t ptr)

cupy.cuda.runtime.freeAsync(intptr_t ptr, ...)

cupy.cuda.runtime.memGetInfo()

cupy.cuda.runtime.memcpy(intptr_t dst, ...)

cupy.cuda.runtime.memcpyAsync(intptr_t dst, ...)

cupy.cuda.runtime.memcpyPeer(intptr_t dst, ...)

cupy.cuda.runtime.memcpyPeerAsync(...)

cupy.cuda.runtime.memcpy2D(intptr_t dst, ...)

cupy.cuda.runtime.memcpy2DAsync(...)

cupy.cuda.runtime.memcpy2DFromArray(...)

cupy.cuda.runtime.memcpy2DFromArrayAsync(...)

cupy.cuda.runtime.memcpy2DToArray(...)

cupy.cuda.runtime.memcpy2DToArrayAsync(...)

cupy.cuda.runtime.memcpy3D(...)

cupy.cuda.runtime.memcpy3DAsync(...)

cupy.cuda.runtime.memset(intptr_t ptr, ...)

cupy.cuda.runtime.memsetAsync(intptr_t ptr, ...)

cupy.cuda.runtime.memPrefetchAsync(...)

cupy.cuda.runtime.memAdvise(intptr_t devPtr, ...)

cupy.cuda.runtime.pointerGetAttributes(...)

cupy.cuda.runtime.streamCreate()

cupy.cuda.runtime.streamCreateWithFlags(...)

cupy.cuda.runtime.streamDestroy(intptr_t stream)

cupy.cuda.runtime.streamSynchronize(...)

cupy.cuda.runtime.streamAddCallback(...)

cupy.cuda.runtime.streamQuery(intptr_t stream)

cupy.cuda.runtime.streamWaitEvent(...)

cupy.cuda.runtime.launchHostFunc(...)

cupy.cuda.runtime.eventCreate()

cupy.cuda.runtime.eventCreateWithFlags(...)

cupy.cuda.runtime.eventDestroy(intptr_t event)

cupy.cuda.runtime.eventElapsedTime(...)

cupy.cuda.runtime.eventQuery(intptr_t event)

cupy.cuda.runtime.eventRecord(...)

cupy.cuda.runtime.eventSynchronize(...)

cupy.cuda.runtime.ipcGetMemHandle(...)

cupy.cuda.runtime.ipcOpenMemHandle(...)

cupy.cuda.runtime.ipcCloseMemHandle(...)

cupy.cuda.runtime.ipcGetEventHandle(...)

cupy.cuda.runtime.ipcOpenEventHandle(...)

Custom kernels#

cupy.ElementwiseKernel(in_params, ...[, ...])

User-defined elementwise kernel.

cupy.ReductionKernel(unicode in_params, ...)

User-defined reduction kernel.

cupy.RawKernel(unicode code, unicode name, ...)

User-defined custom kernel.

cupy.RawModule(unicode code=None, *, ...[, ...])

User-defined custom module.

cupy.fuse(*args, **kwargs)

Decorator that fuses a function.

JIT kernel definition#

Supported Python built-in functions include: range, len(), max(), min().

Note

If loop unrolling is needed, use cupyx.jit.range() instead of the built-in range.

cupyx.jit.rawkernel(*[, mode, device])

A decorator compiles a Python function into CUDA kernel.

cupyx.jit.threadIdx

dim3 threadIdx

cupyx.jit.blockDim

dim3 blockDim

cupyx.jit.blockIdx

dim3 blockIdx

cupyx.jit.gridDim

dim3 gridDim

cupyx.jit.grid(ndim)

Compute the thread index in the grid.

cupyx.jit.gridsize(ndim)

Compute the grid size.

cupyx.jit.laneid()

Returns the lane ID of the calling thread, ranging in [0, jit.warpsize).

cupyx.jit.warpsize

Returns the number of threads in a warp.

cupyx.jit.range(*args[, unroll])

Range with loop unrolling support.

cupyx.jit.syncthreads()

Calls __syncthreads().

cupyx.jit.syncwarp(*[, mask])

Calls __syncwarp().

cupyx.jit.shfl_sync(mask, var, val_id, *[, ...])

Calls the __shfl_sync function.

cupyx.jit.shfl_up_sync(mask, var, val_id, *)

Calls the __shfl_up_sync function.

cupyx.jit.shfl_down_sync(mask, var, val_id, *)

Calls the __shfl_down_sync function.

cupyx.jit.shfl_xor_sync(mask, var, val_id, *)

Calls the __shfl_xor_sync function.

cupyx.jit.shared_memory(dtype, size[, alignment])

Allocates shared memory and returns it as a 1-D array.

cupyx.jit.atomic_add(array, index, value[, ...])

Calls the atomicAdd function to operate atomically on array[index].

cupyx.jit.atomic_sub(array, index, value[, ...])

Calls the atomicSub function to operate atomically on array[index].

cupyx.jit.atomic_exch(array, index, value[, ...])

Calls the atomicExch function to operate atomically on array[index].

cupyx.jit.atomic_min(array, index, value[, ...])

Calls the atomicMin function to operate atomically on array[index].

cupyx.jit.atomic_max(array, index, value[, ...])

Calls the atomicMax function to operate atomically on array[index].

cupyx.jit.atomic_inc(array, index, value[, ...])

Calls the atomicInc function to operate atomically on array[index].

cupyx.jit.atomic_dec(array, index, value[, ...])

Calls the atomicDec function to operate atomically on array[index].

cupyx.jit.atomic_cas(array, index, value[, ...])

Calls the atomicCAS function to operate atomically on array[index].

cupyx.jit.atomic_and(array, index, value[, ...])

Calls the atomicAnd function to operate atomically on array[index].

cupyx.jit.atomic_or(array, index, value[, ...])

Calls the atomicOr function to operate atomically on array[index].

cupyx.jit.atomic_xor(array, index, value[, ...])

Calls the atomicXor function to operate atomically on array[index].

cupyx.jit.cg.this_grid()

Returns the current grid group (_GridGroup).

cupyx.jit.cg.this_thread_block()

Returns the current thread block group (_ThreadBlockGroup).

cupyx.jit.cg.sync(group)

Calls cg::sync().

cupyx.jit.cg.memcpy_async(group, dst, ...[, ...])

Calls cg::memcpy_sync().

cupyx.jit.cg.wait(group)

Calls cg::wait().

cupyx.jit.cg.wait_prior(group)

Calls cg::wait_prior<N>().

cupyx.jit._interface._JitRawKernel(func, ...)

JIT CUDA kernel object.

Kernel binary memoization#

cupy.memoize(bool for_each_device=False)

Makes a function memoizing the result for each argument and device.

cupy.clear_memo()

Clears the memoized results for all functions decorated by memoize.

Distributed#

The following pages describe the APIs used to easily perform communication between different processes in CuPy.

init_process_group(n_devices, rank, *[, ...])

Start cupyx.distributed and obtain a communicator.

NCCLBackend(n_devices, rank[, host, port, ...])

Interface that uses NVIDIA's NCCL to perform communications.

Environment variables#

For runtime#

Here are the environment variables that CuPy uses at runtime.

CUDA_PATH#

Path to the directory containing CUDA. The parent of the directory containing nvcc is used as default. When nvcc is not found, /usr/local/cuda is used. See Working with Custom CUDA Installation for details.

CUPY_CACHE_DIR#

Default: ${HOME}/.cupy/kernel_cache

Path to the directory to store kernel cache. See Performance Best Practices for details.

CUPY_CACHE_SAVE_CUDA_SOURCE#

Default: 0

If set to 1, CUDA source file will be saved along with compiled binary in the cache directory for debug purpose. Note: the source file will not be saved if the compiled binary is already stored in the cache.

CUPY_CACHE_IN_MEMORY#

Default: 0

If set to 1, CUPY_CACHE_DIR and CUPY_CACHE_SAVE_CUDA_SOURCE will be ignored, and the cache is in memory. This environment variable allows reducing disk I/O, but is ignoed when nvcc is set to be the compiler backend.

CUPY_DUMP_CUDA_SOURCE_ON_ERROR#

Default: 0

If set to 1, when CUDA kernel compilation fails, CuPy dumps CUDA kernel code to standard error.

CUPY_CUDA_COMPILE_WITH_DEBUG#

Default: 0

If set to 1, CUDA kernel will be compiled with debug information (--device-debug and --generate-line-info).

CUPY_GPU_MEMORY_LIMIT#

Default: 0 (unlimited)

The amount of memory that can be allocated for each device. The value can be specified in absolute bytes or fraction (e.g., "90%") of the total memory of each GPU. See Memory Management for details.

CUPY_SEED#

Set the seed for random number generators.

CUPY_EXPERIMENTAL_SLICE_COPY#

Default: 0

If set to 1, the following syntax is enabled:

cupy_ndarray[:] = numpy_ndarray
CUPY_ACCELERATORS#

Default: "cub" (In ROCm HIP environment, the default value is "". i.e., no accelerators are used.)

A comma-separated string of backend names (cub, cutensor, or cutensornet) which indicates the acceleration backends used in CuPy operations and its priority (in descending order). By default, all accelerators are disabled on HIP and only CUB is enabled on CUDA.

CUPY_TF32#

Default: 0

If set to 1, it allows CUDA libraries to use Tensor Cores TF32 compute for 32-bit floating point compute.

CUPY_CUDA_ARRAY_INTERFACE_SYNC#

Default: 1

This controls CuPy’s behavior as a Consumer. If set to 0, a stream synchronization will not be performed when a device array provided by an external library that implements the CUDA Array Interface is being consumed by CuPy. For more detail, see the Synchronization requirement in the CUDA Array Interface v3 documentation.

CUPY_CUDA_ARRAY_INTERFACE_EXPORT_VERSION#

Default: 3

This controls CuPy’s behavior as a Producer. If set to 2, the CuPy stream on which the data is being operated will not be exported and thus the Consumer (another library) will not perform any stream synchronization. For more detail, see the Synchronization requirement in the CUDA Array Interface v3 documentation.

CUPY_DLPACK_EXPORT_VERSION#

Default: 0.6

This controls CuPy’s DLPack support. Currently, setting a value smaller than 0.6 would disguise managed memory as normal device memory, which enables data exchanges with libraries that have not updated their DLPack support, whereas starting 0.6 CUDA managed memory can be correctly recognized as a valid device type.

NVCC#

Default: nvcc

Define the compiler to use when compiling CUDA source. Note that most CuPy kernels are built with NVRTC; this environment variable is only effective for RawKernel/RawModule with the nvcc backend or when using cub as the accelerator.

CUPY_CUDA_PER_THREAD_DEFAULT_STREAM#

Default: 0

If set to 1, CuPy will use the CUDA per-thread default stream, effectively causing each host thread to automatically execute in its own stream, unless the CUDA default (null) stream or a user-created stream is specified. If set to 0 (default), the CUDA default (null) stream is used, unless the per-thread default stream (ptds) or a user-created stream is specified.

CUPY_COMPILE_WITH_PTX#

Default: 0

By default, CuPy directly compiles kernels into SASS (CUBIN) to support CUDA Enhanced Compatibility If set to 1, CuPy instead compiles kernels into PTX and lets CUDA Driver assemble SASS from PTX. This option is only effective for CUDA 11.1 or later; CuPy always compiles into PTX on earlier CUDA versions. Also, this option only applies when NVRTC is selected as the compilation backend. NVCC backend always compiles into SASS (CUBIN).

CUDA Toolkit Environment Variables

In addition to the environment variables listed above, as in any CUDA programs, all of the CUDA environment variables listed in the CUDA Toolkit Documentation will also be honored.

Note

When CUPY_ACCELERATORS or NVCC environment variables are set, g++-6 or later is required as the runtime host compiler. Please refer to Installing CuPy from Source for the details on how to install g++.

For installation#

These environment variables are used during installation (building CuPy from source).

CUTENSOR_PATH#

Path to the cuTENSOR root directory that contains lib and include directories. (experimental)

CUPY_INSTALL_USE_HIP#

Default: 0

If set to 1, CuPy is built for AMD ROCm Platform (experimental). For building the ROCm support, see Installing Binary Packages for further detail.

CUPY_USE_CUDA_PYTHON#

Default: 0

If set to 1, CuPy is built using CUDA Python.

CUPY_NVCC_GENERATE_CODE#

Build CuPy for a particular CUDA architecture. For example:

CUPY_NVCC_GENERATE_CODE="arch=compute_60,code=sm_60"

For specifying multiple archs, concatenate the arch=... strings with semicolons (;). If current is specified, then it will automatically detect the currently installed GPU architectures in build time. When this is not set, the default is to support all architectures.

CUPY_NUM_BUILD_JOBS#

Default: 4

To enable or disable parallel build, sets the number of processes used to build the extensions in parallel.

CUPY_NUM_NVCC_THREADS#

Default: 2

To enable or disable nvcc parallel compilation, sets the number of threads used to compile files using nvcc.

Additionally, the environment variables CUDA_PATH and NVCC are also respected at build time.

Comparison Table#

Here is a list of NumPy / SciPy APIs and its corresponding CuPy implementations.

- in CuPy column denotes that CuPy implementation is not provided yet. We welcome contributions for these functions.

NumPy / CuPy APIs#

Module-Level#

NumPy

CuPy

numpy.DataSource

cupy.DataSource (alias of numpy.DataSource)

numpy.ScalarType

-

numpy.abs

cupy.abs

numpy.absolute

cupy.absolute

numpy.add

cupy.add

numpy.all

cupy.all

numpy.allclose

cupy.allclose

numpy.alltrue

cupy.alltrue

numpy.amax

cupy.amax

numpy.amin

cupy.amin

numpy.angle

cupy.angle

numpy.any

cupy.any

numpy.append

cupy.append

numpy.apply_along_axis

cupy.apply_along_axis

numpy.apply_over_axes

-

numpy.arange

cupy.arange

numpy.arccos

cupy.arccos

numpy.arccosh

cupy.arccosh

numpy.arcsin

cupy.arcsin

numpy.arcsinh

cupy.arcsinh

numpy.arctan

cupy.arctan

numpy.arctan2

cupy.arctan2

numpy.arctanh

cupy.arctanh

numpy.argmax

cupy.argmax

numpy.argmin

cupy.argmin

numpy.argpartition

cupy.argpartition

numpy.argsort

cupy.argsort

numpy.argwhere

cupy.argwhere

numpy.around

cupy.around

numpy.array

cupy.array

numpy.array2string

cupy.array2string

numpy.array_equal

cupy.array_equal

numpy.array_equiv

cupy.array_equiv

numpy.array_repr

cupy.array_repr

numpy.array_split

cupy.array_split

numpy.array_str

cupy.array_str

numpy.asanyarray

cupy.asanyarray

numpy.asarray

cupy.asarray

numpy.asarray_chkfinite

cupy.asarray_chkfinite

numpy.ascontiguousarray

cupy.ascontiguousarray

numpy.asfarray

cupy.asfarray

numpy.asfortranarray

cupy.asfortranarray

numpy.asmatrix

- 1

numpy.atleast_1d

cupy.atleast_1d

numpy.atleast_2d

cupy.atleast_2d

numpy.atleast_3d

cupy.atleast_3d

numpy.average

cupy.average

numpy.bartlett

cupy.bartlett

numpy.base_repr

cupy.base_repr

numpy.binary_repr

cupy.binary_repr

numpy.bincount

cupy.bincount

numpy.bitwise_and

cupy.bitwise_and

numpy.bitwise_not

cupy.bitwise_not

numpy.bitwise_or

cupy.bitwise_or

numpy.bitwise_xor

cupy.bitwise_xor

numpy.blackman

cupy.blackman

numpy.block

-

numpy.bmat

- 1

numpy.bool_

cupy.bool_ (alias of numpy.bool_)

numpy.broadcast

cupy.broadcast

numpy.broadcast_arrays

cupy.broadcast_arrays

numpy.broadcast_shapes

cupy.broadcast_shapes (alias of numpy.broadcast_shapes)

numpy.broadcast_to

cupy.broadcast_to

numpy.busday_count

- 2

numpy.busday_offset

- 2

numpy.busdaycalendar

- 2

numpy.byte

cupy.byte (alias of numpy.byte)

numpy.byte_bounds

cupy.byte_bounds

numpy.bytes_

- 3

numpy.c_

cupy.c_

numpy.can_cast

cupy.can_cast

numpy.cast

-

numpy.cbrt

cupy.cbrt

numpy.cdouble

cupy.cdouble (alias of numpy.cdouble)

numpy.ceil

cupy.ceil

numpy.cfloat

cupy.cfloat (alias of numpy.cfloat)

numpy.character

- 3

numpy.chararray

- 3

numpy.choose

cupy.choose

numpy.clip

cupy.clip

numpy.clongdouble

-

numpy.clongfloat

-

numpy.column_stack

cupy.column_stack

numpy.common_type

cupy.common_type

numpy.compare_chararrays

- 3

numpy.complex128

cupy.complex128 (alias of numpy.complex128)

numpy.complex256

-

numpy.complex64

cupy.complex64 (alias of numpy.complex64)

numpy.complex_

cupy.complex_ (alias of numpy.complex_)

numpy.complexfloating

cupy.complexfloating (alias of numpy.complexfloating)

numpy.compress

cupy.compress

numpy.concatenate

cupy.concatenate

numpy.conj

cupy.conj

numpy.conjugate

cupy.conjugate

numpy.convolve

cupy.convolve

numpy.copy

cupy.copy

numpy.copysign

cupy.copysign

numpy.copyto

cupy.copyto

numpy.corrcoef

cupy.corrcoef

numpy.correlate

cupy.correlate

numpy.cos

cupy.cos

numpy.cosh

cupy.cosh

numpy.count_nonzero

cupy.count_nonzero

numpy.cov

cupy.cov

numpy.cross

cupy.cross

numpy.csingle

cupy.csingle (alias of numpy.csingle)

numpy.cumprod

cupy.cumprod

numpy.cumproduct

cupy.cumproduct

numpy.cumsum

cupy.cumsum

numpy.datetime64

- 2

numpy.datetime_as_string

- 2

numpy.datetime_data

- 2

numpy.deg2rad

cupy.deg2rad

numpy.degrees

cupy.degrees

numpy.delete

cupy.delete

numpy.deprecate

-

numpy.deprecate_with_doc

-

numpy.diag

cupy.diag

numpy.diag_indices

cupy.diag_indices

numpy.diag_indices_from

cupy.diag_indices_from

numpy.diagflat

cupy.diagflat

numpy.diagonal

cupy.diagonal

numpy.diff

cupy.diff

numpy.digitize

cupy.digitize

numpy.disp

cupy.disp (alias of numpy.disp)

numpy.divide

cupy.divide

numpy.divmod

cupy.divmod

numpy.dot

cupy.dot

numpy.double

cupy.double (alias of numpy.double)

numpy.dsplit

cupy.dsplit

numpy.dstack

cupy.dstack

numpy.dtype

cupy.dtype (alias of numpy.dtype)

numpy.ediff1d

cupy.ediff1d

numpy.einsum

cupy.einsum

numpy.einsum_path

-

numpy.empty

cupy.empty

numpy.empty_like

cupy.empty_like

numpy.equal

cupy.equal

numpy.errstate

- 4

numpy.exp

cupy.exp

numpy.exp2

cupy.exp2

numpy.expand_dims

cupy.expand_dims

numpy.expm1

cupy.expm1

numpy.extract

cupy.extract

numpy.eye

cupy.eye

numpy.fabs

cupy.fabs

numpy.fill_diagonal

cupy.fill_diagonal

numpy.find_common_type

cupy.find_common_type (alias of numpy.find_common_type)

numpy.finfo

cupy.finfo (alias of numpy.finfo)

numpy.fix

cupy.fix

numpy.flatiter

cupy.flatiter

numpy.flatnonzero

cupy.flatnonzero

numpy.flexible

- 3

numpy.flip

cupy.flip

numpy.fliplr

cupy.fliplr

numpy.flipud

cupy.flipud

numpy.float128

-

numpy.float16

cupy.float16 (alias of numpy.float16)

numpy.float32

cupy.float32 (alias of numpy.float32)

numpy.float64

cupy.float64 (alias of numpy.float64)

numpy.float_

cupy.float_ (alias of numpy.float_)

numpy.float_power

cupy.float_power

numpy.floating

cupy.floating (alias of numpy.floating)

numpy.floor

cupy.floor

numpy.floor_divide

cupy.floor_divide

numpy.fmax

cupy.fmax

numpy.fmin

cupy.fmin

numpy.fmod

cupy.fmod

numpy.format_float_positional

cupy.format_float_positional

numpy.format_float_scientific

cupy.format_float_scientific

numpy.format_parser

cupy.format_parser (alias of numpy.format_parser)

numpy.frexp

cupy.frexp

numpy.from_dlpack

cupy.from_dlpack

numpy.frombuffer

cupy.frombuffer

numpy.fromfile

cupy.fromfile

numpy.fromfunction

cupy.fromfunction

numpy.fromiter

cupy.fromiter

numpy.frompyfunc

-

numpy.fromregex

- 5

numpy.fromstring

cupy.fromstring

numpy.full

cupy.full

numpy.full_like

cupy.full_like

numpy.gcd

cupy.gcd

numpy.generic

cupy.generic (alias of numpy.generic)

numpy.genfromtxt

cupy.genfromtxt

numpy.geomspace

-

numpy.get_array_wrap

cupy.get_array_wrap (alias of numpy.get_array_wrap)

numpy.get_include

-

numpy.get_printoptions

cupy.get_printoptions (alias of numpy.get_printoptions)

numpy.getbufsize

-

numpy.geterr

- 4

numpy.geterrcall

- 4

numpy.geterrobj

- 4

numpy.gradient

cupy.gradient

numpy.greater

cupy.greater

numpy.greater_equal

cupy.greater_equal

numpy.half

cupy.half (alias of numpy.half)

numpy.hamming

cupy.hamming

numpy.hanning

cupy.hanning

numpy.heaviside

cupy.heaviside

numpy.histogram

cupy.histogram

numpy.histogram2d

cupy.histogram2d

numpy.histogram_bin_edges

-

numpy.histogramdd

cupy.histogramdd

numpy.hsplit

cupy.hsplit

numpy.hstack

cupy.hstack

numpy.hypot

cupy.hypot

numpy.i0

cupy.i0

numpy.identity

cupy.identity

numpy.iinfo

cupy.iinfo (alias of numpy.iinfo)

numpy.imag

cupy.imag

numpy.in1d

cupy.in1d

numpy.index_exp

cupy.index_exp (alias of numpy.index_exp)

numpy.indices

cupy.indices

numpy.inexact

cupy.inexact (alias of numpy.inexact)

numpy.info

-

numpy.inner

cupy.inner

numpy.insert

-

numpy.int16

cupy.int16 (alias of numpy.int16)

numpy.int32

cupy.int32 (alias of numpy.int32)

numpy.int64

cupy.int64 (alias of numpy.int64)

numpy.int8

cupy.int8 (alias of numpy.int8)

numpy.int_

cupy.int_ (alias of numpy.int_)

numpy.intc

cupy.intc (alias of numpy.intc)

numpy.integer

cupy.integer (alias of numpy.integer)

numpy.interp

cupy.interp

numpy.intersect1d

cupy.intersect1d

numpy.intp

cupy.intp (alias of numpy.intp)

numpy.invert

cupy.invert

numpy.is_busday

- 2

numpy.isclose

cupy.isclose

numpy.iscomplex

cupy.iscomplex

numpy.iscomplexobj

cupy.iscomplexobj

numpy.isfinite

cupy.isfinite

numpy.isfortran

cupy.isfortran

numpy.isin

cupy.isin

numpy.isinf

cupy.isinf

numpy.isnan

cupy.isnan

numpy.isnat

- 2

numpy.isneginf

cupy.isneginf

numpy.isposinf

cupy.isposinf

numpy.isreal

cupy.isreal

numpy.isrealobj

cupy.isrealobj

numpy.isscalar

cupy.isscalar

numpy.issctype

cupy.issctype (alias of numpy.issctype)

numpy.issubclass_

cupy.issubclass_ (alias of numpy.issubclass_)

numpy.issubdtype

cupy.issubdtype (alias of numpy.issubdtype)

numpy.issubsctype

cupy.issubsctype (alias of numpy.issubsctype)

numpy.iterable

cupy.iterable (alias of numpy.iterable)

numpy.ix_

cupy.ix_

numpy.kaiser

cupy.kaiser

numpy.kron

cupy.kron

numpy.lcm

cupy.lcm

numpy.ldexp

cupy.ldexp

numpy.left_shift

cupy.left_shift

numpy.less

cupy.less

numpy.less_equal

cupy.less_equal

numpy.lexsort

cupy.lexsort

numpy.linspace

cupy.linspace

numpy.load

cupy.load

numpy.loadtxt

cupy.loadtxt

numpy.log

cupy.log

numpy.log10

cupy.log10

numpy.log1p

cupy.log1p

numpy.log2

cupy.log2

numpy.logaddexp

cupy.logaddexp

numpy.logaddexp2

cupy.logaddexp2

numpy.logical_and

cupy.logical_and

numpy.logical_not

cupy.logical_not

numpy.logical_or

cupy.logical_or

numpy.logical_xor

cupy.logical_xor

numpy.logspace

cupy.logspace

numpy.longcomplex

-

numpy.longdouble

-

numpy.longfloat

cupy.longfloat (alias of numpy.longfloat)

numpy.longlong

cupy.longlong (alias of numpy.longlong)

numpy.lookfor

-

numpy.mask_indices

cupy.mask_indices

numpy.mat

- 1

numpy.matmul

cupy.matmul

numpy.matrix

- 1

numpy.max

cupy.max

numpy.maximum

cupy.maximum

numpy.maximum_sctype

-

numpy.may_share_memory

cupy.may_share_memory

numpy.mean

cupy.mean

numpy.median

cupy.median

numpy.memmap

-

numpy.meshgrid

cupy.meshgrid

numpy.mgrid

cupy.mgrid

numpy.min

cupy.min

numpy.min_scalar_type

cupy.min_scalar_type

numpy.minimum

cupy.minimum

numpy.mintypecode

cupy.mintypecode (alias of numpy.mintypecode)

numpy.mod

cupy.mod

numpy.modf

cupy.modf

numpy.moveaxis

cupy.moveaxis

numpy.msort

cupy.msort

numpy.multiply

cupy.multiply

numpy.nan_to_num

cupy.nan_to_num

numpy.nanargmax

cupy.nanargmax

numpy.nanargmin

cupy.nanargmin

numpy.nancumprod

cupy.nancumprod

numpy.nancumsum

cupy.nancumsum

numpy.nanmax

cupy.nanmax

numpy.nanmean

cupy.nanmean

numpy.nanmedian

cupy.nanmedian

numpy.nanmin

cupy.nanmin

numpy.nanpercentile

-

numpy.nanprod

cupy.nanprod

numpy.nanquantile

-

numpy.nanstd

cupy.nanstd

numpy.nansum

cupy.nansum

numpy.nanvar

cupy.nanvar

numpy.nbytes

-

numpy.ndarray

cupy.ndarray

numpy.ndenumerate

-

numpy.ndim

cupy.ndim

numpy.ndindex

cupy.ndindex (alias of numpy.ndindex)

numpy.nditer

-

numpy.negative

cupy.negative

numpy.nested_iters

-

numpy.newaxis

cupy.newaxis (alias of numpy.newaxis)

numpy.nextafter

cupy.nextafter

numpy.nonzero

cupy.nonzero

numpy.not_equal

cupy.not_equal

numpy.number

cupy.number (alias of numpy.number)

numpy.obj2sctype

cupy.obj2sctype (alias of numpy.obj2sctype)

numpy.object_

- 3

numpy.ogrid

cupy.ogrid

numpy.ones

cupy.ones

numpy.ones_like

cupy.ones_like

numpy.outer

cupy.outer

numpy.packbits

cupy.packbits

numpy.pad

cupy.pad

numpy.partition

cupy.partition

numpy.percentile

cupy.percentile

numpy.piecewise

cupy.piecewise

numpy.place

cupy.place

numpy.poly

cupy.poly 6

numpy.poly1d

cupy.poly1d

numpy.polyadd

cupy.polyadd

numpy.polyder

- 6

numpy.polydiv

- 6

numpy.polyfit

cupy.polyfit

numpy.polyint

- 6

numpy.polymul

cupy.polymul

numpy.polysub

cupy.polysub

numpy.polyval

cupy.polyval

numpy.positive

cupy.positive

numpy.power

cupy.power

numpy.printoptions

cupy.printoptions (alias of numpy.printoptions)

numpy.prod

cupy.prod

numpy.product

cupy.product

numpy.promote_types

cupy.promote_types (alias of numpy.promote_types)

numpy.ptp

cupy.ptp

numpy.put

cupy.put

numpy.put_along_axis

-

numpy.putmask

cupy.putmask

numpy.quantile

cupy.quantile

numpy.r_

cupy.r_

numpy.rad2deg

cupy.rad2deg

numpy.radians

cupy.radians

numpy.ravel

cupy.ravel

numpy.ravel_multi_index

cupy.ravel_multi_index

numpy.real

cupy.real

numpy.real_if_close

cupy.real_if_close

numpy.recarray

- 5

numpy.recfromcsv

- 5

numpy.recfromtxt

- 5

numpy.reciprocal

cupy.reciprocal

numpy.record

- 5

numpy.remainder

cupy.remainder

numpy.repeat

cupy.repeat

numpy.require

cupy.require

numpy.reshape

cupy.reshape

numpy.resize

cupy.resize

numpy.result_type

cupy.result_type

numpy.right_shift

cupy.right_shift

numpy.rint

cupy.rint

numpy.roll

cupy.roll

numpy.rollaxis

cupy.rollaxis

numpy.roots

cupy.roots

numpy.rot90

cupy.rot90

numpy.round

cupy.round

numpy.round_

cupy.round_

numpy.row_stack

cupy.row_stack

numpy.s_

cupy.s_ (alias of numpy.s_)

numpy.safe_eval

cupy.safe_eval (alias of numpy.safe_eval)

numpy.save

cupy.save

numpy.savetxt

cupy.savetxt

numpy.savez

cupy.savez

numpy.savez_compressed

cupy.savez_compressed

numpy.sctype2char

cupy.sctype2char (alias of numpy.sctype2char)

numpy.sctypeDict

-

numpy.sctypes

-

numpy.searchsorted

cupy.searchsorted

numpy.select

cupy.select

numpy.set_numeric_ops

- 7

numpy.set_printoptions

cupy.set_printoptions (alias of numpy.set_printoptions)

numpy.set_string_function

cupy.set_string_function (alias of numpy.set_string_function)

numpy.setbufsize

-

numpy.setdiff1d

cupy.setdiff1d

numpy.seterr

- 4

numpy.seterrcall

- 4

numpy.seterrobj

- 4

numpy.setxor1d

cupy.setxor1d

numpy.shape

cupy.shape

numpy.shares_memory

cupy.shares_memory

numpy.short

cupy.short (alias of numpy.short)

numpy.show_config

cupy.show_config

numpy.show_runtime

-

numpy.sign

cupy.sign

numpy.signbit

cupy.signbit

numpy.signedinteger

cupy.signedinteger (alias of numpy.signedinteger)

numpy.sin

cupy.sin

numpy.sinc

cupy.sinc

numpy.single

cupy.single (alias of numpy.single)

numpy.singlecomplex

cupy.singlecomplex (alias of numpy.singlecomplex)

numpy.sinh

cupy.sinh

numpy.size

cupy.size

numpy.sometrue

cupy.sometrue

numpy.sort

cupy.sort

numpy.sort_complex

cupy.sort_complex

numpy.source

-

numpy.spacing

-

numpy.split

cupy.split

numpy.sqrt

cupy.sqrt

numpy.square

cupy.square

numpy.squeeze

cupy.squeeze

numpy.stack

cupy.stack

numpy.std

cupy.std

numpy.str_

- 3

numpy.string_

- 3

numpy.subtract

cupy.subtract

numpy.sum

cupy.sum

numpy.swapaxes

cupy.swapaxes

numpy.take

cupy.take

numpy.take_along_axis

cupy.take_along_axis

numpy.tan

cupy.tan

numpy.tanh

cupy.tanh

numpy.tensordot

cupy.tensordot

numpy.tile

cupy.tile

numpy.timedelta64

- 2

numpy.trace

cupy.trace

numpy.transpose

cupy.transpose

numpy.trapz

cupy.trapz

numpy.tri

cupy.tri

numpy.tril

cupy.tril

numpy.tril_indices

cupy.tril_indices

numpy.tril_indices_from

cupy.tril_indices_from

numpy.trim_zeros

cupy.trim_zeros

numpy.triu

cupy.triu

numpy.triu_indices

cupy.triu_indices

numpy.triu_indices_from

cupy.triu_indices_from

numpy.true_divide

cupy.true_divide

numpy.trunc

cupy.trunc

numpy.typecodes

-

numpy.typename

cupy.typename (alias of numpy.typename)

numpy.ubyte

cupy.ubyte (alias of numpy.ubyte)

numpy.ufunc

cupy.ufunc

numpy.uint

cupy.uint (alias of numpy.uint)

numpy.uint16

cupy.uint16 (alias of numpy.uint16)

numpy.uint32

cupy.uint32 (alias of numpy.uint32)

numpy.uint64

cupy.uint64 (alias of numpy.uint64)

numpy.uint8

cupy.uint8 (alias of numpy.uint8)

numpy.uintc

cupy.uintc (alias of numpy.uintc)

numpy.uintp

cupy.uintp (alias of numpy.uintp)

numpy.ulonglong

cupy.ulonglong (alias of numpy.ulonglong)

numpy.unicode_

- 3

numpy.union1d

cupy.union1d

numpy.unique

cupy.unique

numpy.unpackbits

cupy.unpackbits

numpy.unravel_index

cupy.unravel_index

numpy.unsignedinteger

cupy.unsignedinteger (alias of numpy.unsignedinteger)

numpy.unwrap

cupy.unwrap

numpy.ushort

cupy.ushort (alias of numpy.ushort)

numpy.vander

cupy.vander

numpy.var

cupy.var

numpy.vdot

cupy.vdot

numpy.vectorize

cupy.vectorize

numpy.void

- 3

numpy.vsplit

cupy.vsplit

numpy.vstack

cupy.vstack

numpy.where

cupy.where

numpy.who

cupy.who

numpy.zeros

cupy.zeros

numpy.zeros_like

cupy.zeros_like

Multi-Dimensional Array#

NumPy

CuPy

numpy.ndarray.T

cupy.ndarray.T

numpy.ndarray.all

cupy.ndarray.all

numpy.ndarray.any

cupy.ndarray.any

numpy.ndarray.argmax

cupy.ndarray.argmax

numpy.ndarray.argmin

cupy.ndarray.argmin

numpy.ndarray.argpartition

cupy.ndarray.argpartition

numpy.ndarray.argsort

cupy.ndarray.argsort

numpy.ndarray.astype

cupy.ndarray.astype

numpy.ndarray.base

cupy.ndarray.base

numpy.ndarray.byteswap

- 8

numpy.ndarray.choose

cupy.ndarray.choose

numpy.ndarray.clip

cupy.ndarray.clip

numpy.ndarray.compress

cupy.ndarray.compress

numpy.ndarray.conj

cupy.ndarray.conj

numpy.ndarray.conjugate

cupy.ndarray.conjugate

numpy.ndarray.copy

cupy.ndarray.copy

numpy.ndarray.ctypes

-

numpy.ndarray.cumprod

cupy.ndarray.cumprod

numpy.ndarray.cumsum

cupy.ndarray.cumsum

numpy.ndarray.data

cupy.ndarray.data

numpy.ndarray.diagonal

cupy.ndarray.diagonal

numpy.ndarray.dot

cupy.ndarray.dot

numpy.ndarray.dtype

cupy.ndarray.dtype

numpy.ndarray.dump

cupy.ndarray.dump

numpy.ndarray.dumps

cupy.ndarray.dumps

numpy.ndarray.fill

cupy.ndarray.fill

numpy.ndarray.flags

cupy.ndarray.flags

numpy.ndarray.flat

cupy.ndarray.flat

numpy.ndarray.flatten

cupy.ndarray.flatten

numpy.ndarray.getfield

-

numpy.ndarray.imag

cupy.ndarray.imag

numpy.ndarray.item

cupy.ndarray.item

numpy.ndarray.itemset

-

numpy.ndarray.itemsize

cupy.ndarray.itemsize

numpy.ndarray.max

cupy.ndarray.max

numpy.ndarray.mean

cupy.ndarray.mean

numpy.ndarray.min

cupy.ndarray.min

numpy.ndarray.nbytes

cupy.ndarray.nbytes

numpy.ndarray.ndim

cupy.ndarray.ndim

numpy.ndarray.newbyteorder

- 8

numpy.ndarray.nonzero

cupy.ndarray.nonzero

numpy.ndarray.partition

cupy.ndarray.partition

numpy.ndarray.prod

cupy.ndarray.prod

numpy.ndarray.ptp

cupy.ndarray.ptp

numpy.ndarray.put

cupy.ndarray.put

numpy.ndarray.ravel

cupy.ndarray.ravel

numpy.ndarray.real

cupy.ndarray.real

numpy.ndarray.repeat

cupy.ndarray.repeat

numpy.ndarray.reshape

cupy.ndarray.reshape

numpy.ndarray.resize

-

numpy.ndarray.round

cupy.ndarray.round

numpy.ndarray.searchsorted

cupy.ndarray.searchsorted

numpy.ndarray.setfield

-

numpy.ndarray.setflags

-

numpy.ndarray.shape

cupy.ndarray.shape

numpy.ndarray.size

cupy.ndarray.size

numpy.ndarray.sort

cupy.ndarray.sort

numpy.ndarray.squeeze

cupy.ndarray.squeeze

numpy.ndarray.std

cupy.ndarray.std

numpy.ndarray.strides

cupy.ndarray.strides

numpy.ndarray.sum

cupy.ndarray.sum

numpy.ndarray.swapaxes

cupy.ndarray.swapaxes

numpy.ndarray.take

cupy.ndarray.take

numpy.ndarray.tobytes

cupy.ndarray.tobytes

numpy.ndarray.tofile

cupy.ndarray.tofile

numpy.ndarray.tolist

cupy.ndarray.tolist

numpy.ndarray.tostring

- 7

numpy.ndarray.trace

cupy.ndarray.trace

numpy.ndarray.transpose

cupy.ndarray.transpose

numpy.ndarray.var

cupy.ndarray.var

numpy.ndarray.view

cupy.ndarray.view

Linear Algebra#

NumPy

CuPy

numpy.linalg.cholesky

cupy.linalg.cholesky

numpy.linalg.cond

-

numpy.linalg.det

cupy.linalg.det

numpy.linalg.eig

-

numpy.linalg.eigh

cupy.linalg.eigh

numpy.linalg.eigvals

-

numpy.linalg.eigvalsh

cupy.linalg.eigvalsh

numpy.linalg.inv

cupy.linalg.inv

numpy.linalg.lstsq

cupy.linalg.lstsq

numpy.linalg.matrix_power

cupy.linalg.matrix_power

numpy.linalg.matrix_rank

cupy.linalg.matrix_rank

numpy.linalg.multi_dot

-

numpy.linalg.norm

cupy.linalg.norm

numpy.linalg.pinv

cupy.linalg.pinv

numpy.linalg.qr

cupy.linalg.qr

numpy.linalg.slogdet

cupy.linalg.slogdet

numpy.linalg.solve

cupy.linalg.solve

numpy.linalg.svd

cupy.linalg.svd

numpy.linalg.tensorinv

cupy.linalg.tensorinv

numpy.linalg.tensorsolve

cupy.linalg.tensorsolve

Discrete Fourier Transform#

NumPy

CuPy

numpy.fft.fft

cupy.fft.fft

numpy.fft.fft2

cupy.fft.fft2

numpy.fft.fftfreq

cupy.fft.fftfreq

numpy.fft.fftn

cupy.fft.fftn

numpy.fft.fftshift

cupy.fft.fftshift

numpy.fft.hfft

cupy.fft.hfft

numpy.fft.ifft

cupy.fft.ifft

numpy.fft.ifft2

cupy.fft.ifft2

numpy.fft.ifftn

cupy.fft.ifftn

numpy.fft.ifftshift

cupy.fft.ifftshift

numpy.fft.ihfft

cupy.fft.ihfft

numpy.fft.irfft

cupy.fft.irfft

numpy.fft.irfft2

cupy.fft.irfft2

numpy.fft.irfftn

cupy.fft.irfftn

numpy.fft.rfft

cupy.fft.rfft

numpy.fft.rfft2

cupy.fft.rfft2

numpy.fft.rfftfreq

cupy.fft.rfftfreq

numpy.fft.rfftn

cupy.fft.rfftn

Random Sampling#

NumPy

CuPy

numpy.random.BitGenerator

cupy.random.BitGenerator

numpy.random.Generator

cupy.random.Generator

numpy.random.MT19937

-

numpy.random.PCG64

-

numpy.random.PCG64DXSM

-

numpy.random.Philox

-

numpy.random.RandomState

cupy.random.RandomState

numpy.random.SFC64

-

numpy.random.SeedSequence

-

numpy.random.beta

cupy.random.beta

numpy.random.binomial

cupy.random.binomial

numpy.random.bytes

cupy.random.bytes

numpy.random.chisquare

cupy.random.chisquare

numpy.random.choice

cupy.random.choice

numpy.random.default_rng

cupy.random.default_rng

numpy.random.dirichlet

cupy.random.dirichlet

numpy.random.exponential

cupy.random.exponential

numpy.random.f

cupy.random.f

numpy.random.gamma

cupy.random.gamma

numpy.random.geometric

cupy.random.geometric

numpy.random.get_bit_generator

-

numpy.random.get_state

-

numpy.random.gumbel

cupy.random.gumbel

numpy.random.hypergeometric

cupy.random.hypergeometric

numpy.random.laplace

cupy.random.laplace

numpy.random.logistic

cupy.random.logistic

numpy.random.lognormal

cupy.random.lognormal

numpy.random.logseries

cupy.random.logseries

numpy.random.multinomial

cupy.random.multinomial

numpy.random.multivariate_normal

cupy.random.multivariate_normal

numpy.random.negative_binomial

cupy.random.negative_binomial

numpy.random.noncentral_chisquare

cupy.random.noncentral_chisquare

numpy.random.noncentral_f

cupy.random.noncentral_f

numpy.random.normal

cupy.random.normal

numpy.random.pareto

cupy.random.pareto

numpy.random.permutation

cupy.random.permutation

numpy.random.poisson

cupy.random.poisson

numpy.random.power

cupy.random.power

numpy.random.rand

cupy.random.rand

numpy.random.randint

cupy.random.randint

numpy.random.randn

cupy.random.randn

numpy.random.random

cupy.random.random

numpy.random.random_integers

cupy.random.random_integers

numpy.random.random_sample

cupy.random.random_sample

numpy.random.ranf

cupy.random.ranf

numpy.random.rayleigh

cupy.random.rayleigh

numpy.random.sample

cupy.random.sample

numpy.random.seed

cupy.random.seed

numpy.random.set_bit_generator

-

numpy.random.set_state

-

numpy.random.shuffle

cupy.random.shuffle

numpy.random.standard_cauchy

cupy.random.standard_cauchy

numpy.random.standard_exponential

cupy.random.standard_exponential

numpy.random.standard_gamma

cupy.random.standard_gamma

numpy.random.standard_normal

cupy.random.standard_normal

numpy.random.standard_t

cupy.random.standard_t

numpy.random.triangular

cupy.random.triangular

numpy.random.uniform

cupy.random.uniform

numpy.random.vonmises

cupy.random.vonmises

numpy.random.wald

cupy.random.wald

numpy.random.weibull

cupy.random.weibull

numpy.random.zipf

cupy.random.zipf

Polynomials#

NumPy

CuPy

numpy.polynomial.Chebyshev

-

numpy.polynomial.Hermite

-

numpy.polynomial.HermiteE

-

numpy.polynomial.Laguerre

-

numpy.polynomial.Legendre

-

numpy.polynomial.Polynomial

-

numpy.polynomial.set_default_printstyle

-

Power Series#

NumPy

CuPy

numpy.polynomial.polynomial.ABCPolyBase

-

numpy.polynomial.polynomial.Polynomial

-

numpy.polynomial.polynomial.normalize_axis_index

-

numpy.polynomial.polynomial.polyadd

-

numpy.polynomial.polynomial.polycompanion

cupy.polynomial.polynomial.polycompanion

numpy.polynomial.polynomial.polyder

-

numpy.polynomial.polynomial.polydiv

-

numpy.polynomial.polynomial.polydomain

-

numpy.polynomial.polynomial.polyfit

-

numpy.polynomial.polynomial.polyfromroots

-

numpy.polynomial.polynomial.polygrid2d

-

numpy.polynomial.polynomial.polygrid3d

-

numpy.polynomial.polynomial.polyint

-

numpy.polynomial.polynomial.polyline

-

numpy.polynomial.polynomial.polymul

-

numpy.polynomial.polynomial.polymulx

-

numpy.polynomial.polynomial.polyone

-

numpy.polynomial.polynomial.polypow

-

numpy.polynomial.polynomial.polyroots

-

numpy.polynomial.polynomial.polysub

-

numpy.polynomial.polynomial.polytrim

-

numpy.polynomial.polynomial.polyval

-

numpy.polynomial.polynomial.polyval2d

-

numpy.polynomial.polynomial.polyval3d

-

numpy.polynomial.polynomial.polyvalfromroots

-

numpy.polynomial.polynomial.polyvander

cupy.polynomial.polynomial.polyvander

numpy.polynomial.polynomial.polyvander2d

-

numpy.polynomial.polynomial.polyvander3d

-

numpy.polynomial.polynomial.polyx

-

numpy.polynomial.polynomial.polyzero

-

Polyutils#

NumPy

CuPy

numpy.polynomial.polyutils.absolute

-

numpy.polynomial.polyutils.as_series

cupy.polynomial.polyutils.as_series

numpy.polynomial.polyutils.dragon4_positional

-

numpy.polynomial.polyutils.dragon4_scientific

-

numpy.polynomial.polyutils.format_float

-

numpy.polynomial.polyutils.getdomain

-

numpy.polynomial.polyutils.mapdomain

-

numpy.polynomial.polyutils.mapparms

-

numpy.polynomial.polyutils.trimcoef

cupy.polynomial.polyutils.trimcoef

numpy.polynomial.polyutils.trimseq

cupy.polynomial.polyutils.trimseq

SciPy / CuPy APIs#

Discrete Fourier Transform#

SciPy

CuPy

scipy.fft.dct

cupyx.scipy.fft.dct

scipy.fft.dctn

cupyx.scipy.fft.dctn

scipy.fft.dst

cupyx.scipy.fft.dst

scipy.fft.dstn

cupyx.scipy.fft.dstn

scipy.fft.fft

cupyx.scipy.fft.fft

scipy.fft.fft2

cupyx.scipy.fft.fft2

scipy.fft.fftfreq

cupyx.scipy.fft.fftfreq

scipy.fft.fftn

cupyx.scipy.fft.fftn

scipy.fft.fftshift

cupyx.scipy.fft.fftshift

scipy.fft.fht

cupyx.scipy.fft.fht

scipy.fft.fhtoffset

-

scipy.fft.get_workers

-

scipy.fft.hfft

cupyx.scipy.fft.hfft

scipy.fft.hfft2

cupyx.scipy.fft.hfft2

scipy.fft.hfftn

cupyx.scipy.fft.hfftn

scipy.fft.idct

cupyx.scipy.fft.idct

scipy.fft.idctn

cupyx.scipy.fft.idctn

scipy.fft.idst

cupyx.scipy.fft.idst

scipy.fft.idstn

cupyx.scipy.fft.idstn

scipy.fft.ifft

cupyx.scipy.fft.ifft

scipy.fft.ifft2

cupyx.scipy.fft.ifft2

scipy.fft.ifftn

cupyx.scipy.fft.ifftn

scipy.fft.ifftshift

cupyx.scipy.fft.ifftshift

scipy.fft.ifht

cupyx.scipy.fft.ifht

scipy.fft.ihfft

cupyx.scipy.fft.ihfft

scipy.fft.ihfft2

cupyx.scipy.fft.ihfft2

scipy.fft.ihfftn

cupyx.scipy.fft.ihfftn

scipy.fft.irfft

cupyx.scipy.fft.irfft

scipy.fft.irfft2

cupyx.scipy.fft.irfft2

scipy.fft.irfftn

cupyx.scipy.fft.irfftn

scipy.fft.next_fast_len

cupyx.scipy.fft.next_fast_len

scipy.fft.register_backend

-

scipy.fft.rfft

cupyx.scipy.fft.rfft

scipy.fft.rfft2

cupyx.scipy.fft.rfft2

scipy.fft.rfftfreq

cupyx.scipy.fft.rfftfreq

scipy.fft.rfftn

cupyx.scipy.fft.rfftn

scipy.fft.set_backend

-

scipy.fft.set_global_backend

-

scipy.fft.set_workers

-

scipy.fft.skip_backend

-

Legacy Discrete Fourier Transform#

SciPy

CuPy

scipy.fftpack.cc_diff

-

scipy.fftpack.cs_diff

-

scipy.fftpack.dct

-

scipy.fftpack.dctn

-

scipy.fftpack.diff

-

scipy.fftpack.dst

-

scipy.fftpack.dstn

-

scipy.fftpack.fft

cupyx.scipy.fftpack.fft

scipy.fftpack.fft2

cupyx.scipy.fftpack.fft2

scipy.fftpack.fftfreq

-

scipy.fftpack.fftn

cupyx.scipy.fftpack.fftn

scipy.fftpack.fftshift

-

scipy.fftpack.hilbert

-

scipy.fftpack.idct

-

scipy.fftpack.idctn

-

scipy.fftpack.idst

-

scipy.fftpack.idstn

-

scipy.fftpack.ifft

cupyx.scipy.fftpack.ifft

scipy.fftpack.ifft2

cupyx.scipy.fftpack.ifft2

scipy.fftpack.ifftn

cupyx.scipy.fftpack.ifftn

scipy.fftpack.ifftshift

-

scipy.fftpack.ihilbert

-

scipy.fftpack.irfft

cupyx.scipy.fftpack.irfft

scipy.fftpack.itilbert

-

scipy.fftpack.next_fast_len

-

scipy.fftpack.rfft

cupyx.scipy.fftpack.rfft

scipy.fftpack.rfftfreq

-

scipy.fftpack.sc_diff

-

scipy.fftpack.shift

-

scipy.fftpack.ss_diff

-

scipy.fftpack.tilbert

-

Interpolation#

SciPy

CuPy

scipy.interpolate.Akima1DInterpolator

cupyx.scipy.interpolate.Akima1DInterpolator

scipy.interpolate.BPoly

cupyx.scipy.interpolate.BPoly

scipy.interpolate.BSpline

cupyx.scipy.interpolate.BSpline

scipy.interpolate.BarycentricInterpolator

cupyx.scipy.interpolate.BarycentricInterpolator

scipy.interpolate.BivariateSpline

-

scipy.interpolate.CloughTocher2DInterpolator

-

scipy.interpolate.CubicHermiteSpline

cupyx.scipy.interpolate.CubicHermiteSpline

scipy.interpolate.CubicSpline

-

scipy.interpolate.InterpolatedUnivariateSpline

-

scipy.interpolate.KroghInterpolator

cupyx.scipy.interpolate.KroghInterpolator

scipy.interpolate.LSQBivariateSpline

-

scipy.interpolate.LSQSphereBivariateSpline

-

scipy.interpolate.LSQUnivariateSpline

-

scipy.interpolate.LinearNDInterpolator

-

scipy.interpolate.NdPPoly

cupyx.scipy.interpolate.NdPPoly

scipy.interpolate.NearestNDInterpolator

-

scipy.interpolate.PPoly

cupyx.scipy.interpolate.PPoly

scipy.interpolate.PchipInterpolator

cupyx.scipy.interpolate.PchipInterpolator

scipy.interpolate.RBFInterpolator

cupyx.scipy.interpolate.RBFInterpolator

scipy.interpolate.Rbf

-

scipy.interpolate.RectBivariateSpline

-

scipy.interpolate.RectSphereBivariateSpline

-

scipy.interpolate.RegularGridInterpolator

cupyx.scipy.interpolate.RegularGridInterpolator

scipy.interpolate.SmoothBivariateSpline

-

scipy.interpolate.SmoothSphereBivariateSpline

-

scipy.interpolate.UnivariateSpline

-

scipy.interpolate.approximate_taylor_polynomial

-

scipy.interpolate.barycentric_interpolate

cupyx.scipy.interpolate.barycentric_interpolate

scipy.interpolate.bisplev

-

scipy.interpolate.bisplrep

-

scipy.interpolate.griddata

-

scipy.interpolate.insert

-

scipy.interpolate.interp1d

-

scipy.interpolate.interp2d

-

scipy.interpolate.interpn

cupyx.scipy.interpolate.interpn

scipy.interpolate.krogh_interpolate

cupyx.scipy.interpolate.krogh_interpolate

scipy.interpolate.lagrange

-

scipy.interpolate.make_interp_spline

cupyx.scipy.interpolate.make_interp_spline

scipy.interpolate.make_lsq_spline

-

scipy.interpolate.make_smoothing_spline

-

scipy.interpolate.pade

-

scipy.interpolate.pchip

cupyx.scipy.interpolate.pchip

scipy.interpolate.pchip_interpolate

cupyx.scipy.interpolate.pchip_interpolate

scipy.interpolate.spalde

-

scipy.interpolate.splantider

cupyx.scipy.interpolate.splantider

scipy.interpolate.splder

cupyx.scipy.interpolate.splder

scipy.interpolate.splev

-

scipy.interpolate.splint

-

scipy.interpolate.splprep

-

scipy.interpolate.splrep

-

scipy.interpolate.sproot

-

Advanced Linear Algebra#

SciPy

CuPy

scipy.linalg.bandwidth

-

scipy.linalg.block_diag

cupyx.scipy.linalg.block_diag

scipy.linalg.cdf2rdf

-

scipy.linalg.cho_factor

-

scipy.linalg.cho_solve

-

scipy.linalg.cho_solve_banded

-

scipy.linalg.cholesky_banded

-

scipy.linalg.circulant

cupyx.scipy.linalg.circulant

scipy.linalg.clarkson_woodruff_transform

-

scipy.linalg.companion

cupyx.scipy.linalg.companion

scipy.linalg.convolution_matrix

cupyx.scipy.linalg.convolution_matrix

scipy.linalg.coshm

-

scipy.linalg.cosm

-

scipy.linalg.cossin

-

scipy.linalg.dft

cupyx.scipy.linalg.dft

scipy.linalg.diagsvd

-

scipy.linalg.eig_banded

-

scipy.linalg.eigh_tridiagonal

-

scipy.linalg.eigvals_banded

-

scipy.linalg.eigvalsh_tridiagonal

-

scipy.linalg.expm

-

scipy.linalg.expm_cond

-

scipy.linalg.expm_frechet

-

scipy.linalg.fiedler

cupyx.scipy.linalg.fiedler

scipy.linalg.fiedler_companion

cupyx.scipy.linalg.fiedler_companion

scipy.linalg.find_best_blas_type

-

scipy.linalg.fractional_matrix_power

-

scipy.linalg.funm

-

scipy.linalg.get_blas_funcs

-

scipy.linalg.get_lapack_funcs

-

scipy.linalg.hadamard

cupyx.scipy.linalg.hadamard

scipy.linalg.hankel

cupyx.scipy.linalg.hankel

scipy.linalg.helmert

cupyx.scipy.linalg.helmert

scipy.linalg.hessenberg

-

scipy.linalg.hilbert

cupyx.scipy.linalg.hilbert

scipy.linalg.invhilbert

-

scipy.linalg.invpascal

-

scipy.linalg.ishermitian

-

scipy.linalg.issymmetric

-

scipy.linalg.khatri_rao

-

scipy.linalg.kron

cupyx.scipy.linalg.kron

scipy.linalg.ldl

-

scipy.linalg.leslie

cupyx.scipy.linalg.leslie

scipy.linalg.logm

-

scipy.linalg.lu

cupyx.scipy.linalg.lu

scipy.linalg.lu_factor

cupyx.scipy.linalg.lu_factor

scipy.linalg.lu_solve

cupyx.scipy.linalg.lu_solve

scipy.linalg.matmul_toeplitz

-

scipy.linalg.matrix_balance

-

scipy.linalg.null_space

-

scipy.linalg.ordqz

-

scipy.linalg.orth

-

scipy.linalg.orthogonal_procrustes

-

scipy.linalg.pascal

-

scipy.linalg.pinvh

-

scipy.linalg.polar

-

scipy.linalg.qr_delete

-

scipy.linalg.qr_insert

-

scipy.linalg.qr_multiply

-

scipy.linalg.qr_update

-

scipy.linalg.qz

-

scipy.linalg.rq

-

scipy.linalg.rsf2csf

-

scipy.linalg.schur

-

scipy.linalg.signm

-

scipy.linalg.sinhm

-

scipy.linalg.sinm

-

scipy.linalg.solve_banded

-

scipy.linalg.solve_circulant

-

scipy.linalg.solve_continuous_are

-

scipy.linalg.solve_continuous_lyapunov

-

scipy.linalg.solve_discrete_are

-

scipy.linalg.solve_discrete_lyapunov

-

scipy.linalg.solve_lyapunov

-

scipy.linalg.solve_sylvester

-

scipy.linalg.solve_toeplitz

-

scipy.linalg.solve_triangular

cupyx.scipy.linalg.solve_triangular

scipy.linalg.solveh_banded

-

scipy.linalg.sqrtm

-

scipy.linalg.subspace_angles

-

scipy.linalg.svdvals

-

scipy.linalg.tanhm

-

scipy.linalg.tanm

-

scipy.linalg.toeplitz

cupyx.scipy.linalg.toeplitz

scipy.linalg.tri

cupyx.scipy.linalg.tri

scipy.linalg.tril

cupyx.scipy.linalg.tril

scipy.linalg.triu

cupyx.scipy.linalg.triu

Multidimensional Image Processing#

SciPy

CuPy

scipy.ndimage.affine_transform

cupyx.scipy.ndimage.affine_transform

scipy.ndimage.binary_closing

cupyx.scipy.ndimage.binary_closing

scipy.ndimage.binary_dilation

cupyx.scipy.ndimage.binary_dilation

scipy.ndimage.binary_erosion

cupyx.scipy.ndimage.binary_erosion

scipy.ndimage.binary_fill_holes

cupyx.scipy.ndimage.binary_fill_holes

scipy.ndimage.binary_hit_or_miss

cupyx.scipy.ndimage.binary_hit_or_miss

scipy.ndimage.binary_opening

cupyx.scipy.ndimage.binary_opening

scipy.ndimage.binary_propagation

cupyx.scipy.ndimage.binary_propagation

scipy.ndimage.black_tophat

cupyx.scipy.ndimage.black_tophat

scipy.ndimage.center_of_mass

cupyx.scipy.ndimage.center_of_mass

scipy.ndimage.convolve

cupyx.scipy.ndimage.convolve

scipy.ndimage.convolve1d

cupyx.scipy.ndimage.convolve1d

scipy.ndimage.correlate

cupyx.scipy.ndimage.correlate

scipy.ndimage.correlate1d

cupyx.scipy.ndimage.correlate1d

scipy.ndimage.distance_transform_bf

-

scipy.ndimage.distance_transform_cdt

-

scipy.ndimage.distance_transform_edt

-

scipy.ndimage.extrema

cupyx.scipy.ndimage.extrema

scipy.ndimage.find_objects

-

scipy.ndimage.fourier_ellipsoid

cupyx.scipy.ndimage.fourier_ellipsoid

scipy.ndimage.fourier_gaussian

cupyx.scipy.ndimage.fourier_gaussian

scipy.ndimage.fourier_shift

cupyx.scipy.ndimage.fourier_shift

scipy.ndimage.fourier_uniform

cupyx.scipy.ndimage.fourier_uniform

scipy.ndimage.gaussian_filter

cupyx.scipy.ndimage.gaussian_filter

scipy.ndimage.gaussian_filter1d

cupyx.scipy.ndimage.gaussian_filter1d

scipy.ndimage.gaussian_gradient_magnitude

cupyx.scipy.ndimage.gaussian_gradient_magnitude

scipy.ndimage.gaussian_laplace

cupyx.scipy.ndimage.gaussian_laplace

scipy.ndimage.generate_binary_structure

cupyx.scipy.ndimage.generate_binary_structure

scipy.ndimage.generic_filter

cupyx.scipy.ndimage.generic_filter

scipy.ndimage.generic_filter1d

cupyx.scipy.ndimage.generic_filter1d

scipy.ndimage.generic_gradient_magnitude

cupyx.scipy.ndimage.generic_gradient_magnitude

scipy.ndimage.generic_laplace

cupyx.scipy.ndimage.generic_laplace

scipy.ndimage.geometric_transform

-

scipy.ndimage.grey_closing

cupyx.scipy.ndimage.grey_closing

scipy.ndimage.grey_dilation

cupyx.scipy.ndimage.grey_dilation

scipy.ndimage.grey_erosion

cupyx.scipy.ndimage.grey_erosion

scipy.ndimage.grey_opening

cupyx.scipy.ndimage.grey_opening

scipy.ndimage.histogram

cupyx.scipy.ndimage.histogram

scipy.ndimage.iterate_structure

cupyx.scipy.ndimage.iterate_structure

scipy.ndimage.label

cupyx.scipy.ndimage.label

scipy.ndimage.labeled_comprehension

cupyx.scipy.ndimage.labeled_comprehension

scipy.ndimage.laplace

cupyx.scipy.ndimage.laplace

scipy.ndimage.map_coordinates

cupyx.scipy.ndimage.map_coordinates

scipy.ndimage.maximum

cupyx.scipy.ndimage.maximum

scipy.ndimage.maximum_filter

cupyx.scipy.ndimage.maximum_filter

scipy.ndimage.maximum_filter1d

cupyx.scipy.ndimage.maximum_filter1d

scipy.ndimage.maximum_position

cupyx.scipy.ndimage.maximum_position

scipy.ndimage.mean

cupyx.scipy.ndimage.mean

scipy.ndimage.median

cupyx.scipy.ndimage.median

scipy.ndimage.median_filter

cupyx.scipy.ndimage.median_filter

scipy.ndimage.minimum

cupyx.scipy.ndimage.minimum

scipy.ndimage.minimum_filter

cupyx.scipy.ndimage.minimum_filter

scipy.ndimage.minimum_filter1d

cupyx.scipy.ndimage.minimum_filter1d

scipy.ndimage.minimum_position

cupyx.scipy.ndimage.minimum_position

scipy.ndimage.morphological_gradient

cupyx.scipy.ndimage.morphological_gradient

scipy.ndimage.morphological_laplace

cupyx.scipy.ndimage.morphological_laplace

scipy.ndimage.percentile_filter

cupyx.scipy.ndimage.percentile_filter

scipy.ndimage.prewitt

cupyx.scipy.ndimage.prewitt

scipy.ndimage.rank_filter

cupyx.scipy.ndimage.rank_filter

scipy.ndimage.rotate

cupyx.scipy.ndimage.rotate

scipy.ndimage.shift

cupyx.scipy.ndimage.shift

scipy.ndimage.sobel

cupyx.scipy.ndimage.sobel

scipy.ndimage.spline_filter

cupyx.scipy.ndimage.spline_filter

scipy.ndimage.spline_filter1d

cupyx.scipy.ndimage.spline_filter1d

scipy.ndimage.standard_deviation

cupyx.scipy.ndimage.standard_deviation

scipy.ndimage.sum

cupyx.scipy.ndimage.sum

scipy.ndimage.sum_labels

cupyx.scipy.ndimage.sum_labels

scipy.ndimage.uniform_filter

cupyx.scipy.ndimage.uniform_filter

scipy.ndimage.uniform_filter1d

cupyx.scipy.ndimage.uniform_filter1d

scipy.ndimage.value_indices

-

scipy.ndimage.variance

cupyx.scipy.ndimage.variance

scipy.ndimage.watershed_ift

-

scipy.ndimage.white_tophat

cupyx.scipy.ndimage.white_tophat

scipy.ndimage.zoom

cupyx.scipy.ndimage.zoom

Signal processing#

SciPy

CuPy

scipy.signal.CZT

-

scipy.signal.StateSpace

-

scipy.signal.TransferFunction

-

scipy.signal.ZerosPolesGain

-

scipy.signal.ZoomFFT

-

scipy.signal.abcd_normalize

-

scipy.signal.argrelextrema

-

scipy.signal.argrelmax

-

scipy.signal.argrelmin

-

scipy.signal.band_stop_obj

-

scipy.signal.barthann

-

scipy.signal.bartlett

-

scipy.signal.bessel

-

scipy.signal.besselap

-

scipy.signal.bilinear

cupyx.scipy.signal.bilinear

scipy.signal.bilinear_zpk

cupyx.scipy.signal.bilinear_zpk

scipy.signal.blackman

-

scipy.signal.blackmanharris

-

scipy.signal.bode

-

scipy.signal.bohman

-

scipy.signal.boxcar

-

scipy.signal.bspline

-

scipy.signal.buttap

-

scipy.signal.butter

-

scipy.signal.buttord

-

scipy.signal.cascade

-

scipy.signal.cheb1ap

-

scipy.signal.cheb1ord

-

scipy.signal.cheb2ap

-

scipy.signal.cheb2ord

-

scipy.signal.chebwin

-

scipy.signal.cheby1

-

scipy.signal.cheby2

-

scipy.signal.check_COLA

-

scipy.signal.check_NOLA

-

scipy.signal.chirp

-

scipy.signal.choose_conv_method

cupyx.scipy.signal.choose_conv_method

scipy.signal.cmplx_sort

-

scipy.signal.coherence

-

scipy.signal.cont2discrete

-

scipy.signal.convolve

cupyx.scipy.signal.convolve

scipy.signal.convolve2d

cupyx.scipy.signal.convolve2d

scipy.signal.correlate

cupyx.scipy.signal.correlate

scipy.signal.correlate2d

cupyx.scipy.signal.correlate2d

scipy.signal.correlation_lags

-

scipy.signal.cosine

-

scipy.signal.csd

-

scipy.signal.cspline1d

-

scipy.signal.cspline1d_eval

-

scipy.signal.cspline2d

-

scipy.signal.cubic

-

scipy.signal.cwt

-

scipy.signal.czt

-

scipy.signal.czt_points

-

scipy.signal.daub

-

scipy.signal.dbode

-

scipy.signal.decimate

-

scipy.signal.deconvolve

cupyx.scipy.signal.deconvolve

scipy.signal.detrend

cupyx.scipy.signal.detrend

scipy.signal.dfreqresp

-

scipy.signal.dimpulse

-

scipy.signal.dlsim

-

scipy.signal.dlti

-

scipy.signal.dstep

-

scipy.signal.ellip

-

scipy.signal.ellipap

-

scipy.signal.ellipord

-

scipy.signal.exponential

-

scipy.signal.fftconvolve

cupyx.scipy.signal.fftconvolve

scipy.signal.filtfilt

cupyx.scipy.signal.filtfilt

scipy.signal.find_peaks

-

scipy.signal.find_peaks_cwt

-

scipy.signal.findfreqs

-

scipy.signal.firls

-

scipy.signal.firwin

-

scipy.signal.firwin2

-

scipy.signal.flattop

-

scipy.signal.freqresp

-

scipy.signal.freqs

-

scipy.signal.freqs_zpk

-

scipy.signal.freqz

-

scipy.signal.freqz_zpk

-

scipy.signal.gammatone

-

scipy.signal.gauss_spline

-

scipy.signal.gaussian

-

scipy.signal.gausspulse

-

scipy.signal.general_gaussian

-

scipy.signal.get_window

-

scipy.signal.group_delay

-

scipy.signal.hamming

-

scipy.signal.hann

-

scipy.signal.hilbert

-

scipy.signal.hilbert2

-

scipy.signal.iircomb

-

scipy.signal.iirdesign

-

scipy.signal.iirfilter

-

scipy.signal.iirnotch

-

scipy.signal.iirpeak

-

scipy.signal.impulse

-

scipy.signal.impulse2

-

scipy.signal.invres

-

scipy.signal.invresz

-

scipy.signal.istft

-

scipy.signal.kaiser

-

scipy.signal.kaiser_atten

-

scipy.signal.kaiser_beta

-

scipy.signal.kaiserord

-

scipy.signal.lfilter

cupyx.scipy.signal.lfilter

scipy.signal.lfilter_zi

cupyx.scipy.signal.lfilter_zi

scipy.signal.lfiltic

cupyx.scipy.signal.lfiltic

scipy.signal.lombscargle

-

scipy.signal.lp2bp

cupyx.scipy.signal.lp2bp

scipy.signal.lp2bp_zpk

cupyx.scipy.signal.lp2bp_zpk

scipy.signal.lp2bs

cupyx.scipy.signal.lp2bs

scipy.signal.lp2bs_zpk

cupyx.scipy.signal.lp2bs_zpk

scipy.signal.lp2hp

cupyx.scipy.signal.lp2hp

scipy.signal.lp2hp_zpk

cupyx.scipy.signal.lp2hp_zpk

scipy.signal.lp2lp

cupyx.scipy.signal.lp2lp

scipy.signal.lp2lp_zpk

cupyx.scipy.signal.lp2lp_zpk

scipy.signal.lsim

-

scipy.signal.lsim2

-

scipy.signal.lti

-

scipy.signal.max_len_seq

-

scipy.signal.medfilt

cupyx.scipy.signal.medfilt

scipy.signal.medfilt2d

cupyx.scipy.signal.medfilt2d

scipy.signal.minimum_phase

-

scipy.signal.morlet

-

scipy.signal.morlet2

-

scipy.signal.normalize

cupyx.scipy.signal.normalize

scipy.signal.nuttall

-

scipy.signal.oaconvolve

cupyx.scipy.signal.oaconvolve

scipy.signal.order_filter

cupyx.scipy.signal.order_filter

scipy.signal.parzen

-

scipy.signal.peak_prominences

-

scipy.signal.peak_widths

-

scipy.signal.periodogram

-

scipy.signal.place_poles

-

scipy.signal.qmf

-

scipy.signal.qspline1d

-

scipy.signal.qspline1d_eval

-

scipy.signal.qspline2d

-

scipy.signal.quadratic

-

scipy.signal.remez

-

scipy.signal.resample

-

scipy.signal.resample_poly

-

scipy.signal.residue

-

scipy.signal.residuez

-

scipy.signal.ricker

-

scipy.signal.savgol_coeffs

cupyx.scipy.signal.savgol_coeffs

scipy.signal.savgol_filter

cupyx.scipy.signal.savgol_filter

scipy.signal.sawtooth

-

scipy.signal.sepfir2d

cupyx.scipy.signal.sepfir2d

scipy.signal.sos2tf

-

scipy.signal.sos2zpk

-

scipy.signal.sosfilt

cupyx.scipy.signal.sosfilt

scipy.signal.sosfilt_zi

-

scipy.signal.sosfiltfilt

-

scipy.signal.sosfreqz

-

scipy.signal.spectrogram

-

scipy.signal.spline_filter

-

scipy.signal.square

-

scipy.signal.ss2tf

-

scipy.signal.ss2zpk

-

scipy.signal.step

-

scipy.signal.step2

-

scipy.signal.stft

-

scipy.signal.sweep_poly

-

scipy.signal.symiirorder1

cupyx.scipy.signal.symiirorder1

scipy.signal.symiirorder2

cupyx.scipy.signal.symiirorder2

scipy.signal.tf2sos

-

scipy.signal.tf2ss

-

scipy.signal.tf2zpk

-

scipy.signal.triang

-

scipy.signal.tukey

-

scipy.signal.unique_roots

-

scipy.signal.unit_impulse

-

scipy.signal.upfirdn

-

scipy.signal.vectorstrength

-

scipy.signal.welch

-

scipy.signal.wiener

cupyx.scipy.signal.wiener

scipy.signal.zoom_fft

-

scipy.signal.zpk2sos

-

scipy.signal.zpk2ss

-

scipy.signal.zpk2tf

-

Sparse Matrices#

SciPy

CuPy

scipy.sparse.block_diag

-

scipy.sparse.bmat

cupyx.scipy.sparse.bmat

scipy.sparse.bsr_array

-

scipy.sparse.bsr_matrix

-

scipy.sparse.coo_array

-

scipy.sparse.coo_matrix

cupyx.scipy.sparse.coo_matrix

scipy.sparse.csc_array

-

scipy.sparse.csc_matrix

cupyx.scipy.sparse.csc_matrix

scipy.sparse.csr_array

-

scipy.sparse.csr_matrix

cupyx.scipy.sparse.csr_matrix

scipy.sparse.dia_array

-

scipy.sparse.dia_matrix

cupyx.scipy.sparse.dia_matrix

scipy.sparse.diags

cupyx.scipy.sparse.diags

scipy.sparse.dok_array

-

scipy.sparse.dok_matrix

-

scipy.sparse.eye

cupyx.scipy.sparse.eye

scipy.sparse.find

cupyx.scipy.sparse.find

scipy.sparse.hstack

cupyx.scipy.sparse.hstack

scipy.sparse.identity

cupyx.scipy.sparse.identity

scipy.sparse.issparse

cupyx.scipy.sparse.issparse

scipy.sparse.isspmatrix

cupyx.scipy.sparse.isspmatrix

scipy.sparse.isspmatrix_bsr

-

scipy.sparse.isspmatrix_coo

cupyx.scipy.sparse.isspmatrix_coo

scipy.sparse.isspmatrix_csc

cupyx.scipy.sparse.isspmatrix_csc

scipy.sparse.isspmatrix_csr

cupyx.scipy.sparse.isspmatrix_csr

scipy.sparse.isspmatrix_dia

cupyx.scipy.sparse.isspmatrix_dia

scipy.sparse.isspmatrix_dok

-

scipy.sparse.isspmatrix_lil

-

scipy.sparse.kron

cupyx.scipy.sparse.kron

scipy.sparse.kronsum

cupyx.scipy.sparse.kronsum

scipy.sparse.lil_array

-

scipy.sparse.lil_matrix

-

scipy.sparse.load_npz

-

scipy.sparse.rand

cupyx.scipy.sparse.rand

scipy.sparse.random

cupyx.scipy.sparse.random

scipy.sparse.save_npz

-

scipy.sparse.spdiags

cupyx.scipy.sparse.spdiags

scipy.sparse.spmatrix

cupyx.scipy.sparse.spmatrix

scipy.sparse.tril

cupyx.scipy.sparse.tril

scipy.sparse.triu

cupyx.scipy.sparse.triu

scipy.sparse.vstack

cupyx.scipy.sparse.vstack

Sparse Linear Algebra#

SciPy

CuPy

scipy.sparse.linalg.LinearOperator

cupyx.scipy.sparse.linalg.LinearOperator

scipy.sparse.linalg.SuperLU

cupyx.scipy.sparse.linalg.SuperLU

scipy.sparse.linalg.aslinearoperator

cupyx.scipy.sparse.linalg.aslinearoperator

scipy.sparse.linalg.bicg

-

scipy.sparse.linalg.bicgstab

-

scipy.sparse.linalg.cg

cupyx.scipy.sparse.linalg.cg

scipy.sparse.linalg.cgs

cupyx.scipy.sparse.linalg.cgs

scipy.sparse.linalg.eigs

-

scipy.sparse.linalg.eigsh

cupyx.scipy.sparse.linalg.eigsh

scipy.sparse.linalg.expm

-

scipy.sparse.linalg.expm_multiply

-

scipy.sparse.linalg.factorized

cupyx.scipy.sparse.linalg.factorized

scipy.sparse.linalg.gcrotmk

-

scipy.sparse.linalg.gmres

cupyx.scipy.sparse.linalg.gmres

scipy.sparse.linalg.inv

-

scipy.sparse.linalg.lgmres

-

scipy.sparse.linalg.lobpcg

cupyx.scipy.sparse.linalg.lobpcg

scipy.sparse.linalg.lsmr

cupyx.scipy.sparse.linalg.lsmr

scipy.sparse.linalg.lsqr

cupyx.scipy.sparse.linalg.lsqr

scipy.sparse.linalg.minres

cupyx.scipy.sparse.linalg.minres

scipy.sparse.linalg.norm

cupyx.scipy.sparse.linalg.norm

scipy.sparse.linalg.onenormest

-

scipy.sparse.linalg.qmr

-

scipy.sparse.linalg.spilu

cupyx.scipy.sparse.linalg.spilu

scipy.sparse.linalg.splu

cupyx.scipy.sparse.linalg.splu

scipy.sparse.linalg.spsolve

cupyx.scipy.sparse.linalg.spsolve

scipy.sparse.linalg.spsolve_triangular

cupyx.scipy.sparse.linalg.spsolve_triangular

scipy.sparse.linalg.svds

cupyx.scipy.sparse.linalg.svds

scipy.sparse.linalg.tfqmr

-

scipy.sparse.linalg.use_solver

-

Compressed sparse graph routines#

SciPy

CuPy

scipy.sparse.csgraph.bellman_ford

-

scipy.sparse.csgraph.breadth_first_order

-

scipy.sparse.csgraph.breadth_first_tree

-

scipy.sparse.csgraph.connected_components

cupyx.scipy.sparse.csgraph.connected_components

scipy.sparse.csgraph.construct_dist_matrix

-

scipy.sparse.csgraph.csgraph_from_dense

-

scipy.sparse.csgraph.csgraph_from_masked

-

scipy.sparse.csgraph.csgraph_masked_from_dense

-

scipy.sparse.csgraph.csgraph_to_dense

-

scipy.sparse.csgraph.csgraph_to_masked

-

scipy.sparse.csgraph.depth_first_order

-

scipy.sparse.csgraph.depth_first_tree

-

scipy.sparse.csgraph.dijkstra

-

scipy.sparse.csgraph.floyd_warshall

-

scipy.sparse.csgraph.johnson

-

scipy.sparse.csgraph.laplacian

-

scipy.sparse.csgraph.maximum_bipartite_matching

-

scipy.sparse.csgraph.maximum_flow

-

scipy.sparse.csgraph.min_weight_full_bipartite_matching

-

scipy.sparse.csgraph.minimum_spanning_tree

-

scipy.sparse.csgraph.reconstruct_path

-

scipy.sparse.csgraph.reverse_cuthill_mckee

-

scipy.sparse.csgraph.shortest_path

-

scipy.sparse.csgraph.structural_rank

-

Special Functions#

SciPy

CuPy

scipy.special.agm

-

scipy.special.ai_zeros

-

scipy.special.airy

-

scipy.special.airye

-

scipy.special.assoc_laguerre

-

scipy.special.bdtr

cupyx.scipy.special.bdtr

scipy.special.bdtrc

cupyx.scipy.special.bdtrc

scipy.special.bdtri

cupyx.scipy.special.bdtri

scipy.special.bdtrik

-

scipy.special.bdtrin

-

scipy.special.bei

-

scipy.special.bei_zeros

-

scipy.special.beip

-

scipy.special.beip_zeros

-

scipy.special.ber

-

scipy.special.ber_zeros

-

scipy.special.bernoulli

-

scipy.special.berp

-

scipy.special.berp_zeros

-

scipy.special.besselpoly

-

scipy.special.beta

cupyx.scipy.special.beta

scipy.special.betainc

cupyx.scipy.special.betainc

scipy.special.betaincinv

cupyx.scipy.special.betaincinv

scipy.special.betaln

cupyx.scipy.special.betaln

scipy.special.bi_zeros

-

scipy.special.binom

cupyx.scipy.special.binom

scipy.special.boxcox

cupyx.scipy.special.boxcox

scipy.special.boxcox1p

cupyx.scipy.special.boxcox1p

scipy.special.btdtr

cupyx.scipy.special.btdtr

scipy.special.btdtri

cupyx.scipy.special.btdtri

scipy.special.btdtria

-

scipy.special.btdtrib

-

scipy.special.c_roots

-

scipy.special.cbrt

cupyx.scipy.special.cbrt

scipy.special.cg_roots

-

scipy.special.chdtr

cupyx.scipy.special.chdtr

scipy.special.chdtrc

cupyx.scipy.special.chdtrc

scipy.special.chdtri

cupyx.scipy.special.chdtri

scipy.special.chdtriv

-

scipy.special.chebyc

-

scipy.special.chebys

-

scipy.special.chebyt

-

scipy.special.chebyu

-

scipy.special.chndtr

-

scipy.special.chndtridf

-

scipy.special.chndtrinc

-

scipy.special.chndtrix

-

scipy.special.clpmn

-

scipy.special.comb

-

scipy.special.cosdg

cupyx.scipy.special.cosdg

scipy.special.cosm1

cupyx.scipy.special.cosm1

scipy.special.cotdg

cupyx.scipy.special.cotdg

scipy.special.dawsn

-

scipy.special.digamma

cupyx.scipy.special.digamma

scipy.special.diric

-

scipy.special.ellip_harm

-

scipy.special.ellip_harm_2

-

scipy.special.ellip_normal

-

scipy.special.ellipe

-

scipy.special.ellipeinc

-

scipy.special.ellipj

-

scipy.special.ellipk

-

scipy.special.ellipkinc

-

scipy.special.ellipkm1

-

scipy.special.elliprc

-

scipy.special.elliprd

-

scipy.special.elliprf

-

scipy.special.elliprg

-

scipy.special.elliprj

-

scipy.special.entr

cupyx.scipy.special.entr

scipy.special.erf

cupyx.scipy.special.erf

scipy.special.erf_zeros

-

scipy.special.erfc

cupyx.scipy.special.erfc

scipy.special.erfcinv

cupyx.scipy.special.erfcinv

scipy.special.erfcx

cupyx.scipy.special.erfcx

scipy.special.erfi

-

scipy.special.erfinv

cupyx.scipy.special.erfinv

scipy.special.errstate

-

scipy.special.euler

-

scipy.special.eval_chebyc

-

scipy.special.eval_chebys

-

scipy.special.eval_chebyt

-

scipy.special.eval_chebyu

-

scipy.special.eval_gegenbauer

-

scipy.special.eval_genlaguerre

-

scipy.special.eval_hermite

-

scipy.special.eval_hermitenorm

-

scipy.special.eval_jacobi

-

scipy.special.eval_laguerre

-

scipy.special.eval_legendre

-

scipy.special.eval_sh_chebyt

-

scipy.special.eval_sh_chebyu

-

scipy.special.eval_sh_jacobi

-

scipy.special.eval_sh_legendre

-

scipy.special.exp1

cupyx.scipy.special.exp1

scipy.special.exp10

cupyx.scipy.special.exp10

scipy.special.exp2

cupyx.scipy.special.exp2

scipy.special.expi

cupyx.scipy.special.expi

scipy.special.expit

cupyx.scipy.special.expit

scipy.special.expm1

cupyx.scipy.special.expm1

scipy.special.expn

cupyx.scipy.special.expn

scipy.special.exprel

cupyx.scipy.special.exprel

scipy.special.factorial

-

scipy.special.factorial2

-

scipy.special.factorialk

-

scipy.special.fdtr

cupyx.scipy.special.fdtr

scipy.special.fdtrc

cupyx.scipy.special.fdtrc

scipy.special.fdtri

cupyx.scipy.special.fdtri

scipy.special.fdtridfd

-

scipy.special.fresnel

-

scipy.special.fresnel_zeros

-

scipy.special.fresnelc_zeros

-

scipy.special.fresnels_zeros

-

scipy.special.gamma

cupyx.scipy.special.gamma

scipy.special.gammainc

cupyx.scipy.special.gammainc

scipy.special.gammaincc

cupyx.scipy.special.gammaincc

scipy.special.gammainccinv

cupyx.scipy.special.gammainccinv

scipy.special.gammaincinv

cupyx.scipy.special.gammaincinv

scipy.special.gammaln

cupyx.scipy.special.gammaln

scipy.special.gammasgn

cupyx.scipy.special.gammasgn

scipy.special.gdtr

cupyx.scipy.special.gdtr

scipy.special.gdtrc

cupyx.scipy.special.gdtrc

scipy.special.gdtria

-

scipy.special.gdtrib

-

scipy.special.gdtrix

-

scipy.special.gegenbauer

-

scipy.special.genlaguerre

-

scipy.special.geterr

-

scipy.special.h1vp

-

scipy.special.h2vp

-

scipy.special.h_roots

-

scipy.special.hankel1

-

scipy.special.hankel1e

-

scipy.special.hankel2

-

scipy.special.hankel2e

-

scipy.special.he_roots

-

scipy.special.hermite

-

scipy.special.hermitenorm

-

scipy.special.huber

cupyx.scipy.special.huber

scipy.special.hyp0f1

-

scipy.special.hyp1f1

-

scipy.special.hyp2f1

-

scipy.special.hyperu

-

scipy.special.i0

cupyx.scipy.special.i0

scipy.special.i0e

cupyx.scipy.special.i0e

scipy.special.i1

cupyx.scipy.special.i1

scipy.special.i1e

cupyx.scipy.special.i1e

scipy.special.inv_boxcox

cupyx.scipy.special.inv_boxcox

scipy.special.inv_boxcox1p

cupyx.scipy.special.inv_boxcox1p

scipy.special.it2i0k0

-

scipy.special.it2j0y0

-

scipy.special.it2struve0

-

scipy.special.itairy

-

scipy.special.iti0k0

-

scipy.special.itj0y0

-

scipy.special.itmodstruve0

-

scipy.special.itstruve0

-

scipy.special.iv

-

scipy.special.ive

-

scipy.special.ivp

-

scipy.special.j0

cupyx.scipy.special.j0

scipy.special.j1

cupyx.scipy.special.j1

scipy.special.j_roots

-

scipy.special.jacobi

-

scipy.special.jn

-

scipy.special.jn_zeros

-

scipy.special.jnjnp_zeros

-

scipy.special.jnp_zeros

-

scipy.special.jnyn_zeros

-

scipy.special.js_roots

-

scipy.special.jv

-

scipy.special.jve

-

scipy.special.jvp

-

scipy.special.k0

cupyx.scipy.special.k0

scipy.special.k0e

cupyx.scipy.special.k0e

scipy.special.k1

cupyx.scipy.special.k1

scipy.special.k1e

cupyx.scipy.special.k1e

scipy.special.kei

-

scipy.special.kei_zeros

-

scipy.special.keip

-

scipy.special.keip_zeros

-

scipy.special.kelvin

-

scipy.special.kelvin_zeros

-

scipy.special.ker

-

scipy.special.ker_zeros

-

scipy.special.kerp

-

scipy.special.kerp_zeros

-

scipy.special.kl_div

cupyx.scipy.special.kl_div

scipy.special.kn

-

scipy.special.kolmogi

-

scipy.special.kolmogorov

-

scipy.special.kv

-

scipy.special.kve

-

scipy.special.kvp

-

scipy.special.l_roots

-

scipy.special.la_roots

-

scipy.special.laguerre

-

scipy.special.lambertw

-

scipy.special.legendre

-

scipy.special.lmbda

-

scipy.special.log1p

cupyx.scipy.special.log1p

scipy.special.log_expit

cupyx.scipy.special.log_expit

scipy.special.log_ndtr

cupyx.scipy.special.log_ndtr

scipy.special.log_softmax

cupyx.scipy.special.log_softmax

scipy.special.loggamma

cupyx.scipy.special.loggamma

scipy.special.logit

cupyx.scipy.special.logit

scipy.special.logsumexp

cupyx.scipy.special.logsumexp

scipy.special.lpmn

-

scipy.special.lpmv

cupyx.scipy.special.lpmv

scipy.special.lpn

-

scipy.special.lqmn

-

scipy.special.lqn

-

scipy.special.mathieu_a

-

scipy.special.mathieu_b

-

scipy.special.mathieu_cem

-

scipy.special.mathieu_even_coef

-

scipy.special.mathieu_modcem1

-

scipy.special.mathieu_modcem2

-

scipy.special.mathieu_modsem1

-

scipy.special.mathieu_modsem2

-

scipy.special.mathieu_odd_coef

-

scipy.special.mathieu_sem

-

scipy.special.modfresnelm

-

scipy.special.modfresnelp

-

scipy.special.modstruve

-

scipy.special.multigammaln

cupyx.scipy.special.multigammaln

scipy.special.nbdtr

cupyx.scipy.special.nbdtr

scipy.special.nbdtrc

cupyx.scipy.special.nbdtrc

scipy.special.nbdtri

cupyx.scipy.special.nbdtri

scipy.special.nbdtrik

-

scipy.special.nbdtrin

-

scipy.special.ncfdtr

-

scipy.special.ncfdtri

-

scipy.special.ncfdtridfd

-

scipy.special.ncfdtridfn

-

scipy.special.ncfdtrinc

-

scipy.special.nctdtr

-

scipy.special.nctdtridf

-

scipy.special.nctdtrinc

-

scipy.special.nctdtrit

-

scipy.special.ndtr

cupyx.scipy.special.ndtr

scipy.special.ndtri

cupyx.scipy.special.ndtri

scipy.special.ndtri_exp

-

scipy.special.nrdtrimn

-

scipy.special.nrdtrisd

-

scipy.special.obl_ang1

-

scipy.special.obl_ang1_cv

-

scipy.special.obl_cv

-

scipy.special.obl_cv_seq

-

scipy.special.obl_rad1

-

scipy.special.obl_rad1_cv

-

scipy.special.obl_rad2

-

scipy.special.obl_rad2_cv

-

scipy.special.owens_t

-

scipy.special.p_roots

-

scipy.special.pbdn_seq

-

scipy.special.pbdv

-

scipy.special.pbdv_seq

-

scipy.special.pbvv

-

scipy.special.pbvv_seq

-

scipy.special.pbwa

-

scipy.special.pdtr

cupyx.scipy.special.pdtr

scipy.special.pdtrc

cupyx.scipy.special.pdtrc

scipy.special.pdtri

cupyx.scipy.special.pdtri

scipy.special.pdtrik

-

scipy.special.perm

-

scipy.special.poch

cupyx.scipy.special.poch

scipy.special.polygamma

cupyx.scipy.special.polygamma

scipy.special.powm1

-

scipy.special.pro_ang1

-

scipy.special.pro_ang1_cv

-

scipy.special.pro_cv

-

scipy.special.pro_cv_seq

-

scipy.special.pro_rad1

-

scipy.special.pro_rad1_cv

-

scipy.special.pro_rad2

-

scipy.special.pro_rad2_cv

-

scipy.special.ps_roots

-

scipy.special.pseudo_huber

cupyx.scipy.special.pseudo_huber

scipy.special.psi

cupyx.scipy.special.psi

scipy.special.radian

cupyx.scipy.special.radian

scipy.special.rel_entr

cupyx.scipy.special.rel_entr

scipy.special.rgamma

cupyx.scipy.special.rgamma

scipy.special.riccati_jn

-

scipy.special.riccati_yn

-

scipy.special.roots_chebyc

-

scipy.special.roots_chebys

-

scipy.special.roots_chebyt

-

scipy.special.roots_chebyu

-

scipy.special.roots_gegenbauer

-

scipy.special.roots_genlaguerre

-

scipy.special.roots_hermite

-

scipy.special.roots_hermitenorm

-

scipy.special.roots_jacobi

-

scipy.special.roots_laguerre

-

scipy.special.roots_legendre

-

scipy.special.roots_sh_chebyt

-

scipy.special.roots_sh_chebyu

-

scipy.special.roots_sh_jacobi

-

scipy.special.roots_sh_legendre

-

scipy.special.round

cupyx.scipy.special.round

scipy.special.s_roots

-

scipy.special.seterr

-

scipy.special.sh_chebyt

-

scipy.special.sh_chebyu

-

scipy.special.sh_jacobi

-

scipy.special.sh_legendre

-

scipy.special.shichi

-

scipy.special.sici

-

scipy.special.sinc

cupyx.scipy.special.sinc

scipy.special.sindg

cupyx.scipy.special.sindg

scipy.special.smirnov

-

scipy.special.smirnovi

-

scipy.special.softmax

cupyx.scipy.special.softmax

scipy.special.spence

-

scipy.special.sph_harm

cupyx.scipy.special.sph_harm

scipy.special.spherical_in

-

scipy.special.spherical_jn

-

scipy.special.spherical_kn

-

scipy.special.spherical_yn

cupyx.scipy.special.spherical_yn

scipy.special.stdtr

-

scipy.special.stdtridf

-

scipy.special.stdtrit

-

scipy.special.struve

-

scipy.special.t_roots

-

scipy.special.tandg

cupyx.scipy.special.tandg

scipy.special.tklmbda

-

scipy.special.ts_roots

-

scipy.special.u_roots

-

scipy.special.us_roots

-

scipy.special.voigt_profile

-

scipy.special.wofz

-

scipy.special.wright_bessel

-

scipy.special.wrightomega

-

scipy.special.xlog1py

cupyx.scipy.special.xlog1py

scipy.special.xlogy

cupyx.scipy.special.xlogy

scipy.special.y0

cupyx.scipy.special.y0

scipy.special.y0_zeros

-

scipy.special.y1

cupyx.scipy.special.y1

scipy.special.y1_zeros

-

scipy.special.y1p_zeros

-

scipy.special.yn

cupyx.scipy.special.yn

scipy.special.yn_zeros

-

scipy.special.ynp_zeros

-

scipy.special.yv

-

scipy.special.yve

-

scipy.special.yvp

-

scipy.special.zeta

cupyx.scipy.special.zeta

scipy.special.zetac

cupyx.scipy.special.zetac

Statistical Functions#

SciPy

CuPy

scipy.stats.Covariance

-

scipy.stats.alexandergovern

-

scipy.stats.alpha

-

scipy.stats.anderson

-

scipy.stats.anderson_ksamp

-

scipy.stats.anglit

-

scipy.stats.ansari

-

scipy.stats.arcsine

-

scipy.stats.argus

-

scipy.stats.barnard_exact

-

scipy.stats.bartlett

-

scipy.stats.bayes_mvs

-

scipy.stats.bernoulli

-

scipy.stats.beta

-

scipy.stats.betabinom

-

scipy.stats.betaprime

-

scipy.stats.binned_statistic

-

scipy.stats.binned_statistic_2d

-

scipy.stats.binned_statistic_dd

-

scipy.stats.binom

-

scipy.stats.binom_test

-

scipy.stats.binomtest

-

scipy.stats.boltzmann

-

scipy.stats.bootstrap

-

scipy.stats.boschloo_exact

-

scipy.stats.boxcox

-

scipy.stats.boxcox_llf

cupyx.scipy.stats.boxcox_llf

scipy.stats.boxcox_normmax

-

scipy.stats.boxcox_normplot

-

scipy.stats.bradford

-

scipy.stats.brunnermunzel

-

scipy.stats.burr

-

scipy.stats.burr12

-

scipy.stats.cauchy

-

scipy.stats.chi

-

scipy.stats.chi2

-

scipy.stats.chi2_contingency

-

scipy.stats.chisquare

-

scipy.stats.circmean

-

scipy.stats.circstd

-

scipy.stats.circvar

-

scipy.stats.combine_pvalues

-

scipy.stats.cosine

-

scipy.stats.cramervonmises

-

scipy.stats.cramervonmises_2samp

-

scipy.stats.crystalball

-

scipy.stats.cumfreq

-

scipy.stats.describe

-

scipy.stats.dgamma

-

scipy.stats.differential_entropy

-

scipy.stats.directional_stats

-

scipy.stats.dirichlet

-

scipy.stats.dlaplace

-

scipy.stats.dweibull

-

scipy.stats.energy_distance

-

scipy.stats.entropy

cupyx.scipy.stats.entropy

scipy.stats.epps_singleton_2samp

-

scipy.stats.erlang

-

scipy.stats.expectile

-

scipy.stats.expon

-

scipy.stats.exponnorm

-

scipy.stats.exponpow

-

scipy.stats.exponweib

-

scipy.stats.f

-

scipy.stats.f_oneway

-

scipy.stats.fatiguelife

-

scipy.stats.find_repeats

-

scipy.stats.fisher_exact

-

scipy.stats.fisk

-

scipy.stats.fit

-

scipy.stats.fligner

-

scipy.stats.foldcauchy

-

scipy.stats.foldnorm

-

scipy.stats.friedmanchisquare

-

scipy.stats.gamma

-

scipy.stats.gausshyper

-

scipy.stats.gaussian_kde

-

scipy.stats.genexpon

-

scipy.stats.genextreme

-

scipy.stats.gengamma

-

scipy.stats.genhalflogistic

-

scipy.stats.genhyperbolic

-

scipy.stats.geninvgauss

-

scipy.stats.genlogistic

-

scipy.stats.gennorm

-

scipy.stats.genpareto

-

scipy.stats.geom

-

scipy.stats.gibrat

-

scipy.stats.gilbrat

-

scipy.stats.gmean

-

scipy.stats.gompertz

-

scipy.stats.goodness_of_fit

-

scipy.stats.gstd

-

scipy.stats.gumbel_l

-

scipy.stats.gumbel_r

-

scipy.stats.gzscore

-

scipy.stats.halfcauchy

-

scipy.stats.halfgennorm

-

scipy.stats.halflogistic

-

scipy.stats.halfnorm

-

scipy.stats.hmean

-

scipy.stats.hypergeom

-

scipy.stats.hypsecant

-

scipy.stats.invgamma

-

scipy.stats.invgauss

-

scipy.stats.invweibull

-

scipy.stats.invwishart

-

scipy.stats.iqr

-

scipy.stats.jarque_bera

-

scipy.stats.johnsonsb

-

scipy.stats.johnsonsu

-

scipy.stats.kappa3

-

scipy.stats.kappa4

-

scipy.stats.kendalltau

-

scipy.stats.kruskal

-

scipy.stats.ks_1samp

-

scipy.stats.ks_2samp

-

scipy.stats.ksone

-

scipy.stats.kstat

-

scipy.stats.kstatvar

-

scipy.stats.kstest

-

scipy.stats.kstwo

-

scipy.stats.kstwobign

-

scipy.stats.kurtosis

-

scipy.stats.kurtosistest

-

scipy.stats.laplace

-

scipy.stats.laplace_asymmetric

-

scipy.stats.levene

-

scipy.stats.levy

-

scipy.stats.levy_l

-

scipy.stats.levy_stable

-

scipy.stats.linregress

-

scipy.stats.loggamma

-

scipy.stats.logistic

-

scipy.stats.loglaplace

-

scipy.stats.lognorm

-

scipy.stats.logser

-

scipy.stats.loguniform

-

scipy.stats.lomax

-

scipy.stats.mannwhitneyu

-

scipy.stats.matrix_normal

-

scipy.stats.maxwell

-

scipy.stats.median_abs_deviation

-

scipy.stats.median_test

-

scipy.stats.mielke

-

scipy.stats.mode

-

scipy.stats.moment

-

scipy.stats.monte_carlo_test

-

scipy.stats.mood

-

scipy.stats.moyal

-

scipy.stats.multinomial

-

scipy.stats.multiscale_graphcorr

-

scipy.stats.multivariate_hypergeom

-

scipy.stats.multivariate_normal

-

scipy.stats.multivariate_t

-

scipy.stats.mvsdist

-

scipy.stats.nakagami

-

scipy.stats.nbinom

-

scipy.stats.ncf

-

scipy.stats.nchypergeom_fisher

-

scipy.stats.nchypergeom_wallenius

-

scipy.stats.nct

-

scipy.stats.ncx2

-

scipy.stats.nhypergeom

-

scipy.stats.norm

-

scipy.stats.normaltest

-

scipy.stats.norminvgauss

-

scipy.stats.obrientransform

-

scipy.stats.ortho_group

-

scipy.stats.page_trend_test

-

scipy.stats.pareto

-

scipy.stats.pearson3

-

scipy.stats.pearsonr

-

scipy.stats.percentileofscore

-

scipy.stats.permutation_test

-

scipy.stats.planck

-

scipy.stats.pmean

-

scipy.stats.pointbiserialr

-

scipy.stats.poisson

-

scipy.stats.poisson_means_test

-

scipy.stats.power_divergence

-

scipy.stats.powerlaw

-

scipy.stats.powerlognorm

-

scipy.stats.powernorm

-

scipy.stats.ppcc_max

-

scipy.stats.ppcc_plot

-

scipy.stats.probplot

-

scipy.stats.randint

-

scipy.stats.random_correlation

-

scipy.stats.random_table

-

scipy.stats.rankdata

-

scipy.stats.ranksums

-

scipy.stats.rayleigh

-

scipy.stats.rdist

-

scipy.stats.recipinvgauss

-

scipy.stats.reciprocal

-

scipy.stats.relfreq

-

scipy.stats.rice

-

scipy.stats.rv_continuous

-

scipy.stats.rv_discrete

-

scipy.stats.rv_histogram

-

scipy.stats.rvs_ratio_uniforms

-

scipy.stats.scoreatpercentile

-

scipy.stats.sem

-

scipy.stats.semicircular

-

scipy.stats.shapiro

-

scipy.stats.siegelslopes

-

scipy.stats.sigmaclip

-

scipy.stats.skellam

-

scipy.stats.skew

-

scipy.stats.skewcauchy

-

scipy.stats.skewnorm

-

scipy.stats.skewtest

-

scipy.stats.somersd

-

scipy.stats.spearmanr

-

scipy.stats.special_ortho_group

-

scipy.stats.studentized_range

-

scipy.stats.t

-

scipy.stats.theilslopes

-

scipy.stats.tiecorrect

-

scipy.stats.tmax

-

scipy.stats.tmean

-

scipy.stats.tmin

-

scipy.stats.trapezoid

-

scipy.stats.trapz

-

scipy.stats.triang

-

scipy.stats.trim1

-

scipy.stats.trim_mean

cupyx.scipy.stats.trim_mean

scipy.stats.trimboth

-

scipy.stats.truncexpon

-

scipy.stats.truncnorm

-

scipy.stats.truncpareto

-

scipy.stats.truncweibull_min

-

scipy.stats.tsem

-

scipy.stats.tstd

-

scipy.stats.ttest_1samp

-

scipy.stats.ttest_ind

-

scipy.stats.ttest_ind_from_stats

-

scipy.stats.ttest_rel

-

scipy.stats.tukey_hsd

-

scipy.stats.tukeylambda

-

scipy.stats.tvar

-

scipy.stats.uniform

-

scipy.stats.uniform_direction

-

scipy.stats.unitary_group

-

scipy.stats.variation

-

scipy.stats.vonmises

-

scipy.stats.vonmises_line

-

scipy.stats.wald

-

scipy.stats.wasserstein_distance

-

scipy.stats.weibull_max

-

scipy.stats.weibull_min

-

scipy.stats.weightedtau

-

scipy.stats.wilcoxon

-

scipy.stats.wishart

-

scipy.stats.wrapcauchy

-

scipy.stats.yeojohnson

-

scipy.stats.yeojohnson_llf

-

scipy.stats.yeojohnson_normmax

-

scipy.stats.yeojohnson_normplot

-

scipy.stats.yulesimon

-

scipy.stats.zipf

-

scipy.stats.zipfian

-

scipy.stats.zmap

cupyx.scipy.stats.zmap

scipy.stats.zscore

cupyx.scipy.stats.zscore

Footnotes

1(1,2,3,4)

Use of numpy.matrix is discouraged in NumPy and thus we have no plan to add it to CuPy.

2(1,2,3,4,5,6,7,8,9)

datetime64 and timedelta64 dtypes are currently unsupported.

3(1,2,3,4,5,6,7,8,9,10)

object and string dtypes are not supported in GPU and thus left unimplemented in CuPy.

4(1,2,3,4,5,6,7)

Floating point error handling depends on CPU-specific features which is not available in GPU.

5(1,2,3,4,5)

Structured arrays and record arrays are currently unsupported.

6(1,2,3,4)

Use of numpy.poly1d is discouraged in NumPy and thus we have stopped adding functions with the interface.

7(1,2)

Not supported as it has been deprecated in NumPy.

8(1,2)

Not supported as GPUs only support little-endian byte-encoding.

Python Array API Support#

The Python array API standard aims to provide a coherent set of APIs for array and tensor libraries developed by the community to build upon. This solves the API fragmentation issue across the community by offering concrete function signatures, semantics and scopes of coverage, enabling writing backend-agnostic codes for better portability.

CuPy provides experimental support based on NumPy’s NEP-47, which is in turn based on the v2021 standard. All of the functionalities can be accessed through the cupy.array_api namespace.

NumPy’s Array API Standard Compatibility is an excellent starting point to understand better the differences between the APIs under the main namespace and the array_api namespace. Keep in mind, however, that the key difference between NumPy and CuPy is that we are a GPU-only library, therefore CuPy users should be aware of potential device management issues. Same as in regular CuPy codes, the GPU-to-use can be specified via the Device objects, see Device management. GPU-related semantics (e.g. streams, asynchronicity, etc) are also respected. Finally, remember there are already differences between NumPy and CuPy, although some of which are rectified in the standard.

Array API Functions#

This section is a full list of implemented APIs. For the detailed documentation, see the array API specification.

cupy.array_api.abs(x, /)[source]#

Array API compatible wrapper for np.abs.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.acos(x, /)[source]#

Array API compatible wrapper for np.arccos.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.acosh(x, /)[source]#

Array API compatible wrapper for np.arccosh.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.add(x1, x2, /)[source]#

Array API compatible wrapper for np.add.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.all(x, /, *, axis=None, keepdims=False)[source]#

Array API compatible wrapper for np.all.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.any(x, /, *, axis=None, keepdims=False)[source]#

Array API compatible wrapper for np.any.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.arange(start, /, stop=None, step=1, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.arange.

See its docstring for more information.

Parameters
  • start (Union[int, float]) –

  • stop (Optional[Union[int, float]]) –

  • step (Union[int, float]) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.argmax(x, /, *, axis=None, keepdims=False)[source]#

Array API compatible wrapper for np.argmax.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.argmin(x, /, *, axis=None, keepdims=False)[source]#

Array API compatible wrapper for np.argmin.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.argsort(x, /, *, axis=-1, descending=False, stable=True)[source]#

Array API compatible wrapper for np.argsort.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.asarray(obj, /, *, dtype=None, device=None, copy=None)[source]#

Array API compatible wrapper for np.asarray.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.asin(x, /)[source]#

Array API compatible wrapper for np.arcsin.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.asinh(x, /)[source]#

Array API compatible wrapper for np.arcsinh.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.atan(x, /)[source]#

Array API compatible wrapper for np.arctan.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.atan2(x1, x2, /)[source]#

Array API compatible wrapper for np.arctan2.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.atanh(x, /)[source]#

Array API compatible wrapper for np.arctanh.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.bitwise_and(x1, x2, /)[source]#

Array API compatible wrapper for np.bitwise_and.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.bitwise_invert(x, /)[source]#

Array API compatible wrapper for np.invert.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.bitwise_left_shift(x1, x2, /)[source]#

Array API compatible wrapper for np.left_shift.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.bitwise_or(x1, x2, /)[source]#

Array API compatible wrapper for np.bitwise_or.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.bitwise_right_shift(x1, x2, /)[source]#

Array API compatible wrapper for np.right_shift.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.bitwise_xor(x1, x2, /)[source]#

Array API compatible wrapper for np.bitwise_xor.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.broadcast_arrays(*arrays)[source]#

Array API compatible wrapper for np.broadcast_arrays.

See its docstring for more information.

Parameters

arrays (Array) –

Return type

List[Array]

cupy.array_api.broadcast_to(x, /, shape)[source]#

Array API compatible wrapper for np.broadcast_to.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.can_cast(from_, to, /)[source]#

Array API compatible wrapper for np.can_cast.

See its docstring for more information.

Parameters
  • from_ (Union[Dtype, Array]) –

  • to (Dtype) –

Return type

bool

cupy.array_api.ceil(x, /)[source]#

Array API compatible wrapper for np.ceil.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.concat(arrays, /, *, axis=0)[source]#

Array API compatible wrapper for np.concatenate.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.cos(x, /)[source]#

Array API compatible wrapper for np.cos.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.cosh(x, /)[source]#

Array API compatible wrapper for np.cosh.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.divide(x1, x2, /)[source]#

Array API compatible wrapper for np.divide.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.empty(shape, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.empty.

See its docstring for more information.

Parameters
  • shape (Union[int, Tuple[int, ...]]) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.empty_like(x, /, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.empty_like.

See its docstring for more information.

Parameters
  • x (Array) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.equal(x1, x2, /)[source]#

Array API compatible wrapper for np.equal.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.exp(x, /)[source]#

Array API compatible wrapper for np.exp.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.expand_dims(x, /, *, axis)[source]#

Array API compatible wrapper for np.expand_dims.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.expm1(x, /)[source]#

Array API compatible wrapper for np.expm1.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.eye(n_rows, n_cols=None, /, *, k=0, dtype=None, device=None)[source]#

Array API compatible wrapper for np.eye.

See its docstring for more information.

Parameters
  • n_rows (int) –

  • n_cols (Optional[int]) –

  • k (int) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.finfo(type, /)[source]#

Array API compatible wrapper for np.finfo.

See its docstring for more information.

Parameters

type (Union[Dtype, Array]) –

Return type

finfo_object

cupy.array_api.flip(x, /, *, axis=None)[source]#

Array API compatible wrapper for np.flip.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.floor(x, /)[source]#

Array API compatible wrapper for np.floor.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.floor_divide(x1, x2, /)[source]#

Array API compatible wrapper for np.floor_divide.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.from_dlpack(x, /)[source]#

Array API compatible wrapper for np.from_dlpack.

See its docstring for more information.

Parameters

x (object) –

Return type

Array

cupy.array_api.full(shape, fill_value, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.full.

See its docstring for more information.

Parameters
  • shape (Union[int, Tuple[int, ...]]) –

  • fill_value (Union[int, float]) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.full_like(x, /, fill_value, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.full_like.

See its docstring for more information.

Parameters
  • x (Array) –

  • fill_value (Union[int, float]) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.greater(x1, x2, /)[source]#

Array API compatible wrapper for np.greater.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.greater_equal(x1, x2, /)[source]#

Array API compatible wrapper for np.greater_equal.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.iinfo(type, /)[source]#

Array API compatible wrapper for np.iinfo.

See its docstring for more information.

Parameters

type (Union[Dtype, Array]) –

Return type

iinfo_object

cupy.array_api.isfinite(x, /)[source]#

Array API compatible wrapper for np.isfinite.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.isinf(x, /)[source]#

Array API compatible wrapper for np.isinf.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.isnan(x, /)[source]#

Array API compatible wrapper for np.isnan.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.less(x1, x2, /)[source]#

Array API compatible wrapper for np.less.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.less_equal(x1, x2, /)[source]#

Array API compatible wrapper for np.less_equal.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.linspace(start, stop, /, num, *, dtype=None, device=None, endpoint=True)[source]#

Array API compatible wrapper for np.linspace.

See its docstring for more information.

Parameters
  • start (Union[int, float]) –

  • stop (Union[int, float]) –

  • num (int) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

  • endpoint (bool) –

Return type

Array

cupy.array_api.log(x, /)[source]#

Array API compatible wrapper for np.log.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.log10(x, /)[source]#

Array API compatible wrapper for np.log10.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.log1p(x, /)[source]#

Array API compatible wrapper for np.log1p.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.log2(x, /)[source]#

Array API compatible wrapper for np.log2.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.logaddexp(x1, x2)[source]#

Array API compatible wrapper for np.logaddexp.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.logical_and(x1, x2, /)[source]#

Array API compatible wrapper for np.logical_and.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.logical_not(x, /)[source]#

Array API compatible wrapper for np.logical_not.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.logical_or(x1, x2, /)[source]#

Array API compatible wrapper for np.logical_or.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.logical_xor(x1, x2, /)[source]#

Array API compatible wrapper for np.logical_xor.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.matmul(x1, x2, /)[source]#

Array API compatible wrapper for np.matmul.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.meshgrid(*arrays, indexing='xy')[source]#

Array API compatible wrapper for np.meshgrid.

See its docstring for more information.

Parameters
Return type

List[Array]

cupy.array_api.multiply(x1, x2, /)[source]#

Array API compatible wrapper for np.multiply.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.negative(x, /)[source]#

Array API compatible wrapper for np.negative.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.nonzero(x, /)[source]#

Array API compatible wrapper for np.nonzero.

See its docstring for more information.

Parameters

x (Array) –

Return type

Tuple[Array, …]

cupy.array_api.not_equal(x1, x2, /)[source]#

Array API compatible wrapper for np.not_equal.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.ones(shape, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.ones.

See its docstring for more information.

Parameters
  • shape (Union[int, Tuple[int, ...]]) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.ones_like(x, /, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.ones_like.

See its docstring for more information.

Parameters
  • x (Array) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.permute_dims(x, /, axes)[source]#

Array API compatible wrapper for np.transpose.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.positive(x, /)[source]#

Array API compatible wrapper for np.positive.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.pow(x1, x2, /)[source]#

Array API compatible wrapper for np.power.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.remainder(x1, x2, /)[source]#

Array API compatible wrapper for np.remainder.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.reshape(x, /, shape)[source]#

Array API compatible wrapper for np.reshape.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.result_type(*arrays_and_dtypes)[source]#

Array API compatible wrapper for np.result_type.

See its docstring for more information.

Parameters

arrays_and_dtypes (Union[Array, Dtype]) –

Return type

Dtype

cupy.array_api.roll(x, /, shift, *, axis=None)[source]#

Array API compatible wrapper for np.roll.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.round(x, /)[source]#

Array API compatible wrapper for np.round.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.sign(x, /)[source]#

Array API compatible wrapper for np.sign.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.sin(x, /)[source]#

Array API compatible wrapper for np.sin.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.sinh(x, /)[source]#

Array API compatible wrapper for np.sinh.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.sort(x, /, *, axis=-1, descending=False, stable=True)[source]#

Array API compatible wrapper for np.sort.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.sqrt(x, /)[source]#

Array API compatible wrapper for np.sqrt.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.square(x, /)[source]#

Array API compatible wrapper for np.square.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.squeeze(x, /, axis)[source]#

Array API compatible wrapper for np.squeeze.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.stack(arrays, /, *, axis=0)[source]#

Array API compatible wrapper for np.stack.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.subtract(x1, x2, /)[source]#

Array API compatible wrapper for np.subtract.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.take(x, indices, /, *, axis)[source]#

Array API compatible wrapper for np.take. See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.tan(x, /)[source]#

Array API compatible wrapper for np.tan.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.tanh(x, /)[source]#

Array API compatible wrapper for np.tanh.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.tril(x, /, *, k=0)[source]#

Array API compatible wrapper for np.tril.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.triu(x, /, *, k=0)[source]#

Array API compatible wrapper for np.triu.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.trunc(x, /)[source]#

Array API compatible wrapper for np.trunc.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.unique_all(x, /)[source]#

Array API compatible wrapper for np.unique.

See its docstring for more information.

Parameters

x (Array) –

Return type

UniqueAllResult

cupy.array_api.unique_inverse(x, /)[source]#

Array API compatible wrapper for np.unique.

See its docstring for more information.

Parameters

x (Array) –

Return type

UniqueInverseResult

cupy.array_api.unique_values(x, /)[source]#

Array API compatible wrapper for np.unique.

See its docstring for more information.

Parameters

x (Array) –

Return type

Array

cupy.array_api.where(condition, x1, x2, /)[source]#

Array API compatible wrapper for np.where.

See its docstring for more information.

Parameters
Return type

Array

cupy.array_api.zeros(shape, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.zeros.

See its docstring for more information.

Parameters
  • shape (Union[int, Tuple[int, ...]]) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

cupy.array_api.zeros_like(x, /, *, dtype=None, device=None)[source]#

Array API compatible wrapper for np.zeros_like.

See its docstring for more information.

Parameters
  • x (Array) –

  • dtype (Optional[Dtype]) –

  • device (Optional[Device]) –

Return type

Array

Array API Compliant Object#

Array is a wrapper class built upon cupy.ndarray to enforce strict compliance with the array API standard. See the documentation for detail.

This object should not be constructed directly. Rather, use one of the creation functions, such as cupy.array_api.asarray().

Array(*args, **kwargs)

n-d array object for the array API namespace.

Contribution Guide#

This is a guide for all contributions to CuPy. The development of CuPy is running on the official repository at GitHub. Anyone that wants to register an issue or to send a pull request should read through this document.

Classification of Contributions#

There are several ways to contribute to CuPy community:

  1. Registering an issue

  2. Sending a pull request (PR)

  3. Sending a question to CuPy’s Gitter channel, CuPy User Group, or StackOverflow

  4. Open-sourcing an external example

  5. Writing a post about CuPy

This document mainly focuses on 1 and 2, though other contributions are also appreciated.

Development Cycle#

This section explains the development process of CuPy. Before contributing to CuPy, it is strongly recommended to understand the development cycle.

Versioning#

The versioning of CuPy follows PEP 440 and a part of Semantic versioning. The version number consists of three or four parts: X.Y.Zw where X denotes the major version, Y denotes the minor version, Z denotes the revision number, and the optional w denotes the prelease suffix. While the major, minor, and revision numbers follow the rule of semantic versioning, the pre-release suffix follows PEP 440 so that the version string is much friendly with Python eco-system.

Note that a major update basically does not contain compatibility-breaking changes from the last release candidate (RC). This is not a strict rule, though; if there is a critical API bug that we have to fix for the major version, we may add breaking changes to the major version up.

As for the backward compatibility, see API Compatibility Policy.

Release Cycle#

The first one is the track of stable versions, which is a series of revision updates for the latest major version. The second one is the track of development versions, which is a series of pre-releases for the upcoming major version.

Consider that X.0.0 is the latest major version and Y.0.0, Z.0.0 are the succeeding major versions. Then, the timeline of the updates is depicted by the following table.

Date

ver X

ver Y

ver Z

0 weeks

X.0.0rc1

4 weeks

X.0.0

Y.0.0a1

8 weeks

X.1.0*

Y.0.0b1

12 weeks

X.2.0*

Y.0.0rc1

16 weeks

Y.0.0

Z.0.0a1

(* These might be revision releases)

The dates shown in the left-most column are relative to the release of X.0.0rc1. In particular, each revision/minor release is made four weeks after the previous one of the same major version, and the pre-release of the upcoming major version is made at the same time. Whether these releases are revision or minor is determined based on the contents of each update.

Note that there are only three stable releases for the versions X.x.x. During the parallel development of Y.0.0 and Z.0.0a1, the version Y is treated as an almost-stable version and Z is treated as a development version.

If there is a critical bug found in X.x.x after stopping the development of version X, we may release a hot-fix for this version at any time.

We create a milestone for each upcoming release at GitHub. The GitHub milestone is basically used for collecting the issues and PRs resolved in the release.

Git Branches#

The main branch is used to develop pre-release versions. It means that alpha, beta, and RC updates are developed at the main branch. This branch contains the most up-to-date source tree that includes features newly added after the latest major version.

The stable version is developed at the individual branch named as vN where “N” reflects the version number (we call it a versioned branch). For example, v1.0.0, v1.0.1, and v1.0.2 will be developed at the v1 branch.

Notes for contributors: When you send a pull request, you basically have to send it to the main branch. If the change can also be applied to the stable version, a core team member will apply the same change to the stable version so that the change is also included in the next revision update.

If the change is only applicable to the stable version and not to the main branch, please send it to the versioned branch. We basically only accept changes to the latest versioned branch (where the stable version is developed) unless the fix is critical.

If you want to make a new feature of the main branch available in the current stable version, please send a backport PR to the stable version (the latest vN branch). See the next section for details.

Note: a change that can be applied to both branches should be sent to the main branch. Each release of the stable version is also merged to the development version so that the change is also reflected to the next major version.

Feature Backport PRs#

We basically do not backport any new features of the development version to the stable versions. If you desire to include the feature to the current stable version and you can work on the backport work, we welcome such a contribution. In such a case, you have to send a backport PR to the latest vN branch. Note that we do not accept any feature backport PRs to older versions because we are not running quality assurance workflows (e.g. CI) for older versions so that we cannot ensure that the PR is correctly ported.

There are some rules on sending a backport PR.

  • Start the PR title from the prefix [backport].

  • Clarify the original PR number in the PR description (something like “This is a backport of #XXXX”).

  • (optional) Write to the PR description the motivation of backporting the feature to the stable version.

Please follow these rules when you create a feature backport PR.

Note: PRs that do not include any changes/additions to APIs (e.g. bug fixes, documentation improvements) are usually backported by core dev members. It is also appreciated to make such a backport PR by any contributors, though, so that the overall development proceeds more smoothly!

Issues and Pull Requests#

In this section, we explain how to send pull requests (PRs).

How to Send a Pull Request#

If you can write code to fix an issue, we encourage to send a PR.

First of all, before starting to write any code, do not forget to confirm the following points.

  • Read through the Coding Guidelines and Unit Testing.

  • Check the appropriate branch that you should send the PR following Git Branches. If you do not have any idea about selecting a branch, please choose the main branch.

In particular, check the branch before writing any code. The current source tree of the chosen branch is the starting point of your change.

After writing your code (including unit tests and hopefully documentations!), send a PR on GitHub. You have to write a precise explanation of what and how you fix; it is the first documentation of your code that developers read, which is a very important part of your PR.

Once you send a PR, it is automatically tested on GitHub Actions. After the automatic test passes, core developers will start reviewing your code. Note that this automatic PR test only includes CPU tests.

Note

We are also running continuous integration with GPU tests for the main branch and the versioned branch of the latest major version. Since this service is currently running on our internal server, we do not use it for automatic PR tests to keep the server secure.

If you are planning to add a new feature or modify existing APIs, it is recommended to open an issue and discuss the design first. The design discussion needs lower cost for the core developers than code review. Following the consequences of the discussions, you can send a PR that is smoothly reviewed in a shorter time.

Even if your code is not complete, you can send a pull request as a work-in-progress PR by putting the [WIP] prefix to the PR title. If you write a precise explanation about the PR, core developers and other contributors can join the discussion about how to proceed the PR. WIP PR is also useful to have discussions based on a concrete code.

Coding Guidelines#

Note

Coding guidelines are updated at v5.0. Those who have contributed to older versions should read the guidelines again.

We use PEP8 and a part of OpenStack Style Guidelines related to general coding style as our basic style guidelines.

You can use autopep8 and flake8 commands to check your code.

In order to avoid confusion from using different tool versions, we pin the versions of those tools. Install them with the following command (from within the top directory of CuPy repository):

$ pip install -e '.[stylecheck]'

And check your code with:

$ autopep8 path/to/your/code.py
$ flake8 path/to/your/code.py

To check Cython code, use .flake8.cython configuration file:

$ flake8 --config=.flake8.cython path/to/your/cython/code.pyx

The autopep8 supports automatically correct Python code to conform to the PEP 8 style guide:

$ autopep8 --in-place path/to/your/code.py

The flake8 command lets you know the part of your code not obeying our style guidelines. Before sending a pull request, be sure to check that your code passes the flake8 checking.

Note that flake8 command is not perfect. It does not check some of the style guidelines. Here is a (not-complete) list of the rules that flake8 cannot check.

  • Relative imports are prohibited. [H304]

  • Importing non-module symbols is prohibited.

  • Import statements must be organized into three parts: standard libraries, third-party libraries, and internal imports. [H306]

In addition, we restrict the usage of shortcut symbols in our code base. They are symbols imported by packages and sub-packages of cupy. For example, cupy.cuda.Device is a shortcut of cupy.cuda.device.Device. It is not allowed to use such shortcuts in the ``cupy`` library implementation. Note that you can still use them in tests and examples directories.

Once you send a pull request, your coding style is automatically checked by GitHub Actions. The reviewing process starts after the check passes.

The CuPy is designed based on NumPy’s API design. CuPy’s source code and documents contain the original NumPy ones. Please note the followings when writing the document.

  • In order to identify overlapping parts, it is preferable to add some remarks that this document is just copied or altered from the original one. It is also preferable to briefly explain the specification of the function in a short paragraph, and refer to the corresponding function in NumPy so that users can read the detailed document. However, it is possible to include a complete copy of the document with such a remark if users cannot summarize in such a way.

  • If a function in CuPy only implements a limited amount of features in the original one, users should explicitly describe only what is implemented in the document.

For changes that modify or add new Cython files, please make sure the pointer types follow these guidelines (#1913).

  • Pointers should be void* if only used within Cython, or intptr_t if exposed to the Python space.

  • Memory sizes should be size_t.

  • Memory offsets should be ptrdiff_t.

Note

We are incrementally enforcing the above rules, so some existing code may not follow the above guidelines, but please ensure all new contributions do.

Unit Testing#

Testing is one of the most important part of your code. You must write test cases and verify your implementation by following our testing guide.

Note that we are using pytest and mock package for testing, so install them before writing your code:

$ pip install pytest mock

How to Run Tests#

In order to run unit tests at the repository root, you first have to build Cython files in place by running the following command:

$ pip install -e .

Note

When you modify *.pxd files, before running pip install -e ., you must clean *.cpp and *.so files once with the following command, because Cython does not automatically rebuild those files nicely:

$ git clean -fdx

Once Cython modules are built, you can run unit tests by running the following command at the repository root:

$ python -m pytest

CUDA must be installed to run unit tests.

Some GPU tests require cuDNN to run. In order to skip unit tests that require cuDNN, specify -m='not cudnn' option:

$ python -m pytest path/to/your/test.py -m='not cudnn'

Some GPU tests involve multiple GPUs. If you want to run GPU tests with insufficient number of GPUs, specify the number of available GPUs to CUPY_TEST_GPU_LIMIT. For example, if you have only one GPU, launch pytest by the following command to skip multi-GPU tests:

$ export CUPY_TEST_GPU_LIMIT=1
$ python -m pytest path/to/gpu/test.py

Following this naming convention, you can run all the tests by running the following command at the repository root:

$ python -m pytest

Or you can also specify a root directory to search test scripts from:

$ python -m pytest tests/cupy_tests     # to just run tests of CuPy
$ python -m pytest tests/install_tests  # to just run tests of installation modules

If you modify the code related to existing unit tests, you must run appropriate commands.

Test File and Directory Naming Conventions#

Tests are put into the tests/cupy_tests directory. In order to enable test runner to find test scripts correctly, we are using special naming convention for the test subdirectories and the test scripts.

  • The name of each subdirectory of tests must end with the _tests suffix.

  • The name of each test script must start with the test_ prefix.

When we write a test for a module, we use the appropriate path and file name for the test script whose correspondence to the tested module is clear. For example, if you want to write a test for a module cupy.x.y.z, the test script must be located at tests/cupy_tests/x_tests/y_tests/test_z.py.

How to Write Tests#

There are many examples of unit tests under the tests directory, so reading some of them is a good and recommended way to learn how to write tests for CuPy. They simply use the unittest package of the standard library, while some tests are using utilities from cupy.testing.

In addition to the Coding Guidelines mentioned above, the following rules are applied to the test code:

  • All test classes must inherit from unittest.TestCase.

  • Use unittest features to write tests, except for the following cases:

    • Use assert statement instead of self.assert* methods (e.g., write assert x == 1 instead of self.assertEqual(x, 1)).

    • Use with pytest.raises(...): instead of with self.assertRaises(...):.

Note

We are incrementally applying the above style. Some existing tests may be using the old style (self.assertRaises, etc.), but all newly written tests should follow the above style.

In order to write tests for multiple GPUs, use cupy.testing.multi_gpu() decorators instead:

import unittest
from cupy import testing

class TestMyFunc(unittest.TestCase):
    ...

    @testing.multi_gpu(2)  # specify the number of required GPUs here
    def test_my_two_gpu_func(self):
        ...

If your test requires too much time, add cupy.testing.slow decorator. The test functions decorated by slow are skipped if -m='not slow' is given:

import unittest
from cupy import testing

class TestMyFunc(unittest.TestCase):
    ...

    @testing.slow
    def test_my_slow_func(self):
        ...

Once you send a pull request, GitHub Actions automatically checks if your code meets our coding guidelines described above. Since GitHub Actions does not support CUDA, we cannot run unit tests automatically. The reviewing process starts after the automatic check passes. Note that reviewers will test your code without the option to check CUDA-related code.

Note

Some of numerically unstable tests might cause errors irrelevant to your changes. In such a case, we ignore the failures and go on to the review process, so do not worry about it!

Documentation#

When adding a new feature to the framework, you also need to document it in the reference.

Note

If you are unsure about how to fix the documentation, you can submit a pull request without doing so. Reviewers will help you fix the documentation appropriately.

The documentation source is stored under docs directory and written in reStructuredText format.

To build the documentation, you need to install Sphinx:

$ pip install -r docs/requirements.txt

Then you can build the documentation in HTML format locally:

$ cd docs
$ make html

HTML files are generated under build/html directory. Open index.html with the browser and see if it is rendered as expected.

Note

Docstrings (documentation comments in the source code) are collected from the installed CuPy module. If you modified docstrings, make sure to install the module (e.g., using pip install -e .) before building the documentation.

Tips for Developers#

Here are some tips for developers hacking CuPy source code.

Install as Editable#

During the development we recommend using pip with -e option to install as editable mode:

$ pip install -e .

Please note that even with -e, you will have to rerun pip install -e . to regenerate C++ sources using Cython if you modified Cython source files (e.g., *.pyx files).

Use ccache#

NVCC environment variable can be specified at the build time to use the custom command instead of nvcc . You can speed up the rebuild using ccache (v3.4 or later) by:

$ export NVCC='ccache nvcc'

Limit Architecture#

Use CUPY_NVCC_GENERATE_CODE environment variable to reduce the build time by limiting the target CUDA architectures. For example, if you only run your CuPy build with NVIDIA P100 and V100, you can use:

$ export CUPY_NVCC_GENERATE_CODE=arch=compute_60,code=sm_60;arch=compute_70,code=sm_70

See Environment variables for the description.

Upgrade Guide#

This page covers changes introduced in each major version that users should know when migrating from older releases. Please see also the Compatibility Matrix for supported environments of each major version.

CuPy v12#

Change in cupy.cuda.Device Behavior#

The CUDA current device (set via cupy.cuda.Device.use() or cudaSetDevice()) will be reactivated when exiting a device context manager. This reverts the change introduced in CuPy v10, making the behavior identical to the one in CuPy v9 or earlier.

This decision was made for better interoperability with other libraries that might mutate the current CUDA device. Suppose the following code:

def do_preprocess_cupy():
    with cupy.cuda.Device(2):
        # ...
        pass

torch.cuda.set_device(1)
do_preprocess_cupy()
print(torch.cuda.get_device())  # -> ???

In CuPy v10 and v11, the code prints 0, which can be surprising for users. In CuPy v12, the code now prints 1, making it easy for both users and library developers to maintain the current device where multiple devices are involved.

Deprecation of cupy.ndarray.scatter_{add,max,min}#

These APIs have been marked as deprecated as cupy.{add,maximum,minimum}.at ufunc methods have been implemented, which behave as equivalent and NumPy-compatible.

Requirement Changes#

The following versions are no longer supported in CuPy v12.

  • Python 3.7 or earlier

  • NumPy 1.20 or earlier

  • SciPy 1.6 or earlier

Baseline API Update#

Baseline API has been bumped from NumPy 1.23 and SciPy 1.8 to NumPy 1.24 and SciPy 1.9. CuPy v12 will follow the upstream products’ specifications of these baseline versions.

Update of Docker Images#

CuPy official Docker images (see Installation for details) are now updated to use CUDA 11.8.

CuPy v11#

Unified Binary Package for CUDA 11.2+#

CuPy v11 provides a unified binary package named cupy-cuda11x that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112 ~ cupy-cuda117).

Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102, cupy-cuda110, and cupy-cuda111 will be provided for CUDA 10.2, 11.0, and 11.1, respectively.

Requirement Changes#

The following versions are no longer supported in CuPy v11.

  • ROCm 4.2 or earlier

  • NumPy 1.19 or earlier

  • SciPy 1.5 or earlier

CUB Enabled by Default#

CuPy v11 accelerates the computation with CUB by default. In case needed, you can turn it off by setting CUPY_ACCELERATORS environment variable to "".

Baseline API Update#

Baseline API has been bumped from NumPy 1.21 and SciPy 1.7 to NumPy 1.23 and SciPy 1.8. CuPy v11 will follow the upstream products’ specifications of these baseline versions.

Update of Docker Images#

CuPy official Docker images (see Installation for details) are now updated to use CUDA 11.7 and ROCm 5.0.

CuPy v10#

Dropping CUDA 9.2 / 10.0 / 10.1 Support#

CUDA 10.1 or earlier is no longer supported. Use CUDA 10.2 or later.

Dropping NCCL v2.4 / v2.6 / v2.7 Support#

NCCL v2.4, v2.6, and v2.7 are no longer supported.

Dropping Python 3.6 Support#

Python 3.6 is no longer supported.

Dropping NumPy 1.17 Support#

NumPy 1.17 is no longer supported.

Change in cupy.cuda.Device Behavior#

Current device set via use() will not be honored by the with Device block#

Note

This change has been reverted in CuPy v12. See CuPy v12 section above for details.

The current device set via cupy.cuda.Device.use() will not be reactivated when exiting a device context manager. An existing code mixing with device: block and device.use() may get different results between CuPy v10 and v9.

cupy.cuda.Device(1).use()
with cupy.cuda.Device(0):
    pass
cupy.cuda.Device()  # -> CuPy v10 returns device 0 instead of device 1

This decision was made to serve CuPy users better, but it could lead to surprises to downstream developers depending on CuPy, as essentially CuPy’s Device context manager no longer respects the CUDA cudaSetDevice() API. Mixing device management functionalities (especially using context manager) from different libraries is highly discouraged.

For downstream libraries that still wish to respect the cudaGetDevice()/cudaSetDevice() APIs, you should avoid managing current devices using the with Device context manager, and instead calling these APIs explicitly, see for example cupy/cupy#5963.

Changes in cupy.cuda.Stream Behavior#

Stream is now managed per-device#

Previoulys, it was users’ responsibility to keep the current stream consistent with the current CUDA device. For example, the following code raises an error in CuPy v9 or earlier:

import cupy

with cupy.cuda.Device(0):
    # Create a stream on device 0.
    s0 = cupy.cuda.Stream()

with cupy.cuda.Device(1):
    with s0:
        # Try to use the stream on device 1
        cupy.arange(10)  # -> CUDA_ERROR_INVALID_HANDLE: invalid resource handle

CuPy v10 manages the current stream per-device, thus eliminating the need of switching the stream every time the active device is changed. When using CuPy v10, the above example behaves differently because whenever a stream is created, it is automatically associated with the current device and will be ignored when switching devices. In early versions, trying to use s0 in device 1 raises an error because s0 is associated with device 0. However, in v10, this s0 is ignored and the default stream for device 1 will be used instead.

Current stream set via use() will not be restored when exiting with block#

Samely as the change of cupy.cuda.Device above, the current stream set via cupy.cuda.Stream.use() will not be reactivated when exiting a stream context manager. An existing code mixing with stream: block and stream.use() may get different results between CuPy v10 and v9.

s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
    s2.use()
    with s3:
        pass
    cupy.cuda.get_current_stream()  # -> CuPy v10 returns `s1` instead of `s2`.
Streams can now be shared between threads#

The same cupy.cuda.Stream instance can now safely be shared between multiple threads.

To achieve this, CuPy v10 will not destroy the stream (cudaStreamDestroy) if the stream is the current stream of any thread.

Big-Endian Arrays Automatically Converted to Little-Endian#

cupy.array(), cupy.asarray() and its variants now always transfer the data to GPU in little-endian byte order.

Previously CuPy was copying the given numpy.ndarray to GPU as-is, regardless of the endianness. In CuPy v10, big-endian arrays are converted to little-endian before the transfer, which is the native byte order on GPUs. This change eliminates the need to manually change the array endianness before creating the CuPy array.

Baseline API Update#

Baseline API has been bumped from NumPy 1.20 and SciPy 1.6 to NumPy 1.21 and SciPy 1.7. CuPy v10 will follow the upstream products’ specifications of these baseline versions.

API Changes#

Note that deprecated APIs may be removed in the future CuPy releases.

Update of Docker Images#

CuPy official Docker images (see Installation for details) are now updated to use CUDA 11.4 and ROCm 4.3.

CuPy v9#

Dropping Support of CUDA 9.0#

CUDA 9.0 is no longer supported. Use CUDA 9.2 or later.

Dropping Support of cuDNN v7.5 and NCCL v2.3#

cuDNN v7.5 (or earlier) and NCCL v2.3 (or earlier) are no longer supported.

Dropping Support of NumPy 1.16 and SciPy 1.3#

NumPy 1.16 and SciPy 1.3 are no longer supported.

Dropping Support of Python 3.5#

Python 3.5 is no longer supported in CuPy v9.

NCCL and cuDNN No Longer Included in Wheels#

NCCL and cuDNN shared libraires are no longer included in wheels (see #4850 for discussions). You can manually install them after installing wheel if you don’t have a previous installation; see Installation for details.

cuTENSOR Enabled in Wheels#

cuTENSOR can now be used when installing CuPy via wheels.

cupy.cuda.{nccl,cudnn} Modules Needs Explicit Import#

Previously cupy.cuda.nccl and cupy.cuda.cudnn modules were automatically imported. Since CuPy v9, these modules need to be explicitly imported (i.e., import cupy.cuda.nccl / import cupy.cuda.cudnn.)

Baseline API Update#

Baseline API has been bumped from NumPy 1.19 and SciPy 1.5 to NumPy 1.20 and SciPy 1.6. CuPy v9 will follow the upstream products’ specifications of these baseline versions.

Following NumPy 1.20, aliases for the Python scalar types (cupy.bool, cupy.int, cupy.float, and cupy.complex) are now deprecated. cupy.bool_, cupy.int_, cupy.float_ and cupy.complex_ should be used instead when required.

Update of Docker Images#

CuPy official Docker images (see Installation for details) are now updated to use CUDA 11.2 and Python 3.8.

CuPy v8#

Dropping Support of CUDA 8.0 and 9.1#

CUDA 8.0 and 9.1 are no longer supported. Use CUDA 9.0, 9.2, 10.0, or later.

Dropping Support of NumPy 1.15 and SciPy 1.2#

NumPy 1.15 (or earlier) and SciPy 1.2 (or earlier) are no longer supported.

Update of Docker Images#

  • CuPy official Docker images (see Installation for details) are now updated to use CUDA 10.2 and Python 3.6.

  • SciPy and Optuna are now pre-installed.

CUB Support and Compiler Requirement#

CUB module is now built by default. You can enable the use of CUB by setting CUPY_ACCELERATORS="cub" (see CUPY_ACCELERATORS for details).

Due to this change, g++-6 or later is required when building CuPy from the source. See Installation for details.

The following environment variables are no longer effective:

  • CUB_DISABLED: Use CUPY_ACCELERATORS as aforementioned.

  • CUB_PATH: No longer required as CuPy uses either the CUB source bundled with CUDA (only when using CUDA 11.0 or later) or the one in the CuPy distribution.

API Changes#

  • cupy.scatter_add, which was deprecated in CuPy v4, has been removed. Use cupyx.scatter_add() instead.

  • cupy.sparse module has been deprecated and will be removed in future releases. Use cupyx.scipy.sparse instead.

  • dtype argument of cupy.ndarray.min() and cupy.ndarray.max() has been removed to align with the NumPy specification.

  • cupy.allclose() now returns the result as 0-dim GPU array instead of Python bool to avoid device synchronization.

  • cupy.RawModule now delays the compilation to the time of the first call to align the behavior with cupy.RawKernel.

  • cupy.cuda.*_enabled flags (nccl_enabled, nvtx_enabled, etc.) has been deprecated. Use cupy.cuda.*.available flag (cupy.cuda.nccl.available, cupy.cuda.nvtx.available, etc.) instead.

  • CHAINER_SEED environment variable is no longer effective. Use CUPY_SEED instead.

CuPy v7#

Dropping Support of Python 2.7 and 3.4#

Starting from CuPy v7, Python 2.7 and 3.4 are no longer supported as it reaches its end-of-life (EOL) in January 2020 (2.7) and March 2019 (3.4). Python 3.5.1 is the minimum Python version supported by CuPy v7. Please upgrade the Python version if you are using affected versions of Python to any later versions listed under Installation.

CuPy v6#

Binary Packages Ignore LD_LIBRARY_PATH#

Prior to CuPy v6, LD_LIBRARY_PATH environment variable can be used to override cuDNN / NCCL libraries bundled in the binary distribution (also known as wheels). In CuPy v6, LD_LIBRARY_PATH will be ignored during discovery of cuDNN / NCCL; CuPy binary distributions always use libraries that comes with the package to avoid errors caused by unexpected override.

CuPy v5#

cupyx.scipy Namespace#

cupyx.scipy namespace has been introduced to provide CUDA-enabled SciPy functions. cupy.sparse module has been renamed to cupyx.scipy.sparse; cupy.sparse will be kept as an alias for backward compatibility.

Dropped Support for CUDA 7.0 / 7.5#

CuPy v5 no longer supports CUDA 7.0 / 7.5.

Update of Docker Images#

CuPy official Docker images (see Installation for details) are now updated to use CUDA 9.2 and cuDNN 7.

To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.

CuPy v4#

Note

The version number has been bumped from v2 to v4 to align with the versioning of Chainer. Therefore, CuPy v3 does not exist.

Default Memory Pool#

Prior to CuPy v4, memory pool was only enabled by default when CuPy is used with Chainer. In CuPy v4, memory pool is now enabled by default, even when you use CuPy without Chainer. The memory pool significantly improves the performance by mitigating the overhead of memory allocation and CPU/GPU synchronization.

Attention

When you monitor GPU memory usage (e.g., using nvidia-smi), you may notice that GPU memory not being freed even after the array instance become out of scope. This is expected behavior, as the default memory pool “caches” the allocated memory blocks.

To access the default memory pool instance, use get_default_memory_pool() and get_default_pinned_memory_pool(). You can access the statistics and free all unused memory blocks “cached” in the memory pool.

import cupy
a = cupy.ndarray(100, dtype=cupy.float32)
mempool = cupy.get_default_memory_pool()

# For performance, the size of actual allocation may become larger than the requested array size.
print(mempool.used_bytes())   # 512
print(mempool.total_bytes())  # 512

# Even if the array goes out of scope, its memory block is kept in the pool.
a = None
print(mempool.used_bytes())   # 0
print(mempool.total_bytes())  # 512

# You can clear the memory block by calling `free_all_blocks`.
mempool.free_all_blocks()
print(mempool.used_bytes())   # 0
print(mempool.total_bytes())  # 0

You can even disable the default memory pool by the code below. Be sure to do this before any other CuPy operations.

import cupy
cupy.cuda.set_allocator(None)
cupy.cuda.set_pinned_memory_allocator(None)

Compute Capability#

CuPy v4 now requires NVIDIA GPU with Compute Capability 3.0 or larger. See the List of CUDA GPUs to check if your GPU supports Compute Capability 3.0.

CUDA Stream#

As CUDA Stream is fully supported in CuPy v4, cupy.cuda.RandomState.set_stream, the function to change the stream used by the random number generator, has been removed. Please use cupy.cuda.Stream.use() instead.

See the discussion in #306 for more details.

cupyx Namespace#

cupyx namespace has been introduced to provide features specific to CuPy (i.e., features not provided in NumPy) while avoiding collision in future. See CuPy-specific functions for the list of such functions.

For this rule, cupy.scatter_add() has been moved to cupyx.scatter_add(). cupy.scatter_add() is still available as an alias, but it is encouraged to use cupyx.scatter_add() instead.

Update of Docker Images#

CuPy official Docker images (see Installation for details) are now updated to use CUDA 8.0 and cuDNN 6.0. This change was introduced because CUDA 7.5 does not support NVIDIA Pascal GPUs.

To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.

CuPy v2#

Changed Behavior of count_nonzero Function#

For performance reasons, cupy.count_nonzero() has been changed to return zero-dimensional ndarray instead of int when axis=None. See the discussion in #154 for more details.

Compatibility Matrix#

CuPy

CC 1

CUDA

ROCm

cuTENSOR

NCCL

cuDNN

Python

NumPy

SciPy

Baseline API Spec.

Docs

v13

latest

v12

3.0~

10.2~

4.3~

1.4~

2.8~

7.6~

3.8~

1.21~

1.7~

NumPy 1.24 & SciPy 1.9

stable

v11

3.0~9.0

10.2~12.0

4.3 & 5.0

1.4~1.6

2.8~2.16

7.6~8.7

3.7~3.11

1.20~1.24

1.6~1.9

NumPy 1.23 & SciPy 1.8

v11.6.0

v10

3.0~8.x

10.2~11.7

4.0 & 4.2 & 4.3 & 5.0

1.3~1.5

2.8~2.11

7.6~8.4

3.7~3.10

1.18~1.22

1.4~1.8

NumPy 1.21 & SciPy 1.7

v10.6.0

v9

3.0~8.x

9.2~11.5

3.5~4.3

1.2~1.3

2.4 & 2.6~2.11

7.6~8.2

3.6~3.9

1.17~1.21

1.4~1.7

NumPy 1.20 & SciPy 1.6

v9.6.0

v8

3.0~8.x

9.0 & 9.2~11.2

3.x 2

1.2

2.0~2.8

7.0~8.1

3.5~3.9

1.16~1.20

1.3~1.6

NumPy 1.19 & SciPy 1.5

v8.6.0

v7

3.0~8.x

8.0~11.0

2.x 2

1.0

1.3~2.7

5.0~8.0

3.5~3.8

1.9~1.19

(not specified)

(not specified)

v7.8.0

v6

3.0~7.x

8.0~10.1

n/a

n/a

1.3~2.4

5.0~7.5

2.7 & 3.4~3.8

1.9~1.17

(not specified)

(not specified)

v6.7.0

v5

3.0~7.x

8.0~10.1

n/a

n/a

1.3~2.4

5.0~7.5

2.7 & 3.4~3.7

1.9~1.16

(not specified)

(not specified)

v5.4.0

v4

3.0~7.x

7.0~9.2

n/a

n/a

1.3~2.2

4.0~7.1

2.7 & 3.4~3.6

1.9~1.14

(not specified)

(not specified)

v4.5.0

1

CUDA Compute Capability

2(1,2)

Highly experimental support with limited features.

License#

Copyright (c) 2015 Preferred Infrastructure, Inc.

Copyright (c) 2015 Preferred Networks, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

NumPy#

The CuPy is designed based on NumPy’s API. CuPy’s source code and documents contain the original NumPy ones.

Copyright (c) 2005-2016, NumPy Developers.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

SciPy#

The CuPy is designed based on SciPy’s API. CuPy’s source code and documents contain the original SciPy ones.

Copyright (c) 2001, 2002 Enthought, Inc.

All rights reserved.

Copyright (c) 2003-2016 SciPy Developers.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of Enthought nor the names of the SciPy Developers may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.