Upgrade Guide

This is a list of changes introduced in each release that users should be aware of when migrating from older versions.

CuPy v10

Dropping CUDA 9.2 / 10.0 / 10.1 Support

CUDA 10.1 or earlier is no longer supported. Use CUDA 10.2 or later.

Dropping NCCL v2.4 Support

NCCL v2.4 is no longer supported.

Dropping Python 3.6 Support

Python 3.6 is no longer supported.

Changes in cupy.cuda.Stream Behavior

Stream is now managed per-device

Previoulys, it was users’ responsibility to keep the current stream consistent with the current CUDA device. For example, the following code raises an error in CuPy v9 or earlier:

import cupy

with cupy.cuda.Device(0):
    # Create a stream on device 0.
    s0 = cupy.cuda.Stream()

with cupy.cuda.Device(1):
    with s0:
        # Try to use the stream on device 1
        cupy.arange(10)  # -> CUDA_ERROR_INVALID_HANDLE: invalid resource handle

CuPy v10 manages the current stream per-device, thus eliminating the need of switching the stream every time the active device is changed. When using CuPy v10, the above example behaves differently because whenever a stream is created, it is automatically associated with the current device and will be ignored when switching devices. In early versions, trying to use s0 in device 1 raises an error because s0 is associated with device 0. However, in v10, this s0 is ignored and the default stream for device 1 will be used instead.

Current stream set via use() will not be restored when exiting with block

The current stream set via cupy.cuda.Stream.use() will not be reactivated when exiting a stream context manager. An existing code mixing with stream: block and stream.use() may get different results between CuPy v10 and v9.

s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
    s2.use()
    with s3:
        pass
    cupy.cuda.get_current_stream()  # -> CuPy v10 returns `s1` instead of `s2`.

Streams can now be shared between threads

The same cupy.cuda.Stream instance can now safely be shared between multiple threads.

To achieve this, CuPy v10 will not destroy the stream (cudaStreamDestroy) if the stream is the current stream of any thread.

API Changes

Device synchronize detection APIs (cupyx.allow_synchronize() and cupyx.DeviceSynchronized), introduced as an experimental feature in CuPy v8, have been marked as deprecated because it is impossible to detect synchronizations reliably.

Internal API cupy.cuda.compile_with_cache() has been marked as deprecated as there are better alternatives (see RawModule added since CuPy v7 and RawKernel since v5). While it has a longstanding history, this API has never meant to be public. We encourage downstream libraries and users to migrate to the aforementioned public APIs. See User-Defined Kernels for their tutorials.

Deprecated APIs may be removed in the future CuPy releases.

CuPy v9

Dropping Support of CUDA 9.0

CUDA 9.0 is no longer supported. Use CUDA 9.2 or later.

Dropping Support of cuDNN v7.5 and NCCL v2.3

cuDNN v7.5 (or earlier) and NCCL v2.3 (or earlier) are no longer supported.

Dropping Support of NumPy 1.16 and SciPy 1.3

NumPy 1.16 and SciPy 1.3 are no longer supported.

Dropping Support of Python 3.5

Python 3.5 is no longer supported in CuPy v9.

NCCL and cuDNN No Longer Included in Wheels

NCCL and cuDNN shared libraires are no longer included in wheels (see #4850 for discussions). You can manually install them after installing wheel if you don’t have a previous installation; see Installation for details.

cuTENSOR Enabled in Wheels

cuTENSOR can now be used when installing CuPy via wheels.

cupy.cuda.{nccl,cudnn} Modules Needs Explicit Import

Previously cupy.cuda.nccl and cupy.cuda.cudnn modules were automatically imported. Since CuPy v9, these modules need to be explicitly imported (i.e., import cupy.cuda.nccl / import cupy.cuda.cudnn.)

Baseline API Changes

Baseline API has been bumped from NumPy 1.19 and SciPy 1.5 to NumPy 1.20 and SciPy 1.6. CuPy v9 will follow the upstream products’ specifications of these baseline versions.

Following NumPy 1.20, aliases for the Python scalar types (cupy.bool, cupy.int, cupy.float, and cupy.complex) are now deprecated. cupy.bool_, cupy.int_, cupy.float_ and cupy.complex_ should be used instead when required.

Update of Docker Images

CuPy official Docker images (see Installation for details) are now updated to use CUDA 11.2 and Python 3.8.

CuPy v8

Dropping Support of CUDA 8.0 and 9.1

CUDA 8.0 and 9.1 are no longer supported. Use CUDA 9.0, 9.2, 10.0, or later.

Dropping Support of NumPy 1.15 and SciPy 1.2

NumPy 1.15 (or earlier) and SciPy 1.2 (or earlier) are no longer supported.

Update of Docker Images

  • CuPy official Docker images (see Installation for details) are now updated to use CUDA 10.2 and Python 3.6.

  • SciPy and Optuna are now pre-installed.

CUB Support and Compiler Requirement

CUB module is now built by default. You can enable the use of CUB by setting CUPY_ACCELERATORS="cub" (see Environment variables for details).

Due to this change, g++-6 or later is required when building CuPy from the source. See Installation for details.

The following environment variables are no longer effective:

  • CUB_DISABLED: Use CUPY_ACCELERATORS as aforementioned.

  • CUB_PATH: No longer required as CuPy uses either the CUB source bundled with CUDA (only when using CUDA 11.0 or later) or the one in the CuPy distribution.

API Changes

  • cupy.scatter_add, which was deprecated in CuPy v4, has been removed. Use cupyx.scatter_add() instead.

  • cupy.sparse module has been deprecated and will be removed in future releases. Use cupyx.scipy.sparse instead.

  • dtype argument of cupy.ndarray.min() and cupy.ndarray.max() has been removed to align with the NumPy specification.

  • cupy.allclose() now returns the result as 0-dim GPU array instead of Python bool to avoid device synchronization.

  • cupy.RawModule now delays the compilation to the time of the first call to align the behavior with cupy.RawKernel.

  • cupy.cuda.*_enabled flags (nccl_enabled, nvtx_enabled, etc.) has been deprecated. Use cupy.cuda.*.available flag (cupy.cuda.nccl.available, cupy.cuda.nvtx.available, etc.) instead.

  • CHAINER_SEED environment variable is no longer effective. Use CUPY_SEED instead.

CuPy v7

Dropping Support of Python 2.7 and 3.4

Starting from CuPy v7, Python 2.7 and 3.4 are no longer supported as it reaches its end-of-life (EOL) in January 2020 (2.7) and March 2019 (3.4). Python 3.5.1 is the minimum Python version supported by CuPy v7. Please upgrade the Python version if you are using affected versions of Python to any later versions listed under Installation.

CuPy v6

Binary Packages Ignore LD_LIBRARY_PATH

Prior to CuPy v6, LD_LIBRARY_PATH environment variable can be used to override cuDNN / NCCL libraries bundled in the binary distribution (also known as wheels). In CuPy v6, LD_LIBRARY_PATH will be ignored during discovery of cuDNN / NCCL; CuPy binary distributions always use libraries that comes with the package to avoid errors caused by unexpected override.

CuPy v5

cupyx.scipy Namespace

cupyx.scipy namespace has been introduced to provide CUDA-enabled SciPy functions. cupy.sparse module has been renamed to cupyx.scipy.sparse; cupy.sparse will be kept as an alias for backward compatibility.

Dropped Support for CUDA 7.0 / 7.5

CuPy v5 no longer supports CUDA 7.0 / 7.5.

Update of Docker Images

CuPy official Docker images (see Installation for details) are now updated to use CUDA 9.2 and cuDNN 7.

To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.

CuPy v4

Note

The version number has been bumped from v2 to v4 to align with the versioning of Chainer. Therefore, CuPy v3 does not exist.

Default Memory Pool

Prior to CuPy v4, memory pool was only enabled by default when CuPy is used with Chainer. In CuPy v4, memory pool is now enabled by default, even when you use CuPy without Chainer. The memory pool significantly improves the performance by mitigating the overhead of memory allocation and CPU/GPU synchronization.

Attention

When you monitor GPU memory usage (e.g., using nvidia-smi), you may notice that GPU memory not being freed even after the array instance become out of scope. This is expected behavior, as the default memory pool “caches” the allocated memory blocks.

To access the default memory pool instance, use get_default_memory_pool() and get_default_pinned_memory_pool(). You can access the statistics and free all unused memory blocks “cached” in the memory pool.

import cupy
a = cupy.ndarray(100, dtype=cupy.float32)
mempool = cupy.get_default_memory_pool()

# For performance, the size of actual allocation may become larger than the requested array size.
print(mempool.used_bytes())   # 512
print(mempool.total_bytes())  # 512

# Even if the array goes out of scope, its memory block is kept in the pool.
a = None
print(mempool.used_bytes())   # 0
print(mempool.total_bytes())  # 512

# You can clear the memory block by calling `free_all_blocks`.
mempool.free_all_blocks()
print(mempool.used_bytes())   # 0
print(mempool.total_bytes())  # 0

You can even disable the default memory pool by the code below. Be sure to do this before any other CuPy operations.

import cupy
cupy.cuda.set_allocator(None)
cupy.cuda.set_pinned_memory_allocator(None)

Compute Capability

CuPy v4 now requires NVIDIA GPU with Compute Capability 3.0 or larger. See the List of CUDA GPUs to check if your GPU supports Compute Capability 3.0.

CUDA Stream

As CUDA Stream is fully supported in CuPy v4, cupy.cuda.RandomState.set_stream, the function to change the stream used by the random number generator, has been removed. Please use cupy.cuda.Stream.use() instead.

See the discussion in #306 for more details.

cupyx Namespace

cupyx namespace has been introduced to provide features specific to CuPy (i.e., features not provided in NumPy) while avoiding collision in future. See CuPy-specific functions for the list of such functions.

For this rule, cupy.scatter_add() has been moved to cupyx.scatter_add(). cupy.scatter_add() is still available as an alias, but it is encouraged to use cupyx.scatter_add() instead.

Update of Docker Images

CuPy official Docker images (see Installation for details) are now updated to use CUDA 8.0 and cuDNN 6.0. This change was introduced because CUDA 7.5 does not support NVIDIA Pascal GPUs.

To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.

CuPy v2

Changed Behavior of count_nonzero Function

For performance reasons, cupy.count_nonzero() has been changed to return zero-dimensional ndarray instead of int when axis=None. See the discussion in #154 for more details.