Installation#

Requirements#

  • NVIDIA CUDA GPU with the Compute Capability 3.0 or larger.

  • CUDA Toolkit: v11.2 / v11.3 / v11.4 / v11.5 / v11.6 / v11.7 / v11.8 / v12.0 / v12.1 / v12.2 / v12.3 / v12.4 / v12.5 / v12.6

    • If you have multiple versions of CUDA Toolkit installed, CuPy will automatically choose one of the CUDA installations. See Working with Custom CUDA Installation for details.

    • This requirement is optional if you install CuPy from conda-forge. However, you still need to have a compatible driver installed for your GPU. See Installing CuPy from Conda-Forge for details.

  • Python: v3.10 / v3.11 / v3.12

Note

Currently, CuPy is tested against Ubuntu 20.04 LTS / 22.04 LTS (x86_64), CentOS 7 / 8 (x86_64) and Windows Server 2016 (x86_64).

Python Dependencies#

NumPy/SciPy-compatible API in CuPy v14 is based on NumPy 2.0 and SciPy 1.13, and has been tested against the following versions:

Note

SciPy and Optuna are optional dependencies and will not be installed automatically.

Note

Before installing CuPy, we recommend you to upgrade setuptools and pip:

$ python -m pip install -U setuptools pip

Additional CUDA Libraries#

Part of the CUDA features in CuPy will be activated only when the corresponding libraries are installed.

  • cuTENSOR: v2.0

  • NCCL: v2.16 / v2.17

    • The library to perform collective multi-GPU / multi-node computations.

  • cuDNN: v8.8

    • The library to accelerate deep neural network computations.

  • cuSPARSELt: v0.6.0 / v0.6.1

    • The library to accelerate sparse matrix-matrix multiplication.

Installing CuPy#

Installing CuPy from PyPI#

Wheels (precompiled binary packages) are available for Linux and Windows. Package names are different depending on your CUDA Toolkit version.

CUDA

Command

v11.2 ~ 11.8 (x86_64 / aarch64)

pip install cupy-cuda11x

v12.x (x86_64 / aarch64)

pip install cupy-cuda12x

Note

To enable features provided by additional CUDA libraries (cuTENSOR / NCCL / cuDNN), you need to install them manually. If you installed CuPy via wheels, you can use the installer command below to setup these libraries in case you don’t have a previous installation:

$ python -m cupyx.tools.install_library --cuda 11.x --library cutensor

Note

Append --pre -U -f https://pip.cupy.dev/pre options to install pre-releases (e.g., pip install cupy-cuda11x --pre -U -f https://pip.cupy.dev/pre).

When using wheels, please be careful not to install multiple CuPy packages at the same time. Any of these packages and cupy package (source installation) conflict with each other. Please make sure that only one CuPy package (cupy or cupy-cudaXX where XX is a CUDA version) is installed:

$ pip freeze | grep cupy

Installing CuPy from Conda-Forge#

Conda is a cross-language, cross-platform package management solution widely used in scientific computing and other fields. The above pip install instruction is compatible with conda environments. Alternatively, for both Linux (x86_64, ppc64le, aarch64-sbsa) and Windows once the CUDA driver is correctly set up, you can also install CuPy from the conda-forge channel:

$ conda install -c conda-forge cupy

and conda will install a pre-built CuPy binary package for you, along with the CUDA runtime libraries (cudatoolkit for CUDA 11 and below, or cuda-XXXXX for CUDA 12 and above). It is not necessary to install CUDA Toolkit in advance.

If you aim at minimizing the installation footprint, you can install the cupy-core package:

$ conda install -c conda-forge cupy-core

which only depends on numpy. None of the CUDA libraries will be installed this way, and it is your responsibility to install the needed dependencies yourself, either from conda-forge or elsewhere. This is equivalent of the cupy-cudaXX wheel installation.

Conda has a built-in mechanism to determine and install the latest version of cudatoolkit or any other CUDA components supported by your driver. However, if for any reason you need to force-install a particular CUDA version (say 11.8), you can do:

$ conda install -c conda-forge cupy cuda-version=11.8

Note

cuDNN, cuTENSOR, and NCCL are available on conda-forge as optional dependencies. The following command can install them all at once:

$ conda install -c conda-forge cupy cudnn cutensor nccl

Each of them can also be installed separately as needed.

Note

If you encounter any problem with CuPy installed from conda-forge, please feel free to report to cupy-feedstock, and we will help investigate if it is just a packaging issue in conda-forge’s recipe or a real issue in CuPy.

Note

If you did not install CUDA Toolkit by yourself, for CUDA 11 and below the nvcc compiler might not be available, as the cudatoolkit package from conda-forge does not include the nvcc compiler toolchain. If you would like to use it from a local CUDA installation, you need to make sure the version of CUDA Toolkit matches that of cudatoolkit to avoid surprises. For CUDA 12 and above, nvcc can be installed on a per-conda environment basis via

$ conda install -c conda-forge cuda-nvcc

Installing CuPy from Source#

Use of wheel packages is recommended whenever possible. However, if wheels cannot meet your requirements (e.g., you are running non-Linux environment or want to use a version of CUDA / cuDNN / NCCL not supported by wheels), you can also build CuPy from source.

Note

CuPy source build requires g++-6 or later. For Ubuntu 18.04, run apt-get install g++. For Ubuntu 16.04, CentOS 6 or 7, follow the instructions here.

Note

When installing CuPy from source, features provided by additional CUDA libraries will be disabled if these libraries are not available at the build time. See Installing cuDNN and NCCL for the instructions.

Note

If you upgrade or downgrade the version of CUDA Toolkit, cuDNN, NCCL or cuTENSOR, you may need to reinstall CuPy. See Reinstalling CuPy for details.

You can install the latest stable release version of the CuPy source package via pip.

$ pip install cupy

If you want to install the latest development version of CuPy from a cloned Git repository:

$ git clone --recursive https://github.com/cupy/cupy.git
$ cd cupy
$ pip install .

Note

Cython 0.29.22 or later is required to build CuPy from source. It will be automatically installed during the build process if not available.

Uninstalling CuPy#

Use pip to uninstall CuPy:

$ pip uninstall cupy

Note

If you are using a wheel, cupy shall be replaced with cupy-cudaXX (where XX is a CUDA version number).

Note

If CuPy is installed via conda, please do conda uninstall cupy instead.

Upgrading CuPy#

Just use pip install with -U option:

$ pip install -U cupy

Note

If you are using a wheel, cupy shall be replaced with cupy-cudaXX (where XX is a CUDA version number).

Reinstalling CuPy#

To reinstall CuPy, please uninstall CuPy and then install it. When reinstalling CuPy, we recommend using --no-cache-dir option as pip caches the previously built binaries:

$ pip uninstall cupy
$ pip install cupy --no-cache-dir

Note

If you are using a wheel, cupy shall be replaced with cupy-cudaXX (where XX is a CUDA version number).

Using CuPy inside Docker#

We are providing the official Docker images. Use NVIDIA Container Toolkit to run CuPy image with GPU. You can login to the environment with bash, and run the Python interpreter:

$ docker run --gpus all -it cupy/cupy /bin/bash

Or run the interpreter directly:

$ docker run --gpus all -it cupy/cupy /usr/bin/python3

FAQ#

pip fails to install CuPy#

Please make sure that you are using the latest setuptools and pip:

$ pip install -U setuptools pip

Use -vvvv option with pip command. This will display all logs of installation:

$ pip install cupy -vvvv

If you are using sudo to install CuPy, note that sudo command does not propagate environment variables. If you need to pass environment variable (e.g., CUDA_PATH), you need to specify them inside sudo like this:

$ sudo CUDA_PATH=/opt/nvidia/cuda pip install cupy

If you are using certain versions of conda, it may fail to build CuPy with error g++: error: unrecognized command line option ‘-R’. This is due to a bug in conda (see conda/conda#6030 for details). If you encounter this problem, please upgrade your conda.

Installing cuDNN and NCCL#

We recommend installing cuDNN and NCCL using binary packages (i.e., using apt or yum) provided by NVIDIA.

If you want to install tar-gz version of cuDNN and NCCL, we recommend installing it under the CUDA_PATH directory. For example, if you are using Ubuntu, copy *.h files to include directory and *.so* files to lib64 directory:

$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64

The destination directories depend on your environment.

If you want to use cuDNN or NCCL installed in another directory, please use CFLAGS, LDFLAGS and LD_LIBRARY_PATH environment variables before installing CuPy:

$ export CFLAGS=-I/path/to/cudnn/include
$ export LDFLAGS=-L/path/to/cudnn/lib
$ export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH

Working with Custom CUDA Installation#

If you have installed CUDA on the non-default directory or multiple CUDA versions on the same host, you may need to manually specify the CUDA installation directory to be used by CuPy.

CuPy uses the first CUDA installation directory found by the following order.

  1. CUDA_PATH environment variable.

  2. The parent directory of nvcc command. CuPy looks for nvcc command from PATH environment variable.

  3. /usr/local/cuda

For example, you can build CuPy using non-default CUDA directory by CUDA_PATH environment variable:

$ CUDA_PATH=/opt/nvidia/cuda pip install cupy

Note

CUDA installation discovery is also performed at runtime using the rule above. Depending on your system configuration, you may also need to set LD_LIBRARY_PATH environment variable to $CUDA_PATH/lib64 at runtime.

CuPy always raises NVRTC_ERROR_COMPILATION (6)#

On CUDA 12.2 or later, CUDA Runtime header files are required to compile kernels in CuPy. If CuPy raises a NVRTC_ERROR_COMPILATION with the error message saying catastrophic error: cannot open source file "vector_types.h" for almost everything, it is possible that CuPy cannot find the header files on your system correctly.

This problem does not happen if you have installed CuPy from conda-forge (i.e., conda install -c conda-forge cupy), as the package cuda-cudart-dev_<platform> that contains the needed headers is correctly installed as a dependency. Please report to the CuPy repository if you encounter issues with Conda-installed CuPy.

If you have installed CuPy from PyPI (i.e., pip install cupy-cuda12x), you can install CUDA headers by running pip install "nvidia-cuda-runtime-cu12==12.X.*" where 12.X is the version of your CUDA installation. Once headers from the package is recognized, cupy.show_config() will display the path as CUDA Extra Include Dirs:

$ python -c 'import cupy; cupy.show_config()'
...
CUDA Extra Include Dirs      : []
...
NVRTC Version                : (12, 6)
...
$ pip install "nvidia-cuda-runtime-cu12==12.6.*"
...
$ python -c 'import cupy; cupy.show_config()'
...
CUDA Extra Include Dirs      : ['.../site-packages/nvidia/cuda_runtime/include']
...

Alternatively, you can install CUDA headers system-wide (/usr/local/cuda) using NVIDIA’s Apt (or DNF) repository. Install the cuda-cudart-dev-12-X package where 12-X is the version of your cuda-cudart package, e.g.:

$ apt list "cuda-cudart-*"
cuda-cudart-12-6/now 12.6.68-1 amd64 [installed,local]
$ sudo apt install "cuda-cudart-dev-12-6"

CuPy always raises cupy.cuda.compiler.CompileException#

If CuPy raises a CompileException for almost everything, it is possible that CuPy cannot detect CUDA installed on your system correctly. The following are error messages commonly observed in such cases.

  • nvrtc: error: failed to load builtins

  • catastrophic error: cannot open source file "cuda_fp16.h"

  • error: cannot overload functions distinguished by return type alone

  • error: identifier "__half_raw" is undefined

  • error: no instance of overloaded function "__half::__half" matches the specified type

Please try setting LD_LIBRARY_PATH and CUDA_PATH environment variable. For example, if you have CUDA installed at /usr/local/cuda-12.6:

$ export CUDA_PATH=/usr/local/cuda-12.6
$ export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH

Also see Working with Custom CUDA Installation.

Build fails on Ubuntu 16.04, CentOS 6 or 7#

In order to build CuPy from source on systems with legacy GCC (g++-5 or earlier), you need to manually set up g++-6 or later and configure NVCC environment variable.

On Ubuntu 16.04:

$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt update
$ sudo apt install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"

On CentOS 6 / 7:

$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable
$ export NVCC="nvcc --compiler-bindir gcc"

Using CuPy on AMD GPU (experimental)#

CuPy has an experimental support for AMD GPU (ROCm).

Requirements#

The following ROCm libraries are required:

$ sudo apt install hipblas hipsparse rocsparse rocrand hiprand rocthrust rocsolver rocfft hipfft hipcub rocprim rccl roctracer-dev

Environment Variables#

When building or running CuPy for ROCm, the following environment variables are effective.

  • ROCM_HOME: directory containing the ROCm software (e.g., /opt/rocm).

Docker#

You can try running CuPy for ROCm using Docker.

$ docker run -it --device=/dev/kfd --device=/dev/dri --group-add video cupy/cupy-rocm

Installing Binary Packages#

Wheels (precompiled binary packages) are available for Linux (x86_64). Package names are different depending on your ROCm version.

ROCm

Command

v4.3

$ pip install cupy-rocm-4-3

v5.0

$ pip install cupy-rocm-5-0

Building CuPy for ROCm From Source#

To build CuPy from source, set the CUPY_INSTALL_USE_HIP, ROCM_HOME, and HCC_AMDGPU_TARGET environment variables. (HCC_AMDGPU_TARGET is the ISA name supported by your GPU. Run rocminfo and use the value displayed in Name: line (e.g., gfx900). You can specify a comma-separated list of ISAs if you have multiple GPUs of different architectures.)

$ export CUPY_INSTALL_USE_HIP=1
$ export ROCM_HOME=/opt/rocm
$ export HCC_AMDGPU_TARGET=gfx906
$ pip install cupy

Note

If you don’t specify the HCC_AMDGPU_TARGET environment variable, CuPy will be built for the GPU architectures available on the build host. This behavior is specific to ROCm builds; when building CuPy for NVIDIA CUDA, the build result is not affected by the host configuration.

Limitations#

The following features are not available due to the limitation of ROCm or because that they are specific to CUDA:

  • CUDA Array Interface

  • cuTENSOR

  • Handling extremely large arrays whose size is around 32-bit boundary (HIP is known to fail with sizes 2**32-1024)

  • Atomic addition in FP16 (cupy.ndarray.scatter_add and cupyx.scatter_add)

  • Multi-GPU FFT and FFT callback

  • Some random number generation algorithms

  • Several options in RawKernel/RawModule APIs: Jitify, dynamic parallelism

  • Per-thread default stream

The following features are not yet supported:

  • Sparse matrices (cupyx.scipy.sparse)

  • cuDNN (hipDNN)

  • Hermitian/symmetric eigenvalue solver (cupy.linalg.eigh)

  • Polynomial roots (uses Hermitian/symmetric eigenvalue solver)

  • Splines in cupyx.scipy.interpolate (make_interp_spline, spline modes of RegularGridInterpolator/interpn), as they depend on sparse matrices.

The following features may not work in edge cases (e.g., some combinations of dtype):

Note

We are investigating the root causes of the issues. They are not necessarily CuPy’s issues, but ROCm may have some potential bugs.

  • cupy.ndarray.__getitem__ (#4653)

  • cupy.ix_ (#4654)

  • Some polynomial routines (#4758, #4759)

  • cupy.broadcast (#4662)

  • cupy.convolve (#4668)

  • cupy.correlate (#4781)

  • Some random sampling routines (cupy.random, #4770)

  • cupy.linalg.einsum

  • cupyx.scipy.ndimage and cupyx.scipy.signal (#4878, #4879, #4880)