CuPy – NumPy-like API accelerated with CUDA¶
This is the CuPy documentation.
Overview¶
CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA.
CuPy consists of cupy.ndarray
, the core multi-dimensional array class,
and many functions on it. It supports a subset of numpy.ndarray
interface.
The following is a brief overview of supported subset of NumPy interface:
- Basic indexing (indexing by ints, slices, newaxes, and Ellipsis)
- Most of Advanced indexing (except for some indexing patterns with boolean masks)
- Data types (dtypes):
bool_
,int8
,int16
,int32
,int64
,uint8
,uint16
,uint32
,uint64
,float16
,float32
,float64
,complex64
,complex128
- Most of the array creation routines (
empty
,ones_like
,diag
, etc.) - Most of the array manipulation routines (
reshape
,rollaxis
,concatenate
, etc.) - All operators with broadcasting
- All universal functions for elementwise operations (except those for complex numbers).
- Linear algebra functions, including product (
dot
,matmul
, etc.) and decomposition (cholesky
,svd
, etc.), accelerated by cuBLAS. - Reduction along axes (
sum
,max
,argmax
, etc.)
CuPy also includes the following features for performance:
- User-defined elementwise CUDA kernels
- User-defined reduction CUDA kernels
- Fusing CUDA kernels to optimize user-defined calculation
- Customizable memory allocator and memory pool
- cuDNN utilities
CuPy uses on-the-fly kernel synthesis: when a kernel call is required, it
compiles a kernel code optimized for the shapes and dtypes of given arguments,
sends it to the GPU device, and executes the kernel. The compiled code is
cached to $(HOME)/.cupy/kernel_cache
directory (this cache path can be
overwritten by setting the CUPY_CACHE_DIR
environment variable). It may
make things slower at the first kernel call, though this slow down will be
resolved at the second execution. CuPy also caches the kernel code sent to GPU
device within the process, which reduces the kernel transfer time on further
calls.
Tutorial¶
Basics of CuPy¶
In this section, you will learn about the following things:
- Basics of
cupy.ndarray
- The concept of current device
- host-device and device-device array transfer
Basics of cupy.ndarray¶
CuPy is a GPU array backend that implements a subset of NumPy interface. In the following code, cp is an abbreviation of cupy, as np is numpy as is customarily done:
>>> import numpy as np
>>> import cupy as cp
The cupy.ndarray
class is in its core, which is a compatible GPU alternative of numpy.ndarray
.
>>> x_gpu = cp.array([1, 2, 3])
x_gpu
in the above example is an instance of cupy.ndarray
.
You can see its creation of identical to NumPy
’s one, except that numpy
is replaced with cupy
.
The main difference of cupy.ndarray
from numpy.ndarray
is that the content is allocated on the device memory.
Its data is allocated on the current device, which will be explained later.
Most of the array manipulations are also done in the way similar to NumPy.
Take the Euclidean norm (a.k.a L2 norm) for example.
NumPy has numpy.linalg.norm()
to calculate it on CPU.
>>> x_cpu = np.array([1, 2, 3])
>>> l2_cpu = np.linalg.norm(x_cpu)
We can calculate it on GPU with CuPy in a similar way:
>>> x_gpu = cp.array([1, 2, 3])
>>> l2_gpu = cp.linalg.norm(x_gpu)
CuPy implements many functions on cupy.ndarray
objects.
See the reference for the supported subset of NumPy API.
Understanding NumPy might help utilizing most features of CuPy.
So, we recommend you to read the NumPy documentation.
Current Device¶
CuPy has a concept of the current device, which is the default device on which the allocation, manipulation, calculation etc. of arrays are taken place. Suppose the ID of current device is 0. The following code allocates array contents on GPU 0.
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
The current device can be changed by cupy.cuda.Device.use()
as follows:
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> cp.cuda.Device(1).use()
>>> x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
If you switch the current GPU temporarily, with statement comes in handy.
>>> with cp.cuda.Device(1):
... x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
Most operations of CuPy is done on the current device. Be careful that if processing of an array on a non-current device will cause an error:
>>> with cp.cuda.Device(0):
... x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> with cp.cuda.Device(1):
... x_on_gpu0 * 2 # raises error
Traceback (most recent call last):
...
ValueError: Array device must be same as the current device: array device = 0 while current = 1
cupy.ndarray.device
attribute indicates the device on which the array is allocated.
>>> with cp.cuda.Device(1):
... x = cp.array([1, 2, 3, 4, 5])
>>> x.device
<CUDA Device 1>
Note
If the environment has only one device, such explicit device switching is not needed.
Data Transfer¶
Move arrays to a device¶
cupy.asarray()
can be used to move a numpy.ndarray
, a list, or any object
that can be passed to numpy.array()
to the current device:
>>> x_cpu = np.array([1, 2, 3])
>>> x_gpu = cp.asarray(x_cpu) # move the data to the current device.
cupy.asarray()
can accept cupy.ndarray
, which means we can
transfer the array between devices with this function.
>>> with cp.cuda.Device(0):
... x_gpu_0 = cp.ndarray([1, 2, 3]) # create an array in GPU 0
>>> with cp.cuda.Device(1):
... x_gpu_1 = cp.asarray(x_gpu_0) # move the array to GPU 1
Note
cupy.asarray()
does not copy the input array if possible.
So, if you put an array of the current device, it returns the input object itself.
If we do copy the array in this situation, you can use cupy.array()
with copy=True.
Actually cupy.asarray()
is equivalent to cupy.array(arr, dtype, copy=False).
Move array from a device to the host¶
Moving a device array to the host can be done by cupy.asnumpy()
as follows:
>>> x_gpu = cp.array([1, 2, 3]) # create an array in the current device
>>> x_cpu = cp.asnumpy(x_gpu) # move the array to the host.
We can also use cupy.ndarray.get()
:
>>> x_cpu = x_gpu.get()
How to write CPU/GPU agnostic code¶
The compatibility of CuPy with NumPy enables us to write CPU/GPU generic code.
It can be made easy by the cupy.get_array_module()
function.
This function returns the numpy
or cupy
module based on arguments.
A CPU/GPU generic function is defined using it like follows:
>>> # Stable implementation of log(1 + exp(x))
>>> def softplus(x):
... xp = cp.get_array_module(x)
... return xp.maximum(0, x) + xp.log1p(xp.exp(-abs(x)))
Sometimes, an explicit conversion to a host or device array may be required.
cupy.asarray()
and cupy.asnumpy()
can be used in agnostic implementations
to get host or device arrays from either CuPy or NumPy arrays.
>>> y_cpu = np.array([4, 5, 6])
>>> x_cpu + y_cpu
array([5, 7, 9])
>>> x_gpu + y_cpu
Traceback (most recent call last):
...
TypeError: Unsupported type <class 'numpy.ndarray'>
>>> cp.asnumpy(x_gpu) + y_cpu
array([5, 7, 9])
>>> cp.asnumpy(x_gpu) + cp.asnumpy(y_cpu)
array([5, 7, 9])
>>> x_gpu + cp.asarray(y_cpu)
array([5, 7, 9])
>>> cp.asarray(x_gpu) + cp.asarray(y_cpu)
array([5, 7, 9])
User-Defined Kernels¶
CuPy provides easy ways to define three types of CUDA kernels: elementwise kernels, reduction kernels and raw kernels. In this documentation, we describe how to define and call each kernels.
Basics of elementwise kernels¶
An elementwise kernel can be defined by the ElementwiseKernel
class.
The instance of this class defines a CUDA kernel which can be invoked by the __call__
method of this instance.
A definition of an elementwise kernel consists of four parts: an input argument list, an output argument list, a loop body code, and the kernel name. For example, a kernel that computes a squared difference \(f(x, y) = (x - y)^2\) is defined as follows:
>>> squared_diff = cp.ElementwiseKernel(
... 'float32 x, float32 y',
... 'float32 z',
... 'z = (x - y) * (x - y)',
... 'squared_diff')
The argument lists consist of comma-separated argument definitions. Each argument definition consists of a type specifier and an argument name. Names of NumPy data types can be used as type specifiers.
Note
n
, i
, and names starting with an underscore _
are reserved for the internal use.
The above kernel can be called on either scalars or arrays with broadcasting:
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> y = cp.arange(5, dtype=np.float32)
>>> squared_diff(x, y)
array([[ 0., 0., 0., 0., 0.],
[25., 25., 25., 25., 25.]], dtype=float32)
>>> squared_diff(x, 5)
array([[25., 16., 9., 4., 1.],
[ 0., 1., 4., 9., 16.]], dtype=float32)
Output arguments can be explicitly specified (next to the input arguments):
>>> z = cp.empty((2, 5), dtype=np.float32)
>>> squared_diff(x, y, z)
array([[ 0., 0., 0., 0., 0.],
[25., 25., 25., 25., 25.]], dtype=float32)
Type-generic kernels¶
If a type specifier is one character, then it is treated as a type placeholder.
It can be used to define a type-generic kernels.
For example, the above squared_diff
kernel can be made type-generic as follows:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... 'z = (x - y) * (x - y)',
... 'squared_diff_generic')
Type placeholders of a same character in the kernel definition indicate the same type. The actual type of these placeholders is determined by the actual argument type. The ElementwiseKernel class first checks the output arguments and then the input arguments to determine the actual type. If no output arguments are given on the kernel invocation, then only the input arguments are used to determine the type.
The type placeholder can be used in the loop body code:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... '''
... T diff = x - y;
... z = diff * diff;
... ''',
... 'squared_diff_generic')
More than one type placeholder can be used in a kernel definition. For example, the above kernel can be further made generic over multiple arguments:
>>> squared_diff_super_generic = cp.ElementwiseKernel(
... 'X x, Y y',
... 'Z z',
... 'z = (x - y) * (x - y)',
... 'squared_diff_super_generic')
Note that this kernel requires the output argument explicitly specified, because the type Z
cannot be automatically determined from the input arguments.
Raw argument specifiers¶
The ElementwiseKernel class does the indexing with broadcasting automatically, which is useful to define most elementwise computations.
On the other hand, we sometimes want to write a kernel with manual indexing for some arguments.
We can tell the ElementwiseKernel class to use manual indexing by adding the raw
keyword preceding the type specifier.
We can use the special variable i
and method _ind.size()
for the manual indexing.
i
indicates the index within the loop.
_ind.size()
indicates total number of elements to apply the elementwise operation.
Note that it represents the size after broadcast operation.
For example, a kernel that adds two vectors with reversing one of them can be written as follows:
>>> add_reverse = cp.ElementwiseKernel(
... 'T x, raw T y', 'T z',
... 'z = x + y[_ind.size() - i - 1]',
... 'add_reverse')
(Note that this is an artificial example and you can write such operation just by z = x + y[::-1]
without defining a new kernel).
A raw argument can be used like an array.
The indexing operator y[_ind.size() - i - 1]
involves an indexing computation on y
, so y
can be arbitrarily shaped and strode.
Note that raw arguments are not involved in the broadcasting.
If you want to mark all arguments as raw
, you must specify the size
argument on invocation, which defines the value of _ind.size()
.
Reduction kernels¶
Reduction kernels can be defined by the ReductionKernel
class.
We can use it by defining four parts of the kernel code:
- Identity value: This value is used for the initial value of reduction.
- Mapping expression: It is used for the pre-processing of each element to be reduced.
- Reduction expression: It is an operator to reduce the multiple mapped values.
The special variables
a
andb
are used for its operands. - Post mapping expression: It is used to transform the resulting reduced values.
The special variable
a
is used as its input. Output should be written to the output parameter.
ReductionKernel class automatically inserts other code fragments that are required for an efficient and flexible reduction implementation.
For example, L2 norm along specified axes can be written as follows:
>>> l2norm_kernel = cp.ReductionKernel(
... 'T x', # input params
... 'T y', # output params
... 'x * x', # map
... 'a + b', # reduce
... 'y = sqrt(a)', # post-reduction map
... '0', # identity value
... 'l2norm' # kernel name
... )
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> l2norm_kernel(x, axis=1)
array([ 5.477226 , 15.9687195], dtype=float32)
Note
raw
specifier is restricted for usages that the axes to be reduced are put at the head of the shape.
It means, if you want to use raw
specifier for at least one argument, the axis
argument must be 0
or a contiguous increasing sequence of integers starting from 0
, like (0, 1)
, (0, 1, 2)
, etc.
Raw kernels¶
Raw kernels can be defined by the RawKernel
class.
By using raw kernels, you can define kernels from raw CUDA source.
RawKernel
object allows you to call the kernel with CUDA’s cuLaunchKernel
interface.
In other words, you have control over grid size, block size, shared memory size and stream.
>>> add_kernel = cp.RawKernel(r'''
... extern "C" __global__
... void my_add(const float* x1, const float* x2, float* y) {
... int tid = blockDim.x * blockIdx.x + threadIdx.x;
... y[tid] = x1[tid] + x2[tid];
... }
... ''', 'my_add')
>>> x1 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
>>> x2 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
>>> y = cp.zeros((5, 5), dtype=cp.float32)
>>> add_kernel((5,), (5,), (x1, x2, y)) # grid, block and arguments
>>> y
array([[ 0., 2., 4., 6., 8.],
[10., 12., 14., 16., 18.],
[20., 22., 24., 26., 28.],
[30., 32., 34., 36., 38.],
[40., 42., 44., 46., 48.]], dtype=float32)
Raw kernels operating on complex-valued arrays can be created as well:
>>> complex_kernel = cp.RawKernel(r'''
... #include <cupy/complex.cuh>
... extern "C" __global__
... void my_func(const complex<float>* x1, const complex<float>* x2,
... complex<float>* y, float a) {
... int tid = blockDim.x * blockIdx.x + threadIdx.x;
... y[tid] = x1[tid] + a * x2[tid];
... }
... ''', 'my_func')
>>> x1 = cupy.arange(25, dtype=cupy.complex64).reshape(5, 5)
>>> x2 = 1j*cupy.arange(25, dtype=cupy.complex64).reshape(5, 5)
>>> y = cupy.zeros((5, 5), dtype=cupy.complex64)
>>> complex_kernel((5,), (5,), (x1, x2, y, cupy.float32(2.0))) # grid, block and arguments
>>> y
array([[ 0. +0.j, 1. +2.j, 2. +4.j, 3. +6.j, 4. +8.j],
[ 5.+10.j, 6.+12.j, 7.+14.j, 8.+16.j, 9.+18.j],
[10.+20.j, 11.+22.j, 12.+24.j, 13.+26.j, 14.+28.j],
[15.+30.j, 16.+32.j, 17.+34.j, 18.+36.j, 19.+38.j],
[20.+40.j, 21.+42.j, 22.+44.j, 23.+46.j, 24.+48.j]],
dtype=complex64)
Note that while we encourage the usage of complex<T>
types for complex numbers (available by including <cupy/complex.cuh>
as shown above), for CUDA codes already written using functions from cuComplex.h
there is no need to make the conversion yourself: just set the option translate_cucomplex=True
when creating a RawKernel
instance.
The CUDA kernel attributes can be retrieved by either accessing the attributes
dictionary,
or by accessing the RawKernel
object’s attributes directly; the latter can also be used to set certain
attributes:
>>> add_kernel = cp.RawKernel(r'''
... extern "C" __global__
... void my_add(const float* x1, const float* x2, float* y) {
... int tid = blockDim.x * blockIdx.x + threadIdx.x;
... y[tid] = x1[tid] + x2[tid];
... }
... ''', 'my_add')
>>> add_kernel.attributes
{'max_threads_per_block': 1024, 'shared_size_bytes': 0, 'const_size_bytes': 0, 'local_size_bytes': 0, 'num_regs': 10, 'ptx_version': 70, 'binary_version': 70, 'cache_mode_ca': 0, 'max_dynamic_shared_size_bytes': 49152, 'preferred_shared_memory_carveout': -1}
>>> add_kernel.max_dynamic_shared_size_bytes
49152
>>> add_kernel.max_dynamic_shared_size_bytes = 50000 # set a new value for the attribute
>>> add_kernel.max_dynamic_shared_size_bytes
50000
Dynamical parallelism is supported by RawKernel
. You just need to provide the linking flag (such as -dc
) to RawKernel
’s options
arugment. The static CUDA device runtime library (cudadevrt
) is automatically discovered by CuPy. For further detail, see CUDA Toolkit’s documentation.
Accessing texture memory in RawKernel
is supported via CUDA Runtime’s Texture Object API, see TextureObject
’s documentation as well as CUDA C Programming Guide. For using the Texture Reference API, which is marked as deprecated as of CUDA Toolkit 10.1, see the introduction to RawModule
below.
Note
The kernel does not have return values. You need to pass both input arrays and output arrays as arguments.
Note
No validation will be performed by CuPy for arguments passed to the kernel, including types and number of arguments.
Especially note that when passing ndarray
, its dtype
should match with the type of the argument declared in the method signature of the CUDA source code (unless you are casting arrays intentionally).
For example, cupy.float32
and cupy.uint64
arrays must be passed to the argument typed as float*
and unsigned long long*
.
For Python primitive types, int
, float
and bool
map to long long
, double
and bool
, respectively.
Note
When using printf()
in your CUDA kernel, you may need to synchronize the stream to see the output.
You can use cupy.cuda.Stream.null.synchronize()
if you are using the default stream.
Raw modules¶
For dealing a large raw CUDA source or loading an existing CUDA binary, the RawModule
class can be more handy. It can be initialized either by a CUDA source code, or by a path to the CUDA binary. The needed kernels can then be retrieved by calling the get_function()
method, which returns a RawKernel
instance that can be invoked as discussed above.
>>> loaded_from_source = r'''
... extern "C"{
...
... __global__ void test_sum(const float* x1, const float* x2, float* y, \
... unsigned int N)
... {
... unsigned int tid = blockDim.x * blockIdx.x + threadIdx.x;
... if (tid < N)
... {
... y[tid] = x1[tid] + x2[tid];
... }
... }
...
... __global__ void test_multiply(const float* x1, const float* x2, float* y, \
... unsigned int N)
... {
... unsigned int tid = blockDim.x * blockIdx.x + threadIdx.x;
... if (tid < N)
... {
... y[tid] = x1[tid] * x2[tid];
... }
... }
...
... }'''
>>> module = cp.RawModule(code=loaded_from_source)
>>> ker_sum = module.get_function('test_sum')
>>> ker_times = module.get_function('test_multiply')
>>> N = 10
>>> x1 = cp.arange(N**2, dtype=cp.float32).reshape(N, N)
>>> x2 = cp.ones((N, N), dtype=cp.float32)
>>> y = cp.zeros((N, N), dtype=cp.float32)
>>> ker_sum((N,), (N,), (x1, x2, y, N**2)) # y = x1 + x2
>>> assert cp.allclose(y, x1 + x2)
>>> ker_times((N,), (N,), (x1, x2, y, N**2)) # y = x1 * x2
>>> assert cp.allclose(y, x1 * x2)
The instruction above for using complex numbers in RawKernel
also applies to RawModule
.
CuPy also supports the Texture Reference API. A handle to the texture reference in a module can be retrieved by name via get_texref()
. Then, you need to pass it to TextureReference
, along with a resource descriptor and texture descriptor, for binding the reference to the array. (The interface of TextureReference
is meant to mimic that of TextureObject
to help users make transition to the latter, since as of CUDA Toolkit 10.1 the former is marked as deprecated.)
Kernel fusion¶
cupy.fuse()
is a decorator that fuses functions. This decorator can be used to define an elementwise or reduction kernel more easily than ElementwiseKernel
or ReductionKernel
.
By using this decorator, we can define the squared_diff
kernel as follows:
>>> @cp.fuse()
... def squared_diff(x, y):
... return (x - y) * (x - y)
The above kernel can be called on either scalars, NumPy arrays or CuPy arrays likes the original function.
>>> x_cp = cp.arange(10)
>>> y_cp = cp.arange(10)[::-1]
>>> squared_diff(x_cp, y_cp)
array([81, 49, 25, 9, 1, 1, 9, 25, 49, 81])
>>> x_np = np.arange(10)
>>> y_np = np.arange(10)[::-1]
>>> squared_diff(x_np, y_np)
array([81, 49, 25, 9, 1, 1, 9, 25, 49, 81])
At the first function call, the fused function analyzes the original function based on the abstracted information of arguments (e.g. their dtypes and ndims) and creates and caches an actual CUDA kernel. From the second function call with the same input types, the fused function calls the previously cached kernel, so it is highly recommended to reuse the same decorated functions instead of decorating local functions that are defined multiple times.
cupy.fuse()
also supports simple reduction kernel.
>>> @cp.fuse()
... def sum_of_products(x, y):
... return cp.sum(x * y, axis = -1)
You can specify the kernel name by using the kernel_name
keyword argument as follows:
>>> @cp.fuse(kernel_name='squared_diff')
... def squared_diff(x, y):
... return (x - y) * (x - y)
Note
Currently, cupy.fuse()
can fuse only simple elementwise and reduction operations. Most other routines (e.g. cupy.matmul()
, cupy.reshape()
) are not supported.
Reference Manual¶
This is the official reference of CuPy, a multi-dimensional array on CUDA with a subset of NumPy interface.
Multi-Dimensional Array (ndarray)¶
cupy.ndarray
is the CuPy counterpart of NumPy numpy.ndarray
.
It provides an intuitive interface for a fixed-size multidimensional array which resides
in a CUDA device.
For the basic concept of ndarray
s, please refer to the NumPy documentation.
cupy.ndarray |
Multi-dimensional array on a CUDA device. |
Code compatibility features¶
cupy.ndarray
is designed to be interchangeable with numpy.ndarray
in terms of code compatibility as much as possible.
But occasionally, you will need to know whether the arrays you’re handling are cupy.ndarray
or numpy.ndarray
.
One example is when invoking module-level functions such as cupy.sum()
or numpy.sum()
.
In such situations, cupy.get_array_module()
can be used.
cupy.get_array_module |
Returns the array module for arguments. |
cupyx.scipy.get_array_module |
Returns the array module for arguments. |
Conversion to/from NumPy arrays¶
cupy.ndarray
and numpy.ndarray
are not implicitly convertible to each other.
That means, NumPy functions cannot take cupy.ndarray
s as inputs, and vice versa.
- To convert
numpy.ndarray
tocupy.ndarray
, usecupy.array()
orcupy.asarray()
. - To convert
cupy.ndarray
tonumpy.ndarray
, usecupy.asnumpy()
orcupy.ndarray.get()
.
Note that converting between cupy.ndarray
and numpy.ndarray
incurs data transfer between
the host (CPU) device and the GPU device, which is costly in terms of performance.
cupy.array |
Creates an array on the current device. |
cupy.asarray |
Converts an object to array. |
cupy.asnumpy |
Returns an array on the host memory from an arbitrary source array. |
Universal Functions (ufunc)¶
CuPy provides universal functions (a.k.a. ufuncs) to support various elementwise operations. CuPy’s ufunc supports following features of NumPy’s one:
- Broadcasting
- Output type determination
- Casting rules
CuPy’s ufunc currently does not provide methods such as reduce
, accumulate
, reduceat
, outer
, and at
.
Ufunc class¶
cupy.ufunc |
Universal function. |
Available ufuncs¶
Math operations¶
cupy.add |
Adds two arrays elementwise. |
cupy.subtract |
Subtracts arguments elementwise. |
cupy.multiply |
Multiplies two arrays elementwise. |
cupy.divide |
Elementwise true division (i.e. |
cupy.logaddexp |
Computes log(exp(x1) + exp(x2)) elementwise. |
cupy.logaddexp2 |
Computes log2(exp2(x1) + exp2(x2)) elementwise. |
cupy.true_divide |
Elementwise true division (i.e. |
cupy.floor_divide |
Elementwise floor division (i.e. |
cupy.negative |
Takes numerical negative elementwise. |
cupy.power |
Computes x1 ** x2 elementwise. |
cupy.remainder |
Computes the remainder of Python division elementwise. |
cupy.mod |
Computes the remainder of Python division elementwise. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.absolute |
Elementwise absolute value function. |
cupy.rint |
Rounds each element of an array to the nearest integer. |
cupy.sign |
Elementwise sign function. |
cupy.exp |
Elementwise exponential function. |
cupy.exp2 |
Elementwise exponentiation with base 2. |
cupy.log |
Elementwise natural logarithm function. |
cupy.log2 |
Elementwise binary logarithm function. |
cupy.log10 |
Elementwise common logarithm function. |
cupy.expm1 |
Computes exp(x) - 1 elementwise. |
cupy.log1p |
Computes log(1 + x) elementwise. |
cupy.sqrt |
Elementwise square root function. |
cupy.square |
Elementwise square function. |
cupy.reciprocal |
Computes 1 / x elementwise. |
Trigonometric functions¶
cupy.sin |
Elementwise sine function. |
cupy.cos |
Elementwise cosine function. |
cupy.tan |
Elementwise tangent function. |
cupy.arcsin |
Elementwise inverse-sine function (a.k.a. |
cupy.arccos |
Elementwise inverse-cosine function (a.k.a. |
cupy.arctan |
Elementwise inverse-tangent function (a.k.a. |
cupy.arctan2 |
Elementwise inverse-tangent of the ratio of two arrays. |
cupy.hypot |
Computes the hypoteneous of orthogonal vectors of given length. |
cupy.sinh |
Elementwise hyperbolic sine function. |
cupy.cosh |
Elementwise hyperbolic cosine function. |
cupy.tanh |
Elementwise hyperbolic tangent function. |
cupy.arcsinh |
Elementwise inverse of hyperbolic sine function. |
cupy.arccosh |
Elementwise inverse of hyperbolic cosine function. |
cupy.arctanh |
Elementwise inverse of hyperbolic tangent function. |
cupy.deg2rad |
Converts angles from degrees to radians elementwise. |
cupy.rad2deg |
Converts angles from radians to degrees elementwise. |
Bit-twiddling functions¶
cupy.bitwise_and |
Computes the bitwise AND of two arrays elementwise. |
cupy.bitwise_or |
Computes the bitwise OR of two arrays elementwise. |
cupy.bitwise_xor |
Computes the bitwise XOR of two arrays elementwise. |
cupy.invert |
Computes the bitwise NOT of an array elementwise. |
cupy.left_shift |
Shifts the bits of each integer element to the left. |
cupy.right_shift |
Shifts the bits of each integer element to the right. |
Comparison functions¶
cupy.greater |
Tests elementwise if x1 > x2 . |
cupy.greater_equal |
Tests elementwise if x1 >= x2 . |
cupy.less |
Tests elementwise if x1 < x2 . |
cupy.less_equal |
Tests elementwise if x1 <= x2 . |
cupy.not_equal |
Tests elementwise if x1 != x2 . |
cupy.equal |
Tests elementwise if x1 == x2 . |
cupy.logical_and |
Computes the logical AND of two arrays. |
cupy.logical_or |
Computes the logical OR of two arrays. |
cupy.logical_xor |
Computes the logical XOR of two arrays. |
cupy.logical_not |
Computes the logical NOT of an array. |
cupy.maximum |
Takes the maximum of two arrays elementwise. |
cupy.minimum |
Takes the minimum of two arrays elementwise. |
cupy.fmax |
Takes the maximum of two arrays elementwise. |
cupy.fmin |
Takes the minimum of two arrays elementwise. |
Floating functions¶
cupy.isfinite |
Tests finiteness elementwise. |
cupy.isinf |
Tests if each element is the positive or negative infinity. |
cupy.isnan |
Tests if each element is a NaN. |
cupy.signbit |
Tests elementwise if the sign bit is set (i.e. |
cupy.copysign |
Returns the first argument with the sign bit of the second elementwise. |
cupy.nextafter |
Computes the nearest neighbor float values towards the second argument. |
cupy.modf |
Extracts the fractional and integral parts of an array elementwise. |
cupy.ldexp |
Computes x1 * 2 ** x2 elementwise. |
cupy.frexp |
Decomposes each element to mantissa and two’s exponent. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.floor |
Rounds each element of an array to its floor integer. |
cupy.ceil |
Rounds each element of an array to its ceiling integer. |
cupy.trunc |
Rounds each element of an array towards zero. |
ufunc.at¶
Currently, CuPy does not support at
for ufuncs in general.
However, cupyx.scatter_add()
can substitute add.at
as both behave identically.
Routines¶
The following pages describe NumPy-compatible routines. These functions cover a subset of NumPy routines.
Array Creation Routines¶
Basic creation routines¶
cupy.empty |
Returns an array without initializing the elements. |
cupy.empty_like |
Returns a new array with same shape and dtype of a given array. |
cupy.eye |
Returns a 2-D array with ones on the diagonals and zeros elsewhere. |
cupy.identity |
Returns a 2-D identity array. |
cupy.ones |
Returns a new array of given shape and dtype, filled with ones. |
cupy.ones_like |
Returns an array of ones with same shape and dtype as a given array. |
cupy.zeros |
Returns a new array of given shape and dtype, filled with zeros. |
cupy.zeros_like |
Returns an array of zeros with same shape and dtype as a given array. |
cupy.full |
Returns a new array of given shape and dtype, filled with a given value. |
cupy.full_like |
Returns a full array with same shape and dtype as a given array. |
Creation from other data¶
cupy.array |
Creates an array on the current device. |
cupy.asarray |
Converts an object to array. |
cupy.asanyarray |
Converts an object to array. |
cupy.ascontiguousarray |
Returns a C-contiguous array. |
cupy.copy |
Creates a copy of a given array on the current device. |
cupy.fromfile |
Reads an array from a file. |
Numerical ranges¶
cupy.arange |
Returns an array with evenly spaced values within a given interval. |
cupy.linspace |
Returns an array with evenly-spaced values within a given interval. |
cupy.logspace |
Returns an array with evenly-spaced values on a log-scale. |
cupy.meshgrid |
Return coordinate matrices from coordinate vectors. |
cupy.mgrid |
Construct a multi-dimensional “meshgrid”. |
cupy.ogrid |
Construct a multi-dimensional “meshgrid”. |
Array Manipulation Routines¶
Basic operations¶
cupy.copyto |
Copies values from one array to another with broadcasting. |
Changing array shape¶
cupy.reshape |
Returns an array with new shape and same elements. |
cupy.ravel |
Returns a flattened array. |
Transpose-like operations¶
cupy.moveaxis |
Moves axes of an array to new positions. |
cupy.rollaxis |
Moves the specified axis backwards to the given place. |
cupy.swapaxes |
Swaps the two axes. |
cupy.transpose |
Permutes the dimensions of an array. |
See also
Changing number of dimensions¶
cupy.atleast_1d |
Converts arrays to arrays with dimensions >= 1. |
cupy.atleast_2d |
Converts arrays to arrays with dimensions >= 2. |
cupy.atleast_3d |
Converts arrays to arrays with dimensions >= 3. |
cupy.broadcast |
Object that performs broadcasting. |
cupy.broadcast_to |
Broadcast an array to a given shape. |
cupy.broadcast_arrays |
Broadcasts given arrays. |
cupy.expand_dims |
Expands given arrays. |
cupy.squeeze |
Removes size-one axes from the shape of an array. |
Changing kind of array¶
cupy.asarray |
Converts an object to array. |
cupy.asanyarray |
Converts an object to array. |
cupy.asfortranarray |
Return an array laid out in Fortran order in memory. |
cupy.ascontiguousarray |
Returns a C-contiguous array. |
Joining arrays¶
cupy.concatenate |
Joins arrays along an axis. |
cupy.stack |
Stacks arrays along a new axis. |
cupy.column_stack |
Stacks 1-D and 2-D arrays as columns into a 2-D array. |
cupy.dstack |
Stacks arrays along the third axis. |
cupy.hstack |
Stacks arrays horizontally. |
cupy.vstack |
Stacks arrays vertically. |
Splitting arrays¶
cupy.split |
Splits an array into multiple sub arrays along a given axis. |
cupy.array_split |
Splits an array into multiple sub arrays along a given axis. |
cupy.dsplit |
Splits an array into multiple sub arrays along the third axis. |
cupy.hsplit |
Splits an array into multiple sub arrays horizontally. |
cupy.vsplit |
Splits an array into multiple sub arrays along the first axis. |
Tiling arrays¶
cupy.tile |
Construct an array by repeating A the number of times given by reps. |
cupy.repeat |
Repeat arrays along an axis. |
Adding and removing elements¶
cupy.unique |
Find the unique elements of an array. |
Rearranging elements¶
cupy.flip |
Reverse the order of elements in an array along the given axis. |
cupy.fliplr |
Flip array in the left/right direction. |
cupy.flipud |
Flip array in the up/down direction. |
cupy.reshape |
Returns an array with new shape and same elements. |
cupy.roll |
Roll array elements along a given axis. |
cupy.rot90 |
Rotate an array by 90 degrees in the plane specified by axes. |
Binary Operations¶
Elementwise bit operations¶
cupy.bitwise_and |
Computes the bitwise AND of two arrays elementwise. |
cupy.bitwise_or |
Computes the bitwise OR of two arrays elementwise. |
cupy.bitwise_xor |
Computes the bitwise XOR of two arrays elementwise. |
cupy.invert |
Computes the bitwise NOT of an array elementwise. |
cupy.left_shift |
Shifts the bits of each integer element to the left. |
cupy.right_shift |
Shifts the bits of each integer element to the right. |
Bit packing¶
cupy.packbits |
Packs the elements of a binary-valued array into bits in a uint8 array. |
cupy.unpackbits |
Unpacks elements of a uint8 array into a binary-valued output array. |
Output formatting¶
cupy.binary_repr |
Return the binary representation of the input number as a string. |
Data Type Routines¶
cupy.can_cast |
Returns True if cast between data types can occur according to the casting rule. |
cupy.result_type |
Returns the type that results from applying the NumPy type promotion rules to the arguments. |
cupy.common_type |
Return a scalar type which is common to the input arrays. |
cupy.promote_types (alias of numpy.promote_types() ) |
cupy.min_scalar_type (alias of numpy.min_scalar_type() ) |
cupy.obj2sctype (alias of numpy.obj2sctype() ) |
Creating data types¶
cupy.dtype (alias of numpy.dtype ) |
cupy.format_parser (alias of numpy.format_parser ) |
Data type information¶
cupy.finfo (alias of numpy.finfo ) |
cupy.iinfo (alias of numpy.iinfo ) |
cupy.MachAr (alias of numpy.MachAr ) |
Data type testing¶
cupy.issctype (alias of numpy.issctype() ) |
cupy.issubdtype (alias of numpy.issubdtype() ) |
cupy.issubsctype (alias of numpy.issubsctype() ) |
cupy.issubclass_ (alias of numpy.issubclass_() ) |
cupy.find_common_type (alias of numpy.find_common_type() ) |
Miscellaneous¶
cupy.typename (alias of numpy.typename() ) |
cupy.sctype2char (alias of numpy.sctype2char() ) |
cupy.mintypecode (alias of numpy.mintypecode() ) |
FFT Functions¶
Standard FFTs¶
cupy.fft.fft |
Compute the one-dimensional FFT. |
cupy.fft.ifft |
Compute the one-dimensional inverse FFT. |
cupy.fft.fft2 |
Compute the two-dimensional FFT. |
cupy.fft.ifft2 |
Compute the two-dimensional inverse FFT. |
cupy.fft.fftn |
Compute the N-dimensional FFT. |
cupy.fft.ifftn |
Compute the N-dimensional inverse FFT. |
Real FFTs¶
cupy.fft.rfft |
Compute the one-dimensional FFT for real input. |
cupy.fft.irfft |
Compute the one-dimensional inverse FFT for real input. |
cupy.fft.rfft2 |
Compute the two-dimensional FFT for real input. |
cupy.fft.irfft2 |
Compute the two-dimensional inverse FFT for real input. |
cupy.fft.rfftn |
Compute the N-dimensional FFT for real input. |
cupy.fft.irfftn |
Compute the N-dimensional inverse FFT for real input. |
Hermitian FFTs¶
cupy.fft.hfft |
Compute the FFT of a signal that has Hermitian symmetry. |
cupy.fft.ihfft |
Compute the FFT of a signal that has Hermitian symmetry. |
Helper routines¶
cupy.fft.fftfreq |
Return the FFT sample frequencies. |
cupy.fft.rfftfreq |
Return the FFT sample frequencies for real input. |
cupy.fft.fftshift |
Shift the zero-frequency component to the center of the spectrum. |
cupy.fft.ifftshift |
The inverse of fftshift() . |
Normalization¶
The default normalization has the direct transforms unscaled and the inverse transforms are scaled by \(1/n\).
If the ketyword argument norm
is "ortho"
, both transforms will be scaled by \(1/\sqrt{n}\).
Code compatibility features¶
FFT functions of NumPy alway return numpy.ndarray which type is numpy.complex128
or numpy.float64
.
CuPy functions do not follow the behavior, they will return numpy.complex64
or numpy.float32
if the type of the input is numpy.float16
, numpy.float32
, or numpy.complex64
.
Internally, cupy.fft
always generates a cuFFT plan (see the cuFFT documentation for detail) corresponding to the desired transform. When possible, an n-dimensional plan will be used, as opposed to applying separate 1D plans for each axis to be transformed. Using n-dimensional planning can provide better performance for multidimensional transforms, but requires more GPU memory than separable 1D planning. The user can disable n-dimensional planning by setting cupy.fft.config.enable_nd_planning = False
. This ability to adjust the planning type is a deviation from the NumPy API, which does not use precomputed FFT plans.
Moreover, the automatic plan generation can be suppressed by using an existing plan returned by cupyx.scipy.fftpack.get_fft_plan()
as a context manager. This is again a deviation from NumPy.
Indexing Routines¶
cupy.c_ |
|
cupy.r_ |
|
cupy.nonzero |
Return the indices of the elements that are non-zero. |
cupy.where |
Return elements, either from x or y, depending on condition. |
cupy.indices |
Returns an array representing the indices of a grid. |
cupy.ix_ |
Construct an open mesh from multiple sequences. |
cupy.unravel_index |
Converts array of flat indices into a tuple of coordinate arrays. |
cupy.take |
Takes elements of an array at specified indices along an axis. |
cupy.take_along_axis |
Take values from the input array by matching 1d index and data slices. |
cupy.choose |
|
cupy.diag |
Returns a diagonal or a diagonal array. |
cupy.diagonal |
Returns specified diagonals. |
cupy.lib.stride_tricks.as_strided |
Create a view into the array with the given shape and strides. |
cupy.place |
Change elements of an array based on conditional and input values. |
cupy.put |
Replaces specified elements of an array with given values. |
cupy.fill_diagonal |
Fills the main diagonal of the given array of any dimensionality. |
Input and Output¶
NumPy binary files (NPY, NPZ)¶
cupy.load |
Loads arrays or pickled objects from .npy , .npz or pickled file. |
cupy.save |
Saves an array to a binary file in .npy format. |
cupy.savez |
Saves one or more arrays into a file in uncompressed .npz format. |
cupy.savez_compressed |
Saves one or more arrays into a file in compressed .npz format. |
String formatting¶
cupy.array_repr |
Returns the string representation of an array. |
cupy.array_str |
Returns the string representation of the content of an array. |
Base-n representations¶
cupy.binary_repr |
Return the binary representation of the input number as a string. |
cupy.base_repr |
Return a string representation of a number in the given base system. |
Linear Algebra¶
Matrix and vector products¶
cupy.cross |
Returns the cross product of two vectors. |
cupy.dot |
Returns a dot product of two arrays. |
cupy.vdot |
Returns the dot product of two vectors. |
cupy.inner |
Returns the inner product of two arrays. |
cupy.outer |
Returns the outer product of two vectors. |
cupy.matmul |
Returns the matrix product of two arrays and is the implementation of the @ operator introduced in Python 3.5 following PEP465. |
cupy.tensordot |
Returns the tensor dot product of two arrays along specified axes. |
cupy.einsum |
Evaluates the Einstein summation convention on the operands. |
cupy.linalg.matrix_power |
Raise a square matrix to the (integer) power n. |
cupy.kron |
Returns the kronecker product of two arrays. |
Decompositions¶
cupy.linalg.cholesky |
Cholesky decomposition. |
cupy.linalg.qr |
QR decomposition. |
cupy.linalg.svd |
Singular Value Decomposition. |
Matrix eigenvalues¶
cupy.linalg.eigh |
Eigenvalues and eigenvectors of a symmetric matrix. |
cupy.linalg.eigvalsh |
Calculates eigenvalues of a symmetric matrix. |
Norms etc.¶
cupy.linalg.det |
Returns the determinant of an array. |
cupy.linalg.norm |
Returns one of matrix norms specified by ord parameter. |
cupy.linalg.matrix_rank |
Return matrix rank of array using SVD method |
cupy.linalg.slogdet |
Returns sign and logarithm of the determinant of an array. |
cupy.trace |
Returns the sum along the diagonals of an array. |
Solving linear equations¶
cupy.linalg.solve |
Solves a linear matrix equation. |
cupy.linalg.tensorsolve |
Solves tensor equations denoted by ax = b . |
cupy.linalg.lstsq |
Return the least-squares solution to a linear matrix equation. |
cupy.linalg.inv |
Computes the inverse of a matrix. |
cupy.linalg.pinv |
Compute the Moore-Penrose pseudoinverse of a matrix. |
cupy.linalg.tensorinv |
Computes the inverse of a tensor. |
cupyx.scipy.linalg.lu_factor |
LU decomposition. |
cupyx.scipy.linalg.lu_solve |
Solve an equation system, a * x = b , given the LU factorization of a |
cupyx.scipy.linalg.solve_triangular |
Solve the equation a x = b for x, assuming a is a triangular matrix. |
Logic Functions¶
Truth value testing¶
cupy.all |
Tests whether all array elements along a given axis evaluate to True. |
cupy.any |
Tests whether any array elements along a given axis evaluate to True. |
cupy.in1d |
Tests whether each element of a 1-D array is also present in a second array. |
cupy.isin |
Calculates element in test_elements , broadcasting over element only. |
Infinities and NaNs¶
cupy.isfinite |
Tests finiteness elementwise. |
cupy.isinf |
Tests if each element is the positive or negative infinity. |
cupy.isnan |
Tests if each element is a NaN. |
Array type testing¶
cupy.iscomplex |
Returns a bool array, where True if input element is complex. |
cupy.iscomplexobj |
Check for a complex type or an array of complex numbers. |
cupy.isfortran |
Returns True if the array is Fortran contiguous but not C contiguous. |
cupy.isreal |
Returns a bool array, where True if input element is real. |
cupy.isrealobj |
Return True if x is a not complex type or an array of complex numbers. |
cupy.isscalar |
Returns True if the type of num is a scalar type. |
Logic operations¶
cupy.logical_and |
Computes the logical AND of two arrays. |
cupy.logical_or |
Computes the logical OR of two arrays. |
cupy.logical_not |
Computes the logical NOT of an array. |
cupy.logical_xor |
Computes the logical XOR of two arrays. |
Comparison¶
cupy.allclose |
Returns True if two arrays are element-wise equal within a tolerance. |
cupy.isclose |
Returns a boolean array where two arrays are equal within a tolerance. |
cupy.greater |
Tests elementwise if x1 > x2 . |
cupy.greater_equal |
Tests elementwise if x1 >= x2 . |
cupy.less |
Tests elementwise if x1 < x2 . |
cupy.less_equal |
Tests elementwise if x1 <= x2 . |
cupy.equal |
Tests elementwise if x1 == x2 . |
cupy.not_equal |
Tests elementwise if x1 != x2 . |
Mathematical Functions¶
Trigonometric functions¶
cupy.sin |
Elementwise sine function. |
cupy.cos |
Elementwise cosine function. |
cupy.tan |
Elementwise tangent function. |
cupy.arcsin |
Elementwise inverse-sine function (a.k.a. |
cupy.arccos |
Elementwise inverse-cosine function (a.k.a. |
cupy.arctan |
Elementwise inverse-tangent function (a.k.a. |
cupy.hypot |
Computes the hypoteneous of orthogonal vectors of given length. |
cupy.arctan2 |
Elementwise inverse-tangent of the ratio of two arrays. |
cupy.degrees |
Converts angles from radians to degrees elementwise. |
cupy.radians |
Converts angles from degrees to radians elementwise. |
cupy.unwrap |
Unwrap by changing deltas between values to 2*pi complement. |
cupy.deg2rad |
Converts angles from degrees to radians elementwise. |
cupy.rad2deg |
Converts angles from radians to degrees elementwise. |
Hyperbolic functions¶
cupy.sinh |
Elementwise hyperbolic sine function. |
cupy.cosh |
Elementwise hyperbolic cosine function. |
cupy.tanh |
Elementwise hyperbolic tangent function. |
cupy.arcsinh |
Elementwise inverse of hyperbolic sine function. |
cupy.arccosh |
Elementwise inverse of hyperbolic cosine function. |
cupy.arctanh |
Elementwise inverse of hyperbolic tangent function. |
Rounding¶
cupy.around |
Rounds to the given number of decimals. |
cupy.round_ |
|
cupy.rint |
Rounds each element of an array to the nearest integer. |
cupy.fix |
If given value x is positive, it return floor(x). |
cupy.floor |
Rounds each element of an array to its floor integer. |
cupy.ceil |
Rounds each element of an array to its ceiling integer. |
cupy.trunc |
Rounds each element of an array towards zero. |
Sums, products, differences¶
cupy.prod |
Returns the product of an array along given axes. |
cupy.sum |
Returns the sum of an array along given axes. |
cupy.cumprod |
Returns the cumulative product of an array along a given axis. |
cupy.cumsum |
Returns the cumulative sum of an array along a given axis. |
cupy.nansum |
Returns the sum of an array along given axes treating Not a Numbers (NaNs) as zero. |
cupy.nanprod |
Returns the product of an array along given axes treating Not a Numbers (NaNs) as zero. |
cupy.diff |
Calculate the n-th discrete difference along the given axis. |
Exponents and logarithms¶
cupy.exp |
Elementwise exponential function. |
cupy.expm1 |
Computes exp(x) - 1 elementwise. |
cupy.exp2 |
Elementwise exponentiation with base 2. |
cupy.log |
Elementwise natural logarithm function. |
cupy.log10 |
Elementwise common logarithm function. |
cupy.log2 |
Elementwise binary logarithm function. |
cupy.log1p |
Computes log(1 + x) elementwise. |
cupy.logaddexp |
Computes log(exp(x1) + exp(x2)) elementwise. |
cupy.logaddexp2 |
Computes log2(exp2(x1) + exp2(x2)) elementwise. |
Other special functions¶
cupy.i0 |
Modified Bessel function of the first kind, order 0. |
cupy.sinc |
Elementwise sinc function. |
Floating point routines¶
cupy.signbit |
Tests elementwise if the sign bit is set (i.e. |
cupy.copysign |
Returns the first argument with the sign bit of the second elementwise. |
cupy.frexp |
Decomposes each element to mantissa and two’s exponent. |
cupy.ldexp |
Computes x1 * 2 ** x2 elementwise. |
cupy.nextafter |
Computes the nearest neighbor float values towards the second argument. |
Arithmetic operations¶
cupy.add |
Adds two arrays elementwise. |
cupy.reciprocal |
Computes 1 / x elementwise. |
cupy.negative |
Takes numerical negative elementwise. |
cupy.multiply |
Multiplies two arrays elementwise. |
cupy.divide |
Elementwise true division (i.e. |
cupy.power |
Computes x1 ** x2 elementwise. |
cupy.subtract |
Subtracts arguments elementwise. |
cupy.true_divide |
Elementwise true division (i.e. |
cupy.floor_divide |
Elementwise floor division (i.e. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.mod |
Computes the remainder of Python division elementwise. |
cupy.modf |
Extracts the fractional and integral parts of an array elementwise. |
cupy.remainder |
Computes the remainder of Python division elementwise. |
cupy.divmod |
Handling complex numbers¶
cupy.angle |
Returns the angle of the complex argument. |
cupy.real |
Returns the real part of the elements of the array. |
cupy.imag |
Returns the imaginary part of the elements of the array. |
cupy.conj |
Returns the complex conjugate, element-wise. |
Miscellaneous¶
cupy.clip |
Clips the values of an array to a given interval. |
cupy.sqrt |
Elementwise square root function. |
cupy.cbrt |
Elementwise cube root function. |
cupy.square |
Elementwise square function. |
cupy.absolute |
Elementwise absolute value function. |
cupy.sign |
Elementwise sign function. |
cupy.maximum |
Takes the maximum of two arrays elementwise. |
cupy.minimum |
Takes the minimum of two arrays elementwise. |
cupy.fmax |
Takes the maximum of two arrays elementwise. |
cupy.fmin |
Takes the minimum of two arrays elementwise. |
cupy.nan_to_num |
Elementwise nan_to_num function. |
cupy.blackman |
Returns the Blackman window. |
cupy.hamming |
Returns the Hamming window. |
cupy.hanning |
Returns the Hanning window. |
Random Sampling (cupy.random
)¶
Differences between cupy.random
and numpy.random
:
- CuPy provides Legacy Random Generation API (see also: NumPy 1.16 Reference).
The new random generator API (
numpy.random.Generator
class) introduced in NumPy 1.17 has not been implemented yet. - Most functions under
cupy.random
support thedtype
option, which do not exist in the corresponding NumPy APIs. This option enables generation of float32 values directly without any space overhead. - CuPy does not guarantee that the same number generator is used across major versions.
This means that numbers generated by
cupy.random
by new major version may not be the same as the previous one, even if the same seed and distribution are used.
Simple random data¶
cupy.random.rand |
Returns an array of uniform random values over the interval [0, 1) . |
cupy.random.randn |
Returns an array of standard normal random values. |
cupy.random.randint |
Returns a scalar or an array of integer values over [low, high) . |
cupy.random.random_integers |
Return a scalar or an array of integer values over [low, high] |
cupy.random.random_sample |
Returns an array of random values over the interval [0, 1) . |
cupy.random.random |
Returns an array of random values over the interval [0, 1) . |
cupy.random.ranf |
Returns an array of random values over the interval [0, 1) . |
cupy.random.sample |
Returns an array of random values over the interval [0, 1) . |
cupy.random.choice |
Returns an array of random values from a given 1-D array. |
cupy.random.bytes |
Returns random bytes. |
Permutations¶
cupy.random.shuffle |
Shuffles an array. |
cupy.random.permutation |
Returns a permuted range or a permutation of an array. |
Distributions¶
cupy.random.beta |
Beta distribution. |
cupy.random.binomial |
Binomial distribution. |
cupy.random.chisquare |
Chi-square distribution. |
cupy.random.dirichlet |
Dirichlet distribution. |
cupy.random.exponential |
Exponential distribution. |
cupy.random.f |
F distribution. |
cupy.random.gamma |
Gamma distribution. |
cupy.random.geometric |
Geometric distribution. |
cupy.random.gumbel |
Returns an array of samples drawn from a Gumbel distribution. |
cupy.random.hypergeometric |
hypergeometric distribution. |
cupy.random.laplace |
Laplace distribution. |
cupy.random.logistic |
Logistic distribution. |
cupy.random.lognormal |
Returns an array of samples drawn from a log normal distribution. |
cupy.random.logseries |
Log series distribution. |
cupy.random.multinomial |
Returns an array from multinomial distribution. |
cupy.random.multivariate_normal |
(experimental) Multivariate normal distribution. |
cupy.random.negative_binomial |
Negative binomial distribution. |
cupy.random.noncentral_chisquare |
Noncentral chisquare distribution. |
cupy.random.noncentral_f |
Noncentral F distribution. |
cupy.random.normal |
Returns an array of normally distributed samples. |
cupy.random.pareto |
Pareto II or Lomax distribution. |
cupy.random.poisson |
Poisson distribution. |
cupy.random.power |
Power distribution. |
cupy.random.rayleigh |
Rayleigh distribution. |
cupy.random.standard_cauchy |
Standard cauchy distribution. |
cupy.random.standard_exponential |
Standard exponential distribution. |
cupy.random.standard_gamma |
Standard gamma distribution. |
cupy.random.standard_normal |
Returns an array of samples drawn from the standard normal distribution. |
cupy.random.standard_t |
Standard Student’s t distribution. |
cupy.random.triangular |
Triangular distribution. |
cupy.random.uniform |
Returns an array of uniformly-distributed samples over an interval. |
cupy.random.vonmises |
von Mises distribution. |
cupy.random.wald |
Wald distribution. |
cupy.random.weibull |
weibull distribution. |
cupy.random.zipf |
Zipf distribution. |
Random generator¶
cupy.random.RandomState |
Portable container of a pseudo-random number generator. |
cupy.random.seed |
Resets the state of the random number generator with a seed. |
cupy.random.get_random_state |
Gets the state of the random number generator for the current device. |
cupy.random.set_random_state |
Sets the state of the random number generator for the current device. |
Note
CuPy does not provide cupy.random.get_state
nor cupy.random.set_state
at this time.
Use cupy.random.get_random_state()
and cupy.random.set_random_state()
instead.
Note that these functions use cupy.random.RandomState
instance to represent the internal state, which cannot be serialized.
Sorting, Searching, and Counting¶
Sorting¶
cupy.sort |
Returns a sorted copy of an array with a stable sorting algorithm. |
cupy.lexsort |
Perform an indirect sort using an array of keys. |
cupy.argsort |
Returns the indices that would sort an array with a stable sorting. |
cupy.msort |
Returns a copy of an array sorted along the first axis. |
cupy.partition |
Returns a partitioned copy of an array. |
cupy.argpartition |
Returns the indices that would partially sort an array. |
See also
Searching¶
cupy.argmax |
Returns the indices of the maximum along an axis. |
cupy.nanargmax |
Return the indices of the maximum values in the specified axis ignoring NaNs. |
cupy.argmin |
Returns the indices of the minimum along an axis. |
cupy.nanargmin |
Return the indices of the minimum values in the specified axis ignoring NaNs. |
cupy.nonzero |
Return the indices of the elements that are non-zero. |
cupy.flatnonzero |
Return indices that are non-zero in the flattened version of a. |
cupy.where |
Return elements, either from x or y, depending on condition. |
Counting¶
cupy.count_nonzero |
Counts the number of non-zero values in the array. |
Statistical Functions¶
Order statistics¶
cupy.amin |
Returns the minimum of an array or the minimum along an axis. |
cupy.amax |
Returns the maximum of an array or the maximum along an axis. |
cupy.nanmin |
Returns the minimum of an array along an axis ignoring NaN. |
cupy.nanmax |
Returns the maximum of an array along an axis ignoring NaN. |
cupy.percentile |
Computes the q-th percentile of the data along the specified axis. |
Means and variances¶
cupy.average |
Returns the weighted average along an axis. |
cupy.mean |
Returns the arithmetic mean along an axis. |
cupy.var |
Returns the variance along an axis. |
cupy.std |
Returns the standard deviation along an axis. |
cupy.nanmean |
Returns the arithmetic mean along an axis ignoring NaN values. |
cupy.nanvar |
Returns the variance along an axis ignoring NaN values. |
cupy.nanstd |
Returns the standard deviation along an axis ignoring NaN values. |
Histograms¶
cupy.histogram |
Computes the histogram of a set of data. |
cupy.bincount |
Count number of occurrences of each value in array of non-negative ints. |
Correlations¶
cupy.corrcoef |
Returns the Pearson product-moment correlation coefficients of an array. |
cupy.cov |
Returns the covariance matrix of an array. |
CuPy-specific Functions¶
CuPy-specific functions are placed under cupyx
namespace.
cupyx.rsqrt |
Returns the reciprocal square root. |
cupyx.scatter_add |
Adds given values to specified elements of an array. |
cupyx.scatter_max |
Stores a maximum value of elements specified by indices to an array. |
cupyx.scatter_min |
Stores a minimum value of elements specified by indices to an array. |
SciPy-compatible Routines¶
The following pages describe SciPy-compatible routines. These functions cover a subset of SciPy routines.
Discrete Fourier transforms (scipy.fft
)¶
Fast Fourier Transforms¶
cupyx.scipy.fft.fft |
Compute the one-dimensional FFT. |
cupyx.scipy.fft.ifft |
Compute the one-dimensional inverse FFT. |
cupyx.scipy.fft.fft2 |
Compute the two-dimensional FFT. |
cupyx.scipy.fft.ifft2 |
Compute the two-dimensional inverse FFT. |
cupyx.scipy.fft.fftn |
Compute the N-dimensional FFT. |
cupyx.scipy.fft.ifftn |
Compute the N-dimensional inverse FFT. |
cupyx.scipy.fft.rfft |
Compute the one-dimensional FFT for real input. |
cupyx.scipy.fft.irfft |
Compute the one-dimensional inverse FFT for real input. |
cupyx.scipy.fft.rfft2 |
Compute the two-dimensional FFT for real input. |
cupyx.scipy.fft.irfft2 |
Compute the two-dimensional inverse FFT for real input. |
cupyx.scipy.fft.rfftn |
Compute the N-dimensional FFT for real input. |
cupyx.scipy.fft.irfftn |
Compute the N-dimensional inverse FFT for real input. |
cupyx.scipy.fft.hfft |
Compute the FFT of a signal that has Hermitian symmetry. |
cupyx.scipy.fft.ihfft |
Compute the FFT of a signal that has Hermitian symmetry. |
Code compatibility features¶
- The boolean switch
cupy.fft.config.enable_nd_planning
also affects the FFT functions in this module, see FFT Functions. Moreover, as with other FFT modules in CuPy, FFT functions in this module can take advantage of an existing cuFFT plan (returned bycupyx.scipy.fftpack.get_fft_plan()
) when used as a context manager. - Like in
scipy.fft
, all FFT functions in this module have an optional argumentoverwrite_x
(default isFalse
), which has the same semantics as inscipy.fft
: when it is set toTrue
, the input arrayx
can (not will) be overwritten arbitrarily. This is not an in-place FFT, the user should always use the return value from the functions, e.g.x = cupyx.scipy.fft.fft(x, ..., overwrite_x=True, ...)
. - The
cupyx.scipy.fft
module can also be used as a backend forscipy.fft
e.g. by installing withscipy.fft.set_backend(cupyx.scipy.fft)
. This can allowscipy.fft
to work with bothnumpy
andcupy
arrays.
Note
scipy.fft
requires SciPy version 1.4.0 or newer.
Legacy Discrete Fourier transforms (scipy.fftpack
)¶
Note
As of SciPy version 1.4.0, scipy.fft
is recommended over
scipy.fftpack
. Consider using cupyx.scipy.fft
instead.
Fast Fourier Transforms¶
cupyx.scipy.fftpack.fft |
Compute the one-dimensional FFT. |
cupyx.scipy.fftpack.ifft |
Compute the one-dimensional inverse FFT. |
cupyx.scipy.fftpack.fft2 |
Compute the two-dimensional FFT. |
cupyx.scipy.fftpack.ifft2 |
Compute the two-dimensional inverse FFT. |
cupyx.scipy.fftpack.fftn |
Compute the N-dimensional FFT. |
cupyx.scipy.fftpack.ifftn |
Compute the N-dimensional inverse FFT. |
cupyx.scipy.fftpack.rfft |
Compute the one-dimensional FFT for real input. |
cupyx.scipy.fftpack.irfft |
Compute the one-dimensional inverse FFT for real input. |
cupyx.scipy.fftpack.get_fft_plan |
Generate a CUDA FFT plan for transforming up to three axes. |
Code compatibility features¶
- The
get_fft_plan
function has no counterpart inscipy.fftpack
. It returns a cuFFT plan that can be passed to the FFT functions in this module (using the argumentplan
) to accelarate the computation. The argumentplan
is currently experimental and the interface may be changed in the future version. - The boolean switch
cupy.fft.config.enable_nd_planning
also affects the FFT functions in this module, see FFT Functions. This switch is neglected when planning manually usingget_fft_plan
. - Like in
scipy.fftpack
, all FFT functions in this module have an optional argumentoverwrite_x
(default isFalse
), which has the same semantics as inscipy.fftpack
: when it is set toTrue
, the input arrayx
can (not will) be destroyed and replaced by the output. For this reason, when an in-place FFT is desired, the user should always reassign the input in the following manner:x = cupyx.scipy.fftpack.fft(x, ..., overwrite_x=True, ...)
.
Multi-dimensional image processing¶
CuPy provides multi-dimensional image processing functions.
It supports a subset of scipy.ndimage
interface.
Interpolation¶
cupyx.scipy.ndimage.affine_transform |
Apply an affine transformation. |
cupyx.scipy.ndimage.convolve |
Multi-dimensional convolution. |
cupyx.scipy.ndimage.correlate |
Multi-dimensional correlate. |
cupyx.scipy.ndimage.map_coordinates |
Map the input array to new coordinates by interpolation. |
cupyx.scipy.ndimage.rotate |
Rotate an array. |
cupyx.scipy.ndimage.shift |
Shift an array. |
cupyx.scipy.ndimage.zoom |
Zoom an array. |
OpenCV mode¶
cupyx.scipy.ndimage
supports additional mode, opencv
.
If it is given, the function performs like cv2.warpAffine or cv2.resize.
Sparse matrices¶
CuPy supports sparse matrices using cuSPARSE. These matrices have the same interfaces of SciPy’s sparse matrices.
Conversion to/from SciPy sparse matrices¶
cupyx.scipy.sparse.*_matrix
and scipy.sparse.*_matrix
are not implicitly convertible to each other.
That means, SciPy functions cannot take cupyx.scipy.sparse.*_matrix
objects as inputs, and vice versa.
- To convert SciPy sparse matrices to CuPy, pass it to the constructor of each CuPy sparse matrix class.
- To convert CuPy sparse matrices to SciPy, use
get
method of each CuPy sparse matrix class.
Note that converting between CuPy and SciPy incurs data transfer between the host (CPU) device and the GPU device, which is costly in terms of performance.
Conversion to/from CuPy ndarrays¶
- To convert CuPy ndarray to CuPy sparse matrices, pass it to the constructor of each CuPy sparse matrix class.
- To convert CuPy sparse matrices to CuPy ndarray, use
toarray
of each CuPy sparse matrix instance (e.g.,cupyx.scipy.sparse.csr_matrix.toarray()
).
Converting between CuPy ndarray and CuPy sparse matrices does not incur data transfer; it is copied inside the GPU device.
Sparse matrix classes¶
cupyx.scipy.sparse.coo_matrix |
COOrdinate format sparse matrix. |
cupyx.scipy.sparse.csc_matrix |
Compressed Sparse Column matrix. |
cupyx.scipy.sparse.csr_matrix |
Compressed Sparse Row matrix. |
cupyx.scipy.sparse.dia_matrix |
Sparse matrix with DIAgonal storage. |
cupyx.scipy.sparse.spmatrix |
Base class of all sparse matrixes. |
Functions¶
Building sparse matrices¶
cupyx.scipy.sparse.diags |
Construct a sparse matrix from diagonals. |
cupyx.scipy.sparse.eye |
Creates a sparse matrix with ones on diagonal. |
cupyx.scipy.sparse.identity |
Creates an identity matrix in sparse format. |
cupyx.scipy.sparse.spdiags |
Creates a sparse matrix from diagonals. |
cupyx.scipy.sparse.rand |
Generates a random sparse matrix. |
cupyx.scipy.sparse.random |
Generates a random sparse matrix. |
Identifying sparse matrices¶
cupyx.scipy.sparse.issparse |
Checks if a given matrix is a sparse matrix. |
cupyx.scipy.sparse.isspmatrix |
Checks if a given matrix is a sparse matrix. |
cupyx.scipy.sparse.isspmatrix_csc |
Checks if a given matrix is of CSC format. |
cupyx.scipy.sparse.isspmatrix_csr |
Checks if a given matrix is of CSR format. |
cupyx.scipy.sparse.isspmatrix_coo |
Checks if a given matrix is of COO format. |
cupyx.scipy.sparse.isspmatrix_dia |
Checks if a given matrix is of DIA format. |
Linear Algebra¶
cupyx.scipy.sparse.linalg.lsqr |
Solves linear system with QR decomposition. |
Special Functions¶
Bessel Functions¶
cupyx.scipy.special.j0 |
Bessel function of the first kind of order 0. |
cupyx.scipy.special.j1 |
Bessel function of the first kind of order 1. |
cupyx.scipy.special.y0 |
Bessel function of the second kind of order 0. |
cupyx.scipy.special.y1 |
Bessel function of the second kind of order 1. |
cupyx.scipy.special.i0 |
Modified Bessel function of order 0. |
cupyx.scipy.special.i1 |
Modified Bessel function of order 1. |
Raw Statistical Functions¶
cupyx.scipy.special.ndtr |
Cumulative distribution function of normal distribution. |
Error Function¶
cupyx.scipy.special.erf |
Error function. |
cupyx.scipy.special.erfc |
Complementary error function. |
cupyx.scipy.special.erfcx |
Scaled complementary error function. |
cupyx.scipy.special.erfinv |
Inverse function of error function. |
cupyx.scipy.special.erfcinv |
Inverse function of complementary error function. |
Other Special Functions¶
cupyx.scipy.special.zeta |
Hurwitz zeta function. |
NumPy-CuPy Generic Code Support¶
cupy.get_array_module |
Returns the array module for arguments. |
cupyx.scipy.get_array_module |
Returns the array module for arguments. |
Memory Management¶
CuPy uses memory pool for memory allocations by default. The memory pool significantly improves the performance by mitigating the overhead of memory allocation and CPU/GPU synchronization.
There are two different memory pools in CuPy:
- Device memory pool (GPU device memory), which is used for GPU memory allocations.
- Pinned memory pool (non-swappable CPU memory), which is used during CPU-to-GPU data transfer.
Attention
When you monitor the memory usage (e.g., using nvidia-smi
for GPU memory or ps
for CPU memory), you may notice that memory not being freed even after the array instance become out of scope.
This is an expected behavior, as the default memory pool “caches” the allocated memory blocks.
See Low-Level CUDA Support for the details of memory management APIs.
Memory Pool Operations¶
The memory pool instance provides statistics about memory allocation.
To access the default memory pool instance, use cupy.get_default_memory_pool()
and cupy.get_default_pinned_memory_pool()
.
You can also free all unused memory blocks hold in the memory pool.
See the example code below for details:
import cupy
import numpy
mempool = cupy.get_default_memory_pool()
pinned_mempool = cupy.get_default_pinned_memory_pool()
# Create an array on CPU.
# NumPy allocates 400 bytes in CPU (not managed by CuPy memory pool).
a_cpu = numpy.ndarray(100, dtype=numpy.float32)
print(a_cpu.nbytes) # 400
# You can access statistics of these memory pools.
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 0
print(pinned_mempool.n_free_blocks()) # 0
# Transfer the array from CPU to GPU.
# This allocates 400 bytes from the device memory pool, and another 400
# bytes from the pinned memory pool. The allocated pinned memory will be
# released just after the transfer is complete. Note that the actual
# allocation size may be rounded to larger value than the requested size
# for performance.
a = cupy.array(a_cpu)
print(a.nbytes) # 400
print(mempool.used_bytes()) # 512
print(mempool.total_bytes()) # 512
print(pinned_mempool.n_free_blocks()) # 1
# When the array goes out of scope, the allocated device memory is released
# and kept in the pool for future reuse.
a = None # (or `del a`)
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 512
print(pinned_mempool.n_free_blocks()) # 1
# You can clear the memory pool by calling `free_all_blocks`.
mempool.free_all_blocks()
pinned_mempool.free_all_blocks()
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 0
print(pinned_mempool.n_free_blocks()) # 0
See cupy.cuda.MemoryPool
and cupy.cuda.PinnedMemoryPool
for details.
Limiting GPU Memory Usage¶
You can hard-limit the amount of GPU memory that can be allocated by using CUPY_GPU_MEMORY_LIMIT
environment variable (see Environment variables for details).
# Set the hard-limit to 1 GiB:
# $ export CUPY_GPU_MEMORY_LIMIT="1073741824"
# You can also specify the limit in fraction of the total amount of memory
# on the GPU. If you have a GPU with 2 GiB memory, the following is
# equivalent to the above configuration.
# $ export CUPY_GPU_MEMORY_LIMIT="50%"
import cupy
print(cupy.get_default_memory_pool().get_limit()) # 1073741824
You can also set the limit (or override the value specified via the environment variable) using cupy.cuda.MemoryPool.set_limit()
.
In this way, you can use a different limit for each GPU device.
import cupy
mempool = cupy.get_default_memory_pool()
with cupy.cuda.Device(0):
mempool.set_limit(size=1024**3) # 1 GiB
with cupy.cuda.Device(1):
mempool.set_limit(size=2*1024**3) # 2 GiB
Note
CUDA allocates some GPU memory outside of the memory pool (such as CUDA context, library handles, etc.). Depending on the usage, such memory may take one to few hundred MiB. That will not be counted in the limit.
Changing Memory Pool¶
You can use your own memory allocator instead of the default memory pool by passing the memory allocation function to cupy.cuda.set_allocator()
/ cupy.cuda.set_pinned_memory_allocator()
.
The memory allocator function should take 1 argument (the requested size in bytes) and return cupy.cuda.MemoryPointer
/ cupy.cuda.PinnedMemoryPointer
.
You can even disable the default memory pool by the code below. Be sure to do this before any other CuPy operations.
import cupy
# Disable memory pool for device memory (GPU)
cupy.cuda.set_allocator(None)
# Disable memory pool for pinned memory (CPU).
cupy.cuda.set_pinned_memory_allocator(None)
Low-Level CUDA Support¶
Device management¶
cupy.cuda.Device |
Object that represents a CUDA device. |
Memory management¶
cupy.get_default_memory_pool |
Returns CuPy default memory pool for GPU memory. |
cupy.get_default_pinned_memory_pool |
Returns CuPy default memory pool for pinned memory. |
cupy.cuda.Memory |
Memory allocation on a CUDA device. |
cupy.cuda.UnownedMemory |
CUDA memory that is not owned by CuPy. |
cupy.cuda.PinnedMemory |
Pinned memory allocation on host. |
cupy.cuda.MemoryPointer |
Pointer to a point on a device memory. |
cupy.cuda.PinnedMemoryPointer |
Pointer of a pinned memory. |
cupy.cuda.alloc |
Calls the current allocator. |
cupy.cuda.alloc_pinned_memory |
Calls the current allocator. |
cupy.cuda.get_allocator |
Returns the current allocator for GPU memory. |
cupy.cuda.set_allocator |
Sets the current allocator for GPU memory. |
cupy.cuda.using_allocator |
Sets a thread-local allocator for GPU memory inside |
cupy.cuda.set_pinned_memory_allocator |
Sets the current allocator for the pinned memory. |
cupy.cuda.MemoryPool |
Memory pool for all GPU devices on the host. |
cupy.cuda.PinnedMemoryPool |
Memory pool for pinned memory on the host. |
Memory hook¶
cupy.cuda.MemoryHook |
Base class of hooks for Memory allocations. |
cupy.cuda.memory_hooks.DebugPrintHook |
Memory hook that prints debug information. |
cupy.cuda.memory_hooks.LineProfileHook |
Code line CuPy memory profiler. |
Streams and events¶
cupy.cuda.Stream |
CUDA stream. |
cupy.cuda.get_current_stream |
Gets current CUDA stream. |
cupy.cuda.Event |
CUDA event, a synchronization point of CUDA streams. |
cupy.cuda.get_elapsed_time |
Gets the elapsed time between two events. |
Texture memory¶
cupy.cuda.texture.ChannelFormatDescriptor |
A class that holds the channel format description. |
cupy.cuda.texture.CUDAarray |
Allocate a CUDA array (cudaArray_t) that can be used as texture memory. |
cupy.cuda.texture.ResourceDescriptor |
A class that holds the resource description. |
cupy.cuda.texture.TextureDescriptor |
A class that holds the texture description. |
cupy.cuda.texture.TextureObject |
A class that holds a texture object. |
cupy.cuda.texture.TextureReference |
A class that holds a texture reference. |
Profiler¶
cupy.cuda.profile |
Enable CUDA profiling during with statement. |
cupy.cuda.profiler.initialize |
Initialize the CUDA profiler. |
cupy.cuda.profiler.start |
Enable profiling. |
cupy.cuda.profiler.stop |
Disable profiling. |
cupy.cuda.nvtx.Mark |
Marks an instantaneous event (marker) in the application. |
cupy.cuda.nvtx.MarkC |
Marks an instantaneous event (marker) in the application. |
cupy.cuda.nvtx.RangePush |
Starts a nested range. |
cupy.cuda.nvtx.RangePushC |
Starts a nested range. |
cupy.cuda.nvtx.RangePop |
Ends a nested range. |
NCCL¶
cupy.cuda.nccl.NcclCommunicator |
Initialize an NCCL communicator for one device controlled by one process. |
cupy.cuda.nccl.get_build_version |
|
cupy.cuda.nccl.get_version |
Returns the runtime version of NCCL. |
cupy.cuda.nccl.get_unique_id |
|
cupy.cuda.nccl.groupStart |
Start a group of NCCL calls. |
cupy.cuda.nccl.groupEnd |
End a group of NCCL calls. |
Runtime API¶
CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the Original CUDA Runtime API document to use these functions.
Kernel binary memoization¶
cupy.memoize |
Makes a function memoizing the result for each argument and device. |
cupy.clear_memo |
Clears the memoized results for all functions decorated by memoize. |
Custom kernels¶
cupy.ElementwiseKernel |
User-defined elementwise kernel. |
cupy.ReductionKernel |
User-defined reduction kernel. |
cupy.RawKernel |
User-defined custom kernel. |
cupy.RawModule |
User-defined custom module. |
cupy.fuse |
Decorator that fuses a function. |
Interoperability¶
CuPy can also be used in conjunction with other frameworks.
NumPy¶
cupy.ndarray
implements __array_ufunc__
interface (see NEP 13 — A Mechanism for Overriding Ufuncs for details).
This enables NumPy ufuncs to be directly operated on CuPy arrays.
__array_ufunc__
feature requires NumPy 1.13 or later.
import cupy
import numpy
arr = cupy.random.randn(1, 2, 3, 4).astype(cupy.float32)
result = numpy.sum(arr)
print(type(result)) # => <class 'cupy.core.core.ndarray'>
cupy.ndarray
also implements __array_function__
interface (see NEP 18 — A dispatch mechanism for NumPy’s high level array functions for details).
This enables code using NumPy to be directly operated on CuPy arrays.
__array_function__
feature requires NumPy 1.16 or later; note that this is currently defined as an experimental feature of NumPy and you need to specify the environment variable (NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
) to enable it.
Numba¶
Numba is a Python JIT compiler with NumPy support.
cupy.ndarray
implements __cuda_array_interface__
, which is the CUDA array interchange interface compatible with Numba v0.39.0 or later (see CUDA Array Interface for details).
It means you can pass CuPy arrays to kernels JITed with Numba.
The following is a simple example code borrowed from numba/numba#2860:
import cupy
from numba import cuda
@cuda.jit
def add(x, y, out):
start = cuda.grid(1)
stride = cuda.gridsize(1)
for i in range(start, x.shape[0], stride):
out[i] = x[i] + y[i]
a = cupy.arange(10)
b = a * 2
out = cupy.zeros_like(a)
print(out) # => [0 0 0 0 0 0 0 0 0 0]
add[1, 32](a, b, out)
print(out) # => [ 0 3 6 9 12 15 18 21 24 27]
In addition, cupy.asarray()
supports zero-copy conversion from Numba CUDA array to CuPy array.
import numpy
import numba
import cupy
x = numpy.arange(10) # type: numpy.ndarray
x_numba = numba.cuda.to_device(x) # type: numba.cuda.cudadrv.devicearray.DeviceNDArray
x_cupy = cupy.asarray(x_numba) # type: cupy.ndarray
mpi4py¶
MPI for Python (mpi4py) is a Python wrapper for the Message Passing Interface (MPI) libraries.
MPI is the most widely used standard for high-performance inter-process communications. Recently several MPI vendors, including Open MPI and MVAPICH, have extended their support beyond the v3.1 standard to enable “CUDA-awareness”; that is, passing CUDA device pointers directly to MPI calls to avoid explicit data movement between the host and the device.
With the aforementioned __cuda_array_interface__
standard implemented in CuPy, mpi4py now provides (experimental) support for passing CuPy arrays to MPI calls, provided that mpi4py is built against a CUDA-aware MPI implementation. The folowing is a simple example code borrowed from mpi4py Tutorial:
# To run this script with N MPI processes, do
# mpiexec -n N python this_script.py
import cupy
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
# Allreduce
sendbuf = cupy.arange(10, dtype='i')
recvbuf = cupy.empty_like(sendbuf)
comm.Allreduce(sendbuf, recvbuf)
assert cupy.allclose(recvbuf, sendbuf*size)
This new feature will be officially released in mpi4py 3.1.0. To try it out, please build mpi4py from source for the time being. See the mpi4py website for more information.
DLPack¶
DLPack is a specification of tensor structure to share tensors among frameworks.
CuPy supports importing from and exporting to DLPack data structure (cupy.fromDlpack()
and cupy.ndarray.toDlpack()
).
cupy.fromDlpack |
Zero-copy conversion from a DLPack tensor to a ndarray . |
Here is a simple example:
import cupy
# Create a CuPy array.
cx1 = cupy.random.randn(1, 2, 3, 4).astype(cupy.float32)
# Convert it into a DLPack tensor.
dx = cx1.toDlpack()
# Convert it back to a CuPy array.
cx2 = cupy.fromDlpack(dx)
Here is an example of converting PyTorch tensor into cupy.ndarray
.
import cupy
import torch
from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack
# Create a PyTorch tensor.
tx1 = torch.randn(1, 2, 3, 4).cuda()
# Convert it into a DLPack tensor.
dx = to_dlpack(tx1)
# Convert it into a CuPy array.
cx = cupy.fromDlpack(dx)
# Convert it back to a PyTorch tensor.
tx2 = from_dlpack(cx.toDlpack())
Testing Modules¶
CuPy offers testing utilities to support unit testing.
They are under namespace cupy.testing
.
Standard Assertions¶
The assertions have same names as NumPy’s ones.
The difference from NumPy is that they can accept both numpy.ndarray
and cupy.ndarray
.
cupy.testing.assert_allclose |
Raises an AssertionError if objects are not equal up to desired tolerance. |
cupy.testing.assert_array_almost_equal |
Raises an AssertionError if objects are not equal up to desired precision. |
cupy.testing.assert_array_almost_equal_nulp |
Compare two arrays relatively to their spacing. |
cupy.testing.assert_array_max_ulp |
Check that all items of arrays differ in at most N Units in the Last Place. |
cupy.testing.assert_array_equal |
Raises an AssertionError if two array_like objects are not equal. |
cupy.testing.assert_array_list_equal |
Compares lists of arrays pairwise with assert_array_equal . |
cupy.testing.assert_array_less |
Raises an AssertionError if array_like objects are not ordered by less than. |
NumPy-CuPy Consistency Check¶
The following decorators are for testing consistency between CuPy’s functions and corresponding NumPy’s ones.
cupy.testing.numpy_cupy_allclose |
Decorator that checks NumPy results and CuPy ones are close. |
cupy.testing.numpy_cupy_array_almost_equal |
Decorator that checks NumPy results and CuPy ones are almost equal. |
cupy.testing.numpy_cupy_array_almost_equal_nulp |
Decorator that checks results of NumPy and CuPy are equal w.r.t. |
cupy.testing.numpy_cupy_array_max_ulp |
Decorator that checks results of NumPy and CuPy ones are equal w.r.t. |
cupy.testing.numpy_cupy_array_equal |
Decorator that checks NumPy results and CuPy ones are equal. |
cupy.testing.numpy_cupy_array_list_equal |
Decorator that checks the resulting lists of NumPy and CuPy’s one are equal. |
cupy.testing.numpy_cupy_array_less |
Decorator that checks the CuPy result is less than NumPy result. |
cupy.testing.numpy_cupy_raises |
Decorator that checks the NumPy and CuPy throw same errors. |
Parameterized dtype Test¶
The following decorators offer the standard way for parameterized test with respect to single or the combination of dtype(s).
cupy.testing.for_dtypes |
Decorator for parameterized dtype test. |
cupy.testing.for_all_dtypes |
Decorator that checks the fixture with all dtypes. |
cupy.testing.for_float_dtypes |
Decorator that checks the fixture with float dtypes. |
cupy.testing.for_signed_dtypes |
Decorator that checks the fixture with signed dtypes. |
cupy.testing.for_unsigned_dtypes |
Decorator that checks the fixture with unsinged dtypes. |
cupy.testing.for_int_dtypes |
Decorator that checks the fixture with integer and optionally bool dtypes. |
cupy.testing.for_complex_dtypes |
Decorator that checks the fixture with complex dtypes. |
cupy.testing.for_dtypes_combination |
Decorator that checks the fixture with a product set of dtypes. |
cupy.testing.for_all_dtypes_combination |
Decorator that checks the fixture with a product set of all dtypes. |
cupy.testing.for_signed_dtypes_combination |
Decorator for parameterized test w.r.t. |
cupy.testing.for_unsigned_dtypes_combination |
Decorator for parameterized test w.r.t. |
cupy.testing.for_int_dtypes_combination |
Decorator for parameterized test w.r.t. |
Parameterized order Test¶
The following decorators offer the standard way to parameterize tests with orders.
cupy.testing.for_orders |
Decorator to parameterize tests with order. |
cupy.testing.for_CF_orders |
Decorator that checks the fixture with orders ‘C’ and ‘F’. |
Profiling¶
time range¶
cupy.prof.TimeRangeDecorator |
Decorator to mark function calls with range in NVIDIA profiler |
cupy.prof.time_range |
A context manager to describe the enclosed block as a nested range |
Environment variables¶
Here are the environment variables CuPy uses.
CUDA_PATH |
Path to the directory containing CUDA.
The parent of the directory containing nvcc is
used as default.
When nvcc is not found, /usr/local/cuda is
used.
See Working with Custom CUDA Installation for details. |
CUPY_CACHE_DIR |
Path to the directory to store kernel cache.
${HOME}/.cupy/kernel_cache is used by default.
See Overview for details. |
CUPY_CACHE_SAVE_CUDA_SOURCE |
If set to 1, CUDA source file will be saved along with compiled binary in the cache directory for debug purpose. It is disabled by default. Note: source file will not be saved if the compiled binary is already stored in the cache. |
CUPY_DUMP_CUDA_SOURCE_ON_ERROR |
If set to 1, when CUDA kernel compilation fails, CuPy dumps CUDA kernel code to standard error. It is disabled by default. |
CUPY_CUDA_COMPILE_WITH_DEBUG |
If set to 1, CUDA kernel will be compiled with
debug information (--device-debug and
--generate-line-info ).
It is disabled by default. |
CUPY_GPU_MEMORY_LIMIT |
The amount of memory that can be allocated for
each device.
The value can be specified in absolute bytes or
fraction (e.g., "90%" ) of the total memory of
each GPU.
See Memory Management for details.
0 (unlimited) is used by default. |
CUPY_SEED |
Set the seed for random number generators. For
historical reasons CHAINER_SEED is used if
CUPY_SEED is unspecified. |
CUPY_EXPERIMENTAL_SLICE_COPY |
If set to 1, the following syntax is enabled:
cupy_ndarray[:] = numpy_ndarray . |
Moreover, as in any CUDA programs, all of the CUDA environment variables listed in the CUDA Toolkit Documentation will also be honored.
For installation¶
These environment variables are used during installation (building CuPy from source).
CUDA_PATH |
See the description above. |
CUTENSOR_PATH |
Path to the cuTENSOR root directory that contains lib and
include directories. (experimental) |
NVCC |
Define the compiler to use when compiling CUDA files. |
CUPY_PYTHON_350_FORCE |
Enforce CuPy to be installed against Python 3.5.0 (not recommended). |
CUPY_INSTALL_USE_HIP |
For building the ROCm support, see Install CuPy from Source for further detail. |
CUPY_NVCC_GENERATE_CODE |
To build CuPy for a particular CUDA architecture. For example,
CUPY_NVCC_GENERATE_CODE=compute_60,sm_60 . When this is not
set, the default is to support all architectures. |
Difference between CuPy and NumPy¶
The interface of CuPy is designed to obey that of NumPy. However, there are some differences.
Cast behavior from float to integer¶
Some casting behaviors from float to integer are not defined in C++ specification. The casting from a negative float to unsigned integer and infinity to integer is one of such examples. The behavior of NumPy depends on your CPU architecture. This is the result on an Intel CPU:
>>> np.array([-1], dtype=np.float32).astype(np.uint32)
array([4294967295], dtype=uint32)
>>> cupy.array([-1], dtype=np.float32).astype(np.uint32)
array([0], dtype=uint32)
>>> np.array([float('inf')], dtype=np.float32).astype(np.int32)
array([-2147483648], dtype=int32)
>>> cupy.array([float('inf')], dtype=np.float32).astype(np.int32)
array([2147483647], dtype=int32)
Random methods support dtype argument¶
NumPy’s random value generator does not support a dtype argument and instead always returns a float64
value.
We support the option in CuPy because cuRAND, which is used in CuPy, supports both float32
and float64
.
>>> np.random.randn(dtype=np.float32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: randn() got an unexpected keyword argument 'dtype'
>>> cupy.random.randn(dtype=np.float32) # doctest: +SKIP
array(0.10689262300729752, dtype=float32)
Out-of-bounds indices¶
CuPy handles out-of-bounds indices differently by default from NumPy when using integer array indexing. NumPy handles them by raising an error, but CuPy wraps around them.
>>> x = np.array([0, 1, 2])
>>> x[[1, 3]] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 1 with size 3
>>> x = cupy.array([0, 1, 2])
>>> x[[1, 3]] = 10
>>> x
array([10, 10, 2])
Duplicate values in indices¶
CuPy’s __setitem__
behaves differently from NumPy when integer arrays
reference the same location multiple times.
In that case, the value that is actually stored is undefined.
Here is an example of CuPy.
>>> a = cupy.zeros((2,))
>>> i = cupy.arange(10000) % 2
>>> v = cupy.arange(10000).astype(np.float32)
>>> a[i] = v
>>> a # doctest: +SKIP
array([ 9150., 9151.])
NumPy stores the value corresponding to the last element among elements referencing duplicate locations.
>>> a_cpu = np.zeros((2,))
>>> i_cpu = np.arange(10000) % 2
>>> v_cpu = np.arange(10000).astype(np.float32)
>>> a_cpu[i_cpu] = v_cpu
>>> a_cpu
array([9998., 9999.])
Zero-dimensional array¶
Reduction methods¶
NumPy’s reduction functions (e.g. numpy.sum()
) return scalar values (e.g. numpy.float32
).
However CuPy counterparts return zero-dimensional cupy.ndarray
s.
That is because CuPy scalar values (e.g. cupy.float32
) are aliases of NumPy scalar values and are allocated in CPU memory.
If these types were returned, it would be required to synchronize between GPU and CPU.
If you want to use scalar values, cast the returned arrays explicitly.
>>> type(np.sum(np.arange(3))) == np.int64
True
>>> type(cupy.sum(cupy.arange(3))) == cupy.core.core.ndarray
True
Type promotion¶
CuPy automatically promotes dtypes of cupy.ndarray
s in a function with two or more operands, the result dtype is determined by the dtypes of the inputs.
This is different from NumPy’s rule on type promotion, when operands contain zero-dimensional arrays.
Zero-dimensional numpy.ndarray
s are treated as if they were scalar values if they appear in operands of NumPy’s function,
This may affect the dtype of its output, depending on the values of the “scalar” inputs.
>>> (np.array(3, dtype=np.int32) * np.array([1., 2.], dtype=np.float32)).dtype
dtype('float32')
>>> (np.array(300000, dtype=np.int32) * np.array([1., 2.], dtype=np.float32)).dtype
dtype('float64')
>>> (cupy.array(3, dtype=np.int32) * cupy.array([1., 2.], dtype=np.float32)).dtype
dtype('float64')
Data types¶
Data type of CuPy arrays cannot be non-numeric like strings and objects. See Overview for details.
Universal Functions only work with CuPy array or scalar¶
Unlike NumPy, Universal Functions in CuPy only work with CuPy array or scalar.
They do not accept other objects (e.g., lists or numpy.ndarray
).
>>> np.power([np.arange(5)], 2)
array([[ 0, 1, 4, 9, 16]])
>>> cupy.power([cupy.arange(5)], 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unsupported type <class 'list'>
Random seed arrays are hashed to scalars¶
Like Numpy, CuPy’s RandomState objects accept seeds either as numbers or as full numpy arrays.
>>> seed = np.array([1, 2, 3, 4, 5])
>>> rs = cupy.random.RandomState(seed=seed)
However, unlike Numpy, array seeds will be hashed down to a single number and so may not communicate as much entropy to the underlying random number generator.
Comparison Table¶
Here is a list of NumPy / SciPy APIs and its corresponding CuPy implementations.
-
in CuPy column denotes that CuPy implementation is not provided yet.
We welcome contributions for these functions.
NumPy / CuPy APIs¶
Module-Level¶
Multi-Dimensional Array¶
Linear Algebra¶
Discrete Fourier Transform¶
Random Sampling¶
SciPy / CuPy APIs¶
Discrete Fourier Transform¶
Sparse Matrices¶
Sparse Linear Algebra¶
Advanced Linear Algebra¶
Multidimensional Image Processing¶
Special Functions¶
API Compatibility Policy¶
This document expresses the design policy on compatibilities of CuPy APIs. Development team should obey this policy on deciding to add, extend, and change APIs and their behaviors.
This document is written for both users and developers. Users can decide the level of dependencies on CuPy’s implementations in their codes based on this document. Developers should read through this document before creating pull requests that contain changes on the interface. Note that this document may contain ambiguities on the level of supported compatibilities.
Versioning and Backward Compatibilities¶
The updates of CuPy are classified into three levels: major, minor, and revision. These types have distinct levels of backward compatibilities.
- Major update contains disruptive changes that break the backward compatibility.
- Minor update contains addition and extension to the APIs keeping the supported backward compatibility.
- Revision update contains improvements on the API implementations without changing any API specifications.
Note that we do not support full backward compatibility, which is almost infeasible for Python-based APIs, since there is no way to completely hide the implementation details.
Processes to Break Backward Compatibilities¶
Deprecation, Dropping, and Its Preparation¶
Any APIs may be deprecated at some minor updates. In such a case, the deprecation note is added to the API documentation, and the API implementation is changed to fire deprecation warning (if possible). There should be another way to reimplement the same things previously written with the deprecated APIs.
Any APIs may be marked as to be dropped in the future. In such a case, the dropping is stated in the documentation with the major version number on which the API is planned to be dropped, and the API implementation is changed to fire the future warning (if possible).
The actual dropping should be done through the following steps:
- Make the API deprecated. At this point, users should not need the deprecated API in their new application codes.
- After that, mark the API as to be dropped in the future. It must be done in the minor update different from that of the deprecation.
- At the major version announced in the above update, drop the API.
Consequently, it takes at least two minor versions to drop any APIs after the first deprecation.
API Changes and Its Preparation¶
Any APIs may be marked as to be changed in the future for changes without backward compatibility. In such a case, the change is stated in the documentation with the version number on which the API is planned to be changed, and the API implementation is changed to fire the future warning on the certain usages.
The actual change should be done in the following steps:
- Announce that the API will be changed in the future. At this point, the actual version of change need not be accurate.
- After the announcement, mark the API as to be changed in the future with version number of planned changes. At this point, users should not use the marked API in their new application codes.
- At the major update announced in the above update, change the API.
Supported Backward Compatibility¶
This section defines backward compatibilities that minor updates must maintain.
Documented Interface¶
CuPy has the official API documentation. Many applications can be written based on the documented features. We support backward compatibilities of documented features. In other words, codes only based on the documented features run correctly with minor/revision-updated versions.
Developers are encouraged to use apparent names for objects of implementation details. For example, attributes outside of the documented APIs should have one or more underscores at the prefix of their names.
Undocumented behaviors¶
Behaviors of CuPy implementation not stated in the documentation are undefined. Undocumented behaviors are not guaranteed to be stable between different minor/revision versions.
Minor update may contain changes to undocumented behaviors. For example, suppose an API X is added at the minor update. In the previous version, attempts to use X cause AttributeError. This behavior is not stated in the documentation, so this is undefined. Thus, adding the API X in minor version is permissible.
Revision update may also contain changes to undefined behaviors. Typical example is a bug fix. Another example is an improvement on implementation, which may change the internal object structures not shown in the documentation. As a consequence, even revision updates do not support compatibility of pickling, unless the full layout of pickled objects is clearly documented.
Documentation Error¶
Compatibility is basically determined based on the documentation, though it sometimes contains errors. It may make the APIs confusing to assume the documentation always stronger than the implementations. We therefore may fix the documentation errors in any updates that may break the compatibility in regard to the documentation.
Note
Developers MUST NOT fix the documentation and implementation of the same functionality at the same time in revision updates as “bug fix”. Such a change completely breaks the backward compatibility. If you want to fix the bugs in both sides, first fix the documentation to fit it into the implementation, and start the API changing procedure described above.
Object Attributes and Properties¶
Object attributes and properties are sometimes replaced by each other at minor updates. It does not break the user codes, except the codes depend on how the attributes and properties are implemented.
Functions and Methods¶
Methods may be replaced by callable attributes keeping the compatibility of parameters and return values in minor updates. It does not break the user codes, except the codes depend on how the methods and callable attributes are implemented.
Exceptions and Warnings¶
The specifications of raising exceptions are considered as a part of standard backward compatibilities. No exception is raised in the future versions with correct usages that the documentation allows, unless the API changing process is completed.
On the other hand, warnings may be added at any minor updates for any APIs. It means minor updates do not keep backward compatibility of warnings.
Installation Compatibility¶
The installation process is another concern of compatibilities. We support environmental compatibilities in the following ways.
- Any changes of dependent libraries that force modifications on the existing environments must be done in major updates.
Such changes include following cases:
- dropping supported versions of dependent libraries (e.g. dropping cuDNN v2)
- adding new mandatory dependencies (e.g. adding h5py to setup_requires)
- Supporting optional packages/libraries may be done in minor updates (e.g. supporting h5py in optional features).
Note
The installation compatibility does not guarantee that all the features of CuPy correctly run on supported environments. It may contain bugs that only occurs in certain environments. Such bugs should be fixed in some updates.
Contribution Guide¶
This is a guide for all contributions to CuPy. The development of CuPy is running on the official repository at GitHub. Anyone that wants to register an issue or to send a pull request should read through this document.
Classification of Contributions¶
There are several ways to contribute to CuPy community:
- Registering an issue
- Sending a pull request (PR)
- Sending a question to CuPy User Group
- Open-sourcing an external example
- Writing a post about CuPy
This document mainly focuses on 1 and 2, though other contributions are also appreciated.
Development Cycle¶
This section explains the development process of CuPy. Before contributing to CuPy, it is strongly recommended to understand the development cycle.
Versioning¶
The versioning of CuPy follows PEP 440 and a part of Semantic versioning.
The version number consists of three or four parts: X.Y.Zw
where X
denotes the major version, Y
denotes the minor version, Z
denotes the revision number, and the optional w
denotes the prelease suffix.
While the major, minor, and revision numbers follow the rule of semantic versioning, the pre-release suffix follows PEP 440 so that the version string is much friendly with Python eco-system.
Note that a major update basically does not contain compatibility-breaking changes from the last release candidate (RC). This is not a strict rule, though; if there is a critical API bug that we have to fix for the major version, we may add breaking changes to the major version up.
As for the backward compatibility, see API Compatibility Policy.
Release Cycle¶
The first one is the track of stable versions, which is a series of revision updates for the latest major version. The second one is the track of development versions, which is a series of pre-releases for the upcoming major version.
Consider that X.0.0
is the latest major version and Y.0.0
, Z.0.0
are the succeeding major versions.
Then, the timeline of the updates is depicted by the following table.
Date | ver X | ver Y | ver Z |
---|---|---|---|
0 weeks | X.0.0rc1 | – | – |
4 weeks | X.0.0 | Y.0.0a1 | – |
8 weeks | X.1.0* | Y.0.0b1 | – |
12 weeks | X.2.0* | Y.0.0rc1 | – |
16 weeks | – | Y.0.0 | Z.0.0a1 |
(* These might be revision releases)
The dates shown in the left-most column are relative to the release of X.0.0rc1
.
In particular, each revision/minor release is made four weeks after the previous one of the same major version, and the pre-release of the upcoming major version is made at the same time.
Whether these releases are revision or minor is determined based on the contents of each update.
Note that there are only three stable releases for the versions X.x.x
.
During the parallel development of Y.0.0
and Z.0.0a1
, the version Y
is treated as an almost-stable version and Z
is treated as a development version.
If there is a critical bug found in X.x.x
after stopping the development of version X
, we may release a hot-fix for this version at any time.
We create a milestone for each upcoming release at GitHub. The GitHub milestone is basically used for collecting the issues and PRs resolved in the release.
Git Branches¶
The master
branch is used to develop pre-release versions.
It means that alpha, beta, and RC updates are developed at the master
branch.
This branch contains the most up-to-date source tree that includes features newly added after the latest major version.
The stable version is developed at the individual branch named as vN
where “N” reflects the version number (we call it a versioned branch).
For example, v1.0.0, v1.0.1, and v1.0.2 will be developed at the v1
branch.
Notes for contributors:
When you send a pull request, you basically have to send it to the master
branch.
If the change can also be applied to the stable version, a core team member will apply the same change to the stable version so that the change is also included in the next revision update.
If the change is only applicable to the stable version and not to the master
branch, please send it to the versioned branch.
We basically only accept changes to the latest versioned branch (where the stable version is developed) unless the fix is critical.
If you want to make a new feature of the master
branch available in the current stable version, please send a backport PR to the stable version (the latest vN
branch).
See the next section for details.
Note: a change that can be applied to both branches should be sent to the master
branch.
Each release of the stable version is also merged to the development version so that the change is also reflected to the next major version.
Feature Backport PRs¶
We basically do not backport any new features of the development version to the stable versions.
If you desire to include the feature to the current stable version and you can work on the backport work, we welcome such a contribution.
In such a case, you have to send a backport PR to the latest vN
branch.
Note that we do not accept any feature backport PRs to older versions because we are not running quality assurance workflows (e.g. CI) for older versions so that we cannot ensure that the PR is correctly ported.
There are some rules on sending a backport PR.
- Start the PR title from the prefix [backport].
- Clarify the original PR number in the PR description (something like “This is a backport of #XXXX”).
- (optional) Write to the PR description the motivation of backporting the feature to the stable version.
Please follow these rules when you create a feature backport PR.
Note: PRs that do not include any changes/additions to APIs (e.g. bug fixes, documentation improvements) are usually backported by core dev members. It is also appreciated to make such a backport PR by any contributors, though, so that the overall development proceeds more smoothly!
Issues and Pull Requests¶
In this section, we explain how to file issues and send pull requests (PRs).
Issue/PR Labels¶
Issues and PRs are labeled by the following tags:
- Bug: bug reports (issues) and bug fixes (PRs)
- Enhancement: implementation improvements without breaking the interface
- Feature: feature requests (issues) and their implementations (PRs)
- NoCompat: disrupts backward compatibility
- Test: test fixes and updates
- Document: document fixes and improvements
- Example: fixes and improvements on the examples
- Install: fixes installation script
- Contribution-Welcome: issues that we request for contribution (only issues are categorized to this)
- Other: other issues and PRs
Multiple tags might be labeled to one issue/PR. Note that revision releases cannot include PRs in Feature and NoCompat categories.
How to File an Issue¶
On registering an issue, write precise explanations on how you want CuPy to be. Bug reports must include necessary and sufficient conditions to reproduce the bugs. Feature requests must include what you want to do (and why you want to do, if needed) with CuPy. You can contain your thoughts on how to realize it into the feature requests, though what part is most important for discussions.
Warning
If you have a question on usages of CuPy, it is highly recommended to send a post to CuPy User Group instead of the issue tracker. The issue tracker is not a place to share knowledge on practices. We may suggest these places and immediately close how-to question issues.
How to Send a Pull Request¶
If you can write code to fix an issue, we encourage to send a PR.
First of all, before starting to write any code, do not forget to confirm the following points.
- Read through the Coding Guidelines and Unit Testing.
- Check the appropriate branch that you should send the PR following Git Branches.
If you do not have any idea about selecting a branch, please choose the
master
branch.
In particular, check the branch before writing any code. The current source tree of the chosen branch is the starting point of your change.
After writing your code (including unit tests and hopefully documentations!), send a PR on GitHub. You have to write a precise explanation of what and how you fix; it is the first documentation of your code that developers read, which is a very important part of your PR.
Once you send a PR, it is automatically tested on Travis CI for Linux and Mac OS X, and on AppVeyor for Windows. Your PR needs to pass at least the test for Linux on Travis CI. After the automatic test passes, some of the core developers will start reviewing your code. Note that this automatic PR test only includes CPU tests.
Note
We are also running continuous integration with GPU tests for the master
branch and the versioned branch of the latest major version.
Since this service is currently running on our internal server, we do not use it for automatic PR tests to keep the server secure.
If you are planning to add a new feature or modify existing APIs, it is recommended to open an issue and discuss the design first. The design discussion needs lower cost for the core developers than code review. Following the consequences of the discussions, you can send a PR that is smoothly reviewed in a shorter time.
Even if your code is not complete, you can send a pull request as a work-in-progress PR by putting the [WIP]
prefix to the PR title.
If you write a precise explanation about the PR, core developers and other contributors can join the discussion about how to proceed the PR.
WIP PR is also useful to have discussions based on a concrete code.
Coding Guidelines¶
Note
Coding guidelines are updated at v5.0. Those who have contributed to older versions should read the guidelines again.
We use PEP8 and a part of OpenStack Style Guidelines related to general coding style as our basic style guidelines.
You can use autopep8
and flake8
commands to check your code.
In order to avoid confusion from using different tool versions, we pin the versions of those tools. Install them with the following command (from within the top directory of CuPy repository):
$ pip install -e '.[stylecheck]'
And check your code with:
$ autopep8 path/to/your/code.py
$ flake8 path/to/your/code.py
To check Cython code, use .flake8.cython
configuration file:
$ flake8 --config=.flake8.cython path/to/your/cython/code.pyx
The autopep8
supports automatically correct Python code to conform to the PEP 8 style guide:
$ autopep8 --in-place path/to/your/code.py
The flake8
command lets you know the part of your code not obeying our style guidelines.
Before sending a pull request, be sure to check that your code passes the flake8
checking.
Note that flake8
command is not perfect.
It does not check some of the style guidelines.
Here is a (not-complete) list of the rules that flake8
cannot check.
- Relative imports are prohibited. [H304]
- Importing non-module symbols is prohibited.
- Import statements must be organized into three parts: standard libraries, third-party libraries, and internal imports. [H306]
In addition, we restrict the usage of shortcut symbols in our code base.
They are symbols imported by packages and sub-packages of cupy
.
For example, cupy.cuda.Device
is a shortcut of cupy.cuda.device.Device
.
It is not allowed to use such shortcuts in the ``cupy`` library implementation.
Note that you can still use them in tests and examples directories.
Once you send a pull request, your coding style is automatically checked by Travis-CI. The reviewing process starts after the check passes.
The CuPy is designed based on NumPy’s API design. CuPy’s source code and documents contain the original NumPy ones. Please note the followings when writing the document.
- In order to identify overlapping parts, it is preferable to add some remarks that this document is just copied or altered from the original one. It is also preferable to briefly explain the specification of the function in a short paragraph, and refer to the corresponding function in NumPy so that users can read the detailed document. However, it is possible to include a complete copy of the document with such a remark if users cannot summarize in such a way.
- If a function in CuPy only implements a limited amount of features in the original one, users should explicitly describe only what is implemented in the document.
For changes that modify or add new Cython files, please make sure the pointer types follow these guidelines (#1913).
- Pointers should be
void*
if only used within Cython, orintptr_t
if exposed to the Python space. - Memory sizes should be
size_t
. - Memory offsets should be
ptrdiff_t
.
Note
We are incrementally enforcing the above rules, so some existing code may not follow the above guidelines, but please ensure all new contributions do.
Unit Testing¶
Testing is one of the most important part of your code. You must write test cases and verify your implementation by following our testing guide.
Note that we are using pytest and mock package for testing, so install them before writing your code:
$ pip install pytest mock
How to Run Tests¶
In order to run unit tests at the repository root, you first have to build Cython files in place by running the following command:
$ pip install -e .
Note
When you modify *.pxd
files, before running pip install -e .
, you must clean *.cpp
and *.so
files once with the following command, because Cython does not automatically rebuild those files nicely:
$ git clean -fdx
Note
It’s not officially supported, but you can use ccache to reduce compilation time. On Ubuntu 16.04, you can set up as follows:
$ sudo apt-get install ccache
$ export PATH=/usr/lib/ccache:$PATH
See ccache for details.
If you want to use ccache for nvcc, please install ccache v3.3 or later.
You also need to set environment variable NVCC='ccache nvcc'
.
Once Cython modules are built, you can run unit tests by running the following command at the repository root:
$ python -m pytest
CUDA must be installed to run unit tests.
Some GPU tests require cuDNN to run.
In order to skip unit tests that require cuDNN, specify -m='not cudnn'
option:
$ python -m pytest path/to/your/test.py -m='not cudnn'
Some GPU tests involve multiple GPUs.
If you want to run GPU tests with insufficient number of GPUs, specify the number of available GPUs to CUPY_TEST_GPU_LIMIT
.
For example, if you have only one GPU, launch pytest
by the following command to skip multi-GPU tests:
$ export CUPY_TEST_GPU_LIMIT=1
$ python -m pytest path/to/gpu/test.py
Following this naming convention, you can run all the tests by running the following command at the repository root:
$ python -m pytest
Or you can also specify a root directory to search test scripts from:
$ python -m pytest tests/cupy_tests # to just run tests of CuPy
$ python -m pytest tests/install_tests # to just run tests of installation modules
If you modify the code related to existing unit tests, you must run appropriate commands.
Test File and Directory Naming Conventions¶
Tests are put into the tests/cupy_tests directory. In order to enable test runner to find test scripts correctly, we are using special naming convention for the test subdirectories and the test scripts.
- The name of each subdirectory of
tests
must end with the_tests
suffix. - The name of each test script must start with the
test_
prefix.
When we write a test for a module, we use the appropriate path and file name for the test script whose correspondence to the tested module is clear.
For example, if you want to write a test for a module cupy.x.y.z
, the test script must be located at tests/cupy_tests/x_tests/y_tests/test_z.py
.
How to Write Tests¶
There are many examples of unit tests under the tests directory, so reading some of them is a good and recommended way to learn how to write tests for CuPy.
They simply use the unittest
package of the standard library, while some tests are using utilities from cupy.testing
.
In addition to the Coding Guidelines mentioned above, the following rules are applied to the test code:
All test classes must inherit from
unittest.TestCase
.Use
unittest
features to write tests, except for the following cases:- Use
assert
statement instead ofself.assert*
methods (e.g., writeassert x == 1
instead ofself.assertEqual(x, 1)
). - Use
with pytest.raises(...):
instead ofwith self.assertRaises(...):
.
- Use
Note
We are incrementally applying the above style.
Some existing tests may be using the old style (self.assertRaises
, etc.), but all newly written tests should follow the above style.
Even if your patch includes GPU-related code, your tests should not fail without GPU capability.
Test functions that require CUDA must be tagged by the cupy.testing.attr.gpu
:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.gpu
def test_my_gpu_func(self):
...
The functions tagged by the gpu
decorator are skipped if CUPY_TEST_GPU_LIMIT=0
environment variable is set.
We also have the cupy.testing.attr.cudnn
decorator to let pytest
know that the test depends on cuDNN.
The test functions decorated by cudnn
are skipped if -m='not cudnn'
is given.
The test functions decorated by gpu
must not depend on multiple GPUs.
In order to write tests for multiple GPUs, use cupy.testing.attr.multi_gpu()
decorators instead:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.multi_gpu(2) # specify the number of required GPUs here
def test_my_two_gpu_func(self):
...
If your test requires too much time, add cupy.testing.attr.slow
decorator.
The test functions decorated by slow
are skipped if -m='not slow'
is given:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.slow
def test_my_slow_func(self):
...
Note
If you want to specify more than two attributes, use and
operator like -m='not cudnn and not slow'
.
See detail in the document of pytest.
Once you send a pull request, Travis-CI automatically checks if your code meets our coding guidelines described above. Since Travis-CI does not support CUDA, we cannot run unit tests automatically. The reviewing process starts after the automatic check passes. Note that reviewers will test your code without the option to check CUDA-related code.
Note
Some of numerically unstable tests might cause errors irrelevant to your changes. In such a case, we ignore the failures and go on to the review process, so do not worry about it!
Documentation¶
When adding a new feature to the framework, you also need to document it in the reference.
Note
If you are unsure about how to fix the documentation, you can submit a pull request without doing so. Reviewers will help you fix the documentation appropriately.
The documentation source is stored under docs directory and written in reStructuredText format.
To build the documentation, you need to install Sphinx:
$ pip install sphinx sphinx_rtd_theme
Then you can build the documentation in HTML format locally:
$ cd docs
$ make html
HTML files are generated under build/html
directory.
Open index.html
with the browser and see if it is rendered as expected.
Note
Docstrings (documentation comments in the source code) are collected from the installed CuPy module. If you modified docstrings, make sure to install the module (e.g., using pip install -e .) before building the documentation.
Installation Guide¶
- Recommended Environments
- Requirements
- Install CuPy
- Install CuPy from conda-forge
- Install CuPy from Source
- Uninstall CuPy
- Upgrade CuPy
- Reinstall CuPy
- Run CuPy with Docker
- FAQ
- Warning message “cuDNN is not enabled” appears when using Chainer
pip
fails to install CuPy- Installing cuDNN and NCCL
- Working with Custom CUDA Installation
- Using custom
nvcc
command during installation - Installation for Developers
- CuPy always raises
cupy.cuda.compiler.CompileException
- Build fails with CUDA 11.0 on Ubuntu 16.04, CentOS 6 or 7
Recommended Environments¶
We recommend the following Linux distributions.
Note
We are automatically testing CuPy on all the recommended environments above. We cannot guarantee that CuPy works on other environments including Windows and macOS, even if CuPy may seem to be running correctly.
Requirements¶
You need to have the following components to use CuPy.
- NVIDIA CUDA GPU
- Compute Capability of the GPU must be at least 3.0.
- CUDA Toolkit
- Supported Versions: 8.0, 9.0, 9.1, 9.2, 10.0, 10.1, 10.2 and 11.0.
- If you have multiple versions of CUDA Toolkit installed, CuPy will choose one of the CUDA installations automatically. See Working with Custom CUDA Installation for details.
- Python
- Supported Versions: 3.5.1+, 3.6.0+, 3.7.0+ and 3.8.0+.
- NumPy
- Supported Versions: 1.9, 1.10, 1.11, 1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18 and 1.19.
- NumPy will be installed automatically during the installation of CuPy.
Before installing CuPy, we recommend you to upgrade setuptools
and pip
:
$ pip install -U setuptools pip
Note
On Windows, CuPy only supports Python 3.6.0 or later.
Note
Python 2 is not supported in CuPy v7.x releases. Please consider migrating Python 3 or use CuPy v6.x, which is the last version that supports Python 2.
Optional Libraries¶
Some features in CuPy will only be enabled if the corresponding libraries are installed.
- cuDNN (library to accelerate deep neural network computations)
- Supported Versions: v5, v5.1, v6, v7, v7.1, v7.2, v7.3, v7.4, v7.5, v7.6 and v8.0.
- NCCL (library to perform collective multi-GPU / multi-node computations)
- Supported Versions: v1.3.4, v2, v2.1, v2.2, v2.3, v2.4, v2.5, v2.6 and v2.7.
- cuTENSOR (library for high-performance tensor operations)
- Supported Versions: v1.0.0 (experimental)
Install CuPy¶
Wheels (precompiled binary packages) are available for Linux (Python 3.5 or later) and Windows (Python 3.6 or later). Package names are different depending on the CUDA version you have installed on your host.
(For CUDA 8.0)
$ pip install cupy-cuda80
(For CUDA 9.0)
$ pip install cupy-cuda90
(For CUDA 9.1)
$ pip install cupy-cuda91
(For CUDA 9.2)
$ pip install cupy-cuda92
(For CUDA 10.0)
$ pip install cupy-cuda100
(For CUDA 10.1)
$ pip install cupy-cuda101
(For CUDA 10.2)
$ pip install cupy-cuda102
(For CUDA 11.0)
$ pip install cupy-cuda110
Note
The latest version of cuDNN and NCCL libraries are included in these wheels except for CUDA 11.0. For CUDA 11.0, you need to manually download and install cuDNN 8.0.x. For other CUDA versions, you don’t have to install them manually.
When using wheels, please be careful not to install multiple CuPy packages at the same time.
Any of these packages and cupy
package (source installation) conflict with each other.
Please make sure that only one CuPy package (cupy
or cupy-cudaXX
where XX is a CUDA version) is installed:
$ pip freeze | grep cupy
Install CuPy from conda-forge¶
Conda/Anaconda is a cross-platform package management solution widely used in scientific computing and other fields.
The above pip install
instruction is compatible with conda
environments. Alternatively, for Linux 64 systems
once the CUDA driver is correctly set up, you can install CuPy from the conda-forge
channel:
$ conda install -c conda-forge cupy
and conda
will install pre-built CuPy and most of the optional dependencies for you, including CUDA runtime libraries
(cudatoolkit
), NCCL, and cuDNN. It is not necessary to install CUDA Toolkit in advance. If you need to enforce
the installation of a particular CUDA version (say 10.0) for driver compatibility, you can do:
$ conda install -c conda-forge cupy cudatoolkit=10.0
Note
Currently cuTENSOR is not yet available on conda-forge
.
Note
If you encounter any problem with CuPy from conda-forge
, please feel free to report to cupy-feedstock, and we will help investigate if it is just a packaging
issue in conda-forge
’s recipe or a real issue in CuPy.
Note
If you did not install CUDA Toolkit yourselves, the nvcc
compiler might not be available.
The cudatoolkit
package from Anaconda does not have nvcc
included.
Install CuPy from Source¶
It is recommended to use wheels whenever possible. However, if wheels cannot meet your requirements (e.g., you are running non-Linux environment or want to use a version of CUDA / cuDNN / NCCL not supported by wheels), you can also build CuPy from source.
When installing from source, C++ compiler such as g++
is required.
You need to install it before installing CuPy.
This is typical installation method for each platform:
# Ubuntu 16.04
$ apt-get install g++
# CentOS 7
$ yum install gcc-c++
Note
When installing CuPy from source, features provided by optional libraries (cuDNN and NCCL) will be disabled if these libraries are not available at the time of installation. See Installing cuDNN and NCCL for the instructions.
Note
If you upgrade or downgrade the version of CUDA Toolkit, cuDNN or NCCL, you may need to reinstall CuPy. See Reinstall CuPy for details.
Using Tarball¶
The tarball of the source tree is available via pip download cupy
or from the release notes page.
You can install CuPy from the tarball:
$ pip install cupy-x.x.x.tar.gz
You can also install the development version of CuPy from a cloned Git repository:
$ git clone --recursive https://github.com/cupy/cupy.git
$ cd cupy
$ pip install .
If you are using source tree downloaded from GitHub, you need to install Cython 0.28.0 or later (pip install cython
).
Uninstall CuPy¶
Use pip to uninstall CuPy:
$ pip uninstall cupy
Note
When you upgrade Chainer, pip
sometimes installs the new version without removing the old one in site-packages
.
In this case, pip uninstall
only removes the latest one.
To ensure that CuPy is completely removed, run the above command repeatedly until pip
returns an error.
Note
If you are using a wheel, cupy
shall be replaced with cupy-cudaXX
(where XX is a CUDA version number).
Note
If CuPy is installed via conda
, please do conda uninstall cupy
instead.
Upgrade CuPy¶
Just use pip install
with -U
option:
$ pip install -U cupy
Note
If you are using a wheel, cupy
shall be replaced with cupy-cudaXX
(where XX is a CUDA version number).
Reinstall CuPy¶
If you want to reinstall CuPy, please uninstall CuPy and then install it.
When reinstalling CuPy, we recommend to use --no-cache-dir
option as pip
caches the previously built binaries:
$ pip uninstall cupy
$ pip install cupy --no-cache-dir
Note
If you are using a wheel, cupy
shall be replaced with cupy-cudaXX
(where XX is a CUDA version number).
Run CuPy with Docker¶
We are providing the official Docker image. Use nvidia-docker command to run CuPy image with GPU. You can login to the environment with bash, and run the Python interpreter:
$ nvidia-docker run -it cupy/cupy /bin/bash
Or run the interpreter directly:
$ nvidia-docker run -it cupy/cupy /usr/bin/python
FAQ¶
Warning message “cuDNN is not enabled” appears when using Chainer¶
You failed to build CuPy with cuDNN. If you don’t need cuDNN, ignore this message. Otherwise, retry to install CuPy with cuDNN.
See Installing cuDNN and NCCL and pip fails to install CuPy for details.
pip
fails to install CuPy¶
Please make sure that you are using the latest setuptools
and pip
:
$ pip install -U setuptools pip
Use -vvvv
option with pip
command.
This will display all logs of installation:
$ pip install cupy -vvvv
If you are using sudo
to install CuPy, note that sudo
command does not propagate environment variables.
If you need to pass environment variable (e.g., CUDA_PATH
), you need to specify them inside sudo
like this:
$ sudo CUDA_PATH=/opt/nvidia/cuda pip install cupy
If you are using certain versions of conda, it may fail to build CuPy with error g++: error: unrecognized command line option ‘-R’
.
This is due to a bug in conda (see conda/conda#6030 for details).
If you encounter this problem, please upgrade your conda.
Installing cuDNN and NCCL¶
We recommend installing cuDNN and NCCL using binary packages (i.e., using apt
or yum
) provided by NVIDIA.
If you want to install tar-gz version of cuDNN and NCCL, we recommend you to install it under CUDA directory.
For example, if you are using Ubuntu, copy *.h
files to include
directory and *.so*
files to lib64
directory:
$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64
The destination directories depend on your environment.
If you want to use cuDNN or NCCL installed in another directory, please use CFLAGS
, LDFLAGS
and LD_LIBRARY_PATH
environment variables before installing CuPy:
export CFLAGS=-I/path/to/cudnn/include
export LDFLAGS=-L/path/to/cudnn/lib
export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
Note
Use full paths for the environment variables.
distutils
that is used in the setup script does not expand the home directory mark ~
.
Working with Custom CUDA Installation¶
If you have installed CUDA on the non-default directory or have multiple CUDA versions installed, you may need to manually specify the CUDA installation directory to be used by CuPy.
CuPy uses the first CUDA installation directory found by the following order.
CUDA_PATH
environment variable.- The parent directory of
nvcc
command. CuPy looks fornvcc
command in each directory set inPATH
environment variable. /usr/local/cuda
For example, you can tell CuPy to use non-default CUDA directory by CUDA_PATH
environment variable:
$ CUDA_PATH=/opt/nvidia/cuda pip install cupy
Note
CUDA installation discovery is also performed at runtime using the rule above.
Depending on your system configuration, you may also need to set LD_LIBRARY_PATH
environment variable to $CUDA_PATH/lib64
at runtime.
Using custom nvcc
command during installation¶
If you want to use a custom nvcc
compiler (for example, to use ccache
) to build CuPy, please set NVCC
environment variables before installing CuPy:
export NVCC='ccache nvcc'
Note
During runtime, you don’t need to set this environment variable since CuPy doesn’t use the nvcc command.
Installation for Developers¶
If you are hacking CuPy source code, we recommend you to use pip
with -e
option for editable mode:
$ cd /path/to/cupy/source
$ pip install -e .
Please note that even with -e
, you will have to rerun pip install -e .
to regenerate C++ sources using Cython if you modified Cython source files (e.g., *.pyx
files).
CuPy always raises cupy.cuda.compiler.CompileException
¶
If CuPy does not work at all with CompileException
, it is possible that CuPy cannot detect CUDA installed on your system correctly.
The followings are error messages commonly observed in such cases.
nvrtc: error: failed to load builtins
catastrophic error: cannot open source file "cuda_fp16.h"
error: cannot overload functions distinguished by return type alone
error: identifier "__half_raw" is undefined
Please try setting LD_LIBRARY_PATH
and CUDA_PATH
environment variable.
For example, if you have CUDA installed at /usr/local/cuda-9.0
:
export CUDA_PATH=/usr/local/cuda-9.0
export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
Also see Working with Custom CUDA Installation.
Build fails with CUDA 11.0 on Ubuntu 16.04, CentOS 6 or 7¶
In order to build CuPy from source with CUDA 11.0 on systems with legacy GCC (g++-5 or earlier), you need to manually set up g++-6 or later and configure NVCC
environment variable.
On Ubuntu 16.04:
$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt update
$ sudo apt install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"
On CentOS 6 / 7:
$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable
$ export NVCC="nvcc --compiler-bidir gcc-7"
[Experimental] Installation Guide for ROCm environemt¶
This is an experimental feature. We recommend only for advanced users to use this.
Recommended Environments¶
We recommend the following Linux distributions.
- Ubuntu 16.04 / 18.04 LTS (64-bit)
Requirements¶
You need to have the following components to use CuPy.
And please install ROCm libraries.
$ sudo apt install hipblas hipsparse rocrand rocthrust
Before installing CuPy, we recommend you to upgrade setuptools
and pip
:
$ pip install -U setuptools pip
Install CuPy from Source¶
It is recommended to use wheels whenever possible. However, there is currently no wheels for the ROCm environment, so you have to build it from source.
When installing from source, C++ compiler such as g++
is required.
You need to install it before installing CuPy.
This is typical installation method for each platform:
# Ubuntu 16.04
$ apt-get install g++
Note
If you want to upgrade or downgrade the version of ROCm, you may need to reinstall CuPy after that. See Reinstall CuPy for details.
Using pip¶
You can install CuPy package via pip
.
It builds CuPy from source.
$ export HCC_AMDGPU_TARGET=gfx900 # This value should be changed based on your GPU
$ export __HIP_PLATFORM_HCC__
$ export CUPY_INSTALL_USE_HIP=1
$ pip install cupy
Using Tarball¶
The tarball of the source tree is available via pip download cupy
or from the release notes page.
You can install CuPy from the tarball:
$ pip install cupy-x.x.x.tar.gz
You can also install the development version of CuPy from a cloned Git repository:
$ git clone --recursive https://github.com/cupy/cupy.git
$ cd cupy
$ export HCC_AMDGPU_TARGET=gfx900 # This value should be changed based on your GPU
$ export __HIP_PLATFORM_HCC__
$ export CUPY_INSTALL_USE_HIP=1
$ pip install .
If you are using the source tree downloaded from GitHub, you need to install Cython 0.28.0 or later (pip install cython
).
Uninstall CuPy¶
Use pip to uninstall CuPy:
$ pip uninstall cupy
Note
When you upgrade Chainer, pip
sometimes installs the new version without removing the old one in site-packages
.
In this case, pip uninstall
only removes the latest one.
To ensure that CuPy is completely removed, run the above command repeatedly until pip
returns an error.
Upgrade CuPy¶
Just use pip install
with -U
option:
$ export HCC_AMDGPU_TARGET=gfx900 # This value should be changed based on your GPU
$ export __HIP_PLATFORM_HCC__
$ export CUPY_INSTALL_USE_HIP=1
$ pip install -U cupy
Reinstall CuPy¶
If you want to reinstall CuPy, please uninstall CuPy first, and then install again.
When reinstalling CuPy, we recommend to use --no-cache-dir
option as pip
caches the previously built binaries:
$ pip uninstall cupy
$ export HCC_AMDGPU_TARGET=gfx900 # This value should be changed based on your GPU
$ export __HIP_PLATFORM_HCC__
$ export CUPY_INSTALL_USE_HIP=1
$ pip install cupy --no-cache-dir
FAQ¶
pip
fails to install CuPy¶
Please make sure that you are using the latest setuptools
and pip
:
$ pip install -U setuptools pip
Use -vvvv
option with pip
command to investigate the details of errors.
This will display all logs of installation:
$ pip install cupy -vvvv
If you are using sudo
to install CuPy, note that sudo
command does not propagate environment variables.
If you need to pass environment variable (e.g., ROCM_HOME
), you need to specify them inside sudo
like this:
$ sudo ROCM_HOME=/opt/rocm pip install cupy
If you are using certain versions of conda, it may fail to build CuPy with error g++: error: unrecognized command line option ‘-R’
.
This is due to a bug in conda (see conda/conda#6030 for details).
If you encounter this problem, please downgrade or upgrade it.
Upgrade Guide¶
This is a list of changes introduced in each release that users should be aware of when migrating from older versions. Most changes are carefully designed not to break existing code; however changes that may possibly break them are highlighted with a box.
CuPy v7¶
Dropping Support of Python 2.7 and 3.4¶
Starting from CuPy v7, Python 2.7 and 3.4 are no longer supported as it reaches its end-of-life (EOL) in January 2020 (2.7) and March 2019 (3.4). Python 3.5.1 is the minimum Python version supported by CuPy v7. Please upgrade the Python version if you are using affected versions of Python to any later versions listed under Installation.
CuPy v6¶
Binary Packages Ignore LD_LIBRARY_PATH
¶
Prior to CuPy v6, LD_LIBRARY_PATH
environment variable can be used to override cuDNN / NCCL libraries bundled in the binary distribution (also known as wheels).
In CuPy v6, LD_LIBRARY_PATH
will be ignored during discovery of cuDNN / NCCL; CuPy binary distributions always use libraries that comes with the package to avoid errors caused by unexpected override.
CuPy v5¶
cupyx.scipy
Namespace¶
cupyx.scipy
namespace has been introduced to provide CUDA-enabled SciPy functions.
cupy.sparse
module has been renamed to cupyx.scipy.sparse
; cupy.sparse
will be kept as an alias for backward compatibility.
Dropped Support for CUDA 7.0 / 7.5¶
CuPy v5 no longer supports CUDA 7.0 / 7.5.
Update of Docker Images¶
CuPy official Docker images (see Installation Guide for details) are now updated to use CUDA 9.2 and cuDNN 7.
To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.
CuPy v4¶
Note
The version number has been bumped from v2 to v4 to align with the versioning of Chainer. Therefore, CuPy v3 does not exist.
Default Memory Pool¶
Prior to CuPy v4, memory pool was only enabled by default when CuPy is used with Chainer. In CuPy v4, memory pool is now enabled by default, even when you use CuPy without Chainer. The memory pool significantly improves the performance by mitigating the overhead of memory allocation and CPU/GPU synchronization.
Attention
When you monitor GPU memory usage (e.g., using nvidia-smi
), you may notice that GPU memory not being freed even after the array instance become out of scope.
This is expected behavior, as the default memory pool “caches” the allocated memory blocks.
To access the default memory pool instance, use get_default_memory_pool()
and get_default_pinned_memory_pool()
.
You can access the statistics and free all unused memory blocks “cached” in the memory pool.
import cupy
a = cupy.ndarray(100, dtype=cupy.float32)
mempool = cupy.get_default_memory_pool()
# For performance, the size of actual allocation may become larger than the requested array size.
print(mempool.used_bytes()) # 512
print(mempool.total_bytes()) # 512
# Even if the array goes out of scope, its memory block is kept in the pool.
a = None
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 512
# You can clear the memory block by calling `free_all_blocks`.
mempool.free_all_blocks()
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 0
You can even disable the default memory pool by the code below. Be sure to do this before any other CuPy operations.
import cupy
cupy.cuda.set_allocator(None)
cupy.cuda.set_pinned_memory_allocator(None)
Compute Capability¶
CuPy v4 now requires NVIDIA GPU with Compute Capability 3.0 or larger. See the List of CUDA GPUs to check if your GPU supports Compute Capability 3.0.
CUDA Stream¶
As CUDA Stream is fully supported in CuPy v4, cupy.cuda.RandomState.set_stream
, the function to change the stream used by the random number generator, has been removed.
Please use cupy.cuda.Stream.use()
instead.
See the discussion in #306 for more details.
cupyx
Namespace¶
cupyx
namespace has been introduced to provide features specific to CuPy (i.e., features not provided in NumPy) while avoiding collision in future.
See CuPy-specific Functions for the list of such functions.
For this rule, cupy.scatter_add()
has been moved to cupyx.scatter_add()
.
cupy.scatter_add()
is still available as an alias, but it is encouraged to use cupyx.scatter_add()
instead.
Update of Docker Images¶
CuPy official Docker images (see Installation Guide for details) are now updated to use CUDA 8.0 and cuDNN 6.0. This change was introduced because CUDA 7.5 does not support NVIDIA Pascal GPUs.
To use these images, you may need to upgrade the NVIDIA driver on your host. See Requirements of nvidia-docker for details.
CuPy v2¶
Changed Behavior of count_nonzero Function¶
For performance reasons, cupy.count_nonzero()
has been changed to return zero-dimensional ndarray
instead of int when axis=None.
See the discussion in #154 for more details.
License¶
Copyright (c) 2015 Preferred Infrastructure, Inc.
Copyright (c) 2015 Preferred Networks, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
NumPy¶
The CuPy is designed based on NumPy’s API. CuPy’s source code and documents contain the original NumPy ones.
Copyright (c) 2005-2016, NumPy Developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SciPy¶
The CuPy is designed based on SciPy’s API. CuPy’s source code and documents contain the original SciPy ones.
Copyright (c) 2001, 2002 Enthought, Inc.
All rights reserved.
Copyright (c) 2003-2016 SciPy Developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of Enthought nor the names of the SciPy Developers may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.