CuPy – NumPy-like API accelerated with CUDA¶
This is the CuPy documentation.
Overview¶
CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA.
CuPy consists of cupy.ndarray
, the core multi-dimensional array class,
and many functions on it. It supports a subset of numpy.ndarray
interface.
The following is a brief overview of supported subset of NumPy interface:
- Basic indexing (indexing by ints, slices, newaxes, and Ellipsis)
- Most of Advanced indexing (except for some indexing patterns with boolean masks)
- Data types (dtypes):
bool_
,int8
,int16
,int32
,int64
,uint8
,uint16
,uint32
,uint64
,float16
,float32
,float64
- Most of the array creation routines (
empty
,ones_like
,diag
, etc.) - Most of the array manipulation routines (
reshape
,rollaxis
,concatenate
, etc.) - All operators with broadcasting
- All universal functions for elementwise operations (except those for complex numbers).
- Linear algebra functions, including product (
dot
,matmul
, etc.) and decomposition (cholesky
,svd
, etc.), accelerated by cuBLAS. - Reduction along axes (
sum
,max
,argmax
, etc.)
CuPy also includes the following features for performance:
- User-defined elementwise CUDA kernels
- User-defined reduction CUDA kernels
- Fusing CUDA kernels to optimize user-defined calculation
- Customizable memory allocator and memory pool
- cuDNN utilities
CuPy uses on-the-fly kernel synthesis: when a kernel call is required, it
compiles a kernel code optimized for the shapes and dtypes of given arguments,
sends it to the GPU device, and executes the kernel. The compiled code is
cached to $(HOME)/.cupy/kernel_cache
directory (this cache path can be
overwritten by setting the CUPY_CACHE_DIR
environment variable). It may
make things slower at the first kernel call, though this slow down will be
resolved at the second execution. CuPy also caches the kernel code sent to GPU
device within the process, which reduces the kernel transfer time on further
calls.
Tutorial¶
Basics of CuPy¶
In this section, you will learn about the following things:
- Basics of
cupy.ndarray
- The concept of current device
- host-device and device-device array transfer
Basics of cupy.ndarray¶
CuPy is a GPU array backend that implements a subset of NumPy interface. In the following code, cp is an abbreviation of cupy, as np is numpy as is customarily done:
>>> import numpy as np
>>> import cupy as cp
The cupy.ndarray
class is in its core, which is a compatible GPU alternative of numpy.ndarray
.
>>> x_gpu = cp.array([1, 2, 3])
x_gpu
in the above example is an instance of cupy.ndarray
.
You can see its creation of identical to NumPy
’s one, except that numpy
is replaced with cupy
.
The main difference of cupy.ndarray
from numpy.ndarray
is that the content is allocated on the device memory.
Its data is allocated on the current device, which will be explained later.
Most of the array manipulations are also done in the way similar to NumPy.
Take the Euclidean norm (a.k.a L2 norm) for example.
NumPy has numpy.linalg.norm()
to calculate it on CPU.
>>> x_cpu = np.array([1, 2, 3])
>>> l2_cpu = np.linalg.norm(x_cpu)
We can calculate it on GPU with CuPy in a similar way:
>>> x_gpu = cp.array([1, 2, 3])
>>> l2_gpu = cp.linalg.norm(x_gpu)
CuPy implements many functions on cupy.ndarray
objects.
See the reference for the supported subset of NumPy API.
Understanding NumPy might help utilizing most features of CuPy.
So, we recommend you to read the NumPy documentation.
Current Device¶
CuPy has a concept of the current device, which is the default device on which the allocation, manipulation, calculation etc. of arrays are taken place. Suppose the ID of current device is 0. The following code allocates array contents on GPU 0.
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
The current device can be changed by cupy.cuda.Device.use()
as follows:
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> cp.cuda.Device(1).use()
>>> x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
If you switch the current GPU temporarily, with statement comes in handy.
>>> with cp.cuda.Device(1):
... x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
Most operations of CuPy is done on the current device. Be careful that if processing of an array on a non-current device will cause an error:
>>> with cp.cuda.Device(0):
... x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> with cp.cuda.Device(1):
... x_on_gpu0 * 2 # raises error
Traceback (most recent call last):
...
ValueError: Array device must be same as the current device: array device = 0 while current = 1
cupy.ndarray.device
attribute indicates the device on which the array is allocated.
>>> with cp.cuda.Device(1):
... x = cp.array([1, 2, 3, 4, 5])
>>> x.device
<CUDA Device 1>
Note
If the environment has only one device, such explicit device switching is not needed.
Data Transfer¶
Move arrays to a device¶
cupy.asarray()
can be used to move a numpy.ndarray
, a list, or any object
that can be passed to numpy.array()
to the current device:
>>> x_cpu = np.array([1, 2, 3])
>>> x_gpu = cp.asarray(x_cpu) # move the data to the current device.
cupy.asarray()
can accept cupy.ndarray
, which means we can
transfer the array between devices with this function.
>>> with cp.cuda.Device(0):
... x_gpu_0 = cp.ndarray([1, 2, 3]) # create an array in GPU 0
>>> with cp.cuda.Device(1):
... x_gpu_1 = cp.asarray(x_gpu_0) # move the array to GPU 1
Note
cupy.asarray()
does not copy the input array if possible.
So, if you put an array of the current device, it returns the input object itself.
If we do copy the array in this situation, you can use cupy.array()
with copy=True.
Actually cupy.asarray()
is equivalent to cupy.array(arr, dtype, copy=False).
Move array from a device to the host¶
Moving a device array to the host can be done by cupy.asnumpy()
as follows:
>>> x_gpu = cp.array([1, 2, 3]) # create an array in the current device
>>> x_cpu = cp.asnumpy(x_gpu) # move the array to the host.
We can also use cupy.ndarray.get()
:
>>> x_cpu = x_gpu.get()
How to write CPU/GPU agnostic code¶
The compatibility of CuPy with NumPy enables us to write CPU/GPU generic code.
It can be made easy by the cupy.get_array_module()
function.
This function returns the numpy
or cupy
module based on arguments.
A CPU/GPU generic function is defined using it like follows:
>>> # Stable implementation of log(1 + exp(x))
>>> def softplus(x):
... xp = cp.get_array_module(x)
... return xp.maximum(0, x) + xp.log1p(xp.exp(-abs(x)))
User-Defined Kernels¶
CuPy provides easy ways to define two types of CUDA kernels: elementwise kernels and reduction kernels. We first describe how to define and call elementwise kernels, and then describe how to define and call reduction kernels.
Basics of elementwise kernels¶
An elementwise kernel can be defined by the ElementwiseKernel
class.
The instance of this class defines a CUDA kernel which can be invoked by the __call__
method of this instance.
A definition of an elementwise kernel consists of four parts: an input argument list, an output argument list, a loop body code, and the kernel name. For example, a kernel that computes a squared difference \(f(x, y) = (x - y)^2\) is defined as follows:
>>> squared_diff = cp.ElementwiseKernel(
... 'float32 x, float32 y',
... 'float32 z',
... 'z = (x - y) * (x - y)',
... 'squared_diff')
The argument lists consist of comma-separated argument definitions. Each argument definition consists of a type specifier and an argument name. Names of NumPy data types can be used as type specifiers.
Note
n
, i
, and names starting with an underscore _
are reserved for the internal use.
The above kernel can be called on either scalars or arrays with broadcasting:
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> y = cp.arange(5, dtype=np.float32)
>>> squared_diff(x, y)
array([[ 0., 0., 0., 0., 0.],
[25., 25., 25., 25., 25.]], dtype=float32)
>>> squared_diff(x, 5)
array([[25., 16., 9., 4., 1.],
[ 0., 1., 4., 9., 16.]], dtype=float32)
Output arguments can be explicitly specified (next to the input arguments):
>>> z = cp.empty((2, 5), dtype=np.float32)
>>> squared_diff(x, y, z)
array([[ 0., 0., 0., 0., 0.],
[25., 25., 25., 25., 25.]], dtype=float32)
Type-generic kernels¶
If a type specifier is one character, then it is treated as a type placeholder.
It can be used to define a type-generic kernels.
For example, the above squared_diff
kernel can be made type-generic as follows:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... 'z = (x - y) * (x - y)',
... 'squared_diff_generic')
Type placeholders of a same character in the kernel definition indicate the same type. The actual type of these placeholders is determined by the actual argument type. The ElementwiseKernel class first checks the output arguments and then the input arguments to determine the actual type. If no output arguments are given on the kernel invocation, then only the input arguments are used to determine the type.
The type placeholder can be used in the loop body code:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... '''
... T diff = x - y;
... z = diff * diff;
... ''',
... 'squared_diff_generic')
More than one type placeholder can be used in a kernel definition. For example, the above kernel can be further made generic over multiple arguments:
>>> squared_diff_super_generic = cp.ElementwiseKernel(
... 'X x, Y y',
... 'Z z',
... 'z = (x - y) * (x - y)',
... 'squared_diff_super_generic')
Note that this kernel requires the output argument explicitly specified, because the type Z
cannot be automatically determined from the input arguments.
Raw argument specifiers¶
The ElementwiseKernel class does the indexing with broadcasting automatically, which is useful to define most elementwise computations.
On the other hand, we sometimes want to write a kernel with manual indexing for some arguments.
We can tell the ElementwiseKernel class to use manual indexing by adding the raw
keyword preceding the type specifier.
We can use the special variable i
and method _ind.size()
for the manual indexing.
i
indicates the index within the loop.
_ind.size()
indicates total number of elements to apply the elementwise operation.
Note that it represents the size after broadcast operation.
For example, a kernel that adds two vectors with reversing one of them can be written as follows:
>>> add_reverse = cp.ElementwiseKernel(
... 'T x, raw T y', 'T z',
... 'z = x + y[_ind.size() - i - 1]',
... 'add_reverse')
(Note that this is an artificial example and you can write such operation just by z = x + y[::-1]
without defining a new kernel).
A raw argument can be used like an array.
The indexing operator y[_ind.size() - i - 1]
involves an indexing computation on y
, so y
can be arbitrarily shaped and strode.
Note that raw arguments are not involved in the broadcasting.
If you want to mark all arguments as raw
, you must specify the size
argument on invocation, which defines the value of _ind.size()
.
Reduction kernels¶
Reduction kernels can be defined by the ReductionKernel
class.
We can use it by defining four parts of the kernel code:
- Identity value: This value is used for the initial value of reduction.
- Mapping expression: It is used for the pre-processing of each element to be reduced.
- Reduction expression: It is an operator to reduce the multiple mapped values.
The special variables
a
andb
are used for its operands. - Post mapping expression: It is used to transform the resulting reduced values.
The special variable
a
is used as its input. Output should be written to the output parameter.
ReductionKernel class automatically inserts other code fragments that are required for an efficient and flexible reduction implementation.
For example, L2 norm along specified axes can be written as follows:
>>> l2norm_kernel = cp.ReductionKernel(
... 'T x', # input params
... 'T y', # output params
... 'x * x', # map
... 'a + b', # reduce
... 'y = sqrt(a)', # post-reduction map
... '0', # identity value
... 'l2norm' # kernel name
... )
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> l2norm_kernel(x, axis=1)
array([ 5.477226 , 15.9687195], dtype=float32)
Note
raw
specifier is restricted for usages that the axes to be reduced are put at the head of the shape.
It means, if you want to use raw
specifier for at least one argument, the axis
argument must be 0
or a contiguous increasing sequence of integers starting from 0
, like (0, 1)
, (0, 1, 2)
, etc.
Reference Manual¶
This is the official reference of CuPy, a multi-dimensional array on CUDA with a subset of NumPy interface.
Indices and tables¶
Reference¶
Multi-Dimensional Array (ndarray)¶
cupy.ndarray
is the CuPy counterpart of NumPy numpy.ndarray
.
It provides an intuitive interface for a fixed-size multidimensional array which resides
in a CUDA device.
For the basic concept of ndarray
s, please refer to the NumPy documentation.
cupy.ndarray |
Multi-dimensional array on a CUDA device. |
Code compatibility features¶
cupy.ndarray
is designed to be interchangeable with numpy.ndarray
in terms of code compatibility as much as possible.
But occasionally, you will need to know whether the arrays you’re handling are cupy.ndarray
or numpy.ndarray
.
One example is when invoking module-level functions such as cupy.sum()
or numpy.sum()
.
In such situations, cupy.get_array_module()
can be used.
cupy.get_array_module |
Returns the array module for arguments. |
Conversion to/from NumPy arrays¶
cupy.ndarray
and numpy.ndarray
are not implicitly convertible to each other.
That means, NumPy functions cannot take cupy.ndarray
s as inputs, and vice versa.
- To convert
numpy.ndarray
tocupy.ndarray
, usecupy.array()
orcupy.asarray()
. - To convert
cupy.ndarray
tonumpy.ndarray
, usecupy.asnumpy()
orcupy.ndarray.get()
.
Note that converting between cupy.ndarray
and numpy.ndarray
incurs data transfer between
the host (CPU) device and the GPU device, which is costly in terms of performance.
cupy.array |
Creates an array on the current device. |
cupy.asarray |
Converts an object to array. |
cupy.asnumpy |
Returns an array on the host memory from an arbitrary source array. |
Universal Functions (ufunc)¶
CuPy provides universal functions (a.k.a. ufuncs) to support various elementwise operations. CuPy’s ufunc supports following features of NumPy’s one:
- Broadcasting
- Output type determination
- Casting rules
CuPy’s ufunc currently does not provide methods such as reduce
, accumulate
, reduceat
, outer
, and at
.
Ufunc class¶
cupy.ufunc |
Universal function. |
Available ufuncs¶
Math operations¶
cupy.add |
Adds two arrays elementwise. |
cupy.subtract |
Subtracts arguments elementwise. |
cupy.multiply |
Multiplies two arrays elementwise. |
cupy.divide |
Elementwise true division (i. |
cupy.logaddexp |
Computes log(exp(x1) + exp(x2)) elementwise. |
cupy.logaddexp2 |
Computes log2(exp2(x1) + exp2(x2)) elementwise. |
cupy.true_divide |
Elementwise true division (i. |
cupy.floor_divide |
Elementwise floor division (i. |
cupy.negative |
Takes numerical negative elementwise. |
cupy.power |
Computes x1 ** x2 elementwise. |
cupy.remainder |
Computes the remainder of Python division elementwise. |
cupy.mod |
Computes the remainder of Python division elementwise. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.absolute |
Elementwise absolute value function. |
cupy.rint |
Rounds each element of an array to the nearest integer. |
cupy.sign |
Elementwise sign function. |
cupy.exp |
Elementwise exponential function. |
cupy.exp2 |
Elementwise exponentiation with base 2. |
cupy.log |
Elementwise natural logarithm function. |
cupy.log2 |
Elementwise binary logarithm function. |
cupy.log10 |
Elementwise common logarithm function. |
cupy.expm1 |
Computes exp(x) - 1 elementwise. |
cupy.log1p |
Computes log(1 + x) elementwise. |
cupy.sqrt |
|
cupy.square |
Elementwise square function. |
cupy.reciprocal |
Computes 1 / x elementwise. |
Trigonometric functions¶
cupy.sin |
Elementwise sine function. |
cupy.cos |
Elementwise cosine function. |
cupy.tan |
Elementwise tangent function. |
cupy.arcsin |
Elementwise inverse-sine function (a. |
cupy.arccos |
Elementwise inverse-cosine function (a. |
cupy.arctan |
Elementwise inverse-tangent function (a. |
cupy.arctan2 |
Elementwise inverse-tangent of the ratio of two arrays. |
cupy.hypot |
Computes the hypoteneous of orthogonal vectors of given length. |
cupy.sinh |
Elementwise hyperbolic sine function. |
cupy.cosh |
Elementwise hyperbolic cosine function. |
cupy.tanh |
Elementwise hyperbolic tangent function. |
cupy.arcsinh |
Elementwise inverse of hyperbolic sine function. |
cupy.arccosh |
Elementwise inverse of hyperbolic cosine function. |
cupy.arctanh |
Elementwise inverse of hyperbolic tangent function. |
cupy.deg2rad |
Converts angles from degrees to radians elementwise. |
cupy.rad2deg |
Converts angles from radians to degrees elementwise. |
Bit-twiddling functions¶
cupy.bitwise_and |
Computes the bitwise AND of two arrays elementwise. |
cupy.bitwise_or |
Computes the bitwise OR of two arrays elementwise. |
cupy.bitwise_xor |
Computes the bitwise XOR of two arrays elementwise. |
cupy.invert |
Computes the bitwise NOT of an array elementwise. |
cupy.left_shift |
Shifts the bits of each integer element to the left. |
cupy.right_shift |
Shifts the bits of each integer element to the right. |
Comparison functions¶
cupy.greater |
Tests elementwise if x1 > x2 . |
cupy.greater_equal |
Tests elementwise if x1 >= x2 . |
cupy.less |
Tests elementwise if x1 < x2 . |
cupy.less_equal |
Tests elementwise if x1 <= x2 . |
cupy.not_equal |
Tests elementwise if x1 != x2 . |
cupy.equal |
Tests elementwise if x1 == x2 . |
cupy.logical_and |
Computes the logical AND of two arrays. |
cupy.logical_or |
Computes the logical OR of two arrays. |
cupy.logical_xor |
Computes the logical XOR of two arrays. |
cupy.logical_not |
Computes the logical NOT of an array. |
cupy.maximum |
Takes the maximum of two arrays elementwise. |
cupy.minimum |
Takes the minimum of two arrays elementwise. |
cupy.fmax |
Takes the maximum of two arrays elementwise. |
cupy.fmin |
Takes the minimum of two arrays elementwise. |
Floating point values¶
cupy.isfinite |
Tests finiteness elementwise. |
cupy.isinf |
Tests if each element is the positive or negative infinity. |
cupy.isnan |
Tests if each element is a NaN. |
cupy.signbit |
Tests elementwise if the sign bit is set (i. |
cupy.copysign |
Returns the first argument with the sign bit of the second elementwise. |
cupy.nextafter |
Computes the nearest neighbor float values towards the second argument. |
cupy.modf |
Extracts the fractional and integral parts of an array elementwise. |
cupy.ldexp |
Computes x1 * 2 ** x2 elementwise. |
cupy.frexp |
Decomposes each element to mantissa and two’s exponent. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.floor |
Rounds each element of an array to its floor integer. |
cupy.ceil |
Rounds each element of an array to its ceiling integer. |
cupy.trunc |
Rounds each element of an array towards zero. |
ufunc.at¶
Currently, CuPy does not support at
for ufuncs in general.
However, cupy.scatter_add()
can substitute add.at
as both behave identically.
Routines¶
The following pages describe NumPy-compatible routines. These functions cover a subset of NumPy routines.
Array Creation Routines¶
Basic creation routines¶
cupy.empty |
Returns an array without initializing the elements. |
cupy.empty_like |
Returns a new array with same shape and dtype of a given array. |
cupy.eye |
Returns a 2-D array with ones on the diagonals and zeros elsewhere. |
cupy.identity |
Returns a 2-D identity array. |
cupy.ones |
Returns a new array of given shape and dtype, filled with ones. |
cupy.ones_like |
Returns an array of ones with same shape and dtype as a given array. |
cupy.zeros |
Returns a new array of given shape and dtype, filled with zeros. |
cupy.zeros_like |
Returns an array of zeros with same shape and dtype as a given array. |
cupy.full |
Returns a new array of given shape and dtype, filled with a given value. |
cupy.full_like |
Returns a full array with same shape and dtype as a given array. |
Creation from other data¶
cupy.array |
Creates an array on the current device. |
cupy.asarray |
Converts an object to array. |
cupy.asanyarray |
Converts an object to array. |
cupy.ascontiguousarray |
Returns a C-contiguous array. |
cupy.copy |
Creates a copy of a given array on the current device. |
Numerical ranges¶
cupy.arange |
Returns an array with evenly spaced values within a given interval. |
cupy.linspace |
Returns an array with evenly-spaced values within a given interval. |
cupy.logspace |
Returns an array with evenly-spaced values on a log-scale. |
cupy.meshgrid |
Return coordinate matrices from coordinate vectors. |
Matrix creation¶
cupy.diag |
Returns a diagonal or a diagonal array. |
cupy.diagflat |
Creates a diagonal array from the flattened input. |
Array Manipulation Routines¶
Basic manipulations¶
cupy.copyto |
Copies values from one array to another with broadcasting. |
Shape manipulation¶
cupy.reshape |
Returns an array with new shape and same elements. |
cupy.ravel |
Returns a flattened array. |
Transposition¶
cupy.rollaxis |
Moves the specified axis backwards to the given place. |
cupy.swapaxes |
Swaps the two axes. |
cupy.transpose |
Permutes the dimensions of an array. |
Edit dimensionalities¶
cupy.atleast_1d |
Converts arrays to arrays with dimensions >= 1. |
cupy.atleast_2d |
Converts arrays to arrays with dimensions >= 2. |
cupy.atleast_3d |
Converts arrays to arrays with dimensions >= 3. |
cupy.broadcast |
Object that performs broadcasting. |
cupy.broadcast_arrays |
Broadcasts given arrays. |
cupy.broadcast_to |
Broadcast an array to a given shape. |
cupy.expand_dims |
Expands given arrays. |
cupy.squeeze |
Removes size-one axes from the shape of an array. |
Changing kind of array¶
cupy.asarray |
Converts an object to array. |
cupy.asanyarray |
Converts an object to array. |
cupy.asfortranarray |
Return an array laid out in Fortran order in memory. |
cupy.ascontiguousarray |
Returns a C-contiguous array. |
Joining arrays along axis¶
cupy.concatenate |
Joins arrays along an axis. |
cupy.stack |
Stacks arrays along a new axis. |
cupy.column_stack |
Stacks 1-D and 2-D arrays as columns into a 2-D array. |
cupy.dstack |
Stacks arrays along the third axis. |
cupy.hstack |
Stacks arrays horizontally. |
cupy.vstack |
Stacks arrays vertically. |
Splitting arrays along axis¶
cupy.split |
Splits an array into multiple sub arrays along a given axis. |
cupy.array_split |
Splits an array into multiple sub arrays along a given axis. |
cupy.dsplit |
Splits an array into multiple sub arrays along the third axis. |
cupy.hsplit |
Splits an array into multiple sub arrays horizontally. |
cupy.vsplit |
Splits an array into multiple sub arrays along the first axis. |
Repeating part of arrays along axis¶
cupy.tile |
Construct an array by repeating A the number of times given by reps. |
cupy.repeat |
Repeat arrays along an axis. |
Rearranging elements¶
cupy.flip |
Reverse the order of elements in an array along the given axis. |
cupy.fliplr |
Flip array in the left/right direction. |
cupy.flipud |
Flip array in the up/down direction. |
cupy.reshape |
Returns an array with new shape and same elements. |
cupy.roll |
Roll array elements along a given axis. |
cupy.rot90 |
Rotate an array by 90 degrees in the plane specified by axes. |
Binary Operations¶
Elementwise bit operations¶
cupy.bitwise_and |
Computes the bitwise AND of two arrays elementwise. |
cupy.bitwise_or |
Computes the bitwise OR of two arrays elementwise. |
cupy.bitwise_xor |
Computes the bitwise XOR of two arrays elementwise. |
cupy.invert |
Computes the bitwise NOT of an array elementwise. |
cupy.left_shift |
Shifts the bits of each integer element to the left. |
cupy.right_shift |
Shifts the bits of each integer element to the right. |
Bit packing¶
cupy.packbits |
Packs the elements of a binary-valued array into bits in a uint8 array. |
cupy.unpackbits |
Unpacks elements of a uint8 array into a binary-valued output array. |
Output formatting¶
cupy.binary_repr |
Return the binary representation of the input number as a string. |
Indexing Routines¶
cupy.c_ |
Translates slice objects to concatenation along the second axis. |
cupy.r_ |
Translates slice objects to concatenation along the first axis. |
cupy.nonzero |
Return the indices of the elements that are non-zero. |
cupy.where |
Return elements, either from x or y, depending on condition. |
cupy.ix_ |
Construct an open mesh from multiple sequences. |
cupy.take |
Takes elements of an array at specified indices along an axis. |
cupy.choose |
|
cupy.diag |
Returns a diagonal or a diagonal array. |
cupy.diagonal |
Returns specified diagonals. |
cupy.fill_diagonal |
Fills the main diagonal of the given array of any dimensionality. |
Input and Output¶
NPZ files¶
cupy.load |
Loads arrays or pickled objects from .npy , .npz or pickled file. |
cupy.save |
Saves an array to a binary file in .npy format. |
cupy.savez |
Saves one or more arrays into a file in uncompressed .npz format. |
cupy.savez_compressed |
Saves one or more arrays into a file in compressed .npz format. |
String formatting¶
cupy.array_repr |
Returns the string representation of an array. |
cupy.array_str |
Returns the string representation of the content of an array. |
Base-n representations¶
cupy.binary_repr |
Return the binary representation of the input number as a string. |
cupy.base_repr |
Return a string representation of a number in the given base system. |
Linear Algebra¶
Matrix and vector products¶
cupy.dot |
Returns a dot product of two arrays. |
cupy.vdot |
Returns the dot product of two vectors. |
cupy.inner |
Returns the inner product of two arrays. |
cupy.outer |
Returns the outer product of two vectors. |
cupy.matmul |
Returns the matrix product of two arrays and is the implementation of the @ operator introduced in Python 3. |
cupy.tensordot |
Returns the tensor dot product of two arrays along specified axes. |
cupy.einsum |
Evaluates the Einstein summation convention on the operands. |
cupy.kron |
Returns the kronecker product of two arrays. |
Decompositions¶
cupy.linalg.cholesky |
Cholesky decomposition. |
cupy.linalg.qr |
QR decomposition. |
cupy.linalg.svd |
Singular Value Decomposition. |
Matrix eigenvalues¶
cupy.linalg.eigh |
Eigenvalues and eigenvectors of a symmetric matrix. |
cupy.linalg.eigvalsh |
Calculates eigenvalues of a symmetric matrix. |
Norms etc.¶
cupy.linalg.det |
Retruns the deteminant of an array. |
cupy.linalg.norm |
Returns one of matrix norms specified by ord parameter. |
cupy.linalg.matrix_rank |
Return matrix rank of array using SVD method |
cupy.linalg.slogdet |
Returns sign and logarithm of the determinat of an array. |
cupy.trace |
Returns the sum along the diagonals of an array. |
Solving linear equations¶
cupy.linalg.solve |
Solves a linear matrix equation. |
cupy.linalg.tensorsolve |
Solves tensor equations denoted by ax = b . |
cupy.linalg.inv |
Computes the inverse of a matrix. |
cupy.linalg.pinv |
Compute the Moore-Penrose pseudoinverse of a matrix. |
Logic Functions¶
Truth value testing¶
cupy.all |
Tests whether all array elements along a given axis evaluate to True. |
cupy.any |
Tests whether any array elements along a given axis evaluate to True. |
Infinities and NaNs¶
cupy.isfinite |
Tests finiteness elementwise. |
cupy.isinf |
Tests if each element is the positive or negative infinity. |
cupy.isnan |
Tests if each element is a NaN. |
Array type testing¶
cupy.isscalar |
Returns True if the type of num is a scalar type. |
Logic operations¶
cupy.logical_and |
Computes the logical AND of two arrays. |
cupy.logical_or |
Computes the logical OR of two arrays. |
cupy.logical_not |
Computes the logical NOT of an array. |
cupy.logical_xor |
Computes the logical XOR of two arrays. |
Comparison operations¶
cupy.greater |
Tests elementwise if x1 > x2 . |
cupy.greater_equal |
Tests elementwise if x1 >= x2 . |
cupy.less |
Tests elementwise if x1 < x2 . |
cupy.less_equal |
Tests elementwise if x1 <= x2 . |
cupy.equal |
Tests elementwise if x1 == x2 . |
cupy.not_equal |
Tests elementwise if x1 != x2 . |
Mathematical Functions¶
Trigonometric functions¶
cupy.sin |
Elementwise sine function. |
cupy.cos |
Elementwise cosine function. |
cupy.tan |
Elementwise tangent function. |
cupy.arcsin |
Elementwise inverse-sine function (a. |
cupy.arccos |
Elementwise inverse-cosine function (a. |
cupy.arctan |
Elementwise inverse-tangent function (a. |
cupy.hypot |
Computes the hypoteneous of orthogonal vectors of given length. |
cupy.arctan2 |
Elementwise inverse-tangent of the ratio of two arrays. |
cupy.deg2rad |
Converts angles from degrees to radians elementwise. |
cupy.rad2deg |
Converts angles from radians to degrees elementwise. |
cupy.degrees |
Converts angles from radians to degrees elementwise. |
cupy.radians |
Converts angles from degrees to radians elementwise. |
Hyperbolic functions¶
cupy.sinh |
Elementwise hyperbolic sine function. |
cupy.cosh |
Elementwise hyperbolic cosine function. |
cupy.tanh |
Elementwise hyperbolic tangent function. |
cupy.arcsinh |
Elementwise inverse of hyperbolic sine function. |
cupy.arccosh |
Elementwise inverse of hyperbolic cosine function. |
cupy.arctanh |
Elementwise inverse of hyperbolic tangent function. |
Rounding¶
cupy.rint |
Rounds each element of an array to the nearest integer. |
cupy.floor |
Rounds each element of an array to its floor integer. |
cupy.ceil |
Rounds each element of an array to its ceiling integer. |
cupy.trunc |
Rounds each element of an array towards zero. |
cupy.fix |
If given value x is positive, it return floor(x). |
Sums and products¶
cupy.sum |
Returns the sum of an array along given axes. |
cupy.prod |
Returns the product of an array along given axes. |
cupy.cumsum |
Returns the cumulative sum of an array along a given axis. |
cupy.cumprod |
Returns the cumulative product of an array along a given axis. |
Exponential and logarithm functions¶
cupy.exp |
Elementwise exponential function. |
cupy.expm1 |
Computes exp(x) - 1 elementwise. |
cupy.exp2 |
Elementwise exponentiation with base 2. |
cupy.log |
Elementwise natural logarithm function. |
cupy.log10 |
Elementwise common logarithm function. |
cupy.log2 |
Elementwise binary logarithm function. |
cupy.log1p |
Computes log(1 + x) elementwise. |
cupy.logaddexp |
Computes log(exp(x1) + exp(x2)) elementwise. |
cupy.logaddexp2 |
Computes log2(exp2(x1) + exp2(x2)) elementwise. |
Floating point manipulations¶
cupy.signbit |
Tests elementwise if the sign bit is set (i. |
cupy.copysign |
Returns the first argument with the sign bit of the second elementwise. |
cupy.ldexp |
Computes x1 * 2 ** x2 elementwise. |
cupy.frexp |
Decomposes each element to mantissa and two’s exponent. |
cupy.nextafter |
Computes the nearest neighbor float values towards the second argument. |
Arithmetic operations¶
cupy.negative |
Takes numerical negative elementwise. |
cupy.add |
Adds two arrays elementwise. |
cupy.subtract |
Subtracts arguments elementwise. |
cupy.multiply |
Multiplies two arrays elementwise. |
cupy.divide |
Elementwise true division (i. |
cupy.true_divide |
Elementwise true division (i. |
cupy.floor_divide |
Elementwise floor division (i. |
cupy.power |
Computes x1 ** x2 elementwise. |
cupy.fmod |
Computes the remainder of C division elementwise. |
cupy.mod |
Computes the remainder of Python division elementwise. |
cupy.remainder |
Computes the remainder of Python division elementwise. |
cupy.modf |
Extracts the fractional and integral parts of an array elementwise. |
cupy.reciprocal |
Computes 1 / x elementwise. |
Miscellaneous¶
cupy.clip |
Clips the values of an array to a given interval. |
cupy.sqrt |
|
cupy.square |
Elementwise square function. |
cupy.absolute |
Elementwise absolute value function. |
cupy.sign |
Elementwise sign function. |
cupy.maximum |
Takes the maximum of two arrays elementwise. |
cupy.minimum |
Takes the minimum of two arrays elementwise. |
cupy.fmax |
Takes the maximum of two arrays elementwise. |
cupy.fmin |
Takes the minimum of two arrays elementwise. |
cupy.blackman |
Returns the Blackman window. |
cupy.hamming |
Returns the Hamming window. |
cupy.hanning |
Returns the Hanning window. |
Random Sampling (cupy.random
)¶
CuPy’s random number generation routines are based on cuRAND.
They cover a small fraction of numpy.random
.
The big difference of cupy.random
from numpy.random
is that cupy.random
supports dtype
option for most functions.
This option enables us to generate float32 values directly without any space overhead.
Sample random data¶
cupy.random.choice |
Returns an array of random values from a given 1-D array. |
cupy.random.rand |
Returns an array of uniform random values over the interval [0, 1) . |
cupy.random.randn |
Returns an array of standard normal random values. |
cupy.random.randint |
Returns a scalar or an array of integer values over [low, high) . |
cupy.random.random_integers |
Return a scalar or an array of integer values over [low, high] |
cupy.random.random_sample |
Returns an array of random values over the interval [0, 1) . |
cupy.random.random |
Returns an array of random values over the interval [0, 1) . |
cupy.random.ranf |
Returns an array of random values over the interval [0, 1) . |
cupy.random.sample |
Returns an array of random values over the interval [0, 1) . |
cupy.random.bytes |
Returns random bytes. |
Distributions¶
cupy.random.gumbel |
Returns an array of samples drawn from a Gumbel distribution. |
cupy.random.lognormal |
Returns an array of samples drawn from a log normal distribution. |
cupy.random.normal |
Returns an array of normally distributed samples. |
cupy.random.standard_normal |
Returns an array of samples drawn from the standard normal distribution. |
cupy.random.uniform |
Returns an array of uniformly-distributed samples over an interval. |
Random number generator¶
cupy.random.seed |
Resets the state of the random number generator with a seed. |
cupy.random.get_random_state |
Gets the state of the random number generator for the current device. |
cupy.random.RandomState |
Portable container of a pseudo-random number generator. |
Permutations¶
cupy.random.shuffle |
Shuffles an array. |
Sorting, Searching, and Counting¶
cupy.sort |
Returns a sorted copy of an array with a stable sorting algorithm. |
cupy.lexsort |
Perform an indirect sort using an array of keys. |
cupy.argsort |
Returns the indices that would sort an array with a stable sorting. |
cupy.argmax |
Returns the indices of the maximum along an axis. |
cupy.argmin |
Returns the indices of the minimum along an axis. |
cupy.partition |
Returns a partially sorted copy of an array. |
cupy.count_nonzero |
Counts the number of non-zero values in the array. |
cupy.nonzero |
Return the indices of the elements that are non-zero. |
cupy.flatnonzero |
Return indices that are non-zero in the flattened version of a. |
cupy.where |
Return elements, either from x or y, depending on condition. |
Statistics¶
Order statistics¶
cupy.amin |
Returns the minimum of an array or the minimum along an axis. |
cupy.amax |
Returns the maximum of an array or the maximum along an axis. |
cupy.nanmin |
Returns the minimum of an array along an axis ignoring NaN. |
cupy.nanmax |
Returns the maximum of an array along an axis ignoring NaN. |
Means and variances¶
cupy.mean |
Returns the arithmetic mean along an axis. |
cupy.var |
Returns the variance along an axis. |
cupy.std |
Returns the standard deviation along an axis. |
Histograms¶
cupy.bincount |
Count number of occurrences of each value in array of non-negative ints. |
External Functions¶
cupy.scatter_add |
Adds given values to specified elements of an array. |
Sparse matrix¶
CuPy supports sparse matrices using cuSPARSE. These matrices have the same interfaces of SciPy’s sparse matrices.
Sparse matrix classes¶
cupy.sparse.coo_matrix |
COOrdinate format sparse matrix. |
cupy.sparse.csr_matrix |
Compressed Sparse Row matrix. |
cupy.sparse.csc_matrix |
Compressed Sparse Column matrix. |
cupy.sparse.dia_matrix |
Sparse matrix with DIAgonal storage. |
cupy.sparse.spmatrix |
Base class of all sparse matrixes. |
Functions¶
Building sparse matrices¶
cupy.sparse.eye |
Creates a sparse matrix with ones on diagonal. |
cupy.sparse.identity |
Creates an identity matrix in sparse format. |
Identifying sparse matrices¶
cupy.sparse.issparse |
Checks if a given matrix is a sparse matrix. |
cupy.sparse.isspmatrix |
Checks if a given matrix is a sparse matrix. |
cupy.sparse.isspmatrix_csc |
Checks if a given matrix is of CSC format. |
cupy.sparse.isspmatrix_csr |
Checks if a given matrix is of CSR format. |
cupy.sparse.isspmatrix_coo |
Checks if a given matrix is of COO format. |
cupy.sparse.isspmatrix_dia |
Checks if a given matrix is of DIA format. |
NumPy-CuPy Generic Code Support¶
cupy.get_array_module |
Returns the array module for arguments. |
Low-Level CUDA Support¶
Device management¶
cupy.cuda.Device |
Object that represents a CUDA device. |
Memory management¶
cupy.cuda.Memory |
Memory allocation on a CUDA device. |
cupy.cuda.MemoryPointer |
Pointer to a point on a device memory. |
cupy.cuda.alloc |
Calls the current allocator. |
cupy.cuda.set_allocator |
Sets the current allocator. |
cupy.cuda.MemoryPool |
Memory pool for all devices on the machine. |
Memory hook¶
cupy.cuda.MemoryHook |
Base class of hooks for Memory allocations. |
cupy.cuda.memory_hooks.DebugPrintHook |
Memory hook that prints debug information. |
cupy.cuda.memory_hooks.LineProfileHook |
Code line CuPy memory profiler. |
Streams and events¶
cupy.cuda.Stream |
CUDA stream. |
cupy.cuda.Event |
CUDA event, a synchronization point of CUDA streams. |
cupy.cuda.get_elapsed_time |
Gets the elapsed time between two events. |
Profiler¶
cupy.cuda.profile |
Enable CUDA profiling during with statement. |
cupy.cuda.profiler.initialize |
Initialize the CUDA profiler. |
cupy.cuda.profiler.start |
Enable profiling. |
cupy.cuda.profiler.stop |
Disable profiling. |
cupy.cuda.nvtx.Mark |
Marks an instantaneous event (marker) in the application. |
cupy.cuda.nvtx.MarkC |
Marks an instantaneous event (marker) in the application. |
cupy.cuda.nvtx.RangePush |
Starts a nested range. |
cupy.cuda.nvtx.RangePushC |
Starts a nested range. |
cupy.cuda.nvtx.RangePop |
Ends a nested range. |
Kernel binary memoization¶
cupy.memoize |
Makes a function memoizing the result for each argument and device. |
cupy.clear_memo |
Clears the memoized results for all functions decorated by memoize. |
Custom kernels¶
cupy.ElementwiseKernel |
User-defined elementwise kernel. |
cupy.ReductionKernel |
User-defined reduction kernel. |
Testing Modules¶
CuPy offers testing utilities to support unit testing.
They are under namespace cupy.testing
.
Standard Assertions¶
The assertions have same names as NumPy’s ones.
The difference from NumPy is that they can accept both numpy.ndarray
and cupy.ndarray
.
cupy.testing.assert_allclose |
Raises an AssertionError if objects are not equal up to desired tolerance. |
cupy.testing.assert_array_almost_equal |
Raises an AssertionError if objects are not equal up to desired precision. |
cupy.testing.assert_array_almost_equal_nulp |
Compare two arrays relatively to their spacing. |
cupy.testing.assert_array_max_ulp |
Check that all items of arrays differ in at most N Units in the Last Place. |
cupy.testing.assert_array_equal |
Raises an AssertionError if two array_like objects are not equal. |
cupy.testing.assert_array_list_equal |
Compares lists of arrays pairwise with assert_array_equal . |
cupy.testing.assert_array_less |
Raises an AssertionError if array_like objects are not ordered by less than. |
NumPy-CuPy Consistency Check¶
The following decorators are for testing consistency between CuPy’s functions and corresponding NumPy’s ones.
cupy.testing.numpy_cupy_allclose |
Decorator that checks NumPy results and CuPy ones are close. |
cupy.testing.numpy_cupy_array_almost_equal |
Decorator that checks NumPy results and CuPy ones are almost equal. |
cupy.testing.numpy_cupy_array_almost_equal_nulp |
Decorator that checks results of NumPy and CuPy are equal w. |
cupy.testing.numpy_cupy_array_max_ulp |
Decorator that checks results of NumPy and CuPy ones are equal w. |
cupy.testing.numpy_cupy_array_equal |
Decorator that checks NumPy results and CuPy ones are equal. |
cupy.testing.numpy_cupy_array_list_equal |
Decorator that checks the resulting lists of NumPy and CuPy’s one are equal. |
cupy.testing.numpy_cupy_array_less |
Decorator that checks the CuPy result is less than NumPy result. |
cupy.testing.numpy_cupy_raises |
Decorator that checks the NumPy and CuPy throw same errors. |
Parameterized dtype Test¶
The following decorators offer the standard way for parameterized test with respect to single or the combination of dtype(s).
cupy.testing.for_dtypes |
Decorator for parameterized dtype test. |
cupy.testing.for_all_dtypes |
Decorator that checks the fixture with all dtypes. |
cupy.testing.for_float_dtypes |
Decorator that checks the fixture with all float dtypes. |
cupy.testing.for_signed_dtypes |
Decorator that checks the fixture with signed dtypes. |
cupy.testing.for_unsigned_dtypes |
Decorator that checks the fixture with all dtypes. |
cupy.testing.for_int_dtypes |
Decorator that checks the fixture with integer and optionally bool dtypes. |
cupy.testing.for_dtypes_combination |
Decorator that checks the fixture with a product set of dtypes. |
cupy.testing.for_all_dtypes_combination |
Decorator that checks the fixture with a product set of all dtypes. |
cupy.testing.for_signed_dtypes_combination |
Decorator for parameterized test w. |
cupy.testing.for_unsigned_dtypes_combination |
Decorator for parameterized test w. |
cupy.testing.for_int_dtypes_combination |
Decorator for parameterized test w. |
Parameterized order Test¶
The following decorators offer the standard way to parameterize tests with orders.
cupy.testing.for_orders |
Decorator to parameterize tests with order. |
cupy.testing.for_CF_orders |
Decorator that checks the fixture with orders ‘C’ and ‘F’. |
Profiling¶
time range¶
cupy.prof.TimeRangeDecorator |
Decorator to mark function calls with range in NVIDIA profiler |
cupy.prof.time_range |
A context manager to describe the enclosed block as a nested range |
Environment variables¶
Here are the environment variables CuPy uses.
CUPY_CACHE_DIR |
Path to the directory to store kernel cache.
${HOME}/.cupy/kernel_cache is used by default.
See Overview for details. |
CUPY_CACHE_SAVE_CUDA_SOURCE |
If set to 1, CUDA source file will be saved along with compiled binary in the cache directory for debug purpose. It is disabled by default. Note: source file will not be saved if the compiled binary is already stored in the cache. |
CUPY_DUMP_CUDA_SOURCE_ON_ERROR |
If set to 1, when CUDA kernel compilation fails, CuPy dumps CUDA kernel code to standard error. It is disabled by default. |
For install¶
These environment variables are only used during installation.
CUDA_PATH |
Path to the directory containing CUDA.
The parent of the directory containing nvcc is used as default.
When nvcc is not found, /usr/local/cuda is used.
See Install CuPy with CUDA for details. |
NVCC |
Define the compiler to use when compiling CUDA files. |
Difference between CuPy and NumPy¶
The interface of CuPy is designed to obey that of NumPy. However, there are some differeneces.
Cast behavior from float to integer¶
Some casting behaviors from float to integer are not defined in C++ specification. The casting from a negative float to unsigned integer and infinity to integer is one of such examples. The behavior of NumPy depends on your CPU architecture. This is Intel CPU result.
>>> np.array([-1], dtype=np.float32).astype(np.uint32)
array([4294967295], dtype=uint32)
>>> cupy.array([-1], dtype=np.float32).astype(np.uint32)
array([0], dtype=uint32)
>>> np.array([float('inf')], dtype=np.float32).astype(np.int32)
array([-2147483648], dtype=int32)
>>> cupy.array([float('inf')], dtype=np.float32).astype(np.int32)
array([2147483647], dtype=int32)
Random methods support dtype argument¶
NumPy’s random value generator does not support dtype option and it always resturns a float32
value.
We support the option in CuPy because cuRAND, which is used in CuPy, supports any types of float values.
>>> np.random.randn(dtype=np.float32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: randn() got an unexpected keyword argument 'dtype'
>>> cupy.random.randn(dtype=np.float32)
array(0.10689262300729752, dtype=float32)
Out-of-bounds indices¶
CuPy handles out-of-bounds indices differently by default from NumPy when using integer array indexing. NumPy handles them by raising an error, but CuPy wraps around them.
>>> x = np.array([0, 1, 2])
>>> x[[1, 3]] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 1 with size 3
>>> x = cupy.array([0, 1, 2])
>>> x[[1, 3]] = 10
>>> x
array([10, 10, 2])
Duplicate values in indices¶
CuPy’s __setitem__
behaves differently from NumPy when integer arrays
reference the same location multiple times.
In that case, the value that is actually stored is undefined.
Here is an example of CuPy.
>>> a = cupy.zeros((2,))
>>> i = cupy.arange(10000) % 2
>>> v = cupy.arange(10000).astype(np.float32)
>>> a[i] = v
>>> a
array([ 9150., 9151.])
NumPy stores the value corresponding to the last element among elements referencing duplicate locations.
>>> a_cpu = np.zeros((2,))
>>> i_cpu = np.arange(10000) % 2
>>> v_cpu = np.arange(10000).astype(np.float32)
>>> a_cpu[i_cpu] = v_cpu
>>> a_cpu
array([9998., 9999.])
Reduction methods return zero-dimensional array¶
NumPy’s reduction functions (e.g. numpy.sum()
) return scalar values (e.g. numpy.float32
).
However CuPy counterparts return zero-dimensional cupy.ndarray
s.
That is because CuPy scalar values (e.g. cupy.float32
) are aliases of NumPy scalar values and are allocated in CPU memory.
If these types were returned, it would be required to synchronize between GPU and CPU.
If you want to use scalar values, cast the returned arrays explicitly.
>>> type(np.sum(np.arange(3))) == np.int64
True
>>> type(cupy.sum(cupy.arange(3))) == cupy.core.core.ndarray
True
Data types¶
Data type of CuPy arrays cannot be non-numeric like strings and objects. See Overview for details.
Array creation from Python objects¶
Currently, cupy.array()
or cupy.asarray()
cannot create an array from Python object containing CuPy array (e.g., a list of CuPy arrays).
Use cupy.stack()
instead.
>>> data_cpu = [np.arange(10), np.arange(10)]
>>> np.asarray(data_cpu)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> data_gpu = [cupy.arange(10), cupy.arange(10)]
>>> cupy.asarray(data_gpu)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unsupported dtype object
>>> cupy.stack(data_gpu)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Universal Functions only work with CuPy array or scalar¶
Unlike NumPy, Universal Functions in CuPy only work with CuPy array or scalar.
They do not accept other objects (e.g., lists or numpy.ndarray
).
>>> np.power([np.arange(5)], 2)
array([[ 0, 1, 4, 9, 16]])
>>> cupy.power([cupy.arange(5)], 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unsupported type <class 'list'>
API Compatibility Policy¶
This document expresses the design policy on compatibilities of CuPy APIs. Development team should obey this policy on deciding to add, extend, and change APIs and their behaviors.
This document is written for both users and developers. Users can decide the level of dependencies on CuPy’s implementations in their codes based on this document. Developers should read through this document before creating pull requests that contain changes on the interface. Note that this document may contain ambiguities on the level of supported compatibilities.
Versioning and Backward Compatibilities¶
The updates of CuPy are classified into three levels: major, minor, and revision. These types have distinct levels of backward compatibilities.
- Major update contains disruptive changes that break the backward compatibility.
- Minor update contains addition and extension to the APIs keeping the supported backward compatibility.
- Revision update contains improvements on the API implementations without changing any API specifications.
Note that we do not support full backward compatibility, which is almost infeasible for Python-based APIs, since there is no way to completely hide the implementation details.
Processes to Break Backward Compatibilities¶
Deprecation, Dropping, and Its Preparation¶
Any APIs may be deprecated at some minor updates. In such a case, the deprecation note is added to the API documentation, and the API implementation is changed to fire deprecation warning (if possible). There should be another way to reimplement the same things previously written with the deprecated APIs.
Any APIs may be marked as to be dropped in the future. In such a case, the dropping is stated in the documentation with the major version number on which the API is planned to be dropped, and the API implementation is changed to fire the future warning (if possible).
The actual dropping should be done through the following steps:
- Make the API deprecated. At this point, users should not need the deprecated API in their new application codes.
- After that, mark the API as to be dropped in the future. It must be done in the minor update different from that of the deprecation.
- At the major version announced in the above update, drop the API.
Consequently, it takes at least two minor versions to drop any APIs after the first deprecation.
API Changes and Its Preparation¶
Any APIs may be marked as to be changed in the future for changes without backward compatibility. In such a case, the change is stated in the documentation with the version number on which the API is planned to be changed, and the API implementation is changed to fire the future warning on the certain usages.
The actual change should be done in the following steps:
- Announce that the API will be changed in the future. At this point, the actual version of change need not be accurate.
- After the announcement, mark the API as to be changed in the future with version number of planned changes. At this point, users should not use the marked API in their new application codes.
- At the major update announced in the above update, change the API.
Supported Backward Compatibility¶
This section defines backward compatibilities that minor updates must maintain.
Documented Interface¶
CuPy has the official API documentation. Many applications can be written based on the documented features. We support backward compatibilities of documented features. In other words, codes only based on the documented features run correctly with minor/revision-updated versions.
Developers are encouraged to use apparent names for objects of implementation details. For example, attributes outside of the documented APIs should have one or more underscores at the prefix of their names.
Undocumented behaviors¶
Behaviors of CuPy implementation not stated in the documentation are undefined. Undocumented behaviors are not guaranteed to be stable between different minor/revision versions.
Minor update may contain changes to undocumented behaviors. For example, suppose an API X is added at the minor update. In the previous version, attempts to use X cause AttributeError. This behavior is not stated in the documentation, so this is undefined. Thus, adding the API X in minor version is permissible.
Revision update may also contain changes to undefined behaviors. Typical example is a bug fix. Another example is an improvement on implementation, which may change the internal object structures not shown in the documentation. As a consequence, even revision updates do not support compatibility of pickling, unless the full layout of pickled objects is clearly documented.
Documentation Error¶
Compatibility is basically determined based on the documentation, though it sometimes contains errors. It may make the APIs confusing to assume the documentation always stronger than the implementations. We therefore may fix the documentation errors in any updates that may break the compatibility in regard to the documentation.
Note
Developers MUST NOT fix the documentation and implementation of the same functionality at the same time in revision updates as “bug fix”. Such a change completely breaks the backward compatibility. If you want to fix the bugs in both sides, first fix the documentation to fit it into the implementation, and start the API changing procedure described above.
Object Attributes and Properties¶
Object attributes and properties are sometimes replaced by each other at minor updates. It does not break the user codes, except the codes depend on how the attributes and properties are implemented.
Functions and Methods¶
Methods may be replaced by callable attributes keeping the compatibility of parameters and return values in minor updates. It does not break the user codes, except the codes depend on how the methods and callable attributes are implemented.
Exceptions and Warnings¶
The specifications of raising exceptions are considered as a part of standard backward compatibilities. No exception is raised in the future versions with correct usages that the documentation allows, unless the API changing process is completed.
On the other hand, warnings may be added at any minor updates for any APIs. It means minor updates do not keep backward compatibility of warnings.
Installation Compatibility¶
The installation process is another concern of compatibilities. We support environmental compatibilities in the following ways.
- Any changes of dependent libraries that force modifications on the existing environments must be done in major updates.
Such changes include following cases:
- dropping supported versions of dependent libraries (e.g. dropping cuDNN v2)
- adding new mandatory dependencies (e.g. adding h5py to setup_requires)
- Supporting optional packages/libraries may be done in minor updates (e.g. supporting h5py in optional features).
Note
The installation compatibility does not guarantee that all the features of CuPy correctly run on supported environments. It may contain bugs that only occurs in certain environments. Such bugs should be fixed in some updates.
Contribution Guide¶
This is a guide for all contributions to CuPy. The development of CuPy is running on the official repository at GitHub. Anyone that wants to register an issue or to send a pull request should read through this document.
Classification of Contributions¶
There are several ways to contribute to CuPy community:
- Registering an issue
- Sending a pull request (PR)
- Sending a question to CuPy User Group
- Writing a post about CuPy
This document mainly focuses on 1 and 2, though other contributions are also appreciated.
Release and Milestone¶
We are using GitHub Flow as our basic working process. In particular, we are using the master branch for our development, and releases are made as tags.
Releases are classified into three groups: major, minor, and revision. This classification is based on following criteria:
- Major update contains disruptive changes that break the backward compatibility.
- Minor update contains additions and extensions to the APIs keeping the supported backward compatibility.
- Revision update contains improvements on the API implementations without changing any API specification.
The release classification is reflected into the version number x.y.z, where x, y, and z corresponds to major, minor, and revision updates, respectively.
We set a milestone for an upcoming release. The milestone is of name ‘vX.Y.Z’, where the version number represents a revision release at the outset. If at least one feature PR is merged in the period, we rename the milestone to represent a minor release (see the next section for the PR types).
See also API Compatibility Policy.
Issues and PRs¶
Issues and PRs are classified into following categories:
- Bug: bug reports (issues) and bug fixes (PRs)
- Enhancement: implementation improvements without breaking the interface
- Feature: feature requests (issues) and their implementations (PRs)
- NoCompat: disrupts backward compatibility
- Test: test fixes and updates
- Document: document fixes and improvements
- Example: fixes and improvements on the examples
- Install: fixes installation script
- Contribution-Welcome: issues that we request for contribution (only issues are categorized to this)
- Other: other issues and PRs
Issues and PRs are labeled by these categories. This classification is often reflected into its corresponding release category: Feature issues/PRs are contained into minor/major releases and NoCompat issues/PRs are contained into major releases, while other issues/PRs can be contained into any releases including revision ones.
On registering an issue, write precise explanations on what you want CuPy to be. Bug reports must include necessary and sufficient conditions to reproduce the bugs. Feature requests must include what you want to do (and why you want to do, if needed). You can contain your thoughts on how to realize it into the feature requests, though what part is most important for discussions.
Warning
If you have a question on usages of CuPy, it is highly recommended to send a post to CuPy User Group instead of the issue tracker. The issue tracker is not a place to share knowledge on practices. We may redirect question issues to CuPy User Group.
If you can write code to fix an issue, send a PR to the master branch. Before writing your code for PRs, read through the Coding Guidelines. The description of any PR must contain a precise explanation of what and how you want to do; it is the first documentation of your code for developers, a very important part of your PR.
Once you send a PR, it is automatically tested on Travis CI for Linux and Mac OS X, and on AppVeyor for Windows. Your PR need to pass at least the test for Linux on Travis CI. After the automatic test passes, some of the core developers will start reviewing your code. Note that this automatic PR test only includes CPU tests.
Note
We are also running continuous integration with GPU tests for the master branch. Since this service is running on our internal server, we do not use it for automatic PR tests to keep the server secure.
Even if your code is not complete, you can send a pull request as a work-in-progress PR by putting the [WIP]
prefix to the PR title.
If you write a precise explanation about the PR, core developers and other contributors can join the discussion about how to proceed the PR.
Coding Guidelines¶
We use PEP8 and a part of OpenStack Style Guidelines related to general coding style as our basic style guidelines.
To check your code, use autopep8
and flake8
command installed by hacking
package:
$ pip install autopep8 hacking
$ autopep8 --global-config .pep8 path/to/your/code.py
$ flake8 path/to/your/code.py
To check Cython code, use .flake8.cython
configuration file:
$ flake8 --config=.flake8.cython path/to/your/cython/code.pyx
The autopep8
supports automatically correct Python code to conform to the PEP 8 style guide:
$ autopep8 --in-place --global-config .pep8 path/to/your/code.py
The flake8
command lets you know the part of your code not obeying our style guidelines.
Before sending a pull request, be sure to check that your code passes the flake8
checking.
Note that flake8
command is not perfect.
It does not check some of the style guidelines.
Here is a (not-complete) list of the rules that flake8
cannot check.
- Relative imports are prohibited. [H304]
- Importing non-module symbols is prohibited.
- Import statements must be organized into three parts: standard libraries, third-party libraries, and internal imports. [H306]
In addition, we restrict the usage of shortcut symbols in our code base.
They are symbols imported by packages and sub-packages of cupy
.
For example, cupy.cuda.Device
is a shortcut of cupy.cuda.device.Device
.
It is not allowed to use such shortcuts in the ``cupy`` library implementation.
Note that you can still use them in tests
and examples
directories.
Once you send a pull request, your coding style is automatically checked by Travis-CI. The reviewing process starts after the check passes.
The CuPy is designed based on NumPy’s API design. CuPy’s source code and documents contain the original NumPy ones. Please note the followings when writing the document.
- In order to identify overlapping parts, it is preferable to add some remarks that this document is just copied or altered from the original one. It is also preferable to briefly explain the specification of the function in a short paragraph, and refer to the corresponding function in NumPy so that users can read the detailed document. However, it is possible to include a complete copy of the document with such a remark if users cannot summarize in such a way.
- If a function in CuPy only implements a limited amount of features in the original one, users should explicitly describe only what is implemented in the document.
Testing Guidelines¶
Testing is one of the most important part of your code. You must test your code by unit tests following our testing guidelines.
Note that we are using pytest and mock package for testing, so install them before writing your code:
$ pip install pytest mock
In order to run unit tests at the repository root, you first have to build Cython files in place by running the following command:
$ pip install -e .
Note
When you modify *.pxd
files, before running pip install -e .
, you must clean *.cpp
and *.so
files once with the following command, because Cython does not automatically rebuild those files nicely:
$ git clean -fdx
Note
It’s not officially supported, but you can use ccache to reduce compilation time. On Ubuntu 16.04, you can set up as follows:
$ sudo apt-get install ccache
$ export PATH=/usr/lib/ccache:$PATH
See ccache for details.
If you want to use ccache for nvcc, please install ccache v3.3 or later.
You also need to set environment variable NVCC='ccache nvcc'
.
Once Cython modules are built, you can run unit tests by running the following command at the repository root:
$ python -m pytest
CUDA must be installed to run unit tests.
Some GPU tests require cuDNN to run.
In order to skip unit tests that require cuDNN, specify -m='not cudnn'
option:
$ python -m pytest path/to/your/test.py -m='not cudnn'
Some GPU tests involve multiple GPUs.
If you want to run GPU tests with insufficient number of GPUs, specify the number of available GPUs to CUPY_TEST_GPU_LIMIT
.
For example, if you have only one GPU, launch pytest
by the following command to skip multi-GPU tests:
$ export CUPY_TEST_GPU_LIMIT=1
$ python -m pytest path/to/gpu/test.py
Tests are put into the tests/cupy_tests
and tests/install_tests
directories.
These have the same structure as that of cupy
and install
directories, respectively.
In order to enable test runner to find test scripts correctly, we are using special naming convention for the test subdirectories and the test scripts.
- The name of each subdirectory of
tests
must end with the_tests
suffix. - The name of each test script must start with the
test_
prefix.
Following this naming convention, you can run all the tests by running the following command at the repository root:
$ python -m pytest
Or you can also specify a root directory to search test scripts from:
$ python -m pytest tests/cupy_tests # to just run tests of CuPy
$ python -m pytest tests/install_tests # to just run tests of installation modules
If you modify the code related to existing unit tests, you must run appropriate commands.
There are many examples of unit tests under the tests
directory.
They simply use the unittest
package of the standard library.
Even if your patch includes GPU-related code, your tests should not fail without GPU capability.
Test functions that require CUDA must be tagged by the cupy.testing.attr.gpu
:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.gpu
def test_my_gpu_func(self):
...
The functions tagged by the gpu
decorator are skipped if CUPY_TEST_GPU_LIMIT=0
environment variable is set.
We also have the cupy.testing.attr.cudnn
decorator to let pytest
know that the test depends on cuDNN.
The test functions decorated by cudnn
are skipped if -m='not cudnn'
is given.
The test functions decorated by gpu
must not depend on multiple GPUs.
In order to write tests for multiple GPUs, use cupy.testing.attr.multi_gpu()
or cupy.testing.attr.multi_gpu()
decorators instead:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.multi_gpu(2) # specify the number of required GPUs here
def test_my_two_gpu_func(self):
...
Once you send a pull request, Travis-CI automatically checks if your code meets our coding guidelines described above. Since Travis-CI does not support CUDA, we cannot run unit tests automatically. The reviewing process starts after the automatic check passes. Note that reviewers will test your code without the option to check CUDA-related code.
We leverage doctest as well. You can run doctest by typing make doctest
at the docs
directory:
$ cd docs
$ make doctest
Installation Guide¶
Recommended Environments¶
We recommend these Linux distributions.
The following versions of Python can be used: 2.7.6+, 3.4.3+, 3.5.1+, and 3.6.0+.
Warning
If you are using certain versions of conda, it may fail to build CuPy with error
g++: error: unrecognized command line option ‘-R’
.
This is due to a bug in conda (see conda/conda#6030 for details).
If you encounter this problem, please downgrade or upgrade it.
Note
We are testing CuPy automatically with Jenkins, where all the above recommended environments are tested. We cannot guarantee that CuPy works on other environments including Windows and macOS, even if CuPy looks running correctly.
CuPy uses C++ compiler such as g++. You need to install it before installing CuPy. This is typical installation method for each platform:
# Ubuntu 14.04
$ apt-get install g++
# CentOS 7
$ yum install gcc-c++
If you use old setuptools
, upgrade it:
$ pip install -U setuptools
Dependencies¶
Before installing CuPy, we recommend to upgrade setuptools
if you are using an old one:
$ pip install -U setuptools
The following Python packages are required to install CuPy. The latest version of each package will automatically be installed if missing.
In addition, you need to install CUDA. The following versions of CUDA can be used: 7.0, 7.5, 8.0 and 9.0.
Optional Libraries¶
The following libraries are optional dependencies. CuPy will enable these features only if they are installed.
Install CuPy¶
Install CuPy via pip¶
We recommend to install CuPy via pip:
$ pip install cupy
Note
All optional CUDA related libraries, cuDNN and NCCL, need to be installed before installing CuPy. After you update these libraries, please reinstall CuPy because you need to compile and link to the newer version of them.
Install CuPy from source¶
The tarball of the source tree is available via pip download cupy
or from the release notes page.
You can install CuPy from the tarball:
$ pip install cupy-x.x.x.tar.gz
You can also install the development version of CuPy from a cloned Git repository:
$ git clone https://github.com/cupy/cupy.git
$ cd cupy
$ pip install .
When an error occurs…¶
Use -vvvv
option with pip
command.
That shows all logs of installation.
It may help you:
$ pip install cupy -vvvv
Install CuPy with CUDA¶
You need to install CUDA Toolkit before installing CuPy.
If you have CUDA in a default directory or set CUDA_PATH
correctly, CuPy installer finds CUDA automatically:
$ pip install cupy
Note
CuPy installer looks up CUDA_PATH
environment variable first.
If it is empty, the installer looks for nvcc
command from PATH
environment variable and use its parent directory as the root directory of CUDA installation.
If nvcc
command is also not found, the installer tries to use the default directory for Ubuntu /usr/local/cuda
.
If you installed CUDA into a non-default directory, you need to specify the directory with CUDA_PATH
environment variable:
$ CUDA_PATH=/opt/nvidia/cuda pip install cupy
If you want to use a custom nvcc
compiler (For example, to use ccache
), please set NVCC
environment variables before installing CuPy:
export NVCC='ccache nvcc'
Warning
If you want to use sudo
to install CuPy, note that sudo
command initializes all environment variables.
Please specify CUDA_PATH
environment variable inside sudo
like this:
$ sudo CUDA_PATH=/opt/nvidia/cuda pip install cupy
Install CuPy with cuDNN and NCCL¶
cuDNN is a library for Deep Neural Networks that NVIDIA provides. NCCL is a library for collective multi-GPU communication. CuPy can use cuDNN and NCCL. If you want to enable these libraries, install them before installing CuPy. We recommend you to install developer library of deb package of cuDNN and NCCL.
If you want to install tar-gz version of cuDNN, we recommend you to install it to CUDA directory.
For example if you uses Ubuntu Linux, copy .h
files to include
directory and .so
files to lib64
directory:
$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64
The destination directories depend on your environment.
If you want to use cuDNN or NCCL installed in other directory, please use CFLAGS
, LDFLAGS
and LD_LIBRARY_PATH
environment variables before installing CuPy:
export CFLAGS=-I/path/to/cudnn/include
export LDFLAGS=-L/path/to/cudnn/lib
export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
Note
Use full paths for the environment variables.
distutils
that is used in the setup script does not parse the home directory mark ~
.
Install CuPy for developers¶
CuPy uses Cython (>=0.24).
Developers need to use Cython to regenerate C++ sources from pyx
files.
We recommend to use pip
with -e
option for editable mode:
$ pip install -U cython
$ cd /path/to/cupy/source
$ pip install -e .
Users need not to install Cython as a distribution package of CuPy only contains generated sources.
Uninstall CuPy¶
Use pip to uninstall CuPy:
$ pip uninstall cupy
Note
When you upgrade Chainer, pip
sometimes install the new version without removing the old one in site-packages
.
In this case, pip uninstall
only removes the latest one.
To ensure that Chainer is completely removed, run the above command repeatedly until pip
returns an error.
Reinstall CuPy¶
If you want to reinstall CuPy, please uninstall CuPy and then install it.
We recommend to use --no-cache-dir
option as pip
sometimes uses cache:
$ pip uninstall cupy
$ pip install cupy --no-cache-dir
When you install CuPy without CUDA, and after that you want to use CUDA, please reinstall CuPy. You need to reinstall CuPy when you want to upgrade CUDA.
Run CuPy with Docker¶
We are providing the official Docker image. Use nvidia-docker command to run CuPy image with GPU. You can login to the environment with bash, and run the Python interpreter:
$ nvidia-docker run -it cupy/cupy /bin/bash
Or run the interpreter directly:
$ nvidia-docker run -it cupy/cupy /usr/bin/python
FAQ¶
Warning message “cuDNN is not enabled” appears¶
You failed to build CuPy with cuDNN.
If you don’t need cuDNN, ignore this message.
Otherwise, retry to install CuPy with cuDNN.
-vvvv
option helps you.
See Install CuPy with cuDNN and NCCL.
Upgrade Guide¶
This is a list of changes introduced in each release that users should be aware of when migrating from older versions. Most changes are carefully designed not to break existing code; however changes that may possibly break them are highlighted with a box.
CuPy v2¶
Changed Behavior of count_nonzero Function¶
For performance reasons, cupy.count_nonzero()
has been changed to return zero-dimensional ndarray
instead of int when axis=None.
See the discussion in #154 for more details.
License¶
Copyright (c) 2015 Preferred Infrastructure, Inc.
Copyright (c) 2015 Preferred Networks, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
NumPy¶
The CuPy is designed based on NumPy’s API. CuPy’s source code and documents contain the original NumPy ones.
Copyright (c) 2005-2016, NumPy Developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.