Low-level CUDA support#
Device management#
|
Object that represents a CUDA device. |
Memory management#
Returns CuPy default memory pool for GPU memory. |
|
Returns CuPy default memory pool for pinned memory. |
|
|
Memory allocation on a CUDA device. |
|
Asynchronous memory allocation on a CUDA device. |
|
Managed memory (Unified memory) allocation on a CUDA device. |
|
CUDA memory that is not owned by CuPy. |
|
Pinned memory allocation on host. |
|
Pointer to a point on a device memory. |
|
Pointer of a pinned memory. |
|
Allocate managed memory (unified memory). |
|
(Experimental) Allocate memory from Stream Ordered Memory Allocator. |
|
Calls the current allocator. |
|
Calls the current allocator. |
Returns the current allocator for GPU memory. |
|
|
Sets the current allocator for GPU memory. |
|
Sets a thread-local allocator for GPU memory inside |
Sets the current allocator for the pinned memory. |
|
|
Memory pool for all GPU devices on the host. |
|
(Experimental) CUDA memory pool for all GPU devices on the host. |
|
Memory pool for pinned memory on the host. |
Allocator with python functions to perform memory allocation. |
|
|
Allocator with C function pointers to allocation routines. |
Memory hook#
Base class of hooks for Memory allocations. |
|
Memory hook that prints debug information. |
|
Code line CuPy memory profiler. |
Streams and events#
|
CUDA stream. |
|
CUDA stream not managed by CuPy. |
|
Gets the current CUDA stream for the specified CUDA device. |
|
CUDA event, a synchronization point of CUDA streams. |
|
Gets the elapsed time between two events. |
Graphs#
|
The CUDA graph object. |
Texture and surface memory#
A class that holds the channel format description. |
|
Allocate a CUDA array (cudaArray_t) that can be used as texture memory. |
|
A class that holds the resource description. |
|
A class that holds the texture description. |
|
A class that holds a texture object. |
|
A class that holds a surface object. |
NVTX#
|
Marks an instantaneous event (marker) in the application. |
|
Marks an instantaneous event (marker) in the application. |
|
Starts a nested range. |
|
Starts a nested range. |
Ends a nested range started by a |
NCCL#
|
Initialize an NCCL communicator for one device controlled by one process. |
Returns the runtime version of NCCL. |
|
Start a group of NCCL calls. |
|
End a group of NCCL calls. |
Version#
Returns the version of the CUDA Runtime installed in the environment. |
Runtime API#
CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the CUDA Runtime API documentation to use these functions.