Low-level CUDA support#

Device management#

cupy.cuda.Device([device])

Object that represents a CUDA device.

Memory management#

`cupy.get_default_memory_pool`()	Returns CuPy default memory pool for GPU memory.
`cupy.get_default_pinned_memory_pool`()	Returns CuPy default memory pool for pinned memory.
`cupy.cuda.Memory`(size_t size)	Memory allocation on a CUDA device.
`cupy.cuda.MemoryAsync`(size_t size, stream)	Asynchronous memory allocation on a CUDA device.
`cupy.cuda.ManagedMemory`(size_t size)	Managed memory (Unified memory) allocation on a CUDA device.
`cupy.cuda.UnownedMemory`(intptr_t ptr, ...)	CUDA memory that is not owned by CuPy.
`cupy.cuda.PinnedMemory`(size[, flags])	Pinned memory allocation on host.
`cupy.cuda.MemoryPointer`(BaseMemory mem, ...)	Pointer to a point on a device memory.
`cupy.cuda.PinnedMemoryPointer`(mem, ...)	Pointer of a pinned memory.
`cupy.cuda.malloc_managed`(size_t size)	Allocate managed memory (unified memory).
`cupy.cuda.malloc_async`(size_t size)	(Experimental) Allocate memory from Stream Ordered Memory Allocator.
`cupy.cuda.alloc`(size)	Calls the current allocator.
`cupy.cuda.alloc_pinned_memory`(size_t size)	Calls the current allocator.
`cupy.cuda.get_allocator`()	Returns the current allocator for GPU memory.
`cupy.cuda.set_allocator`([allocator])	Sets the current allocator for GPU memory.
`cupy.cuda.using_allocator`([allocator])	Sets a thread-local allocator for GPU memory inside
`cupy.cuda.set_pinned_memory_allocator`([...])	Sets the current allocator for the pinned memory.
`cupy.cuda.MemoryPool`([allocator])	Memory pool for all GPU devices on the host.
`cupy.cuda.MemoryAsyncPool`([pool_handles])	(Experimental) CUDA memory pool for all GPU devices on the host.
`cupy.cuda.PinnedMemoryPool`([allocator])	Memory pool for pinned memory on the host.
`cupy.cuda.PythonFunctionAllocator`(...)	Allocator with python functions to perform memory allocation.
`cupy.cuda.CFunctionAllocator`(intptr_t param, ...)	Allocator with C function pointers to allocation routines.

Memory hook#

`cupy.cuda.MemoryHook`()	Base class of hooks for Memory allocations.
`cupy.cuda.memory_hooks.DebugPrintHook`([...])	Memory hook that prints debug information.
`cupy.cuda.memory_hooks.LineProfileHook`([...])	Code line CuPy memory profiler.

Streams and events#

`cupy.cuda.Stream`([null, non_blocking, ptds, ...])	CUDA stream.
`cupy.cuda.ExternalStream`(ptr[, device_id])	CUDA stream not managed by CuPy.
`cupy.cuda.get_current_stream`(int device_id=-1)	Gets the current CUDA stream for the specified CUDA device.
`cupy.cuda.Event`([block, disable_timing, ...])	CUDA event, a synchronization point of CUDA streams.
`cupy.cuda.get_elapsed_time`(start_event, ...)	Gets the elapsed time between two events.

Graphs#

cupy.cuda.Graph(*args, **kwargs)

The CUDA graph object.

Texture and surface memory#

`cupy.cuda.texture.ChannelFormatDescriptor`(...)	A class that holds the channel format description.
`cupy.cuda.texture.CUDAarray`(...)	Allocate a CUDA array (cudaArray_t) that can be used as texture memory.
`cupy.cuda.texture.ResourceDescriptor`(...)	A class that holds the resource description.
`cupy.cuda.texture.TextureDescriptor`([...])	A class that holds the texture description.
`cupy.cuda.texture.TextureObject`(...)	A class that holds a texture object.
`cupy.cuda.texture.SurfaceObject`(...)	A class that holds a surface object.

NVTX#

`cupy.cuda.nvtx.Mark`(message, int id_color=-1)	Marks an instantaneous event (marker) in the application.
`cupy.cuda.nvtx.MarkC`(message, uint32_t color=0)	Marks an instantaneous event (marker) in the application.
`cupy.cuda.nvtx.RangePush`(message, ...)	Starts a nested range.
`cupy.cuda.nvtx.RangePushC`(message, ...)	Starts a nested range.
`cupy.cuda.nvtx.RangePop`()	Ends a nested range started by a `RangePush*()` call.

NCCL#

`cupy.cuda.nccl.NcclCommunicator`(int ndev, ...)	Initialize an NCCL communicator for one device controlled by one process.
`cupy.cuda.nccl.get_build_version`()
`cupy.cuda.nccl.get_version`()	Returns the runtime version of NCCL.
`cupy.cuda.nccl.get_unique_id`()
`cupy.cuda.nccl.groupStart`()	Start a group of NCCL calls.
`cupy.cuda.nccl.groupEnd`()	End a group of NCCL calls.

Version#

cupy.cuda.get_local_runtime_version()

Returns the version of the CUDA Runtime installed in the environment.

Runtime API#

CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the CUDA Runtime API documentation to use these functions.

`cupy.cuda.runtime.driverGetVersion`()
`cupy.cuda.runtime.runtimeGetVersion`()	Returns the version of the CUDA Runtime statically linked to CuPy.
`cupy.cuda.runtime.getDevice`()
`cupy.cuda.runtime.getDeviceProperties`(int device)
`cupy.cuda.runtime.deviceGetAttribute`(...)
`cupy.cuda.runtime.deviceGetByPCIBusId`(...)
`cupy.cuda.runtime.deviceGetPCIBusId`(int device)
`cupy.cuda.runtime.deviceGetDefaultMemPool`(...)	Get the default mempool on the current device.
`cupy.cuda.runtime.deviceGetMemPool`(int device)	Get the current mempool on the current device.
`cupy.cuda.runtime.deviceSetMemPool`(...)	Set the current mempool on the current device to pool.
`cupy.cuda.runtime.memPoolCreate`(...)
`cupy.cuda.runtime.memPoolDestroy`(intptr_t pool)
`cupy.cuda.runtime.memPoolTrimTo`(...)
`cupy.cuda.runtime.getDeviceCount`()
`cupy.cuda.runtime.setDevice`(int device)
`cupy.cuda.runtime.deviceSynchronize`()
`cupy.cuda.runtime.deviceCanAccessPeer`(...)
`cupy.cuda.runtime.deviceEnablePeerAccess`(...)
`cupy.cuda.runtime.deviceGetLimit`(int limit)
`cupy.cuda.runtime.deviceSetLimit`(int limit, ...)
`cupy.cuda.runtime.malloc`(size_t size)
`cupy.cuda.runtime.mallocManaged`(size_t size, ...)
`cupy.cuda.runtime.malloc3DArray`(...)
`cupy.cuda.runtime.mallocArray`(...)
`cupy.cuda.runtime.mallocAsync`(size_t size, ...)
`cupy.cuda.runtime.mallocFromPoolAsync`(...)
`cupy.cuda.runtime.hostAlloc`(size_t size, ...)
`cupy.cuda.runtime.hostRegister`(intptr_t ptr, ...)
`cupy.cuda.runtime.hostUnregister`(intptr_t ptr)
`cupy.cuda.runtime.free`(intptr_t ptr)
`cupy.cuda.runtime.freeHost`(intptr_t ptr)
`cupy.cuda.runtime.freeArray`(intptr_t ptr)
`cupy.cuda.runtime.freeAsync`(intptr_t ptr, ...)
`cupy.cuda.runtime.memGetInfo`()
`cupy.cuda.runtime.memcpy`(intptr_t dst, ...)
`cupy.cuda.runtime.memcpyAsync`(intptr_t dst, ...)
`cupy.cuda.runtime.memcpyPeer`(intptr_t dst, ...)
`cupy.cuda.runtime.memcpyPeerAsync`(...)
`cupy.cuda.runtime.memcpy2D`(intptr_t dst, ...)
`cupy.cuda.runtime.memcpy2DAsync`(...)
`cupy.cuda.runtime.memcpy2DFromArray`(...)
`cupy.cuda.runtime.memcpy2DFromArrayAsync`(...)
`cupy.cuda.runtime.memcpy2DToArray`(...)
`cupy.cuda.runtime.memcpy2DToArrayAsync`(...)
`cupy.cuda.runtime.memcpy3D`(...)
`cupy.cuda.runtime.memcpy3DAsync`(...)
`cupy.cuda.runtime.memset`(intptr_t ptr, ...)
`cupy.cuda.runtime.memsetAsync`(intptr_t ptr, ...)
`cupy.cuda.runtime.memPrefetchAsync`(...)
`cupy.cuda.runtime.memAdvise`(intptr_t devPtr, ...)
`cupy.cuda.runtime.pointerGetAttributes`(...)
`cupy.cuda.runtime.streamCreate`()
`cupy.cuda.runtime.streamCreateWithFlags`(...)
`cupy.cuda.runtime.streamCreateWithPriority`(...)
`cupy.cuda.runtime.streamDestroy`(intptr_t stream)
`cupy.cuda.runtime.streamSynchronize`(...)
`cupy.cuda.runtime.streamAddCallback`(...)
`cupy.cuda.runtime.streamQuery`(intptr_t stream)
`cupy.cuda.runtime.streamWaitEvent`(...)
`cupy.cuda.runtime.launchHostFunc`(...)
`cupy.cuda.runtime.eventCreate`()
`cupy.cuda.runtime.eventCreateWithFlags`(...)
`cupy.cuda.runtime.eventDestroy`(intptr_t event)
`cupy.cuda.runtime.eventElapsedTime`(...)
`cupy.cuda.runtime.eventQuery`(intptr_t event)
`cupy.cuda.runtime.eventRecord`(...)
`cupy.cuda.runtime.eventSynchronize`(...)
`cupy.cuda.runtime.ipcGetMemHandle`(...)
`cupy.cuda.runtime.ipcOpenMemHandle`(...)
`cupy.cuda.runtime.ipcCloseMemHandle`(...)
`cupy.cuda.runtime.ipcGetEventHandle`(...)
`cupy.cuda.runtime.ipcOpenEventHandle`(...)
`cupy.cuda.runtime.profilerStart`()	Enable profiling.
`cupy.cuda.runtime.profilerStop`()	Disable profiling.