Low-Level CUDA Support¶

Device management¶

cupy.cuda.Device

Object that represents a CUDA device.

Memory management¶

`cupy.get_default_memory_pool`	Returns CuPy default memory pool for GPU memory.
`cupy.get_default_pinned_memory_pool`	Returns CuPy default memory pool for pinned memory.
`cupy.cuda.Memory`	Memory allocation on a CUDA device.
`cupy.cuda.UnownedMemory`	CUDA memory that is not owned by CuPy.
`cupy.cuda.PinnedMemory`	Pinned memory allocation on host.
`cupy.cuda.MemoryPointer`	Pointer to a point on a device memory.
`cupy.cuda.PinnedMemoryPointer`	Pointer of a pinned memory.
`cupy.cuda.alloc`	Calls the current allocator.
`cupy.cuda.alloc_pinned_memory`	Calls the current allocator.
`cupy.cuda.get_allocator`	Returns the current allocator for GPU memory.
`cupy.cuda.set_allocator`	Sets the current allocator for GPU memory.
`cupy.cuda.using_allocator`	Sets a thread-local allocator for GPU memory inside
`cupy.cuda.set_pinned_memory_allocator`	Sets the current allocator for the pinned memory.
`cupy.cuda.MemoryPool`	Memory pool for all GPU devices on the host.
`cupy.cuda.PinnedMemoryPool`	Memory pool for pinned memory on the host.
`cupy.cuda.PythonFunctionAllocator`	Allocator with python functions to perform memory allocation.

Memory hook¶

`cupy.cuda.MemoryHook`	Base class of hooks for Memory allocations.
`cupy.cuda.memory_hooks.DebugPrintHook`	Memory hook that prints debug information.
`cupy.cuda.memory_hooks.LineProfileHook`	Code line CuPy memory profiler.

Streams and events¶

`cupy.cuda.Stream`	CUDA stream.
`cupy.cuda.ExternalStream`	CUDA stream.
`cupy.cuda.get_current_stream`	Gets current CUDA stream.
`cupy.cuda.Event`	CUDA event, a synchronization point of CUDA streams.
`cupy.cuda.get_elapsed_time`	Gets the elapsed time between two events.

Texture and surface memory¶

`cupy.cuda.texture.ChannelFormatDescriptor`	A class that holds the channel format description.
`cupy.cuda.texture.CUDAarray`	Allocate a CUDA array (cudaArray_t) that can be used as texture memory.
`cupy.cuda.texture.ResourceDescriptor`	A class that holds the resource description.
`cupy.cuda.texture.TextureDescriptor`	A class that holds the texture description.
`cupy.cuda.texture.TextureObject`	A class that holds a texture object.
`cupy.cuda.texture.SurfaceObject`	A class that holds a surface object.
`cupy.cuda.texture.TextureReference`	A class that holds a texture reference.

Profiler¶

`cupy.cuda.profile`	Enable CUDA profiling during with statement.
`cupy.cuda.profiler.initialize`	Initialize the CUDA profiler.
`cupy.cuda.profiler.start`	Enable profiling.
`cupy.cuda.profiler.stop`	Disable profiling.
`cupy.cuda.nvtx.Mark`	Marks an instantaneous event (marker) in the application.
`cupy.cuda.nvtx.MarkC`	Marks an instantaneous event (marker) in the application.
`cupy.cuda.nvtx.RangePush`	Starts a nested range.
`cupy.cuda.nvtx.RangePushC`	Starts a nested range.
`cupy.cuda.nvtx.RangePop`	Ends a nested range.

NCCL¶

`cupy.cuda.nccl.NcclCommunicator`	Initialize an NCCL communicator for one device controlled by one process.
`cupy.cuda.nccl.get_build_version`
`cupy.cuda.nccl.get_version`	Returns the runtime version of NCCL.
`cupy.cuda.nccl.get_unique_id`
`cupy.cuda.nccl.groupStart`	Start a group of NCCL calls.
`cupy.cuda.nccl.groupEnd`	End a group of NCCL calls.

Runtime API¶

CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the Original CUDA Runtime API document to use these functions.

`cupy.cuda.runtime.driverGetVersion`
`cupy.cuda.runtime.runtimeGetVersion`
`cupy.cuda.runtime.getDevice`
`cupy.cuda.runtime.deviceGetAttribute`
`cupy.cuda.runtime.deviceGetByPCIBusId`
`cupy.cuda.runtime.deviceGetPCIBusId`
`cupy.cuda.runtime.getDeviceCount`
`cupy.cuda.runtime.setDevice`
`cupy.cuda.runtime.deviceSynchronize`
`cupy.cuda.runtime.deviceCanAccessPeer`
`cupy.cuda.runtime.deviceEnablePeerAccess`
`cupy.cuda.runtime.deviceGetLimit`
`cupy.cuda.runtime.deviceSetLimit`
`cupy.cuda.runtime.malloc`
`cupy.cuda.runtime.mallocManaged`
`cupy.cuda.runtime.malloc3DArray`
`cupy.cuda.runtime.mallocArray`
`cupy.cuda.runtime.hostAlloc`
`cupy.cuda.runtime.hostRegister`
`cupy.cuda.runtime.hostUnregister`
`cupy.cuda.runtime.free`
`cupy.cuda.runtime.freeHost`
`cupy.cuda.runtime.freeArray`
`cupy.cuda.runtime.memGetInfo`
`cupy.cuda.runtime.memcpy`
`cupy.cuda.runtime.memcpyAsync`
`cupy.cuda.runtime.memcpyPeer`
`cupy.cuda.runtime.memcpyPeerAsync`
`cupy.cuda.runtime.memcpy2D`
`cupy.cuda.runtime.memcpy2DAsync`
`cupy.cuda.runtime.memcpy2DFromArray`
`cupy.cuda.runtime.memcpy2DFromArrayAsync`
`cupy.cuda.runtime.memcpy2DToArray`
`cupy.cuda.runtime.memcpy2DToArrayAsync`
`cupy.cuda.runtime.memcpy3D`
`cupy.cuda.runtime.memcpy3DAsync`
`cupy.cuda.runtime.memset`
`cupy.cuda.runtime.memsetAsync`
`cupy.cuda.runtime.memPrefetchAsync`
`cupy.cuda.runtime.memAdvise`
`cupy.cuda.runtime.pointerGetAttributes`
`cupy.cuda.runtime.streamCreate`
`cupy.cuda.runtime.streamCreateWithFlags`
`cupy.cuda.runtime.streamDestroy`
`cupy.cuda.runtime.streamSynchronize`
`cupy.cuda.runtime.streamAddCallback`
`cupy.cuda.runtime.streamQuery`
`cupy.cuda.runtime.streamWaitEvent`
`cupy.cuda.runtime.eventCreate`
`cupy.cuda.runtime.eventCreateWithFlags`
`cupy.cuda.runtime.eventDestroy`
`cupy.cuda.runtime.eventElapsedTime`
`cupy.cuda.runtime.eventQuery`
`cupy.cuda.runtime.eventRecord`
`cupy.cuda.runtime.eventSynchronize`
`cupy.cuda.runtime.ipcGetMemHandle`
`cupy.cuda.runtime.ipcOpenMemHandle`
`cupy.cuda.runtime.ipcCloseMemHandle`
`cupy.cuda.runtime.ipcGetEventHandle`
`cupy.cuda.runtime.ipcOpenEventHandle`