Low-Level CUDA Support

Device management

cupy.cuda.Device Object that represents a CUDA device.

Memory management

cupy.get_default_memory_pool Returns CuPy default memory pool for GPU memory.
cupy.get_default_pinned_memory_pool Returns CuPy default memory pool for pinned memory.
cupy.cuda.Memory Memory allocation on a CUDA device.
cupy.cuda.UnownedMemory CUDA memory that is not owned by CuPy.
cupy.cuda.PinnedMemory Pinned memory allocation on host.
cupy.cuda.MemoryPointer Pointer to a point on a device memory.
cupy.cuda.PinnedMemoryPointer Pointer of a pinned memory.
cupy.cuda.alloc Calls the current allocator.
cupy.cuda.alloc_pinned_memory Calls the current allocator.
cupy.cuda.get_allocator Returns the current allocator for GPU memory.
cupy.cuda.set_allocator Sets the current allocator for GPU memory.
cupy.cuda.using_allocator Sets a thread-local allocator for GPU memory inside
cupy.cuda.set_pinned_memory_allocator Sets the current allocator for the pinned memory.
cupy.cuda.MemoryPool Memory pool for all GPU devices on the host.
cupy.cuda.PinnedMemoryPool Memory pool for pinned memory on the host.

Memory hook

cupy.cuda.MemoryHook Base class of hooks for Memory allocations.
cupy.cuda.memory_hooks.DebugPrintHook Memory hook that prints debug information.
cupy.cuda.memory_hooks.LineProfileHook Code line CuPy memory profiler.

Streams and events

cupy.cuda.Stream CUDA stream.
cupy.cuda.get_current_stream Gets current CUDA stream.
cupy.cuda.Event CUDA event, a synchronization point of CUDA streams.
cupy.cuda.get_elapsed_time Gets the elapsed time between two events.

Texture memory

cupy.cuda.texture.ChannelFormatDescriptor A class that holds the channel format description.
cupy.cuda.texture.CUDAarray Allocate a CUDA array (cudaArray_t) that can be used as texture memory.
cupy.cuda.texture.ResourceDescriptor A class that holds the resource description.
cupy.cuda.texture.TextureDescriptor A class that holds the texture description.
cupy.cuda.texture.TextureObject A class that holds a texture object.
cupy.cuda.texture.TextureReference A class that holds a texture reference.

Profiler

cupy.cuda.profile Enable CUDA profiling during with statement.
cupy.cuda.profiler.initialize Initialize the CUDA profiler.
cupy.cuda.profiler.start Enable profiling.
cupy.cuda.profiler.stop Disable profiling.
cupy.cuda.nvtx.Mark Marks an instantaneous event (marker) in the application.
cupy.cuda.nvtx.MarkC Marks an instantaneous event (marker) in the application.
cupy.cuda.nvtx.RangePush Starts a nested range.
cupy.cuda.nvtx.RangePushC Starts a nested range.
cupy.cuda.nvtx.RangePop Ends a nested range.

NCCL

cupy.cuda.nccl.NcclCommunicator Initialize an NCCL communicator for one device controlled by one process.
cupy.cuda.nccl.get_build_version
cupy.cuda.nccl.get_version Returns the runtime version of NCCL.
cupy.cuda.nccl.get_unique_id
cupy.cuda.nccl.groupStart Start a group of NCCL calls.
cupy.cuda.nccl.groupEnd End a group of NCCL calls.

Runtime API

CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the Original CUDA Runtime API document to use these functions.

cupy.cuda.runtime.driverGetVersion
cupy.cuda.runtime.runtimeGetVersion
cupy.cuda.runtime.getDevice
cupy.cuda.runtime.deviceGetAttribute
cupy.cuda.runtime.deviceGetByPCIBusId
cupy.cuda.runtime.deviceGetPCIBusId
cupy.cuda.runtime.getDeviceCount
cupy.cuda.runtime.setDevice
cupy.cuda.runtime.deviceSynchronize
cupy.cuda.runtime.deviceCanAccessPeer
cupy.cuda.runtime.deviceEnablePeerAccess
cupy.cuda.runtime.malloc
cupy.cuda.runtime.mallocManaged
cupy.cuda.runtime.malloc3DArray
cupy.cuda.runtime.mallocArray
cupy.cuda.runtime.hostAlloc
cupy.cuda.runtime.hostRegister
cupy.cuda.runtime.hostUnregister
cupy.cuda.runtime.free
cupy.cuda.runtime.freeHost
cupy.cuda.runtime.freeArray
cupy.cuda.runtime.memGetInfo
cupy.cuda.runtime.memcpy
cupy.cuda.runtime.memcpyAsync
cupy.cuda.runtime.memcpyPeer
cupy.cuda.runtime.memcpyPeerAsync
cupy.cuda.runtime.memcpy2D
cupy.cuda.runtime.memcpy2DAsync
cupy.cuda.runtime.memcpy2DFromArray
cupy.cuda.runtime.memcpy2DFromArrayAsync
cupy.cuda.runtime.memcpy2DToArray
cupy.cuda.runtime.memcpy2DToArrayAsync
cupy.cuda.runtime.memcpy3D
cupy.cuda.runtime.memcpy3DAsync
cupy.cuda.runtime.memset
cupy.cuda.runtime.memsetAsync
cupy.cuda.runtime.memPrefetchAsync
cupy.cuda.runtime.memAdvise
cupy.cuda.runtime.pointerGetAttributes
cupy.cuda.runtime.streamCreate
cupy.cuda.runtime.streamCreateWithFlags
cupy.cuda.runtime.streamDestroy
cupy.cuda.runtime.streamSynchronize
cupy.cuda.runtime.streamAddCallback
cupy.cuda.runtime.streamQuery
cupy.cuda.runtime.streamWaitEvent
cupy.cuda.runtime.eventCreate
cupy.cuda.runtime.eventCreateWithFlags
cupy.cuda.runtime.eventDestroy
cupy.cuda.runtime.eventElapsedTime
cupy.cuda.runtime.eventQuery
cupy.cuda.runtime.eventRecord
cupy.cuda.runtime.eventSynchronize