Low-level CUDA support

Device management

cupy.cuda.Device([device])

Object that represents a CUDA device.

Memory management

cupy.get_default_memory_pool()

Returns CuPy default memory pool for GPU memory.

cupy.get_default_pinned_memory_pool()

Returns CuPy default memory pool for pinned memory.

cupy.cuda.Memory(size_t size)

Memory allocation on a CUDA device.

cupy.cuda.MemoryAsync(size_t size, stream)

Asynchronous memory allocation on a CUDA device.

cupy.cuda.ManagedMemory(size_t size)

Managed memory (Unified memory) allocation on a CUDA device.

cupy.cuda.UnownedMemory(intptr_t ptr, …)

CUDA memory that is not owned by CuPy.

cupy.cuda.PinnedMemory(size[, flags])

Pinned memory allocation on host.

cupy.cuda.MemoryPointer(BaseMemory mem, …)

Pointer to a point on a device memory.

cupy.cuda.PinnedMemoryPointer(mem, …)

Pointer of a pinned memory.

cupy.cuda.malloc_managed(size_t size)

Allocate managed memory (unified memory).

cupy.cuda.malloc_async(size_t size)

(Experimental) Allocate memory from Stream Ordered Memory Allocator.

cupy.cuda.alloc(size)

Calls the current allocator.

cupy.cuda.alloc_pinned_memory(size_t size)

Calls the current allocator.

cupy.cuda.get_allocator()

Returns the current allocator for GPU memory.

cupy.cuda.set_allocator([allocator])

Sets the current allocator for GPU memory.

cupy.cuda.using_allocator([allocator])

Sets a thread-local allocator for GPU memory inside

cupy.cuda.set_pinned_memory_allocator([…])

Sets the current allocator for the pinned memory.

cupy.cuda.MemoryPool([allocator])

Memory pool for all GPU devices on the host.

cupy.cuda.MemoryAsyncPool([pool_handles])

(Experimental) CUDA memory pool for all GPU devices on the host.

cupy.cuda.PinnedMemoryPool([allocator])

Memory pool for pinned memory on the host.

cupy.cuda.PythonFunctionAllocator(…)

Allocator with python functions to perform memory allocation.

cupy.cuda.CFunctionAllocator(intptr_t param, …)

Allocator with C function pointers to allocation routines.

Memory hook

cupy.cuda.MemoryHook()

Base class of hooks for Memory allocations.

cupy.cuda.memory_hooks.DebugPrintHook([…])

Memory hook that prints debug information.

cupy.cuda.memory_hooks.LineProfileHook([…])

Code line CuPy memory profiler.

Streams and events

cupy.cuda.Stream([null, non_blocking, ptds])

CUDA stream.

cupy.cuda.ExternalStream(ptr[, device_id])

CUDA stream not managed by CuPy.

cupy.cuda.get_current_stream()

Gets current CUDA stream.

cupy.cuda.Event([block, disable_timing, …])

CUDA event, a synchronization point of CUDA streams.

cupy.cuda.get_elapsed_time(start_event, …)

Gets the elapsed time between two events.

Texture and surface memory

cupy.cuda.texture.ChannelFormatDescriptor(…)

A class that holds the channel format description.

cupy.cuda.texture.CUDAarray(…)

Allocate a CUDA array (cudaArray_t) that can be used as texture memory.

cupy.cuda.texture.ResourceDescriptor(…)

A class that holds the resource description.

cupy.cuda.texture.TextureDescriptor([…])

A class that holds the texture description.

cupy.cuda.texture.TextureObject(…)

A class that holds a texture object.

cupy.cuda.texture.SurfaceObject(…)

A class that holds a surface object.

cupy.cuda.texture.TextureReference(…)

A class that holds a texture reference.

Profiler

cupy.cuda.profile()

Enable CUDA profiling during with statement.

cupy.cuda.profiler.initialize(…)

Initialize the CUDA profiler.

cupy.cuda.profiler.start()

Enable profiling.

cupy.cuda.profiler.stop()

Disable profiling.

cupy.cuda.nvtx.Mark(message, int id_color=-1)

Marks an instantaneous event (marker) in the application.

cupy.cuda.nvtx.MarkC(message, uint32_t color=0)

Marks an instantaneous event (marker) in the application.

cupy.cuda.nvtx.RangePush(message, …)

Starts a nested range.

cupy.cuda.nvtx.RangePushC(message, …)

Starts a nested range.

cupy.cuda.nvtx.RangePop()

Ends a nested range.

NCCL

cupy.cuda.nccl.NcclCommunicator(int ndev, …)

Initialize an NCCL communicator for one device controlled by one process.

cupy.cuda.nccl.get_build_version()

cupy.cuda.nccl.get_version()

Returns the runtime version of NCCL.

cupy.cuda.nccl.get_unique_id()

cupy.cuda.nccl.groupStart()

Start a group of NCCL calls.

cupy.cuda.nccl.groupEnd()

End a group of NCCL calls.

Runtime API

CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the CUDA Runtime API documentation to use these functions.

cupy.cuda.runtime.driverGetVersion()

cupy.cuda.runtime.runtimeGetVersion()

cupy.cuda.runtime.getDevice()

cupy.cuda.runtime.getDeviceProperties(int device)

cupy.cuda.runtime.deviceGetAttribute(…)

cupy.cuda.runtime.deviceGetByPCIBusId(…)

cupy.cuda.runtime.deviceGetPCIBusId(int device)

cupy.cuda.runtime.deviceGetDefaultMemPool(…)

Get the default mempool on the current device.

cupy.cuda.runtime.deviceGetMemPool(int device)

Get the current mempool on the current device.

cupy.cuda.runtime.deviceSetMemPool(…)

Set the current mempool on the current device to pool.

cupy.cuda.runtime.memPoolTrimTo(…)

cupy.cuda.runtime.getDeviceCount()

cupy.cuda.runtime.setDevice(int device)

cupy.cuda.runtime.deviceSynchronize()

cupy.cuda.runtime.deviceCanAccessPeer(…)

cupy.cuda.runtime.deviceEnablePeerAccess(…)

cupy.cuda.runtime.deviceGetLimit(int limit)

cupy.cuda.runtime.deviceSetLimit(int limit, …)

cupy.cuda.runtime.malloc(size_t size)

cupy.cuda.runtime.mallocManaged(size_t size, …)

cupy.cuda.runtime.malloc3DArray(…)

cupy.cuda.runtime.mallocArray(…)

cupy.cuda.runtime.mallocAsync(size_t size, …)

cupy.cuda.runtime.hostAlloc(size_t size, …)

cupy.cuda.runtime.hostRegister(intptr_t ptr, …)

cupy.cuda.runtime.hostUnregister(intptr_t ptr)

cupy.cuda.runtime.free(intptr_t ptr)

cupy.cuda.runtime.freeHost(intptr_t ptr)

cupy.cuda.runtime.freeArray(intptr_t ptr)

cupy.cuda.runtime.freeAsync(intptr_t ptr, …)

cupy.cuda.runtime.memGetInfo()

cupy.cuda.runtime.memcpy(intptr_t dst, …)

cupy.cuda.runtime.memcpyAsync(intptr_t dst, …)

cupy.cuda.runtime.memcpyPeer(intptr_t dst, …)

cupy.cuda.runtime.memcpyPeerAsync(…)

cupy.cuda.runtime.memcpy2D(intptr_t dst, …)

cupy.cuda.runtime.memcpy2DAsync(…)

cupy.cuda.runtime.memcpy2DFromArray(…)

cupy.cuda.runtime.memcpy2DFromArrayAsync(…)

cupy.cuda.runtime.memcpy2DToArray(…)

cupy.cuda.runtime.memcpy2DToArrayAsync(…)

cupy.cuda.runtime.memcpy3D(…)

cupy.cuda.runtime.memcpy3DAsync(…)

cupy.cuda.runtime.memset(intptr_t ptr, …)

cupy.cuda.runtime.memsetAsync(intptr_t ptr, …)

cupy.cuda.runtime.memPrefetchAsync(…)

cupy.cuda.runtime.memAdvise(intptr_t devPtr, …)

cupy.cuda.runtime.pointerGetAttributes(…)

cupy.cuda.runtime.streamCreate()

cupy.cuda.runtime.streamCreateWithFlags(…)

cupy.cuda.runtime.streamDestroy(intptr_t stream)

cupy.cuda.runtime.streamSynchronize(…)

cupy.cuda.runtime.streamAddCallback(…)

cupy.cuda.runtime.streamQuery(intptr_t stream)

cupy.cuda.runtime.streamWaitEvent(…)

cupy.cuda.runtime.launchHostFunc(…)

cupy.cuda.runtime.eventCreate()

cupy.cuda.runtime.eventCreateWithFlags(…)

cupy.cuda.runtime.eventDestroy(intptr_t event)

cupy.cuda.runtime.eventElapsedTime(…)

cupy.cuda.runtime.eventQuery(intptr_t event)

cupy.cuda.runtime.eventRecord(…)

cupy.cuda.runtime.eventSynchronize(…)

cupy.cuda.runtime.ipcGetMemHandle(…)

cupy.cuda.runtime.ipcOpenMemHandle(…)

cupy.cuda.runtime.ipcCloseMemHandle(…)

cupy.cuda.runtime.ipcGetEventHandle(…)

cupy.cuda.runtime.ipcOpenEventHandle(…)