Custom kernels#

`cupy.ElementwiseKernel`(in_params, ...[, ...])	User-defined elementwise kernel.
`cupy.ReductionKernel`(str in_params, ...[, ...])	User-defined reduction kernel.
`cupy.RawKernel`(str code, str name, ...[, jitify])	User-defined custom kernel.
`cupy.RawModule`(str code=None, *, ...[, ...])	User-defined custom module.
`cupy.fuse`(args, *kwargs)	Decorator that fuses a function.

JIT kernel definition#

Supported Python built-in functions include: range, len(), max(), min().

Note

If loop unrolling is needed, use cupyx.jit.range() instead of the built-in range.

`cupyx.jit.rawkernel`(*[, mode, device])	A decorator compiles a Python function into CUDA kernel.
`cupyx.jit.threadIdx`	dim3 threadIdx
`cupyx.jit.blockDim`	dim3 blockDim
`cupyx.jit.blockIdx`	dim3 blockIdx
`cupyx.jit.gridDim`	dim3 gridDim
`cupyx.jit.grid`(ndim)	Compute the thread index in the grid.
`cupyx.jit.gridsize`(ndim)	Compute the grid size.
`cupyx.jit.laneid`()	Returns the lane ID of the calling thread, ranging in `[0, jit.warpsize)`.
`cupyx.jit.warpsize`	Returns the number of threads in a warp.
`cupyx.jit.range`(*args[, unroll])	Range with loop unrolling support.
`cupyx.jit.syncthreads`()	Calls `__syncthreads()`.
`cupyx.jit.syncwarp`(*[, mask])	Calls `__syncwarp()`.
`cupyx.jit.shfl_sync`(mask, var, val_id, *[, ...])	Calls the `__shfl_sync` function.
`cupyx.jit.shfl_up_sync`(mask, var, val_id, *)	Calls the `__shfl_up_sync` function.
`cupyx.jit.shfl_down_sync`(mask, var, val_id, *)	Calls the `__shfl_down_sync` function.
`cupyx.jit.shfl_xor_sync`(mask, var, val_id, *)	Calls the `__shfl_xor_sync` function.
`cupyx.jit.shared_memory`(dtype, size[, alignment])	Allocates shared memory and returns it as a 1-D array.
`cupyx.jit.atomic_add`(array, index, value[, ...])	Calls the `atomicAdd` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_sub`(array, index, value[, ...])	Calls the `atomicSub` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_exch`(array, index, value[, ...])	Calls the `atomicExch` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_min`(array, index, value[, ...])	Calls the `atomicMin` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_max`(array, index, value[, ...])	Calls the `atomicMax` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_inc`(array, index, value[, ...])	Calls the `atomicInc` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_dec`(array, index, value[, ...])	Calls the `atomicDec` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_cas`(array, index, value[, ...])	Calls the `atomicCAS` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_and`(array, index, value[, ...])	Calls the `atomicAnd` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_or`(array, index, value[, ...])	Calls the `atomicOr` function to operate atomically on `array[index]`.
`cupyx.jit.atomic_xor`(array, index, value[, ...])	Calls the `atomicXor` function to operate atomically on `array[index]`.
`cupyx.jit.cg.this_grid`()	Returns the current grid group (`_GridGroup`).
`cupyx.jit.cg.this_thread_block`()	Returns the current thread block group (`_ThreadBlockGroup`).
`cupyx.jit.cg.sync`(group)	Calls `cg::sync()`.
`cupyx.jit.cg.memcpy_async`(group, dst, ...[, ...])	Calls `cg::memcpy_sync()`.
`cupyx.jit.cg.wait`(group)	Calls `cg::wait()`.
`cupyx.jit.cg.wait_prior`(group)	Calls `cg::wait_prior<N>()`.
`cupyx.jit._interface._JitRawKernel`(func, ...)	JIT CUDA kernel object.

Kernel binary memoization#

`cupy.memoize`(bool for_each_device=False)	Makes a function memoizing the result for each argument and device.
`cupy.clear_memo`()	Clears the memoized results for all functions decorated by memoize.