Custom kernels

cupy.ElementwiseKernel(in_params, …[, …])

User-defined elementwise kernel.

cupy.ReductionKernel(unicode in_params, …)

User-defined reduction kernel.

cupy.RawKernel(unicode code, unicode name, …)

User-defined custom kernel.

cupy.RawModule(unicode code=None, *, …[, …])

User-defined custom module.

cupy.fuse(*args, **kwargs)

Decorator that fuses a function.

JIT kernel definition

Supported Python built-in functions include: range, len(), max(), min()

cupyx.jit.rawkernel([mode])

A decorator compiles a Python function into CUDA kernel.

cupyx.jit.threadIdx

dim3 threadIdx

cupyx.jit.blockDim

dim3 blockDim

cupyx.jit.blockIdx

dim3 blockIdx

cupyx.jit.gridDim

dim3 gridDim

cupyx.jit.grid(ndim)

Compute the thread index in the grid.

cupyx.jit.gridsize(ndim)

Compute the grid size.

cupyx.jit.laneid()

Returns the lane ID of the calling thread, ranging in [0, jit.warpsize).

cupyx.jit.warpsize

Returns the number of threads in a warp.

cupyx.jit.syncthreads()

Calls __syncthreads().

cupyx.jit.syncwarp(*[, mask])

Calls __syncwarp().

cupyx.jit.shfl_sync(mask, var, val_id, *[, …])

Calls the __shfl_sync function.

cupyx.jit.shfl_up_sync(mask, var, val_id, *)

Calls the __shfl_up_sync function.

cupyx.jit.shfl_down_sync(mask, var, val_id, *)

Calls the __shfl_down_sync function.

cupyx.jit.shfl_xor_sync(mask, var, val_id, *)

Calls the __shfl_xor_sync function.

cupyx.jit.shared_memory(dtype, size)

Allocates shared memory and returns a 1-D array.

cupyx.jit.atomic_add(array, index, value[, …])

Calls the atomicAdd function to operate atomically on array[index].

cupyx.jit.atomic_sub(array, index, value[, …])

Calls the atomicSub function to operate atomically on array[index].

cupyx.jit.atomic_exch(array, index, value[, …])

Calls the atomicExch function to operate atomically on array[index].

cupyx.jit.atomic_min(array, index, value[, …])

Calls the atomicMin function to operate atomically on array[index].

cupyx.jit.atomic_max(array, index, value[, …])

Calls the atomicMax function to operate atomically on array[index].

cupyx.jit.atomic_inc(array, index, value[, …])

Calls the atomicInc function to operate atomically on array[index].

cupyx.jit.atomic_dec(array, index, value[, …])

Calls the atomicDec function to operate atomically on array[index].

cupyx.jit.atomic_cas(array, index, value[, …])

Calls the atomicCAS function to operate atomically on array[index].

cupyx.jit.atomic_and(array, index, value[, …])

Calls the atomicAnd function to operate atomically on array[index].

cupyx.jit.atomic_or(array, index, value[, …])

Calls the atomicOr function to operate atomically on array[index].

cupyx.jit.atomic_xor(array, index, value[, …])

Calls the atomicXor function to operate atomically on array[index].

cupyx.jit._interface._JitRawKernel(func, mode)

JIT CUDA kernel object.

Kernel binary memoization

cupy.memoize(bool for_each_device=False)

Makes a function memoizing the result for each argument and device.

cupy.clear_memo()

Clears the memoized results for all functions decorated by memoize.