cupy.ElementwiseKernel

class cupy.ElementwiseKernel(in_params, out_params, operation, name='kernel', reduce_dims=True, preamble='', **kwargs)

User-defined elementwise kernel.

This class can be used to define an elementwise kernel with or without broadcasting.

The kernel is compiled at an invocation of the __call__() method, which is cached for each device. The compiled binary is also cached into a file under the $HOME/.cupy/kernel_cache/ directory with a hashed file name. The cached binary is reused by other processes.

Parameters:
  • in_params (str) – Input argument list.
  • out_params (str) – Output argument list.
  • operation (str) – The body in the loop written in CUDA-C/C++.
  • name (str) – Name of the kernel function. It should be set for readability of the performance profiling.
  • reduce_dims (bool) – If False, the shapes of array arguments are kept within the kernel invocation. The shapes are reduced (i.e., the arrays are reshaped without copy to the minimum dimension) by default. It may make the kernel fast by reducing the index calculations.
  • options (list) – Options passed to the nvcc command.
  • preamble (str) – Fragment of the CUDA-C/C++ code that is inserted at the top of the cu file.
  • loop_prep (str) – Fragment of the CUDA-C/C++ code that is inserted at the top of the kernel function definition and above the for loop.
  • after_loop (str) – Fragment of the CUDA-C/C++ code that is inserted at the bottom of the kernel function definition.

Methods

Attributes

in_params
kwargs
name
nargs
nin
nout
operation
out_params
params
preamble
reduce_dims