cupy.ReductionKernel#

class cupy.ReductionKernel(unicode in_params, unicode out_params, map_expr, reduce_expr, post_map_expr, identity, name=u'reduce_kernel', reduce_type=None, reduce_dims=True, preamble=u'', options=())[source]#

User-defined reduction kernel.

This class can be used to define a reduction kernel with or without broadcasting.

The kernel is compiled at an invocation of the __call__() method, which is cached for each device. The compiled binary is also cached into a file under the $HOME/.cupy/kernel_cache/ directory with a hashed file name. The cached binary is reused by other processes.

Parameters:
  • in_params (str) – Input argument list.

  • out_params (str) – Output argument list.

  • map_expr (str) – Mapping expression for input values.

  • reduce_expr (str) – Reduction expression.

  • post_map_expr (str) – Mapping expression for reduced values.

  • identity (str) – Identity value for starting the reduction.

  • name (str) – Name of the kernel function. It should be set for readability of the performance profiling.

  • reduce_type (str) – Type of values to be used for reduction. This type is used to store the special variables a.

  • reduce_dims (bool) – If True, input arrays are reshaped without copy to smaller dimensions for efficiency.

  • preamble (str) – Fragment of the CUDA-C/C++ code that is inserted at the top of the cu file.

  • options (tuple of str) – Additional compilation options.

Methods

__call__()#

Compiles and invokes the reduction kernel.

The compilation runs only if the kernel is not cached. Note that the kernels with different argument dtypes, ndims, or axis are not compatible. It means that single ReductionKernel object may be compiled into multiple kernel binaries.

Parameters:
  • args – Arguments of the kernel.

  • out (cupy.ndarray) – The output array. This can only be specified if args does not contain the output array.

  • axis (int or tuple of ints) – Axis or axes along which the reduction is performed.

  • keepdims (bool) – If True, the specified axes are remained as axes of length one.

  • stream (cupy.cuda.Stream, optional) – The CUDA stream to launch the kernel on. If not given, the current stream will be used.

Returns:

Arrays are returned according to the out_params argument of the __init__ method.

__eq__(value, /)#

Return self==value.

__ne__(value, /)#

Return self!=value.

__lt__(value, /)#

Return self<value.

__le__(value, /)#

Return self<=value.

__gt__(value, /)#

Return self>value.

__ge__(value, /)#

Return self>=value.

Attributes

cached_code#

Returns next(iter(self.cached_codes.values())).

This proprety method is for debugging purpose. The return value is not guaranteed to keep backward compatibility.

cached_codes#

Returns a dict that has input types as keys and codes values.

This proprety method is for debugging purpose. The return value is not guaranteed to keep backward compatibility.

identity#

unicode

Type:

identity

in_params#
map_expr#
name#
nargs#
nin#
nout#
options#
out_params#
params#
post_map_expr#
preamble#
reduce_dims#
reduce_expr#
reduce_type#