cupy.RawModule#

class cupy.RawModule(unicode code=None, *, unicode path=None, tuple options=(), unicode backend=u'nvrtc', bool translate_cucomplex=False, bool enable_cooperative_groups=False, name_expressions=None, bool jitify=False)[source]#

User-defined custom module.

This class can be used to either compile raw CUDA sources or load CUDA modules (*.cubin, *.ptx). This class is useful when a number of CUDA kernels in the same source need to be retrieved.

For the former case, the CUDA source code is compiled when any method is called. For the latter case, an existing CUDA binary (*.cubin) or a PTX file can be loaded by providing its path.

CUDA kernels in a RawModule can be retrieved by calling get_function(), which will return an instance of RawKernel. (Same as in RawKernel, the generated binary is also cached.)

Parameters:
  • code (str) – CUDA source code. Mutually exclusive with path.

  • path (str) – Path to cubin/ptx. Mutually exclusive with code.

  • options (tuple of str) – Compiler options passed to the backend (NVRTC or NVCC). For details, see https://docs.nvidia.com/cuda/nvrtc/index.html#group__options or https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#command-option-description.

  • backend (str) – Either nvrtc or nvcc. Defaults to nvrtc

  • translate_cucomplex (bool) – Whether the CUDA source includes the header cuComplex.h or not. If set to True, any code that uses the functions from cuComplex.h will be translated to its Thrust counterpart. Defaults to False.

  • enable_cooperative_groups (bool) – Whether to enable cooperative groups in the CUDA source. If set to True, compile options are configured properly and the kernel is launched with cuLaunchCooperativeKernel so that cooperative groups can be used from the CUDA source. This feature is only supported in CUDA 9 or later.

  • name_expressions (sequence of str) – A sequence (e.g. list) of strings referring to the names of C++ global/template kernels. For example, name_expressions=['func1<int>', 'func1<double>', 'func2'] for the template kernel func1<T> and non-template kernel func2. Strings in this tuple must then be passed, one at a time, to get_function() to retrieve the corresponding kernel.

  • jitify (bool) – Whether or not to use Jitify to assist NVRTC to compile C++ kernels. Defaults to False.

Note

Starting CuPy v13.0.0, RawModule by default compiles with the C++11 standard (-std=c++11) if it’s not specified in options.

Note

Each kernel in RawModule possesses independent function attributes.

Note

Before CuPy v8.0.0, the compilation happens at initialization. Now, it happens at the first time retrieving any object (kernels or pointers) from the module.

Methods

compile(self, log_stream=None)#

Compile the current module.

In general, you don’t have to call this method; kernels are compiled implicitly on the first call.

Parameters:

log_stream (object) – Pass either sys.stdout or a file object to which the compiler output will be written. Defaults to None.

Note

Calling compile() will reset the internal state of a RawKernel.

get_function(self, unicode name)#

Retrieve a CUDA kernel by its name from the module.

Parameters:

name (str) – Name of the kernel function. For C++ global/template kernels, name refers to one of the name expressions specified when initializing the present RawModule instance.

Returns:

An RawKernel instance.

Return type:

RawKernel

Note

The following example shows how to retrieve one of the specialized C++ template kernels:

code = r'''
template<typename T>
__global__ void func(T* in_arr) { /* do something */ }
'''

kers = ('func<int>', 'func<float>', 'func<double>')
mod = cupy.RawModule(code=code, options=('--std=c++11',),
                     name_expressions=kers)

// retrieve func<int>
ker_int = mod.get_function(kers[0])

See also

nvrtcAddNameExpression and nvrtcGetLoweredName from Accessing Lowered Names of the NVRTC documentation.

get_global(self, name)#

Retrieve a pointer to a global symbol by its name from the module.

Parameters:

name (str) – Name of the global symbol.

Returns:

A handle to the global symbol.

Return type:

MemoryPointer

Note

This method can be used to access, for example, constant memory:

# to get a pointer to "arr" declared in the source like this:
# __constant__ float arr[10];
memptr = mod.get_global("arr")
# ...wrap it using cupy.ndarray with a known shape
arr_ndarray = cp.ndarray((10,), cp.float32, memptr)
# ...perform data transfer to initialize it
arr_ndarray[...] = cp.random.random((10,), dtype=cp.float32)
# ...and arr is ready to be accessed by RawKernels
__eq__(value, /)#

Return self==value.

__ne__(value, /)#

Return self!=value.

__lt__(value, /)#

Return self<value.

__le__(value, /)#

Return self<=value.

__gt__(value, /)#

Return self>value.

__ge__(value, /)#

Return self>=value.

Attributes

backend#
code#
enable_cooperative_groups#
file_path#
module#
name_expressions#
options#