cupy.RawModule#
- class cupy.RawModule(unicode code=None, *, unicode path=None, tuple options=(), unicode backend=u'nvrtc', bool translate_cucomplex=False, bool enable_cooperative_groups=False, name_expressions=None, bool jitify=False)[source]#
User-defined custom module.
This class can be used to either compile raw CUDA sources or load CUDA modules (*.cubin, *.ptx). This class is useful when a number of CUDA kernels in the same source need to be retrieved.
For the former case, the CUDA source code is compiled when any method is called. For the latter case, an existing CUDA binary (*.cubin) or a PTX file can be loaded by providing its path.
CUDA kernels in a
RawModule
can be retrieved by callingget_function()
, which will return an instance ofRawKernel
. (Same as inRawKernel
, the generated binary is also cached.)- Parameters:
code (str) – CUDA source code. Mutually exclusive with
path
.path (str) – Path to cubin/ptx. Mutually exclusive with
code
.options (tuple of str) – Compiler options passed to the backend (NVRTC or NVCC). For details, see https://docs.nvidia.com/cuda/nvrtc/index.html#group__options or https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#command-option-description.
backend (str) – Either nvrtc or nvcc. Defaults to nvrtc
translate_cucomplex (bool) – Whether the CUDA source includes the header cuComplex.h or not. If set to
True
, any code that uses the functions from cuComplex.h will be translated to its Thrust counterpart. Defaults toFalse
.enable_cooperative_groups (bool) – Whether to enable cooperative groups in the CUDA source. If set to
True
, compile options are configured properly and the kernel is launched withcuLaunchCooperativeKernel
so that cooperative groups can be used from the CUDA source. This feature is only supported in CUDA 9 or later.name_expressions (sequence of str) – A sequence (e.g. list) of strings referring to the names of C++ global/template kernels. For example,
name_expressions=['func1<int>', 'func1<double>', 'func2']
for the template kernelfunc1<T>
and non-template kernelfunc2
. Strings in this tuple must then be passed, one at a time, toget_function()
to retrieve the corresponding kernel.jitify (bool) – Whether or not to use Jitify to assist NVRTC to compile C++ kernels. Defaults to
False
.
Note
Starting CuPy v13.0.0, RawModule by default compiles with the C++11 standard (
-std=c++11
) if it’s not specified inoptions
.Note
Each kernel in
RawModule
possesses independent function attributes.Note
Before CuPy v8.0.0, the compilation happens at initialization. Now, it happens at the first time retrieving any object (kernels or pointers) from the module.
Methods
- compile(self, log_stream=None)#
Compile the current module.
In general, you don’t have to call this method; kernels are compiled implicitly on the first call.
- Parameters:
log_stream (object) – Pass either
sys.stdout
or a file object to which the compiler output will be written. Defaults toNone
.
- get_function(self, unicode name)#
Retrieve a CUDA kernel by its name from the module.
- Parameters:
name (str) – Name of the kernel function. For C++ global/template kernels,
name
refers to one of the name expressions specified when initializing the presentRawModule
instance.- Returns:
An
RawKernel
instance.- Return type:
Note
The following example shows how to retrieve one of the specialized C++ template kernels:
code = r''' template<typename T> __global__ void func(T* in_arr) { /* do something */ } ''' kers = ('func<int>', 'func<float>', 'func<double>') mod = cupy.RawModule(code=code, options=('--std=c++11',), name_expressions=kers) // retrieve func<int> ker_int = mod.get_function(kers[0])
See also
nvrtcAddNameExpression
andnvrtcGetLoweredName
from Accessing Lowered Names of the NVRTC documentation.
- get_global(self, name)#
Retrieve a pointer to a global symbol by its name from the module.
- Parameters:
name (str) – Name of the global symbol.
- Returns:
A handle to the global symbol.
- Return type:
Note
This method can be used to access, for example, constant memory:
# to get a pointer to "arr" declared in the source like this: # __constant__ float arr[10]; memptr = mod.get_global("arr") # ...wrap it using cupy.ndarray with a known shape arr_ndarray = cp.ndarray((10,), cp.float32, memptr) # ...perform data transfer to initialize it arr_ndarray[...] = cp.random.random((10,), dtype=cp.float32) # ...and arr is ready to be accessed by RawKernels
- __eq__(value, /)#
Return self==value.
- __ne__(value, /)#
Return self!=value.
- __lt__(value, /)#
Return self<value.
- __le__(value, /)#
Return self<=value.
- __gt__(value, /)#
Return self>value.
- __ge__(value, /)#
Return self>=value.
Attributes
- backend#
- code#
- enable_cooperative_groups#
- file_path#
- module#
- name_expressions#
- options#