cupy.fft.config.set_cufft_callbacks#
- class cupy.fft.config.set_cufft_callbacks(unicode cb_load=u'', unicode cb_store=u'', ndarray cb_load_aux_arr=None, *, ndarray cb_store_aux_arr=None)[source]#
A context manager for setting up load and/or store callbacks.
- Parameters
cb_load (str) – A string contains the device kernel for the load callback. It must define
d_loadCallbackPtr
.cb_store (str) – A string contains the device kernel for the store callback. It must define
d_storeCallbackPtr
.cb_load_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the load callback.
cb_store_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the store callback.
Note
Any FFT calls living in this context will have callbacks set up. An example for a load callback is shown below:
code = r''' __device__ cufftComplex CB_ConvertInputC( void *dataIn, size_t offset, void *callerInfo, void *sharedPtr) { // implementation } __device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInputC; ''' with cp.fft.config.set_cufft_callbacks(cb_load=code): out_arr = cp.fft.fft(in_arr, ...)
Note
Below are the runtime requirements for using this feature:
cython >= 0.29.0
A host compiler that supports C++11 and above; might need to set up the
CXX
environment variable.nvcc
and the full CUDA Toolkit. Note that thecudatoolkit
package from Conda-Forge is not enough, as it does not contain static libraries.
Note
Callbacks only work for transforms over contiguous axes; the behavior for non-contiguous transforms is in general undefined.
Warning
Using cuFFT callbacks requires compiling and loading a Python module at runtime as well as static linking for each distinct transform and callback, so the first invocation for each combination will be very slow. This is a limitation of cuFFT, so use this feature only when the callback-enabled transform is known more performant and can be reused to amortize the cost.
Warning
The generated Python modules are by default cached in
~/.cupy/callback_cache
for possible reuse (with the same set of load/store callbacks). Due to static linking, however, the file sizes can be excessive! The cache position can be changed via settingCUPY_CACHE_DIR
.See also
Methods
- __enter__(self)#
- __exit__(self, exc_type, exc_value, traceback)#
- __eq__(value, /)#
Return self==value.
- __ne__(value, /)#
Return self!=value.
- __lt__(value, /)#
Return self<value.
- __le__(value, /)#
Return self<=value.
- __gt__(value, /)#
Return self>value.
- __ge__(value, /)#
Return self>=value.