cupy.fft.config#
The cupy.fft.config object provides configuration options for FFTs and
FFT plans.
- class config.set_cufft_callbacks(cb_load=None, cb_store=None, ndarray cb_load_aux_arr=None, *, ndarray cb_store_aux_arr=None, str cb_load_name=None, str cb_store_name=None, MemoryPointer cb_load_data=None, MemoryPointer cb_store_data=None, str cb_ver='legacy', tuple nvrtc_options=())[source]#
A context manager for setting up load and/or store callbacks. Any FFT calls living in this context will have callbacks set up.
- Parameters:
cb_load (str) – A string contains the device kernel for the load callback. It must define
d_loadCallbackPtr.cb_store (str) – A string contains the device kernel for the store callback. It must define
d_storeCallbackPtr.cb_load_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the load callback. DEPRECATED.
cb_store_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the store callback. DEPRECATED.
cb_load_name (str) – A string contains device kernel for the load callback. If not defined, we attempt to infer from the provided
cb_loadbut it may fail. Only needed when usingcb_ver="jit".cb_store_name (str) – A string contains device kernel for the store callback. If not defined, we attempt to infer from the provided
cb_storebut it may fail. Only needed when usingcb_ver="jit".cb_load_data (MemoryPointer, optional) – A memory chunk containing data to be used in the load callback.
cb_store_data (MemoryPointer, optional) – A memory chunk containing data to be used in the store callback.
cb_ver (str) – Which cuFFT callback support to use. The default is
"legacy". Starting CUDA 12.2,"jit"is supported.
Note
Callbacks only work for transforms over contiguous axes; the behavior for non-contiguous transforms is in general undefined.
Below is the documentation only applicable to the jit option.
TODO
Below is the documentation only applicable to the legacy option.
Note
An example for a load callback is shown below:
code = r''' __device__ cufftComplex CB_ConvertInputC( void *dataIn, size_t offset, void *callerInfo, void *sharedPtr) { // implementation } __device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInputC; ''' with cp.fft.config.set_cufft_callbacks(cb_load=code): out_arr = cp.fft.fft(in_arr, ...)
Note
Below are the runtime requirements for using this feature:
cython >= 0.29.0
A host compiler that supports C++11 and above; might need to set up the
CXXenvironment variable.nvccand the full CUDA Toolkit. Note that thecudatoolkitpackage from Conda-Forge is not enough, as it does not contain static libraries.
Warning
Using cuFFT callbacks requires compiling and loading a Python module at runtime as well as static linking for each distinct transform and callback, so the first invocation for each combination will be very slow. This is a limitation of cuFFT, so use this feature only when the callback-enabled transform is known more performant and can be reused to amortize the cost.
Warning
The generated Python modules are by default cached in
~/.cupy/callback_cachefor possible reuse (with the same set of load/store callbacks). Due to static linking, however, the file sizes can be excessive! The cache position can be changed via settingCUPY_CACHE_DIR.See also
- __enter__(self)#
- __exit__(self, exc_type, exc_value, traceback)#
- config.set_cufft_gpus(gpus)[source]#
Set the GPUs to be used in multi-GPU FFT.
- Parameters:
gpus (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first
gpusGPUs will be used.
Warning
This API is currently experimental and may be changed in the future version.
See also