cupy.fft.config#

The cupy.fft.config object provides configuration options for FFTs and FFT plans.

class config.set_cufft_callbacks(cb_load=None, cb_store=None, ndarray cb_load_aux_arr=None, *, ndarray cb_store_aux_arr=None, str cb_load_name=None, str cb_store_name=None, MemoryPointer cb_load_data=None, MemoryPointer cb_store_data=None, str cb_ver='legacy', tuple nvrtc_options=())[source]#

A context manager for setting up load and/or store callbacks. Any FFT calls living in this context will have callbacks set up.

Parameters:
  • cb_load (str) – A string contains the device kernel for the load callback. It must define d_loadCallbackPtr.

  • cb_store (str) – A string contains the device kernel for the store callback. It must define d_storeCallbackPtr.

  • cb_load_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the load callback. DEPRECATED.

  • cb_store_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the store callback. DEPRECATED.

  • cb_load_name (str) – A string contains device kernel for the load callback. If not defined, we attempt to infer from the provided cb_load but it may fail. Only needed when using cb_ver="jit".

  • cb_store_name (str) – A string contains device kernel for the store callback. If not defined, we attempt to infer from the provided cb_store but it may fail. Only needed when using cb_ver="jit".

  • cb_load_data (MemoryPointer, optional) – A memory chunk containing data to be used in the load callback.

  • cb_store_data (MemoryPointer, optional) – A memory chunk containing data to be used in the store callback.

  • cb_ver (str) – Which cuFFT callback support to use. The default is "legacy". Starting CUDA 12.2, "jit" is supported.

Note

Callbacks only work for transforms over contiguous axes; the behavior for non-contiguous transforms is in general undefined.

Below is the documentation only applicable to the jit option.

TODO

Below is the documentation only applicable to the legacy option.

Note

An example for a load callback is shown below:

code = r'''
__device__ cufftComplex CB_ConvertInputC(
    void *dataIn,
    size_t offset,
    void *callerInfo,
    void *sharedPtr) {
  // implementation
}

__device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInputC;
'''

with cp.fft.config.set_cufft_callbacks(cb_load=code):
    out_arr = cp.fft.fft(in_arr, ...)

Note

Below are the runtime requirements for using this feature:

  • cython >= 0.29.0

  • A host compiler that supports C++11 and above; might need to set up the CXX environment variable.

  • nvcc and the full CUDA Toolkit. Note that the cudatoolkit package from Conda-Forge is not enough, as it does not contain static libraries.

Warning

Using cuFFT callbacks requires compiling and loading a Python module at runtime as well as static linking for each distinct transform and callback, so the first invocation for each combination will be very slow. This is a limitation of cuFFT, so use this feature only when the callback-enabled transform is known more performant and can be reused to amortize the cost.

Warning

The generated Python modules are by default cached in ~/.cupy/callback_cache for possible reuse (with the same set of load/store callbacks). Due to static linking, however, the file sizes can be excessive! The cache position can be changed via setting CUPY_CACHE_DIR.

__enter__(self)#
__exit__(self, exc_type, exc_value, traceback)#
config.set_cufft_gpus(gpus)[source]#

Set the GPUs to be used in multi-GPU FFT.

Parameters:

gpus (int or list of int) – The number of GPUs or a list of GPUs to be used. For the former case, the first gpus GPUs will be used.

Warning

This API is currently experimental and may be changed in the future version.

config.get_plan_cache() PlanCache#

Get the per-thread, per-device plan cache, or create one if not found.

See also

PlanCache

config.show_plan_cache_info()#

Show all of the plan caches’ info on this thread.

See also

PlanCache