class cupy.fft.config.set_cufft_callbacks(unicode cb_load=u'', unicode cb_store=u'', ndarray cb_load_aux_arr=None, *, ndarray cb_store_aux_arr=None)[source]#

A context manager for setting up load and/or store callbacks.

  • cb_load (str) – A string contains the device kernel for the load callback. It must define d_loadCallbackPtr.

  • cb_store (str) – A string contains the device kernel for the store callback. It must define d_storeCallbackPtr.

  • cb_load_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the load callback.

  • cb_store_aux_arr (cupy.ndarray, optional) – A CuPy array containing data to be used in the store callback.


Any FFT calls living in this context will have callbacks set up. An example for a load callback is shown below:

code = r'''
__device__ cufftComplex CB_ConvertInputC(
    void *dataIn,
    size_t offset,
    void *callerInfo,
    void *sharedPtr) {
  // implementation

__device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInputC;

with cp.fft.config.set_cufft_callbacks(cb_load=code):
    out_arr = cp.fft.fft(in_arr, ...)


Below are the runtime requirements for using this feature:

  • cython >= 0.29.0

  • A host compiler that supports C++11 and above; might need to set up the CXX environment variable.

  • nvcc and the full CUDA Toolkit. Note that the cudatoolkit package from Conda-Forge is not enough, as it does not contain static libraries.


Callbacks only work for transforms over contiguous axes; the behavior for non-contiguous transforms is in general undefined.


Using cuFFT callbacks requires compiling and loading a Python module at runtime as well as static linking for each distinct transform and callback, so the first invocation for each combination will be very slow. This is a limitation of cuFFT, so use this feature only when the callback-enabled transform is known more performant and can be reused to amortize the cost.


The generated Python modules are by default cached in ~/.cupy/callback_cache for possible reuse (with the same set of load/store callbacks). Due to static linking, however, the file sizes can be excessive! The cache position can be changed via setting CUPY_CACHE_DIR.


__exit__(self, exc_type, exc_value, traceback)#
__eq__(value, /)#

Return self==value.

__ne__(value, /)#

Return self!=value.

__lt__(value, /)#

Return self<value.

__le__(value, /)#

Return self<=value.

__gt__(value, /)#

Return self>value.

__ge__(value, /)#

Return self>=value.