cupy.cuda.MemoryAsyncPool

class cupy.cuda.MemoryAsyncPool(pool_handles='current')

(Experimental) CUDA memory pool for all GPU devices on the host.

A memory pool preserves any allocations even if they are freed by the user. One instance of this class can be used for multiple devices. This class uses CUDA’s Stream Ordered Memory Allocator (supported on CUDA 11.2+). The simplest way to use this pool as CuPy’s default allocator is the following code:

set_allocator(MemoryAsyncPool().malloc)

Using this feature requires CUDA >= 11.2 with a supported GPU and platform. If it is not supported, an error will be raised.

The current CuPy stream is used to allocate/free the memory.

Parameters

pool_handles (str or int) – A flag to indicate which mempool to use. ‘default’ is for the device’s default mempool, ‘current’ is for the current mempool (which could be the default one), and an int that represents cudaMemPool_t created from elsewhere for an external mempool. A list consisting of these flags can also be accepted, in which case the list length must equal to the total number of visible devices so that the mempools for each device can be set independently.

Warning

This feature is currently experimental and subject to change.

Note

MemoryAsyncPool currently cannot work with memory hooks.

Methods

free_all_blocks(self, stream=None)

Releases free memory.

Parameters

stream (cupy.cuda.Stream) – Release memory freed on the given stream. If stream is None, the current stream is used.

free_bytes(self)size_t

Gets the total number of bytes acquired but not used by the pool.

Returns

The total number of bytes acquired but not used by the pool.

Return type

int

get_limit(self)size_t

Gets the upper limit of memory allocation of the current device.

Returns

The number of bytes

Return type

int

Note

Unlike with MemoryPool, MemoryAsyncPool’s set_limit() method can only impose a soft limit. If other (non-CuPy) applications are also allocating memory from the same mempool, this limit may not be respected.

malloc(self, size_t size)MemoryPointer

Allocate memory from the current device’s pool on the current stream.

This method can be used as a CuPy memory allocator. The simplest way to use a memory pool as the default allocator is the following code:

set_allocator(MemoryAsyncPool().malloc)
Parameters

size (int) – Size of the memory buffer to allocate in bytes.

Returns

Pointer to the allocated buffer.

Return type

MemoryPointer

n_free_blocks(self)size_t
set_limit(self, size=None, fraction=None)

Sets the upper limit of memory allocation of the current device.

When fraction is specified, its value will become a fraction of the amount of GPU memory that is available for allocation. For example, if you have a GPU with 2 GiB memory, you can either use set_limit(fraction=0.5) or set_limit(size=1024**3) to limit the memory size to 1 GiB.

size and fraction cannot be specified at the same time. If both of them are not specified or 0 is specified, the limit will be disabled.

Note

Unlike with MemoryPool, MemoryAsyncPool’s set_limit() method can only impose a soft limit. If other (non-CuPy) applications are also allocating memory from the same mempool, this limit may not be respected. Internally, this limit is set via the cudaMemPoolAttrReleaseThreshold attribute.

Note

You can also set the limit by using CUPY_GPU_MEMORY_LIMIT environment variable, see Environment variables for the details. The limit set by this method supersedes the value specified in the environment variable.

Also note that this method only changes the limit for the current device, whereas the environment variable sets the default limit for all devices.

Parameters
  • size (int) – Limit size in bytes.

  • fraction (float) – Fraction in the range of [0, 1].

total_bytes(self)size_t

Gets the total number of bytes acquired by the pool.

Returns

The total number of bytes acquired by the pool.

Return type

int

used_bytes(self)size_t

Gets the total number of bytes used by the pool.

Returns

The total number of bytes used by the pool.

Return type

int

__eq__(value, /)

Return self==value.

__ne__(value, /)

Return self!=value.

__lt__(value, /)

Return self<value.

__le__(value, /)

Return self<=value.

__gt__(value, /)

Return self>value.

__ge__(value, /)

Return self>=value.