cupy.cuda.Stream#
- class cupy.cuda.Stream(null=False, non_blocking=False, ptds=False, priority=None)[source]#
CUDA stream.
This class handles the CUDA stream handle in RAII way, i.e., when an Stream instance is destroyed by the GC, its handle is also destroyed.
Note that if both
nullandptdsareFalse, a plain new stream is created.- Parameters:
null (bool) – If
True, the stream is a null stream (i.e. the default stream that synchronizes with all streams). Note that you can also use theStream.nullsingleton object instead of creating a new null stream object.ptds (bool) – If
TrueandnullisFalse, the per-thread default stream is used. Note that you can also use theStream.ptdssingleton object instead of creating a new per-thread default stream object.non_blocking (bool) – If
Trueand bothnullandptdsareFalse, the stream does not synchronize with the NULL stream.priority (int) – Priority of the stream. Lower numbers represent higher priorities.
- Variables:
~Stream.ptr (intptr_t) – Raw stream handle.
~Stream.device_id (int) – The ID of the device that the stream was created on. The value
-1is used for the singleton stream objects.
Methods
- __enter__(self)#
- __exit__(self, *args)#
- add_callback(self, callback, arg)#
Adds a callback that is called when all queued work is done.
- Parameters:
callback (function) – Callback function. It must take three arguments (Stream object, int error status, and user data object), and returns nothing.
arg (object) – Argument to the callback.
Note
Whenever possible, use the
launch_host_func()method instead of this one, as it may be deprecated and removed from CUDA at some point. Also note that this function does not support CUDA graph capture.
- begin_capture(self, mode=None)#
Begin stream capture to construct a CUDA graph.
A call to this function must be paired with a call to
end_capture()to complete the capture.# create a non-blocking stream for the purpose of capturing s1 = cp.cuda.Stream(non_blocking=True) with s1: s1.begin_capture() # ... perform operations to construct a graph ... g = s1.end_capture() # the returned graph can be launched on any stream (including s1) g.launch(stream=s1) s1.synchronize() s2 = cp.cuda.Stream() with s2: g.launch() s2.synchronize()
- Parameters:
mode (int) – The stream capture mode. Default is
streamCaptureModeRelaxed.
Note
During the stream capture, synchronous device-host transfers are not allowed. This has a particular implication for CuPy APIs, as some functions that internally require synchronous transfer would not work as expected and an exception would be raised. For further constraints of CUDA stream capture, please refer to the CUDA Programming Guide.
Note
Currently this capability is not supported on HIP.
See also
- end_capture(self)#
End stream capture and retrieve the constructed CUDA graph.
- Returns:
A CUDA graph object that encapsulates the captured work.
- Return type:
Note
Currently this capability is not supported on HIP.
See also
- classmethod from_external(cls, obj)#
Create a Stream from an external stream object via the CUDA stream protocol.
This method creates a CuPy Stream from a foreign stream object that implements the CUDA stream protocol (i.e., has a
__cuda_stream__method). The created Stream holds a reference to the foreign stream object to ensure it remains alive.- Parameters:
obj – A stream-like object that implements the
__cuda_stream__method.- Returns:
A CuPy Stream wrapping the external stream.
- Return type:
- Raises:
TypeError – If the object does not implement
__cuda_stream__or if__cuda_stream__does not return a valid 2-tuple.
Note
This classmethod supersedes
ExternalStream. Users are encouraged to use this method for interoperability with other libraries that support the CUDA stream protocol.See also
Examples
>>> # Assuming torch_stream is a PyTorch CUDA stream >>> cupy_stream = cupy.cuda.Stream.from_external(torch_stream)
- is_capturing(self)#
Check if the stream is capturing.
- Returns:
If the capturing status is successfully queried, the returned value indicates the capturing status. An exception could be raised if such a query is illegal, please refer to the CUDA Programming Guide for detail.
- Return type:
- launch_host_func(self, callback, arg)#
Launch a callback on host when all queued work is done.
- Parameters:
callback (function) – Callback function. It must take only one argument (user data object), and returns nothing.
arg (object) – Argument to the callback.
See also
- record(self, event=None)#
Records an event on the stream.
- Parameters:
event (None or cupy.cuda.Event) – CUDA event. If
None, then a new plain event is created and used.- Returns:
The recorded event.
- Return type:
See also
- synchronize(self)#
Waits for the stream completing all queued work.
- use(self)#
Makes this stream current.
If you want to switch a stream temporarily, use the with statement.
- wait_event(self, event)#
Makes the stream wait for an event.
The future work on this stream will be done after the event.
- Parameters:
event (cupy.cuda.Event) – CUDA event.
- __eq__(self, other)#
- __ne__(value, /)#
Return self!=value.
- __lt__(value, /)#
Return self<value.
- __le__(value, /)#
Return self<=value.
- __gt__(value, /)#
Return self>value.
- __ge__(value, /)#
Return self>=value.
Attributes
- done#
True if all work on this stream has been done.
- is_non_blocking#
True if the stream is non_blocking. False indicates the default stream creation flag.
- null = <Stream 0 (device -1)>#
- priority#
Query the priority of a stream.
- ptds = <Stream 2 (device -1)>#