IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: Nightly
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

DeviceGraphBuilder

struct DeviceGraphBuilder

Builder for explicit device graph construction.

A DeviceGraphBuilder is obtained from DeviceContext.create_graph_builder(). Callers add kernel nodes via add_function() and then call instantiate() to produce a reusable DeviceGraph.

Example:

from std.gpu.host import DeviceContext

def kernel(x: Int):
print("Value:", x)

with DeviceContext() as ctx:
var compiled_fn = ctx.compile_function[kernel, kernel]()
var builder = ctx.create_graph_builder()
builder.add_function(compiled_fn, 42, grid_dim=1, block_dim=1)
var graph = builder^.instantiate()
graph.replay()
ctx.synchronize()

Implemented traits

AnyType, ImplicitlyDestructible, Movable, _FunctionEnqueuer

comptime members

enqueue_fn_name

comptime enqueue_fn_name = StringSlice("AsyncRT_DeviceGraphBuilder_addFunctionDirect")

C runtime function name used by _FunctionEnqueuer to add a kernel node.

Methods

__init__

__init__(out self, *, copy: Self)

Creates a copy of an existing graph builder by incrementing its reference count.

Args:

  • copy (Self): The graph builder to copy.

__del__

__del__(deinit self)

Releases resources associated with this graph builder.

handle

handle(self) -> UnsafePointer[NoneType, MutExternalOrigin]

Gets the underlying C builder handle.

Returns:

UnsafePointer[NoneType, MutExternalOrigin]: The underlying C builder handle as an opaque pointer.

add_function

add_function[*Ts: DevicePassable](self, f: DeviceFunction[target=f.target, compile_options=f.compile_options, link_options=f.link_options, _ptxas_info_verbose=f._ptxas_info_verbose], *args: *Ts.values, *, grid_dim: Dim, block_dim: Dim, cluster_dim: OptionalReg[Dim] = None, shared_mem_bytes: OptionalReg[Int] = None, var attributes: List[LaunchAttribute] = List(__list_literal__=NoneType(None)), var constant_memory: List[ConstantMemoryMapping] = List(__list_literal__=NoneType(None)))

Adds a type-checked compiled kernel function as a node in this graph.

Parameters:

Args:

Raises:

If adding the node fails.

add_function[FuncType: def() register_passable -> None, //, dump_asm: Variant[Bool, Path, StringSlice[StaticConstantOrigin], def() capturing -> Path] = False, dump_llvm: Variant[Bool, Path, StringSlice[StaticConstantOrigin], def() capturing -> Path] = False, _dump_sass: Variant[Bool, Path, StringSlice[StaticConstantOrigin], def() capturing -> Path] = False, _ptxas_info_verbose: Bool = False](self, func: FuncType, grid_dim: Dim, block_dim: Dim, cluster_dim: OptionalReg[Dim] = None, shared_mem_bytes: OptionalReg[Int] = None, var attributes: List[LaunchAttribute] = List(__list_literal__=NoneType(None)), var constant_memory: List[ConstantMemoryMapping] = List(__list_literal__=NoneType(None)))

Compiles and adds a capturing kernel closure as a node in this graph.

This overload is for kernels that capture variables from their enclosing scope using the {var} capture syntax. Compilation is performed automatically using the DeviceContext that created this builder, so no separate compile step is needed.

Example:

from std.gpu import global_idx
from std.gpu.host import DeviceContext

with DeviceContext() as ctx:
var scale: Float32 = 2.0
var buf = ctx.enqueue_create_buffer[DType.float32](256)
var ptr = buf.unsafe_ptr()

def scale_kernel() {var}:
var i = global_idx.x
ptr[i] = Float32(i) * scale

var builder = ctx.create_graph_builder()
builder.add_function(scale_kernel, grid_dim=1, block_dim=256)
var graph = builder^.instantiate()
graph.replay()
ctx.synchronize()

Parameters:

Args:

  • func (FuncType): The capturing kernel closure to compile and add as a graph node.
  • grid_dim (Dim): Dimensions of the compute grid.
  • block_dim (Dim): Dimensions of each thread block.
  • cluster_dim (OptionalReg[Dim]): Cluster dimensions (optional).
  • shared_mem_bytes (OptionalReg[Int]): Amount of dynamic shared memory per block.
  • attributes (List[LaunchAttribute]): Launch attributes.
  • constant_memory (List[ConstantMemoryMapping]): Constant memory mappings.

Raises:

If adding the node fails.

add_copy

add_copy[dtype: DType](self, dst_buf: DeviceBuffer[dtype], src_buf: HostBuffer[dtype])

Adds a host-to-device memcpy node to the graph.

The number of bytes copied is determined by the size of the device buffer.

Parameters:

  • dtype (DType): Type of the data being copied.

Args:

Raises:

If adding the node fails.

add_copy[dtype: DType](self, dst_buf: HostBuffer[dtype], src_buf: DeviceBuffer[dtype])

Adds a device-to-host memcpy node to the graph.

The number of bytes copied is determined by the size of the device buffer.

Parameters:

  • dtype (DType): Type of the data being copied.

Args:

Raises:

If adding the node fails.

add_copy[dtype: DType](self, dst_buf: DeviceBuffer[dtype], src_buf: DeviceBuffer[dtype])

Adds a device-to-device memcpy node to the graph.

Both buffers must belong to the same context as this builder; cross-context copies are not supported in graphs. The number of bytes copied is determined by the size of the source buffer.

Parameters:

  • dtype (DType): Type of the data being copied.

Args:

Raises:

If adding the node fails.

add_memset

add_memset[dtype: DType](self, dst: DeviceBuffer[dtype], val: Scalar[dtype])

Adds a memset node to the graph that sets all elements of dst to val.

Parameters:

  • dtype (DType): Type of the data stored in the buffer.

Args:

Raises:

If adding the node fails. The underlying graph APIs cannot express an 8-byte memset whose high and low 32-bit halves differ as a single node, so such patterns will return an error.

instantiate

instantiate(var self) -> DeviceGraph

Instantiates the constructed graph into an executable device graph.

Finalizes the graph construction and produces a DeviceGraph that can be replayed multiple times.

Returns:

DeviceGraph: The instantiated device graph.

Raises:

If instantiation fails.