Version: Nightly

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

cvt_pk_fp8_f32_raw

cvt_pk_fp8_f32_raw[dtype: DType](src: SIMD[DType.float32, 4]) -> SIMD[dtype, 4]

Packs 4 f32 into 4 fp8 via 2 chained v_cvt_pk_fp8_f32 ops.

Unlike SIMD.cast[fp8](), this bypasses the compiler's clamp + NaN scrub wrapper (v_med3_f32 + v_cmp_u_f32 + v_cndmask_b32) that the pop.cast lowering emits on AMDGPU. The caller is responsible for ensuring inputs are in the FP8 representable range; finite out-of-range values are NOT saturated by the hardware instruction. NaN/Inf inputs produce implementation-defined FP8 outputs.

Notes:

Only supported on AMD CDNA4+ GPUs.
Maps to two v_cvt_pk_fp8_f32 (or .pk.bf8.f32) instructions.
Use only when input domain is provably bounded (e.g. softmax output, where values are in (0, 1]).

Parameters:

dtype (DType): The FP8 destination type, float8_e4m3fn or float8_e5m2.

Args:

src (SIMD[DType.float32, 4]): Four f32 values to pack.

Returns:

SIMD[dtype, 4]: SIMD of 4 fp8 values, bitcast from the packed i32 result.