For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
cvt_pk_fp8_f32_raw
cvt_pk_fp8_f32_raw[dtype: DType](src: SIMD[DType.float32, 4]) -> SIMD[dtype, 4]
Packs 4 f32 into 4 fp8 via 2 chained v_cvt_pk_fp8_f32 ops.
Unlike SIMD.cast[fp8](), this bypasses the compiler's clamp + NaN
scrub wrapper (v_med3_f32 + v_cmp_u_f32 + v_cndmask_b32) that the
pop.cast lowering emits on AMDGPU. The caller is responsible for
ensuring inputs are in the FP8 representable range; finite
out-of-range values are NOT saturated by the hardware instruction.
NaN/Inf inputs produce implementation-defined FP8 outputs.
Notes:
- Only supported on AMD CDNA4+ GPUs.
- Maps to two
v_cvt_pk_fp8_f32(or.pk.bf8.f32) instructions. - Use only when input domain is provably bounded (e.g. softmax output, where values are in (0, 1]).
Parameters:
- dtype (
DType): The FP8 destination type,float8_e4m3fnorfloat8_e5m2.
Args:
- src (
SIMD[DType.float32, 4]): Four f32 values to pack.
Returns:
SIMD[dtype, 4]: SIMD of 4 fp8 values, bitcast from the packed i32 result.