Skip to content

Commit c252e0c

Browse files
authored
metal : optimize multi-sequence FA vec kernel (#13493)
* batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci
1 parent 4f711af commit c252e0c

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

ggml/src/ggml-metal/ggml-metal.metal

+5
Original file line numberDiff line numberDiff line change
@@ -3887,6 +3887,11 @@ kernel void kernel_flash_attn_ext_vec(
38873887
sm[tiisg] = pm[ic + tiisg];
38883888
}
38893889

3890+
// skip -INF blocks
3891+
if (simd_max(sm[tiisg]) == -INFINITY) {
3892+
continue;
3893+
}
3894+
38903895
// Q*K^T
38913896
{
38923897
// each simdgroup processes 1 query and NE (NW/NL) head elements

0 commit comments

Comments
 (0)