Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

FullstackSensei · 2025-05-01T18:37:51Z

Name and Version

./llama.cpp/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: Tesla P40, compute capability 6.1, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: yes
Device 2: Tesla P40, compute capability 6.1, VMM: yes
Device 3: Tesla P40, compute capability 6.1, VMM: yes
version: 34 (fc727bc)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

Dual Xeon E5-2699v4 + Quad Nvidia P40

Models

unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF/UD-Q4_K_XL

Problem description & steps to reproduce

Using Unslotjh's Llama 4 Scout Q4_K_XL runs fine using the following command:

    --model /models/Llama-4-Scout/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf \
    --threads 40 \
    --ctx-size 32768 \
    --n-gpu-layers 99 \
	--device CUDA1,CUDA2,CUDA3 --tensor-split 0,1,1,1 \
	-fa --cache-type-k q8_0 --cache-type-v q8_0 \
    --seed 3407 \
    --prio 3 \
    --temp 0.6 \
    --min-p 0.01 \
    --top-p 0.9 \
    -no-cnv \
    --prompt "<|header_start|>user<|header_end|>\n\nCreate a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.<|eot|><|header_start|>assistant<|header_end|>\n\n"```

However, if I add -sm row, I get a GGML_ASSERT failed, and llama-cli becomes unresponsive until I Ctrl-C to force exit.

### First Bad Commit

_No response_

### Relevant log output

```shell
./llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1177: GGML_ASSERT(src0_dd_i != nullptr) failed

The text was updated successfully, but these errors were encountered:

JohannesGaessler · 2025-05-01T18:41:05Z

-sm row is not supported for MoE models; the error message is unhelpful though.

FullstackSensei added the bug-unconfirmed label May 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

FullstackSensei commented May 1, 2025

JohannesGaessler commented May 1, 2025

Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

Comments

FullstackSensei commented May 1, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

JohannesGaessler commented May 1, 2025