Skip to content

Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FullstackSensei opened this issue May 1, 2025 · 1 comment
Open

Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout #13240

FullstackSensei opened this issue May 1, 2025 · 1 comment

Comments

@FullstackSensei
Copy link

Name and Version

./llama.cpp/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: Tesla P40, compute capability 6.1, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: yes
Device 2: Tesla P40, compute capability 6.1, VMM: yes
Device 3: Tesla P40, compute capability 6.1, VMM: yes
version: 34 (fc727bc)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

Dual Xeon E5-2699v4 + Quad Nvidia P40

Models

unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF/UD-Q4_K_XL

Problem description & steps to reproduce

Using Unslotjh's Llama 4 Scout Q4_K_XL runs fine using the following command:

    --model /models/Llama-4-Scout/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf \
    --threads 40 \
    --ctx-size 32768 \
    --n-gpu-layers 99 \
	--device CUDA1,CUDA2,CUDA3 --tensor-split 0,1,1,1 \
	-fa --cache-type-k q8_0 --cache-type-v q8_0 \
    --seed 3407 \
    --prio 3 \
    --temp 0.6 \
    --min-p 0.01 \
    --top-p 0.9 \
    -no-cnv \
    --prompt "<|header_start|>user<|header_end|>\n\nCreate a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.<|eot|><|header_start|>assistant<|header_end|>\n\n"```

However, if I add -sm row, I get a GGML_ASSERT failed, and llama-cli becomes unresponsive until I Ctrl-C to force exit.

### First Bad Commit

_No response_

### Relevant log output

```shell
./llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1177: GGML_ASSERT(src0_dd_i != nullptr) failed
@JohannesGaessler
Copy link
Collaborator

-sm row is not supported for MoE models; the error message is unhelpful though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants