-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Eval bug: Qwen3 30B A3B is slow with CUDA #13211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
Checked performance with single RTX 3090 and got 90 TPS for Q4 quant size. I suspect it's issue with MoE and multi-gpu setup. |
I'm trying to use
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Name and Version
Operating systems
Windows
GGML backends
CUDA
Hardware
Models
Qwen3-30B-A3B-Q6_K from https://huggingface.co/unsloth/Qwen3-30B-A3B-128K-GGUF
Problem description & steps to reproduce
Using CUDA backend I got only 40-50 tps generation speed.
Here is parameters:
With Vulkan backend I getting 80-90 tps generation speed with
But! With batch size more than 384 I'm getting error with incorrect size and BSOD with video memory issues which never happening with CUDA. I've tested VRAM with
memtest_vulkan-v0.5.0
and everything was fine.First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: