Skip to content

Misc. bug: the output file of llama-quantize is not gguf format #13258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
samsosu opened this issue May 2, 2025 · 0 comments
Open

Misc. bug: the output file of llama-quantize is not gguf format #13258

samsosu opened this issue May 2, 2025 · 0 comments

Comments

@samsosu
Copy link

samsosu commented May 2, 2025

Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA A10, compute capability 8.6, VMM: yes
version: 5225 (a0f7016)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-quantize

Command line

cd build/bin
./llama-quantize /mnt/data/train_output/Qwen2.5-32B-f16.gguf /mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf Q4_K_M

Problem description & steps to reproduce

the quantize process has been finished successfully. but the output file can not be loaded by the following command:
./llama-cli -m /mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf -n 128 --color -ngl 35

the error response like this:
gguf_init_from_file_impl: invalid magic characters: '', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf

llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/train_output/Qwen2.5-32B-Q4_K_M.gguf'
main: error: unable to load model

And I check the header data of this gguf file, find out there is not GGUF header, there is a lot of zero bytes at the beginning of gguf file

I also checked the source code of quantize.cpp,there is no code about outputing gguf format header at all.

First Bad Commit

No response

Relevant log output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant