Quantizing LLM to GGML or GUFF Format: A Comprehensive Guide #4068
Replies: 3 comments 2 replies
-
The easiest thing unless you actually want to do the conversion yourself is to find a pre-converted model in the correct format. TheBloke has a massive amount of models published in various formats: https://huggingface.co/TheBloke This repo currently uses the GGUF format. GGML was the previous format. The LLM project you linked still uses the GGML format (however they're working on GGUF support). This isn't going to be anything like a comprehensive guide, maybe more like a very brief overview. Hopefully it still helps you a bit: If you want to quantize your own model to GGUF format you'll probably follow these steps (I'm assuming it's a LLaMA-type model) -
After writing all this I realized, maybe you're asking about the algorithm rather than how perform quantization with existing tools? |
Beta Was this translation helpful? Give feedback.
-
Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa). |
Beta Was this translation helpful? Give feedback.
-
GGML is the model saving format, like GGUF or not? I found the 1 blog in which I saw that this is a library. Anyone know of a good blog or article with me? |
Beta Was this translation helpful? Give feedback.
-
I would like to know about the detail, How to quantize LLM to GGML or GUFF format. Currently, I found few referent that describe about this e.g. https://github.com/rustformers/llm/blob/main/crates/ggml/README.md. In constant GPTQ have the referent paper that explain about the detail.
So have any website,blog suggest that describe about technique quantize GGML format.
Thank you for your help. :)
Beta Was this translation helpful? Give feedback.
All reactions