-
Notifications
You must be signed in to change notification settings - Fork 11.8k
llama : Support llama 4 text-only #12791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
79ebef8
llama4 conversion
ngxson b19dbd0
initial support, no chat template
ngxson f6d8e75
clean up a bit
ngxson 1fb1888
fix tokenizer conversion
ngxson 869d7d9
correct hparams
ngxson 6ceae82
try this
ngxson 7cfc237
fix shexp
ngxson edbaaf4
ffn_inp_normed
ngxson a518c11
chat template
ngxson 46fe5cb
clean up model conversion
ngxson ab91ab2
add_bos
ngxson f9c788d
add scale_before_ffn
ngxson e4012e6
fix order
ngxson 2a9b29a
Merge branch 'master' into xsn/llama4
ngxson ee06e9b
weight_before_ffn
ngxson f8f1bd4
llm_graph_input_attn_temp
ngxson e6a2809
add chunk attn mask
ngxson af1968c
build_inp_attn_scale()
ngxson 09eba6a
add comment about ggml_repeat
ngxson b28cd9c
clarify comments
ngxson d3e67f9
fix build
ngxson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
ied 4 ½ months | ||
__ggml_vocab_test__ | ||
Führer | ||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
|
||
__ggml_vocab_test__ | ||
|
||
|
||
__ggml_vocab_test__ | ||
Hello world | ||
__ggml_vocab_test__ | ||
Hello world | ||
__ggml_vocab_test__ | ||
Hello World | ||
__ggml_vocab_test__ | ||
Hello World | ||
__ggml_vocab_test__ | ||
Hello World! | ||
__ggml_vocab_test__ | ||
Hello, world! | ||
__ggml_vocab_test__ | ||
Hello, world! | ||
__ggml_vocab_test__ | ||
this is 🦙.cpp | ||
__ggml_vocab_test__ | ||
w048 7tuijk dsdfhu | ||
__ggml_vocab_test__ | ||
нещо на Български | ||
__ggml_vocab_test__ | ||
កាន់តែពិសេសអាចខលចេញ | ||
__ggml_vocab_test__ | ||
🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token) | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
__ggml_vocab_test__ | ||
Hello | ||
Hello | ||
__ggml_vocab_test__ | ||
( | ||
__ggml_vocab_test__ | ||
|
||
= | ||
__ggml_vocab_test__ | ||
' era | ||
__ggml_vocab_test__ | ||
Hello, y'all! How are you 😁 ?我想在apple工作1314151天~ | ||
__ggml_vocab_test__ | ||
!!!!!! | ||
__ggml_vocab_test__ | ||
3 | ||
__ggml_vocab_test__ | ||
33 | ||
__ggml_vocab_test__ | ||
333 | ||
__ggml_vocab_test__ | ||
3333 | ||
__ggml_vocab_test__ | ||
33333 | ||
__ggml_vocab_test__ | ||
333333 | ||
__ggml_vocab_test__ | ||
3333333 | ||
__ggml_vocab_test__ | ||
33333333 | ||
__ggml_vocab_test__ | ||
333333333 | ||
__ggml_vocab_test__ | ||
Cửa Việt | ||
__ggml_vocab_test__ | ||
discards | ||
__ggml_vocab_test__ | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL | ||
__ggml_vocab_test__ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
1190 220 32 220 18215 7112 | ||
50 16800 258 | ||
|
||
220 | ||
256 | ||
277 | ||
197 | ||
198 | ||
368 | ||
2946 | ||
3271 | ||
19873 3817 | ||
39715 3817 | ||
19873 7353 | ||
39715 7353 | ||
39715 7353 13 | ||
19873 24 3817 13 | ||
39715 24 3817 13 | ||
544 373 9522 112 247 26 36315 | ||
99 39923 220 35 9607 21498 21470 3679 9433 | ||
1595 7653 633 79829 34051 1636 | ||
8755 102595 115960 21125 148305 96819 102816 39048 14105 22528 160234 | ||
114590 222 330 14879 21 51358 127 12817 93293 117 24204 330 68239 881 120327 170428 21 89101 330 7384 88230 511 947 1492 3742 7233 21 | ||
19873 | ||
39715 | ||
220 39715 | ||
256 39715 | ||
277 39715 | ||
277 39715 198 277 39715 | ||
330 | ||
198 319 | ||
19 7359 | ||
19873 24 386 87799 13 2403 583 650 51358 223 1663 155736 1522 42056 7544 13336 28785 29 4412 20645 | ||
17931 4959 | ||
31 | ||
1922 | ||
12325 | ||
12325 31 | ||
12325 1922 | ||
12325 12325 | ||
12325 12325 31 | ||
12325 12325 1922 | ||
12325 12325 12325 | ||
47 19811 12077 | ||
3260 3579 | ||
198 7283 51499 191231 20192 3271 3322 9287 2143 17860 114590 222 330 14879 21 51358 127 12817 93293 117 24204 330 68239 881 120327 170428 21 89101 9522 112 247 172394 247 220 31 220 1922 220 12325 220 12325 31 220 12325 1922 220 12325 12325 220 12325 12325 31 220 12325 12325 1922 220 31 26 31 220 31 396 31 220 31 1043 31 117131 102595 115960 21125 148305 96819 102816 80883 223 1663 155736 1522 42056 7544 13336 28785 29 4412 20645 79745 150278 117079 633 79829 34051 1636 25611 41990 109428 1488 91054 24072 17931 4959 29795 9296 16517 1806 481 96 1386 36633 1609 24 481 1109 650 5074 43 481 57 702 5074 27088 2170 536 24 481 48 650 1933 1696 30262 43 1665 19 32818 262 27236 56 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy evaluation doesn't support splitting yet, so this will always eagerly evaluate (and so it will take more RAM than ideal during conversion).
This may or may not explain the conversion slowness others are noticing.
This can be fixed in
gguf/gguf-py/lazy.py
by handling tuples of tensors as output values. I have the necessary changes somewhere, I'll open a PR once I find them.(EDIT: see #12809)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic question: is it possible to somehow extend
LazyTorchTensor
to load a tensor remotely? FYI huggingface backend supports byte range, so an idea could be to read the tensor one by one completely on RAM, without having to download them to diskThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I never thought of it, but yes, technically this should totally be possible. Lazy tensors only needs the name, shape and type of the tensor for the fake tensors, and then a way to turn the original fake tensors into real tensors.
The hardest part of this wouldn't necessarily be the lazy tensors, but how the remote paths would be specified and how it would interact with the default output path and default name of the output file, and how the tensors would be enumerated and how the config file and the tokenizer would be fetched.
There's a lot of tokenizer-related code which assumes local files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can rely on AutoTokenizer.from_pretrained, which will download tokenizer files to a temporary directory. Will have a look on how it works.
We can alternatively rely on huggingface_hub.download() which accepts a pattern of file name to download (so for example, we can disallow downloading safetensors)
In my case, loading safetensors remotely can be very useful. I couldn't test the 409B maverick model as it requires 1.5TB in total to store both HF model + gguf, but HF space only provides at max 1TB of storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson I just tested Maverick -sadly it doesn't work on BF16 conversion - it's cause
"interleave_moe_layer_step": 2,
so every 2nd layer / odd layer is MoE, whilst the rest are FFN.Error:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just downloaded
unsloth/Llama-4-Maverick-17B-128E-Instruct
from HF. Everything looks good so far. (On a machine with 2 TB RAM and 13 TB SSD.)It took me about 50 minutes to reach this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this comment here for viz: we discussed via DM and turns out Daniel was using wrong directory 😂
+1 reason to support converting HF --> gguf without downloading to disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature will be 🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got past the
ValueError: Duplicated tensor name 'blk.1.ffn_down_exps.weight'
error, thanks!:Now I wait!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: More Information on Llama 4 Maverick
🛠️ Model Conversion
root@bb22ebf4525a:/ws# time python convert_hf_to_gguf.py model_path ... INFO:hf-to-gguf:Set model quantization version INFO:gguf.gguf_writer:Writing the following files: INFO:gguf.gguf_writer:/model_path/840ed22d9bc7731246bc119cca026a48a0ff8ec6-128x17B-840ed22d9bc7731246bc119cca026a48a0ff8ec6-F16.gguf: n_tensors = 531, total_size = 801.5G ... real 302m33.600s user 285m44.179s sys 113m0.490s
🔧 Quantization
🧪 Tested with MUSA backend: