-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Eval bug: b4882 broke t5 #12435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's extremely unlikely to be that commit, however maybe e0dbec0 did you bisect this or just test b4880 vs b4882? What's your command line BTW? |
These changes in the release most likely broke t5: commit e0dbec0 llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) I don't use any example besides server which I patched to support t5, but the bug can be seen by starting the cli (which I don't really know how to use, but it seemed to be cranking out the same gibberish I see in my server). llama-cli -m /data3hd/models/madlad400-7b-mt.Q6_K.gguf --color -n -1 --multiline-input --interactive-first -ngl 65 -c 512 -ctk f16 -ctv f16 -b 512 -ub 512 -n 512 --keep 0 --temp 0.0 --dynatemp-range 0.0 --dynatemp-exp 1.0 --top-k 40 --top-p 0.95 --typical 1.0 --min-p 0.00 --repeat-last-n 64 --repeat-penalty 1.0 --presence-penalty 0.0 --frequency-penalty 0.0 --mirostat 0 --mirostat-lr 0.1 --mirostat-ent 5.0 -p "" --in-prefix "" --in-suffix "" EDIT: llama-cli -m /data3hd/models/madlad400-7b-mt.Q6_K.gguf --color -n -1 -ngl 65 -c 512 -ctk f16 -ctv f16 -b 512 -ub 512 -n 512 --keep 0 --temp 0.0 --dynatemp-range 0.0 --dynatemp-exp 1.0 --top-k 40 --top-p 0.95 --typical 1.0 --min-p 0.00 --repeat-last-n 64 --repeat-penalty 1.0 --presence-penalty 0.0 --frequency-penalty 0.0 --mirostat 0 --mirostat-lr 0.1 --mirostat-ent 5.0 -p "<2de> Today it rains" --in-prefix "" --in-suffix "" 4880 and below will correctly output: Heute regnet es [end of text] |
I will push another PR for this now |
The docker version I can run of Q4_K_S on a GTX1070 is b4823, after that something broke it just keeps restarting when it warms up. |
FYI, I do not know what's wrong but I can't seem to be able to run nomic embedding has well, is it a separate issue? |
4880 was ok for me on GTX1070. I cant speak for anything above 4823 to <4880. 4882 and above do not work. |
I believe embedding models and t5 problem are related. Unique to t5 is the encoder part which computes embeddings on the prompt to send to decoder. |
IDK, I went all the way back to B4764 (3 weeks ago) without success, using Q4_K_S and Q4_K_L |
t5 was certainly working over that release range for me since I regress it fairly often (mainly when I have to rebase my server patches, 4882 was a significant rebase due to api changes related to kv cache). I can't speak for embedding models as I don't use them but I think the embedding models are very similar to the way the t5 encoder works (i.e. non-causal processing of the whole prompt at once). |
t5 is operational again as of release b4919. Thanks to @ggerganov and @fairydreaming for extremely rapid debug of this issue. |
Sorry to bump on this guys, i'll open a new one but just to confirm. So the dumb question is this a different issue? Thanks guys, again sorry. |
@steampunque what is the nvidia and cuda driver version you are using? Trying to debug why it doe not work on my end. |
Name and Version
version: 4882 (be7c303)
built with cc (GCC) 11.2.0 for x86_64-slackware-linux
Operating systems
Linux
GGML backends
CUDA
Hardware
gtx 1070
Models
madlad400 7b q6_k
Problem description & steps to reproduce
gibberish now comes out of the model after b4882 commit.
First Bad Commit
b4882
Relevant log output
The text was updated successfully, but these errors were encountered: