Replies: 1 comment 1 reply
-
You can use # before
llama-server -c 1024 ...
# after
llama-server -c 8192 -np 8 ... This will automatically assign requests to the slot that matches the prefix the most. However the generation speed will be currently impacted negatively due to some implementation limitations. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey!
Is it possible to have multiple caches of prompt prefixes?
In my case, I will have about 8 prompt prefixes that will be rotating all the time. This makes
cache_prompt
mostly useless.Is there a way to cache 8 variations of the prompt prefixes? (while still allowing me to inject suffixes that will always be different, and not expected to be cached)
Many thanks!
Beta Was this translation helpful? Give feedback.
All reactions