-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Prefilling assistant message in openai compatible API #13174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just a heads-up that this is potentially a very breaking change, especially because this is an OpenAI compatible API but this is not OpenAI's behavior. The main situation I can think of is if someone wants to generate a new assistant message after the last one - i.e for ChatML they want the I'd suggest we add this to #9291 at a minimum. |
This adds support for prefilling assistant response (or its thought process) using the OpenAI compatible API.
The feature is used for example by Claude.
It can be tested using open-webui or with the following curl command:
Example advanced scenario: time limit for the thinking process
</think>
to its partial response