You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As reasoning models are becoming mainstream, we start to see some pattern:
Most models use <think>, <reasoning>, etc, basically a set of known tokens now
The "reasoning budget" can technically be supported by any models, not just Qwen, by keeping track of number of tokens between <think> and </think>
"no think" is just a reasoning budget == 0
So I'm thinking about accepting an object like this for each request:
"reasoning": {
"budget": -1, // number of reasoning tokens budget
default: -1 (inf) ; 0 for no think
"format": "", // equivalent of --reasoning-format
if set to "deepseek", reasoning will be returned in "message.reasoning_content"
if set to "hide", it will be completely hidden
default: "none", return the reasoning with the message as normal
}
The reasoning format "hide" can be implemented via #13214 ; the "deepseek" format current only supported for non-stream, but I think we can modify a bit to support this.
For the budget, we don't yet have the logic to handle it.
The text was updated successfully, but these errors were encountered:
Another interesting option, maybe expose reasoning_effort as a Jinja templating variable? Could be used in Qwen3 where low could fill the <think> block ahead of time, and be OpenAI compatible.
Feature Description
As reasoning models are becoming mainstream, we start to see some pattern:
<think>
,<reasoning>
, etc, basically a set of known tokens now<think>
and</think>
So I'm thinking about accepting an object like this for each request:
The reasoning format "hide" can be implemented via #13214 ; the "deepseek" format current only supported for non-stream, but I think we can modify a bit to support this.
For the budget, we don't yet have the logic to handle it.
The text was updated successfully, but these errors were encountered: