Feature Request: add per-request "reasoning" options in llama-server #13272

ngxson · 2025-05-02T21:27:22Z

Feature Description

As reasoning models are becoming mainstream, we start to see some pattern:

Most models use <think>, <reasoning>, etc, basically a set of known tokens now
The "reasoning budget" can technically be supported by any models, not just Qwen, by keeping track of number of tokens between <think> and </think>
"no think" is just a reasoning budget == 0

So I'm thinking about accepting an object like this for each request:

"reasoning": {
    "budget": -1, // number of reasoning tokens budget
                     default: -1 (inf) ; 0 for no think
    "format": "", // equivalent of --reasoning-format
                     if set to "deepseek", reasoning will be returned in "message.reasoning_content"
                     if set to "hide", it will be completely hidden
                     default: "none", return the reasoning with the message as normal
}

The reasoning format "hide" can be implemented via #13214 ; the "deepseek" format current only supported for non-stream, but I think we can modify a bit to support this.

For the budget, we don't yet have the logic to handle it.

The text was updated successfully, but these errors were encountered:

GreenCappuccino · 2025-05-04T19:23:28Z

Another interesting option, maybe expose reasoning_effort as a Jinja templating variable? Could be used in Qwen3 where low could fill the <think> block ahead of time, and be OpenAI compatible.

ngxson added the enhancement New feature or request label May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: add per-request "reasoning" options in llama-server #13272

Feature Request: add per-request "reasoning" options in llama-server #13272

ngxson commented May 2, 2025 •

edited

Loading

GreenCappuccino commented May 4, 2025

Feature Request: add per-request "reasoning" options in llama-server #13272

Feature Request: add per-request "reasoning" options in llama-server #13272

Comments

ngxson commented May 2, 2025 • edited Loading

Feature Description

GreenCappuccino commented May 4, 2025

ngxson commented May 2, 2025 •

edited

Loading