Attempt to resolve conflicts for PR https://github.com/ggml-org/llama.cpp/pull/12379 #3

cgruver · 2025-04-30T20:24:29Z

@ochafik I have attempted to resolve the conflicts so that your PR can be merged into upstream.

PTAL.

Apologies if I inadvertently reverted anything. I tested this on an Intel 155H with ARC GPU and the granite3.3:2b model.

It seems to work OK. But, I'm a bit of a noob... ;-)

…al based thinking tags parsing)

…r functionary v3.2)

…lls｜>)

…13136) * clip : refactor set input for cgraph * more strict assert * minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere * split qwen2 and qwen2.5 code blocks * minor style fix

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

Co-authored-by: pockers21 <[email protected]>

* mtmd : fix glm-edge redundant token count * fix chat template * temporary disable GLMEdge test chat tmpl

* add depth param * update llama-bench README and add depth param * llama-bench: default params for depth arg for faster execution * Update examples/llama-bench/README.md Co-authored-by: Johannes Gäßler <[email protected]> * fix buffer print ub * use user provided args * remove extra whitespaces --------- Co-authored-by: Johannes Gäßler <[email protected]>

* fix(rpc): Improve input validation and error handling The `rpc-server` was vulnerable to Denial of Service attacks via several RPC commands (`SET_TENSOR`, `GRAPH_COMPUTE`, etc.). Malformed messages could trigger failed assertions (e.g., invalid `ggml_type`) or out-of-bounds reads/writes leading to `GGML_ABORT` calls, crashing the server process. This PR introduces robust input validation and replaces `abort()` calls with graceful error handling: - **Type Validation:** `deserialize_tensor` now checks if the `tensor->type` is within the valid `GGML_TYPE_COUNT` range *before* calling `ggml_new_tensor_4d`. Returns `nullptr` on invalid type. - **Bounds Checks:** Replaced `GGML_ABORT` in `set_tensor`, `set_tensor_hash`, and `get_tensor` handlers with error logging and returning `false` when data/offset parameters are out of buffer bounds. - **Size Checks:** Added safe arithmetic checks (for overflow) in `graph_compute` when calculating required message sizes based on client-provided `n_nodes` and `n_tensors`. Returns early if the reported sizes conflict with the actual message size or would lead to overflow. - **Error Propagation:** - `create_node` now checks for `nullptr` return values from `deserialize_tensor` and its recursive calls, propagating `nullptr` upwards on failure. Uses `find` instead of `at` for safer map access. - `copy_tensor` now checks for `nullptr` from `deserialize_tensor` and sets the response status to failure if deserialization or bounds checks fail. - `graph_compute` now checks for `nullptr` return from `create_node` and returns failure status correctly. The final return value now reflects the actual computation status. These changes improve the RPC server's resilience against malformed client requests, preventing crashes and ensuring errors are handled more gracefully. Signed-off-by: Ville Vesilehto <[email protected]> * refactor(rpc): address pr comments removed comments and unnecessary returns Signed-off-by: Ville Vesilehto <[email protected]> * refactor(rpc): ambiguous nullptr from create_node rpc_server::create_node could previously return nullptr if the input ID was 0 (valid) or if an internal error (deserialization, recursion failure) occurred (invalid). This ambiguity made error handling difficult for the caller (`graph_compute`). This commit clarifies the meaning of nullptr: - `graph_compute` now checks if the input 'id' was non-zero when `create_node` returns nullptr, correctly identifying failures versus intentional null links. - `create_node` avoids recursive calls for zero IDs and propagates nullptr unambiguously on failure during recursion. Signed-off-by: Ville Vesilehto <[email protected]> * refactor(rpc): initial zero check in create_node The caller (`graph_compute`) already checks `id != 0` when handling a `nullptr` return from `create_node`, correctly distinguishing intentional null links from actual errors. This makes the initial `if (id == 0)` check redundant. Also removes the log message when a tensor ID is not found in the provided map which was added in this branch. Signed-off-by: Ville Vesilehto <[email protected]> * fix(rpc): Handle get_alloc_size failure in server Check the return value of `server.get_alloc_size` in the RPC server loop. If the call fails, return early to close the connection. Signed-off-by: Ville Vesilehto <[email protected]> * refactor(rpc): input size validation in graph_compute Removes detailed, step-by-step size calculations and overflow checks in favor of simpler direct comparisons, assuming 64-bit overflow is unlikely. Signed-off-by: Ville Vesilehto <[email protected]> * refactor(rpc): remove extra status code setting Removes the explicit setting of `response.result = GGML_STATUS_FAILED` when `create_node` returns `nullptr` within `graph_compute`. Primary signal is the `false` return value in case of failure. Signed-off-by: Ville Vesilehto <[email protected]> * refactor(rpc): remove redundant check for tensor->type Breaks CI on ubuntu-cpu-make. Tensor type is uint32_t, thus the check is not needed. Signed-off-by: Ville Vesilehto <[email protected]> --------- Signed-off-by: Ville Vesilehto <[email protected]>

ggml-org#12466) * Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture - Adds MoE-based embedding model supporting multilingual embeddings. - Selects architecture variant based on hyperparameter detection (MoE layers). - Removes unnecessary subclass initialization checks for clarity. https://www.nomic.ai/blog/posts/nomic-embed-text-v2 Co-authored-by: Jared Van Bortel <[email protected]> * fix tokenizer * don't rename this tensor --------- Co-authored-by: Jared Van Bortel <[email protected]>

* llama-graph : fix text position for mrope * fix typo * explicitly set 4th dim in the loop

* llava : add clip_n_output_tokens, deprecate clip_n_patches * mtmd : add qwen2vl and qwen2.5vl * decode_embd_batch::set_position_... * working version * deprecate llama-qwen2vl-cli * correct order W, H of clip_embd_nbytes_by_img * edit existing line in hot topics

…g#13183)

ggml-ci

…org#13174) * Prefilling assistant message in openai compatible API * fixed indentation * fixed code convention * simplify method usage * no more than one assistant message at end of messages * merge checks into prefill code * Update examples/server/utils.hpp --------- Co-authored-by: matteo <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

Signed-off-by: xiaofei <[email protected]>

* docker : do not build tests * include "ggml-cpu.h"

* arg : allow using -hf offline * add more comments in code [no ci]

z17 compilation requires GCC 15.1.0 and onwards Signed-off-by: Aaron Teo <[email protected]>

Build fails with compilation error on power pc. This patch fixes the same. Tested with unit tests run via --build <build_dir> && cd <build_dir> && make test Signed-off-by: Shalini Salomi Bodapati <[email protected]>

* convert : improve model arch handling * use AutoConfig * rm trust_remote_code * Update convert_hf_to_gguf.py * fix self.block_count for vision * fix NomicBertModel

Signed-off-by: Charro Gruver <[email protected]>

ochafik added 30 commits March 12, 2025 23:51

add common_regex w/ support for partial final matches

16c9c63

add common_json w/ support for truncated json healing

6dcff43

renaming: string_find_partial_stop (moved to common.cpp)

a95fe78

add common_chat_msg_diff

ce2f593

partial common_chat_parse

cd3681d

refactor parser w/ optionals

9462365

server: wire chat diffs in stream mode

6ed8a8f

fix trigger of thinking models (must happen after thoughts are closed)

eaeed7d

nits + docs

d6e680a

fix functionary v3.2 raw python!

64ea080

rename: common_chat_syntax (now contains format)

c46d4da

rm common_regex.at_start

4358d5d

Merge remote-tracking branch 'origin/master' into tool-diffs

f477288

fix gcc compilation

e0202b3

fix unreachable code warning after [[noreturn]] annotation

f840e3a

fix / refactor test-regex-partial

af7391e

fix test-chat

449917b

rm spaces

b428b5c

fix command r7b partial parsing (lacked args path)

668fc90

Update test_chat_completion.py

b48ab23

refactor + test chat parser (try_consume_json_with_dumped_args, liter…

aefc8a4

…al based thinking tags parsing)

return partial msg from server

22428a4

refactor partial json

5b9c5a4

don't return empty <think></think>

3fbe84f

test_tool_call: allow comment lines in now-multiline code strings (fo…

d4cb7fe

…r functionary v3.2)

accommodate yet another deepseek r1 distill fantasy syntax (<｜tool▁ca…

31f5eb2

…lls｜>)

rm space

bddc65a

nit: fix python type

ea3bf03

refactor test-chat-parser

f3bfbc6

fix QwQ 32B tool call parsing after thoughts (hermes2)

bb7b9fe

ngxson and others added 29 commits April 28, 2025 12:18

clip : refactor set input for cgraph + fix qwen2.5vl input (ggml-org#…

5fa9e63

…13136) * clip : refactor set input for cgraph * more strict assert * minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere * split qwen2 and qwen2.5 code blocks * minor style fix

llama : (mrope) allow using normal 1D position for text token (ggml-o…

d2b2031

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

context : do not clear output buffer on reserve (ggml-org#13152)

fb0471d

Co-authored-by: pockers21 <[email protected]>

mtmd : fix glm-edge redundant token count (ggml-org#13139)

4e87962

* mtmd : fix glm-edge redundant token count * fix chat template * temporary disable GLMEdge test chat tmpl

clip : fix model size display (ggml-org#13153)

eaea325

llama-graph : fix text position for mrope (ggml-org#13159)

b6ce743

* llama-graph : fix text position for mrope * fix typo * explicitly set 4th dim in the loop

llama : set qwen3 model type sizes (ggml-org#13175)

e98b369

llama : llm_type order by size (ggml-org#13177)

7d3af70

CUDA: fix non-cont. inputs for batched mat mul (ggml-org#13155)

cdf7658

llama-bench: fixed size of fields to correctly map to values (ggml-or…

5a63980

…g#13183)

sampling : when top-k <= 0 -> noop (ggml-org#13173)

d9d398f

ggml-ci

scripts: n_depth for compare-llama-bench [no ci] (ggml-org#13201)

19e899c

rpc : fix cache directory initialization (ggml-org#13188)

a0f7016

Signed-off-by: xiaofei <[email protected]>

docker : do not build tests (ggml-org#13204)

da84c04

* docker : do not build tests * include "ggml-cpu.h"

arg : allow using -hf offline (ggml-org#13202)

5933e6f

* arg : allow using -hf offline * add more comments in code [no ci]

feat(ggml-cpu): enable z17 compile (ggml-org#13182)

44cd8d9

z17 compilation requires GCC 15.1.0 and onwards Signed-off-by: Aaron Teo <[email protected]>

convert : correct typo image_mean --> image_std (ggml-org#13208)

07c2e2f

ggml : fix ppc64le build (ggml-org#13176)

4163137

Build fails with compilation error on power pc. This patch fixes the same. Tested with unit tests run via --build <build_dir> && cd <build_dir> && make test Signed-off-by: Shalini Salomi Bodapati <[email protected]>

vulkan: use uint array index to avoid glslang bug (ggml-org#13193)

e5007a5

common : add -jf / --json-schema-file flag (ggml-org#12011)

3b127c7

llava : remove duplicate include (ggml-org#13207)

ceda28e

convert : improve model arch handling (ggml-org#13122)

3e168be

* convert : improve model arch handling * use AutoConfig * rm trust_remote_code * Update convert_hf_to_gguf.py * fix self.block_count for vision * fix NomicBertModel

Merge branch 'master' into re-base

177f614

fix missing extract_reasoning bool in chat.h

870cdce

Signed-off-by: Charro Gruver <[email protected]>

cgruver mentioned this pull request Apr 30, 2025

server: streaming of tool calls and thoughts when --jinja is on ggml-org/llama.cpp#12379

Draft

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to resolve conflicts for PR https://github.com/ggml-org/llama.cpp/pull/12379 #3

Attempt to resolve conflicts for PR https://github.com/ggml-org/llama.cpp/pull/12379 #3

cgruver commented Apr 30, 2025

Attempt to resolve conflicts for PR https://github.com/ggml-org/llama.cpp/pull/12379 #3

Are you sure you want to change the base?

Attempt to resolve conflicts for PR https://github.com/ggml-org/llama.cpp/pull/12379 #3

Conversation

cgruver commented Apr 30, 2025