Misc. bug: Server does not always cancel requests for disconnected connections #13262

CyberShadow · 2025-05-02T10:22:26Z

Name and Version

$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 0 (unknown)
built with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu

(Actually version 5161)

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m gemma-3-27b-pt-q4_0.gguf -ngl 9999 --host 127.0.0.1 --port 8000 --threads-http 1

# ...

curl -v --request POST \
    --url http://127.0.0.1:8000/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Five, Four, Three, Two, One, '$RANDOM'\n\n\n\nThe countdown","n_predict": 256, "n_probs":10, "temperature":0,"stream":true}'

Problem description & steps to reproduce

It looks like sometimes the server will try to generate responses for HTTP requests that have been queued, but the client has since disconnected.

I can reproduce the problem as follows:

Start the server
Start the curl command above. The key aspects of it is that it must be long-running (i.e. n_predict is high).
While it's still running, in another terminal, start and then immediately cancel (with Ctrl+C) the same command a few times, in quick succession.
Start the curl command once more.
Cancel the original curl command in step 2.

Expected behavior: The server should start to immediately reply to the command from step 4.
Actual behavior: The server seems to hang, because it is pointlessly generating replies to the canceled commands in step 3.

I tried to force the server to handle one request at a time with the --threads-http 1 option, but it doesn't seem to make a difference.

First Bad Commit

This seems to be a regression, but it was introduced about a year ago, so the exact change which introduced it is probably not relevant.

Relevant log output

The text was updated successfully, but these errors were encountered:

CyberShadow added the bug-unconfirmed label May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Server does not always cancel requests for disconnected connections #13262

Misc. bug: Server does not always cancel requests for disconnected connections #13262

CyberShadow commented May 2, 2025

Misc. bug: Server does not always cancel requests for disconnected connections #13262

Misc. bug: Server does not always cancel requests for disconnected connections #13262

Comments

CyberShadow commented May 2, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output