You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like sometimes the server will try to generate responses for HTTP requests that have been queued, but the client has since disconnected.
I can reproduce the problem as follows:
Start the server
Start the curl command above. The key aspects of it is that it must be long-running (i.e. n_predict is high).
While it's still running, in another terminal, start and then immediately cancel (with Ctrl+C) the same command a few times, in quick succession.
Start the curl command once more.
Cancel the original curl command in step 2.
Expected behavior: The server should start to immediately reply to the command from step 4.
Actual behavior: The server seems to hang, because it is pointlessly generating replies to the canceled commands in step 3.
I tried to force the server to handle one request at a time with the --threads-http 1 option, but it doesn't seem to make a difference.
First Bad Commit
This seems to be a regression, but it was introduced about a year ago, so the exact change which introduced it is probably not relevant.
Relevant log output
The text was updated successfully, but these errors were encountered:
Name and Version
(Actually version 5161)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
It looks like sometimes the server will try to generate responses for HTTP requests that have been queued, but the client has since disconnected.
I can reproduce the problem as follows:
curl
command above. The key aspects of it is that it must be long-running (i.e.n_predict
is high).curl
command once more.curl
command in step 2.Expected behavior: The server should start to immediately reply to the command from step 4.
Actual behavior: The server seems to hang, because it is pointlessly generating replies to the canceled commands in step 3.
I tried to force the server to handle one request at a time with the
--threads-http 1
option, but it doesn't seem to make a difference.First Bad Commit
This seems to be a regression, but it was introduced about a year ago, so the exact change which introduced it is probably not relevant.
Relevant log output
The text was updated successfully, but these errors were encountered: