[GH-ISSUE #2443] Bug introduced by PR#2399: ollama always starts a new chat with empty user prompt #1425

Closed
opened 2026-04-12 11:18:12 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @hyjwei on GitHub (Feb 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2443

Originally assigned to: @jmorganca on GitHub.

When I load a model with ollama run model, ollama used to load the model and then stop to wait for my prompt. It is very quick to get the prompt for user's input.

But after commit a0a199b108, when I run ollama run model, ollama will load the model, then immediately start a chat with System prompt and empty user prompt (because I haven't yet given any to ollama). Ollama will generated message to respond the chat, but all these response will be discarded by the client. The client will show nothing and only prompt for user's input after the generation is completed. If this generation takes long times, then ollama client will also wait for a long time.

How to replicate the issue

Run ollama serve with OLLAMA_DEBUG=1, then run ollama run to load any model, for example phi, and check server debug log.

before this commit (a0a199b108), it is like:

time=2024-02-10T21:23:31.237-05:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[1707618211] llama server main loop starting
[1707618211] all slots are idle and system prompt is empty, clear the KV cache
[GIN] 2024/02/10 - 21:23:31 | 200 |  494.310637ms |       127.0.0.1 | POST     "/api/chat"

after this commit (a0a199b108), it is like:

time=2024-02-10T21:34:09.610-05:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[1707618849] llama server main loop starting
[1707618849] all slots are idle and system prompt is empty, clear the KV cache
time=2024-02-10T21:34:09.610-05:00 level=DEBUG source=routes.go:1165 msg="chat handler" prompt="System: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful answers to the user's questions.\nUser: \nAssistant:"
[1707618849] slot 0 is processing [task id: 0]
[1707618849] slot 0 : in cache: 0 tokens | to process: 32 tokens
[1707618849] slot 0 : kv cache rm - [0, end)
[1707618849] sampled token: 15286: ' Assistant'
[1707618849] sampled token:    25: ':'
[1707618849]
[1707618849] print_timings: prompt eval time =     128.28 ms /    32 tokens (    4.01 ms per token,   249.46 tokens per second)
[1707618849] print_timings:        eval time =      21.62 ms /     2 runs   (   10.81 ms per token,    92.49 tokens per second)
[1707618849] print_timings:       total time =     149.90 ms
[1707618849] slot 0 released (34 tokens in cache)
[1707618849] next result cancel on stop
[1707618849] next result removing waiting task ID: 0
[GIN] 2024/02/10 - 21:34:09 | 200 |   638.90862ms |       127.0.0.1 | POST     "/api/chat"

You can find the generated tokens in sampled token message. Although in this particular run it generates two tokens only, usually there are much more tokens generated here.

I personally compiled two binaries with/without a0a199b108 to confirm the difference of results.

Originally created by @hyjwei on GitHub (Feb 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2443 Originally assigned to: @jmorganca on GitHub. When I load a model with `ollama run model`, ollama used to load the model and then stop to wait for my prompt. It is very quick to get the prompt for user's input. But after commit a0a199b108e1ae241df357c9b989fd27e0bd19d9, when I run `ollama run model`, ollama will load the model, then immediately start a chat with System prompt and empty user prompt (because I haven't yet given any to ollama). Ollama will generated message to respond the chat, but all these response will be discarded by the client. The client will show nothing and only prompt for user's input after the generation is completed. If this generation takes long times, then ollama client will also wait for a long time. ## How to replicate the issue Run `ollama serve` with OLLAMA_DEBUG=1, then run `ollama run` to load any model, for example `phi`, and check server debug log. ### before this commit (a0a199b108e1ae241df357c9b989fd27e0bd19d9), it is like: ``` time=2024-02-10T21:23:31.237-05:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" [1707618211] llama server main loop starting [1707618211] all slots are idle and system prompt is empty, clear the KV cache [GIN] 2024/02/10 - 21:23:31 | 200 | 494.310637ms | 127.0.0.1 | POST "/api/chat" ``` ### after this commit (a0a199b108e1ae241df357c9b989fd27e0bd19d9), it is like: ``` time=2024-02-10T21:34:09.610-05:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" [1707618849] llama server main loop starting [1707618849] all slots are idle and system prompt is empty, clear the KV cache time=2024-02-10T21:34:09.610-05:00 level=DEBUG source=routes.go:1165 msg="chat handler" prompt="System: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful answers to the user's questions.\nUser: \nAssistant:" [1707618849] slot 0 is processing [task id: 0] [1707618849] slot 0 : in cache: 0 tokens | to process: 32 tokens [1707618849] slot 0 : kv cache rm - [0, end) [1707618849] sampled token: 15286: ' Assistant' [1707618849] sampled token: 25: ':' [1707618849] [1707618849] print_timings: prompt eval time = 128.28 ms / 32 tokens ( 4.01 ms per token, 249.46 tokens per second) [1707618849] print_timings: eval time = 21.62 ms / 2 runs ( 10.81 ms per token, 92.49 tokens per second) [1707618849] print_timings: total time = 149.90 ms [1707618849] slot 0 released (34 tokens in cache) [1707618849] next result cancel on stop [1707618849] next result removing waiting task ID: 0 [GIN] 2024/02/10 - 21:34:09 | 200 | 638.90862ms | 127.0.0.1 | POST "/api/chat" ``` You can find the generated tokens in `sampled token` message. Although in this particular run it generates two tokens only, usually there are much more tokens generated here. I personally compiled two binaries with/without a0a199b108e1ae241df357c9b989fd27e0bd19d9 to confirm the difference of results.
GiteaMirror added the bug label 2026-04-12 11:18:12 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1425