[GH-ISSUE #10236] Send Multiple Chat Requests to Ollama Server Does Not Work #68775

Closed
opened 2026-05-04 15:09:28 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @JasonHonKL on GitHub (Apr 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10236

What is the issue?

Image

When I send 5 post requests to Ollama, the first request would work and others won't. I am currently using Mac book M1. This maybe an issue as using GPU seems work. To reproduce the bug you just need to send five curl hello world to deepseek-r1:8b model at the same time you should be able to reproduce the bug. Thanks a lot !

Relevant log output

[GIN] 2025/04/12 - 02:29:08 | 200 |         1m36s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/12 - 02:29:08 | 500 |         1m36s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/12 - 02:29:08 | 500 |         1m36s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/12 - 02:29:08 | 500 |         1m36s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/12 - 02:29:08 | 500 |         1m36s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/12 - 02:29:08 | 500 |         1m36s |       127.0.0.1 | POST     "/api/chat"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

latest (git repo main)

Originally created by @JasonHonKL on GitHub (Apr 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10236 ### What is the issue? <img width="997" alt="Image" src="https://github.com/user-attachments/assets/952eecac-6942-4ea4-a7a5-6af26c8fb84e" /> When I send 5 post requests to Ollama, the first request would work and others won't. I am currently using Mac book M1. This maybe an issue as using GPU seems work. To reproduce the bug you just need to send five curl hello world to deepseek-r1:8b model at the same time you should be able to reproduce the bug. Thanks a lot ! ### Relevant log output ```shell [GIN] 2025/04/12 - 02:29:08 | 200 | 1m36s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/12 - 02:29:08 | 500 | 1m36s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/12 - 02:29:08 | 500 | 1m36s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/12 - 02:29:08 | 500 | 1m36s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/12 - 02:29:08 | 500 | 1m36s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/12 - 02:29:08 | 500 | 1m36s | 127.0.0.1 | POST "/api/chat" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version latest (git repo main)
GiteaMirror added the bug label 2026-05-04 15:09:28 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 11, 2025):

Full server log will make it easier to diagnose. Can you show the prompt and the mechanism for submitting simultaneous prompts?

<!-- gh-comment-id:2797794745 --> @rick-github commented on GitHub (Apr 11, 2025): Full server log will make it easier to diagnose. Can you show the prompt and the mechanism for submitting simultaneous prompts?
Author
Owner

@JasonHonKL commented on GitHub (Apr 12, 2025):

@rick-github Sorry for forgetting to post my server log. Here's the log

panic unable to load model: [File Directory of the model]
goroutine 5 [running]:
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0x1400014a2d0, {0x1b, 0x0, 0x1, 0x0
, {0x0, 0x0, 0x0}, 0x14000047c00, 0x0}, ...)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:851 +0x2c8
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:966 +0xa50
time=2025-03-13T16:36:19.825+08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2"
time=2025-03-13T16:36:19.831+08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade"
[GIN] 2025/03/13 - 16:36:19 | 500 |  646.450041ms |       127.0.0.1 | POST     "/api/generate"

However the first request works. i.e the model exists and work for the first request.
Here's the prompt

	prompt := "Please summarize the following content of the file. Highlight the main ideas, key points, and any important details. Here is the content: \n" + fileContent + "\nMake sure to provide a concise yet comprehensive summary. Ensure the agent response is in JSON format with the summary as \"summary\":summary."

I use go routine to send 5 request to Ollama server with \chat api. Thanks a lot.

<!-- gh-comment-id:2798407359 --> @JasonHonKL commented on GitHub (Apr 12, 2025): @rick-github Sorry for forgetting to post my server log. Here's the log ``` panic unable to load model: [File Directory of the model] goroutine 5 [running]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0x1400014a2d0, {0x1b, 0x0, 0x1, 0x0 , {0x0, 0x0, 0x0}, 0x14000047c00, 0x0}, ...) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:851 +0x2c8 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:966 +0xa50 time=2025-03-13T16:36:19.825+08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2" time=2025-03-13T16:36:19.831+08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade" [GIN] 2025/03/13 - 16:36:19 | 500 | 646.450041ms | 127.0.0.1 | POST "/api/generate" ``` However the first request works. i.e the model exists and work for the first request. Here's the prompt ```go prompt := "Please summarize the following content of the file. Highlight the main ideas, key points, and any important details. Here is the content: \n" + fileContent + "\nMake sure to provide a concise yet comprehensive summary. Ensure the agent response is in JSON format with the summary as \"summary\":summary." ``` I use go routine to send 5 request to Ollama server with \chat api. Thanks a lot.
Author
Owner

@rick-github commented on GitHub (Apr 12, 2025):

Full server log will make it easier to diagnose.

<!-- gh-comment-id:2798758151 --> @rick-github commented on GitHub (Apr 12, 2025): **Full** server log will make it easier to diagnose.
Author
Owner

@JasonHonKL commented on GitHub (Apr 13, 2025):

Thanks I just solved. I forget to set max no. of worker with my old computer. Sorry for bothering.

<!-- gh-comment-id:2799594987 --> @JasonHonKL commented on GitHub (Apr 13, 2025): Thanks I just solved. I forget to set max no. of worker with my old computer. Sorry for bothering.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68775