[GH-ISSUE #7617] llama3.2-vision #30620

Closed
opened 2026-04-22 10:26:50 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @tonilampela on GitHub (Nov 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7617

What is the issue?

Running llama3.2-vision 11b model currently seem to throw request timed out errors.

BoltAI 1.26.1
ollama 0.4.1

In ollama console output I can see:

[GIN] 2024/11/11 - 10:17:38 | 500 |          1m0s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2024-11-11T10:17:38.719+02:00 level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.4.1

Originally created by @tonilampela on GitHub (Nov 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7617 ### What is the issue? Running llama3.2-vision 11b model currently seem to throw request timed out errors. BoltAI 1.26.1 ollama 0.4.1 In ollama console output I can see: ``` [GIN] 2024/11/11 - 10:17:38 | 500 | 1m0s | 127.0.0.1 | POST "/v1/chat/completions" time=2024-11-11T10:17:38.719+02:00 level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.4.1
GiteaMirror added the bug label 2026-04-22 10:26:50 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 11, 2024):

Your client has a 1 minute timeout.

<!-- gh-comment-id:2467716637 --> @rick-github commented on GitHub (Nov 11, 2024): Your client has a 1 minute timeout.
Author
Owner

@tonilampela commented on GitHub (Nov 11, 2024):

Your client has a 1 minute timeout.

Yeah, I found from advanced settings timeout adjustment and that seemed to sort it. Thanks!

<!-- gh-comment-id:2468124734 --> @tonilampela commented on GitHub (Nov 11, 2024): > Your client has a 1 minute timeout. Yeah, I found from advanced settings timeout adjustment and that seemed to sort it. Thanks!
Author
Owner

@maxi1134 commented on GitHub (Nov 25, 2024):

I too get "msg="mllama doesn't support parallel requests yet"".

Yet I have no error 500 indicating a timeout.

Could it be something else?

I am on v0.4.4

equests yet"
Nov 25 14:53:48 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:53:48 | 200 |   2.12196848s |    192.168.0.15 | POST     "/api/generate"
Nov 25 14:54:16 machinelearning ollama[586]: time=2024-11-25T14:54:16.635Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:54:27 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:54:27 | 200 | 10.976419175s |    192.168.0.15 | POST     "/api/generate"
Nov 25 14:54:28 machinelearning ollama[586]: time=2024-11-25T14:54:28.514Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:54:29 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:54:29 | 200 |  526.881149ms |    192.168.0.15 | POST     "/api/generate"
Nov 25 14:55:22 machinelearning ollama[586]: time=2024-11-25T14:55:22.022Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:55:33 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:55:33 | 200 | 11.142031778s |    192.168.0.15 | POST     "/api/generate"
Nov 25 14:55:33 machinelearning ollama[586]: time=2024-11-25T14:55:33.167Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:55:34 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:55:34 | 200 |  1.591731104s |    192.168.0.15 | POST     "/api/generate"
Nov 25 14:58:26 machinelearning ollama[586]: time=2024-11-25T14:58:26.031Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:58:26 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:58:26 | 200 |   18.326325ms |             ::1 | POST     "/api/generate"
Nov 25 14:59:01 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:01 | 200 |      39.301µs |       127.0.0.1 | HEAD     "/"
Nov 25 14:59:01 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:01 | 200 |   18.716591ms |       127.0.0.1 | POST     "/api/show"
Nov 25 14:59:02 machinelearning ollama[586]: time=2024-11-25T14:59:02.002Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:59:02 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:02 | 200 |   18.164592ms |       127.0.0.1 | POST     "/api/generate"
Nov 25 14:59:03 machinelearning ollama[586]: time=2024-11-25T14:59:03.818Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:59:04 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:04 | 200 |  513.220707ms |       127.0.0.1 | POST     "/api/chat"
Nov 25 14:59:13 machinelearning ollama[586]: time=2024-11-25T14:59:13.420Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:59:13 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:13 | 200 |  480.104801ms |       127.0.0.1 | POST     "/api/chat"
Nov 25 14:59:25 machinelearning ollama[586]: time=2024-11-25T14:59:25.949Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 14:59:26 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:26 | 200 |  374.918587ms |       127.0.0.1 | POST     "/api/chat"
Nov 25 15:01:16 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:16 | 200 |       17.31µs |       127.0.0.1 | HEAD     "/"
Nov 25 15:01:16 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:16 | 200 |   18.172272ms |       127.0.0.1 | POST     "/api/show"
Nov 25 15:01:16 machinelearning ollama[586]: time=2024-11-25T15:01:16.199Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 15:01:16 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:16 | 200 |   17.658543ms |       127.0.0.1 | POST     "/api/generate"
Nov 25 15:01:17 machinelearning ollama[586]: time=2024-11-25T15:01:17.739Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 15:01:18 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:18 | 200 |  395.252708ms |       127.0.0.1 | POST     "/api/chat"
Nov 25 15:01:34 machinelearning ollama[586]: time=2024-11-25T15:01:34.190Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 15:01:40 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:40 | 200 |  6.206634214s |       127.0.0.1 | POST     "/api/chat"
Nov 25 15:02:10 machinelearning ollama[586]: time=2024-11-25T15:02:10.264Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 15:02:34 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:02:34 | 200 | 24.365532524s |       127.0.0.1 | POST     "/api/chat"
Nov 25 15:06:00 machinelearning ollama[586]: time=2024-11-25T15:06:00.997Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 15:06:09 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:06:09 | 200 |  8.635675577s |    192.168.0.15 | POST     "/api/generate"
Nov 25 15:06:09 machinelearning ollama[586]: time=2024-11-25T15:06:09.634Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet"
Nov 25 15:06:10 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:06:10 | 200 |   503.75888ms |    192.168.0.15 | POST     "/api/generate"
Nov 25 15:08:37 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:08:37 | 200 |       26.17µs |       127.0.0.1 | GET      "/api/version"

<!-- gh-comment-id:2498284490 --> @maxi1134 commented on GitHub (Nov 25, 2024): I too get "msg="mllama doesn't support parallel requests yet"". Yet I have no error 500 indicating a timeout. Could it be something else? I am on v0.4.4 ``` equests yet" Nov 25 14:53:48 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:53:48 | 200 | 2.12196848s | 192.168.0.15 | POST "/api/generate" Nov 25 14:54:16 machinelearning ollama[586]: time=2024-11-25T14:54:16.635Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:54:27 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:54:27 | 200 | 10.976419175s | 192.168.0.15 | POST "/api/generate" Nov 25 14:54:28 machinelearning ollama[586]: time=2024-11-25T14:54:28.514Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:54:29 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:54:29 | 200 | 526.881149ms | 192.168.0.15 | POST "/api/generate" Nov 25 14:55:22 machinelearning ollama[586]: time=2024-11-25T14:55:22.022Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:55:33 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:55:33 | 200 | 11.142031778s | 192.168.0.15 | POST "/api/generate" Nov 25 14:55:33 machinelearning ollama[586]: time=2024-11-25T14:55:33.167Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:55:34 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:55:34 | 200 | 1.591731104s | 192.168.0.15 | POST "/api/generate" Nov 25 14:58:26 machinelearning ollama[586]: time=2024-11-25T14:58:26.031Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:58:26 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:58:26 | 200 | 18.326325ms | ::1 | POST "/api/generate" Nov 25 14:59:01 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:01 | 200 | 39.301µs | 127.0.0.1 | HEAD "/" Nov 25 14:59:01 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:01 | 200 | 18.716591ms | 127.0.0.1 | POST "/api/show" Nov 25 14:59:02 machinelearning ollama[586]: time=2024-11-25T14:59:02.002Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:59:02 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:02 | 200 | 18.164592ms | 127.0.0.1 | POST "/api/generate" Nov 25 14:59:03 machinelearning ollama[586]: time=2024-11-25T14:59:03.818Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:59:04 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:04 | 200 | 513.220707ms | 127.0.0.1 | POST "/api/chat" Nov 25 14:59:13 machinelearning ollama[586]: time=2024-11-25T14:59:13.420Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:59:13 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:13 | 200 | 480.104801ms | 127.0.0.1 | POST "/api/chat" Nov 25 14:59:25 machinelearning ollama[586]: time=2024-11-25T14:59:25.949Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 14:59:26 machinelearning ollama[586]: [GIN] 2024/11/25 - 14:59:26 | 200 | 374.918587ms | 127.0.0.1 | POST "/api/chat" Nov 25 15:01:16 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:16 | 200 | 17.31µs | 127.0.0.1 | HEAD "/" Nov 25 15:01:16 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:16 | 200 | 18.172272ms | 127.0.0.1 | POST "/api/show" Nov 25 15:01:16 machinelearning ollama[586]: time=2024-11-25T15:01:16.199Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 15:01:16 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:16 | 200 | 17.658543ms | 127.0.0.1 | POST "/api/generate" Nov 25 15:01:17 machinelearning ollama[586]: time=2024-11-25T15:01:17.739Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 15:01:18 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:18 | 200 | 395.252708ms | 127.0.0.1 | POST "/api/chat" Nov 25 15:01:34 machinelearning ollama[586]: time=2024-11-25T15:01:34.190Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 15:01:40 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:01:40 | 200 | 6.206634214s | 127.0.0.1 | POST "/api/chat" Nov 25 15:02:10 machinelearning ollama[586]: time=2024-11-25T15:02:10.264Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 15:02:34 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:02:34 | 200 | 24.365532524s | 127.0.0.1 | POST "/api/chat" Nov 25 15:06:00 machinelearning ollama[586]: time=2024-11-25T15:06:00.997Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 15:06:09 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:06:09 | 200 | 8.635675577s | 192.168.0.15 | POST "/api/generate" Nov 25 15:06:09 machinelearning ollama[586]: time=2024-11-25T15:06:09.634Z level=WARN source=sched.go:137 msg="mllama doesn't support parallel requests yet" Nov 25 15:06:10 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:06:10 | 200 | 503.75888ms | 192.168.0.15 | POST "/api/generate" Nov 25 15:08:37 machinelearning ollama[586]: [GIN] 2024/11/25 - 15:08:37 | 200 | 26.17µs | 127.0.0.1 | GET "/api/version" ```
Author
Owner

@rick-github commented on GitHub (Nov 25, 2024):

msg="mllama doesn't support parallel requests yet" has nothing to do with timeouts, it's just a warning saying that OLLAMA_NUM_PARALLEL is going to be treated as if it were set to 1.

<!-- gh-comment-id:2498294912 --> @rick-github commented on GitHub (Nov 25, 2024): `msg="mllama doesn't support parallel requests yet"` has nothing to do with timeouts, it's just a warning saying that `OLLAMA_NUM_PARALLEL` is going to be treated as if it were set to 1.
Author
Owner

@pickmao commented on GitHub (Jan 4, 2025):

same,just set env OLLAMA_NUM_PARALLEL=3,

<!-- gh-comment-id:2569994869 --> @pickmao commented on GitHub (Jan 4, 2025): same,just set env OLLAMA_NUM_PARALLEL=3,
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30620