[GH-ISSUE #5759] service hang after some requests to /api/embeddings #50097

Closed
opened 2026-04-28 14:05:27 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @JerryKwan on GitHub (Jul 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5759

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

The service seems like hang after some requests to /api/embeddings, and need to restart to recover
Here are some logs

[GIN] 2024/07/18 - 00:52:55 | 200 |  2.824880868s |   10.255.56.113 | POST     "/api/embeddings"
time=2024-07-18T00:52:55.388Z level=INFO source=routes.go:298 msg="embedding generation failed: do embedding request: Post \"http://127.0.0.1:35303/embedding\": context canceled"
[GIN] 2024/07/18 - 00:52:55 | 500 |   257.27018ms |   10.255.56.113 | POST     "/api/embeddings"
cuda driver library failed to get device context 800time=2024-07-18T00:57:55.395Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:55.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:55.898Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:56.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:56.399Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:56.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:56.899Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:57.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:57.399Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:57.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:57.898Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:58.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:58.398Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:58.648Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:58.899Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:59.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:59.398Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:59.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:57:59.898Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
cuda driver library failed to get device context 800time=2024-07-18T00:58:00.149Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
time=2024-07-18T00:58:00.396Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.006997071 model=/root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
cuda driver library failed to get device context 800time=2024-07-18T00:58:00.398Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
time=2024-07-18T00:58:00.646Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.257512846 model=/root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
cuda driver library failed to get device context 800time=2024-07-18T00:58:00.648Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory"
time=2024-07-18T00:58:00.895Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.506938263 model=/root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.2.5

Originally created by @JerryKwan on GitHub (Jul 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5759 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? The service seems like hang after some requests to /api/embeddings, and need to restart to recover Here are some logs ``` [GIN] 2024/07/18 - 00:52:55 | 200 | 2.824880868s | 10.255.56.113 | POST "/api/embeddings" time=2024-07-18T00:52:55.388Z level=INFO source=routes.go:298 msg="embedding generation failed: do embedding request: Post \"http://127.0.0.1:35303/embedding\": context canceled" [GIN] 2024/07/18 - 00:52:55 | 500 | 257.27018ms | 10.255.56.113 | POST "/api/embeddings" cuda driver library failed to get device context 800time=2024-07-18T00:57:55.395Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:55.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:55.898Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:56.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:56.399Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:56.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:56.899Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:57.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:57.399Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:57.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:57.898Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:58.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:58.398Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:58.648Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:58.899Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:59.148Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:59.398Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:59.649Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:57:59.898Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" cuda driver library failed to get device context 800time=2024-07-18T00:58:00.149Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" time=2024-07-18T00:58:00.396Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.006997071 model=/root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 cuda driver library failed to get device context 800time=2024-07-18T00:58:00.398Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" time=2024-07-18T00:58:00.646Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.257512846 model=/root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 cuda driver library failed to get device context 800time=2024-07-18T00:58:00.648Z level=WARN source=gpu.go:399 msg="error looking up nvidia GPU memory" time=2024-07-18T00:58:00.895Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.506938263 model=/root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.2.5
GiteaMirror added the needs more infobug labels 2026-04-28 14:05:28 -05:00
Author
Owner

@royjhan commented on GitHub (Jul 19, 2024):

can you provide steps to reproduce?

<!-- gh-comment-id:2240368987 --> @royjhan commented on GitHub (Jul 19, 2024): can you provide steps to reproduce?
Author
Owner

@LaansDole commented on GitHub (Jul 29, 2024):

I believe that I have also encountered the same error message as @JerryKwan , even though my initialization setup is working as expected. But after a while, it said something like cannot find cuda driver and then the errors above, the Ollama Llama is still running despite all that. The Ollama API that I am using is /api/generate

Screenshot 2024-07-29 090615
Screenshot 2024-07-29 090710

<!-- gh-comment-id:2255035747 --> @LaansDole commented on GitHub (Jul 29, 2024): I believe that I have also encountered the same error message as @JerryKwan , even though my initialization setup is working as expected. But after a while, it said something like cannot find cuda driver and then the errors above, the Ollama Llama is still running despite all that. The Ollama API that I am using is `/api/generate` ![Screenshot 2024-07-29 090615](https://github.com/user-attachments/assets/ff498f07-e308-4434-ba0e-7a0805151f06) ![Screenshot 2024-07-29 090710](https://github.com/user-attachments/assets/9c925c84-8e12-414c-ac7e-6c1d75e3e93d)
Author
Owner

@dhiltgen commented on GitHub (Oct 24, 2024):

Are you still experiencing ollama failing to talk to the GPU after a while?

<!-- gh-comment-id:2434142139 --> @dhiltgen commented on GitHub (Oct 24, 2024): Are you still experiencing ollama failing to talk to the GPU after a while?
Author
Owner

@dhiltgen commented on GitHub (Oct 24, 2024):

Actually, looking more closely at the log, this is a dup of #6928

<!-- gh-comment-id:2434143171 --> @dhiltgen commented on GitHub (Oct 24, 2024): Actually, looking more closely at the log, this is a dup of #6928
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50097