[GH-ISSUE #13402] Embedding will not work with some models #34610

Closed
opened 2026-04-22 18:19:44 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @alienatorZ on GitHub (Dec 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13402

Originally assigned to: @jessegross on GitHub.

What is the issue?

I have Ubuntu 22.04
Ollama 0.13.3-rc0

with models like gpt-oss and qwen3 30ba3b I get:

time=2025-12-10T01:05:16.820Z level=INFO source=server.go:1332 msg="llama runner started in 0.55 seconds"
panic: caching disabled but unable to fit entire input in a batch

goroutine 8 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00021d0e0, {0x0, {0x5d17aac04d30, 0xc000690100}, {0x5d17aac0f1a0, 0xc000011308}, {0xc0001e9908, 0x200, 0x25f}, {{0x5d17aac0f1a0, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:707 +0x1ac5
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00021d0e0, {0x5d17aabfa2a0, 0xc00036e0a0})
github.com/ollama/ollama/runner/ollamarunner/runner.go:460 +0x30b
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/ollamarunner/runner.go:1412 +0x4c9
[GIN] 2025/12/10 - 01:05:16 | 400 | 662.671261ms | 127.0.0.1 | POST "/v1/embeddings"
ggml_backend_cuda_device_get_memory device GPU-3db830a0-9e60-5a10-12d2-610d17f21eda utilizing NVML memory reporting free: 9562619904 total: 25769803776
ggml_backend_cuda_device_get_memory device GPU-9d2aa8d9-e445-799f-a8e6-8613d292be49 utilizing NVML memory reporting free: 9516482560 total: 25769803776
time=2025-12-10T01:10:16.523Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39367"
time=2025-12-10T01:10:16.902Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45263"
time=2025-12-10T01:10:17.386Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41443"
time=2025-12-10T01:10:17.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40973"
time=2025-12-10T01:10:17.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41359"
time=2025-12-10T01:10:18.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40929"
time=2025-12-10T01:10:18.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44135"
time=2025-12-10T01:10:18.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40651"
time=2025-12-10T01:10:18.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44351"
time=2025-12-10T01:10:19.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37923"
time=2025-12-10T01:10:19.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46833"
time=2025-12-10T01:10:19.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42845"
time=2025-12-10T01:10:19.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40909"
time=2025-12-10T01:10:20.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39489"
time=2025-12-10T01:10:20.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41233"
time=2025-12-10T01:10:20.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40639"
time=2025-12-10T01:10:20.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41111"
time=2025-12-10T01:10:21.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43361"
time=2025-12-10T01:10:21.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40575"
time=2025-12-10T01:10:21.636Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37059"
time=2025-12-10T01:10:21.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41903"
time=2025-12-10T01:10:22.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37109"
time=2025-12-10T01:10:22.138Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2025-12-10T01:10:22.138Z level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"
[GIN] 2025/12/10 - 02:11:11 | 200 | 40.105µs | 127.0.0.1 | GET "/api/version"

Relevant log output


OS

Ubuntu 22.04

GPU

Nvidia Tesla P40 X2

CPU

No response

Ollama version

0.13.3-rc0

Originally created by @alienatorZ on GitHub (Dec 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13402 Originally assigned to: @jessegross on GitHub. ### What is the issue? I have Ubuntu 22.04 Ollama 0.13.3-rc0 with models like gpt-oss and qwen3 30ba3b I get: time=2025-12-10T01:05:16.820Z level=INFO source=server.go:1332 msg="llama runner started in 0.55 seconds" panic: caching disabled but unable to fit entire input in a batch goroutine 8 [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00021d0e0, {0x0, {0x5d17aac04d30, 0xc000690100}, {0x5d17aac0f1a0, 0xc000011308}, {0xc0001e9908, 0x200, 0x25f}, {{0x5d17aac0f1a0, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:707 +0x1ac5 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00021d0e0, {0x5d17aabfa2a0, 0xc00036e0a0}) github.com/ollama/ollama/runner/ollamarunner/runner.go:460 +0x30b created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1412 +0x4c9 [GIN] 2025/12/10 - 01:05:16 | 400 | 662.671261ms | 127.0.0.1 | POST "/v1/embeddings" ggml_backend_cuda_device_get_memory device GPU-3db830a0-9e60-5a10-12d2-610d17f21eda utilizing NVML memory reporting free: 9562619904 total: 25769803776 ggml_backend_cuda_device_get_memory device GPU-9d2aa8d9-e445-799f-a8e6-8613d292be49 utilizing NVML memory reporting free: 9516482560 total: 25769803776 time=2025-12-10T01:10:16.523Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39367" time=2025-12-10T01:10:16.902Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45263" time=2025-12-10T01:10:17.386Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41443" time=2025-12-10T01:10:17.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40973" time=2025-12-10T01:10:17.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41359" time=2025-12-10T01:10:18.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40929" time=2025-12-10T01:10:18.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44135" time=2025-12-10T01:10:18.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40651" time=2025-12-10T01:10:18.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44351" time=2025-12-10T01:10:19.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37923" time=2025-12-10T01:10:19.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46833" time=2025-12-10T01:10:19.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42845" time=2025-12-10T01:10:19.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40909" time=2025-12-10T01:10:20.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39489" time=2025-12-10T01:10:20.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41233" time=2025-12-10T01:10:20.637Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40639" time=2025-12-10T01:10:20.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41111" time=2025-12-10T01:10:21.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43361" time=2025-12-10T01:10:21.387Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40575" time=2025-12-10T01:10:21.636Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37059" time=2025-12-10T01:10:21.887Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41903" time=2025-12-10T01:10:22.137Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37109" time=2025-12-10T01:10:22.138Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout" time=2025-12-10T01:10:22.138Z level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values" [GIN] 2025/12/10 - 02:11:11 | 200 | 40.105µs | 127.0.0.1 | GET "/api/version" ### Relevant log output ```shell ``` ### OS Ubuntu 22.04 ### GPU Nvidia Tesla P40 X2 ### CPU _No response_ ### Ollama version 0.13.3-rc0
GiteaMirror added the bug label 2026-04-22 18:19:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34610