[GH-ISSUE #12633] Qwen3-embedding:0.6b does not work on v0.12.5 , but works on previous versions #8385

Closed
opened 2026-04-12 21:01:57 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Kemo-is-kiwi on GitHub (Oct 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12633

What is the issue?

Issue

qwen3-embedding:0.6b model crashes immediately when using /api/embed on Ollama v0.12.5 (Windows, CPU backend).

Worked correctly on previous versions (e.g. v0.12.2 / v0.12.3).

Environment

Ollama version: v0.12.5
OS: Windows 11 (x64)
Hardware: CPU only (no GPU detected)
Model: qwen3-embedding:0.6b
Command: POST /api/embed


Repro Steps

ollama pull qwen3-embedding:0.6b
ollama serve

Then in Python:

import requests payload = {"model":"qwen3-embedding:0.6b","input":"t"} print(requests.post("http://127.0.0.1:11434/api/embed", json=payload).json())

Relevant log output

Observed Log Output

print_info: model type = 0.6B
print_info: model params = 595.78 M
print_info: general.name = Qwen3 Embedding 0.6b
print_info: n_ctx_orig_yarn = 32768
print_info: rope type = 2
print_info: rope scaling = linear
print_info: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized

llama_context: CPU compute buffer size = 1104.01 MiB
llama_context: graph nodes = 1127
llama_context: graph splits = 1

time=2025-10-15T04:20:21.574+03:00 level=INFO source=server.go:1309 msg="llama runner started in 1.15 seconds"
time=2025-10-15T04:20:21.575+03:00 level=DEBUG source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3-embedding:0.6b runner.vram="0 B" runner.model=C:\Users\user\.ollama\models\blobs\sha256-06507c7b42688469c4e7298b0a1e16deff06caf291cf0a5b278c308249c3e439 runner.num_ctx=42000

C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:4663:
GGML_ASSERT(i01 >= 0 && i01 < ne01) failed

[GIN] 2025/10/15 - 04:20:21 | 500 | 1.8646282s | 127.0.0.1 | POST "/api/embed"

time=2025-10-15T04:20:22.045+03:00 level=ERROR source=server.go:426 msg="llama runner terminated" error="exit status 0xc0000409"


---

Error from Python client

ollama._types.ResponseError: do embedding request: Post "http://127.0.0.1:63883/embedding":
read tcp 127.0.0.1:63899->127.0.0.1:63883: wsarecv:
An existing connection was forcibly closed by the remote host. (status code: 500)

OS

Windows

GPU

Other

CPU

Intel

Ollama version

v0.12.5

Originally created by @Kemo-is-kiwi on GitHub (Oct 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12633 ### What is the issue? Issue qwen3-embedding:0.6b model crashes immediately when using /api/embed on Ollama v0.12.5 (Windows, CPU backend). Worked correctly on previous versions (e.g. v0.12.2 / v0.12.3). Environment Ollama version: v0.12.5 OS: Windows 11 (x64) Hardware: CPU only (no GPU detected) Model: qwen3-embedding:0.6b Command: POST /api/embed --- Repro Steps ollama pull qwen3-embedding:0.6b ollama serve Then in Python: import requests payload = {"model":"qwen3-embedding:0.6b","input":"t"} print(requests.post("http://127.0.0.1:11434/api/embed", json=payload).json()) ### Relevant log output ```shell Observed Log Output print_info: model type = 0.6B print_info: model params = 595.78 M print_info: general.name = Qwen3 Embedding 0.6b print_info: n_ctx_orig_yarn = 32768 print_info: rope type = 2 print_info: rope scaling = linear print_info: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized llama_context: CPU compute buffer size = 1104.01 MiB llama_context: graph nodes = 1127 llama_context: graph splits = 1 time=2025-10-15T04:20:21.574+03:00 level=INFO source=server.go:1309 msg="llama runner started in 1.15 seconds" time=2025-10-15T04:20:21.575+03:00 level=DEBUG source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3-embedding:0.6b runner.vram="0 B" runner.model=C:\Users\user\.ollama\models\blobs\sha256-06507c7b42688469c4e7298b0a1e16deff06caf291cf0a5b278c308249c3e439 runner.num_ctx=42000 C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:4663: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed [GIN] 2025/10/15 - 04:20:21 | 500 | 1.8646282s | 127.0.0.1 | POST "/api/embed" time=2025-10-15T04:20:22.045+03:00 level=ERROR source=server.go:426 msg="llama runner terminated" error="exit status 0xc0000409" --- Error from Python client ollama._types.ResponseError: do embedding request: Post "http://127.0.0.1:63883/embedding": read tcp 127.0.0.1:63899->127.0.0.1:63883: wsarecv: An existing connection was forcibly closed by the remote host. (status code: 500) ``` ### OS Windows ### GPU Other ### CPU Intel ### Ollama version v0.12.5
GiteaMirror added the bug label 2026-04-12 21:01:57 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 15, 2025):

#12014

<!-- gh-comment-id:3405852518 --> @rick-github commented on GitHub (Oct 15, 2025): #12014
Author
Owner

@jmorganca commented on GitHub (Oct 17, 2025):

This should be fixed in 0.12.6. I'm so sorry for the issue in 0.12.5. Let me know if you're still seeing issues

<!-- gh-comment-id:3416713933 --> @jmorganca commented on GitHub (Oct 17, 2025): This should be fixed in 0.12.6. I'm so sorry for the issue in 0.12.5. Let me know if you're still seeing issues
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8385