[GH-ISSUE #12014] qwen3-embedding crashes starting from 0.11.5 #70038

New Issue

GiteaMirror · 2026-05-04T20:07:30-05:00

GiteaMirror commented

2026-05-04 20:07:30 -05:00

Originally created by @liangstein on GitHub (Aug 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12014

Originally assigned to: @mxyng on GitHub.

What is the issue?

Each time when using qwen3-embedding-4b, there is error:
time=2025-08-21T14:35:05.168-04:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /zfspool/ollama_models/blobs/sha256-b60ae5ce2dd6a0b77f82cadf21def1f310a3e10cde380ad0081b07a9d416949d --port 43943"
time=2025-08-21T14:35:05.169-04:00 level=INFO source=server.go:488 msg="system memory" total="125.0 GiB" free="114.0 GiB" free_swap="4.0 GiB"
time=2025-08-21T14:35:05.169-04:00 level=INFO source=server.go:531 msg=offload library=cpu layers.requested=0 layers.model=37 layers.offload=0 layers.split=[] memory.available="[114.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.7 GiB" memory.required.partial="0 B" memory.required.kv="576.0 MiB" memory.required.allocations="[118.3 MiB]" memory.weights.total="4.0 GiB" memory.weights.repeating="3.6 GiB" memory.weights.nonrepeating="393.4 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB"
time=2025-08-21T14:35:05.177-04:00 level=INFO source=runner.go:864 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon PRO W7700, gfx1101 (0x1101), VMM: no, Wave Size: 32, ID: GPU-fab9c6b417b1ab5e
load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2025-08-21T14:35:07.265-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-08-21T14:35:07.266-04:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:43943"
time=2025-08-21T14:35:07.276-04:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:20 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon PRO W7700) - 15160 MiB free
time=2025-08-21T14:35:07.276-04:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
time=2025-08-21T14:35:07.277-04:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /zfspool/ollama_models/blobs/sha256-b60ae5ce2dd6a0b77f82cadf21def1f310a3e10cde380ad0081b07a9d416949d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B
llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding
llama_model_loader: - kv 4: general.size_label str = 4B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base
llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-...
llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme...
llama_model_loader: - kv 11: qwen3.block_count u32 = 36
llama_model_loader: - kv 12: qwen3.context_length u32 = 40960
llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560
llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 97
.....
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
llama_context: CPU output buffer size = 0.59 MiB
llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB
llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB
llama_context: ROCm0 compute buffer size = 694.64 MiB
llama_context: ROCm_Host compute buffer size = 17.01 MiB
llama_context: graph nodes = 1411
llama_context: graph splits = 472 (with bs=512), 73 (with bs=1)
time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1269 msg="llama runner started in 6.87 seconds"
time=2025-08-25T21:32:43.610-04:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1231 msg="waiting for llama runner to start responding"
time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1269 msg="llama runner started in 6.87 seconds"
//ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
[New LWP 3505330]
[New LWP 3505331]
[New LWP 3505332]
[New LWP 3505333]
[New LWP 3505334]
[New LWP 3505335]
[New LWP 3505336]
[New LWP 3505337]
[New LWP 3505338]
[New LWP 3505339]
[New LWP 3505342]
[New LWP 3505350]
[New LWP 3506719]
[New LWP 3506720]
[New LWP 3506721]
[New LWP 3506722]
[New LWP 3506723]
[New LWP 3506724]
[New LWP 3506725]
[New LWP 3506726]

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @liangstein on GitHub (Aug 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12014 Originally assigned to: @mxyng on GitHub. ### What is the issue? Each time when using qwen3-embedding-4b, there is error: time=2025-08-21T14:35:05.168-04:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /zfspool/ollama_models/blobs/sha256-b60ae5ce2dd6a0b77f82cadf21def1f310a3e10cde380ad0081b07a9d416949d --port 43943" time=2025-08-21T14:35:05.169-04:00 level=INFO source=server.go:488 msg="system memory" total="125.0 GiB" free="114.0 GiB" free_swap="4.0 GiB" time=2025-08-21T14:35:05.169-04:00 level=INFO source=server.go:531 msg=offload library=cpu layers.requested=0 layers.model=37 layers.offload=0 layers.split=[] memory.available="[114.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.7 GiB" memory.required.partial="0 B" memory.required.kv="576.0 MiB" memory.required.allocations="[118.3 MiB]" memory.weights.total="4.0 GiB" memory.weights.repeating="3.6 GiB" memory.weights.nonrepeating="393.4 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB" time=2025-08-21T14:35:05.177-04:00 level=INFO source=runner.go:864 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon PRO W7700, gfx1101 (0x1101), VMM: no, Wave Size: 32, ID: GPU-fab9c6b417b1ab5e load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so time=2025-08-21T14:35:07.265-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-08-21T14:35:07.266-04:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:43943" time=2025-08-21T14:35:07.276-04:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:20 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon PRO W7700) - 15160 MiB free time=2025-08-21T14:35:07.276-04:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-21T14:35:07.277-04:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /zfspool/ollama_models/blobs/sha256-b60ae5ce2dd6a0b77f82cadf21def1f310a3e10cde380ad0081b07a9d416949d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 4B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 97 ..... llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized llama_context: CPU output buffer size = 0.59 MiB llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB llama_context: ROCm0 compute buffer size = 694.64 MiB llama_context: ROCm_Host compute buffer size = 17.01 MiB llama_context: graph nodes = 1411 llama_context: graph splits = 472 (with bs=512), 73 (with bs=1) time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1269 msg="llama runner started in 6.87 seconds" time=2025-08-25T21:32:43.610-04:00 level=INFO source=sched.go:473 msg="loaded runners" count=1 time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1231 msg="waiting for llama runner to start responding" time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1269 msg="llama runner started in 6.87 seconds" //ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed [New LWP 3505330] [New LWP 3505331] [New LWP 3505332] [New LWP 3505333] [New LWP 3505334] [New LWP 3505335] [New LWP 3505336] [New LWP 3505337] [New LWP 3505338] [New LWP 3505339] [New LWP 3505342] [New LWP 3505350] [New LWP 3506719] [New LWP 3506720] [New LWP 3506721] [New LWP 3506722] [New LWP 3506723] [New LWP 3506724] [New LWP 3506725] [New LWP 3506726] ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_

GiteaMirror added the embeddings bug labels 2026-05-04 20:07:31 -05:00

GiteaMirror closed this issue

2026-05-04 20:07:35 -05:00

GiteaMirror commented

2026-05-04 20:07:38 -05:00

@rick-github commented on GitHub (Aug 21, 2025):

There are no errors in this log. Please post the full log.

@rick-github commented on GitHub (Aug 21, 2025): There are no errors in this log. Please post the full log.

GiteaMirror commented

2026-05-04 20:07:41 -05:00

@cornfusing commented on GitHub (Aug 22, 2025):

I encountered the same issue using 0.11.5 and 0.11.6. With 0.11.4 it works without problems.

OS: Ubuntu 24.04 for Ollama; Windows 11 for python
CPU: Intel i9-13900K
Ollama version: 0.11.6

Docker logs at the end.

Using python:

from ollama import embed

response = embed(
    model="hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0",
    input="test",
)

print(response)

Output:

Traceback (most recent call last):
  File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\ollama_embed_test.py", line 3, in <module>
    response = embed(
               ^^^^^^
  File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\.venv\Lib\site-packages\ollama\_client.py", line 367, in embed
    return self._request(
           ^^^^^^^^^^^^^^
  File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\.venv\Lib\site-packages\ollama\_client.py", line 180, in _request
    return cls(**self._request_raw(*args, **kwargs).json())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\.venv\Lib\site-packages\ollama\_client.py", line 124, in _request_raw
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: do embedding request: Post "http://127.0.0.1:36465/embedding": EOF (status code: 500)

Comparison nomic-embed-text:

from ollama import embed

response = embed(
    model="nomic-embed-text",
    input="test",
)

print(response)

Output:

model='nomic-embed-text'
created_at=None
done=None
done_reason=None
total_duration=316931954
load_duration=295798503
prompt_eval_count=1
prompt_eval_duration=None
eval_count=None
eval_duration=None
embeddings=[[0.028800003, 0.011689755, -0.19166185, -0.008959678 [...]

Full docker log

``` time=2025-08-22T08:23:49.512Z level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-08-22T08:23:49.513Z level=INFO source=images.go:477 msg="total blobs: 20" time=2025-08-22T08:23:49.513Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-08-22T08:23:49.513Z level=INFO source=routes.go:1371 msg="Listening on [::]:11434 (version 0.11.6)" time=2025-08-22T08:23:49.514Z level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-08-22T08:23:49.514Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:503 msg="Searching for GPU library" name=libcuda.so* time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:527 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:560 msg="discovered GPU libraries" paths=[] time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:503 msg="Searching for GPU library" name=libcudart.so* time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:527 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:560 msg="discovered GPU libraries" paths=[/usr/lib/ollama/libcudart.so.12.8.90] cudaSetDevice err: 35 time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:576 msg="Unable to load cudart library /usr/lib/ollama/libcudart.so.12.8.90: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-08-22T08:23:49.516Z level=DEBUG source=amd_linux.go:422 msg="amdgpu driver not detected /sys/module/amdgpu" time=2025-08-22T08:23:49.516Z level=INFO source=gpu.go:379 msg="no compatible GPUs were discovered" time=2025-08-22T08:23:49.516Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.2 GiB" available="29.3 GiB" time=2025-08-22T08:25:20.929Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="29.3 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:20.929Z level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-08-22T08:25:20.936Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-22T08:25:20.936Z level=DEBUG source=sched.go:208 msg="loading first model" model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 8B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 8B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 8B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-8B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 4096 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 12288 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 7 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q8_0: 253 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 7.49 GiB (8.50 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 7.57 B print_info: general.name = Qwen3 Embedding 8B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-08-22T08:25:21.072Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.5 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=INFO source=server.go:383 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb --port 41285" time=2025-08-22T08:25:21.081Z level=DEBUG source=server.go:384 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/lib/ollama time=2025-08-22T08:25:21.081Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="28.5 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.5 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=INFO source=server.go:488 msg="system memory" total="31.2 GiB" free="28.5 GiB" free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=DEBUG source=memory.go:177 msg=evaluating library=cpu gpu_count=1 available="[28.6 GiB]" time=2025-08-22T08:25:21.081Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-08-22T08:25:21.081Z level=DEBUG source=memory.go:177 msg=evaluating library=cpu gpu_count=1 available="[28.6 GiB]" time=2025-08-22T08:25:21.081Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-08-22T08:25:21.082Z level=INFO source=server.go:531 msg=offload library=cpu layers.requested=-1 layers.model=37 layers.offload=0 layers.split=[] memory.available="[28.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.6 GiB" memory.required.partial="0 B" memory.required.kv="576.0 MiB" memory.required.allocations="[8.6 GiB]" memory.weights.total="7.5 GiB" memory.weights.repeating="6.9 GiB" memory.weights.nonrepeating="629.5 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB" time=2025-08-22T08:25:21.129Z level=INFO source=runner.go:864 msg="starting go runner" time=2025-08-22T08:25:21.129Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-08-22T08:25:21.132Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-08-22T08:25:21.133Z level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:41285" time=2025-08-22T08:25:21.134Z level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-08-22T08:25:21.135Z level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-22T08:25:21.135Z level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 8B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 8B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 8B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-8B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 4096 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 12288 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 7 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q8_0: 253 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 7.49 GiB (8.50 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 0 print_info: n_ctx_train = 40960 print_info: n_embd = 4096 print_info: n_layer = 36 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 12288 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 3 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 40960 print_info: rope_finetuned = unknown print_info: model type = 8B print_info: model params = 7.57 B print_info: general.name = Qwen3 Embedding 8B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: layer 0 assigned to device CPU, is_swa = 0 load_tensors: layer 1 assigned to device CPU, is_swa = 0 load_tensors: layer 2 assigned to device CPU, is_swa = 0 load_tensors: layer 3 assigned to device CPU, is_swa = 0 load_tensors: layer 4 assigned to device CPU, is_swa = 0 load_tensors: layer 5 assigned to device CPU, is_swa = 0 load_tensors: layer 6 assigned to device CPU, is_swa = 0 load_tensors: layer 7 assigned to device CPU, is_swa = 0 load_tensors: layer 8 assigned to device CPU, is_swa = 0 load_tensors: layer 9 assigned to device CPU, is_swa = 0 load_tensors: layer 10 assigned to device CPU, is_swa = 0 load_tensors: layer 11 assigned to device CPU, is_swa = 0 load_tensors: layer 12 assigned to device CPU, is_swa = 0 load_tensors: layer 13 assigned to device CPU, is_swa = 0 load_tensors: layer 14 assigned to device CPU, is_swa = 0 load_tensors: layer 15 assigned to device CPU, is_swa = 0 load_tensors: layer 16 assigned to device CPU, is_swa = 0 load_tensors: layer 17 assigned to device CPU, is_swa = 0 load_tensors: layer 18 assigned to device CPU, is_swa = 0 load_tensors: layer 19 assigned to device CPU, is_swa = 0 load_tensors: layer 20 assigned to device CPU, is_swa = 0 load_tensors: layer 21 assigned to device CPU, is_swa = 0 load_tensors: layer 22 assigned to device CPU, is_swa = 0 load_tensors: layer 23 assigned to device CPU, is_swa = 0 load_tensors: layer 24 assigned to device CPU, is_swa = 0 load_tensors: layer 25 assigned to device CPU, is_swa = 0 load_tensors: layer 26 assigned to device CPU, is_swa = 0 load_tensors: layer 27 assigned to device CPU, is_swa = 0 load_tensors: layer 28 assigned to device CPU, is_swa = 0 load_tensors: layer 29 assigned to device CPU, is_swa = 0 load_tensors: layer 30 assigned to device CPU, is_swa = 0 load_tensors: layer 31 assigned to device CPU, is_swa = 0 load_tensors: layer 32 assigned to device CPU, is_swa = 0 load_tensors: layer 33 assigned to device CPU, is_swa = 0 load_tensors: layer 34 assigned to device CPU, is_swa = 0 load_tensors: layer 35 assigned to device CPU, is_swa = 0 load_tensors: layer 36 assigned to device CPU, is_swa = 0 load_tensors: CPU model buffer size = 7668.64 MiB load_all_data: no device found for buffer type CPU for async uploads time=2025-08-22T08:25:21.637Z level=DEBUG source=server.go:1278 msg="model load progress 0.18" time=2025-08-22T08:25:21.888Z level=DEBUG source=server.go:1278 msg="model load progress 0.30" time=2025-08-22T08:25:22.139Z level=DEBUG source=server.go:1278 msg="model load progress 0.43" time=2025-08-22T08:25:22.390Z level=DEBUG source=server.go:1278 msg="model load progress 0.52" time=2025-08-22T08:25:22.640Z level=DEBUG source=server.go:1278 msg="model load progress 0.60" time=2025-08-22T08:25:22.891Z level=DEBUG source=server.go:1278 msg="model load progress 0.67" time=2025-08-22T08:25:23.142Z level=DEBUG source=server.go:1278 msg="model load progress 0.78" time=2025-08-22T08:25:23.393Z level=DEBUG source=server.go:1278 msg="model load progress 0.87" time=2025-08-22T08:25:23.644Z level=DEBUG source=server.go:1278 msg="model load progress 0.97" llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized set_abort_callback: call llama_context: CPU output buffer size = 0.59 MiB create_memory: n_ctx = 4096 (padded) llama_kv_cache_unified: layer 0: dev = CPU llama_kv_cache_unified: layer 1: dev = CPU llama_kv_cache_unified: layer 2: dev = CPU llama_kv_cache_unified: layer 3: dev = CPU llama_kv_cache_unified: layer 4: dev = CPU llama_kv_cache_unified: layer 5: dev = CPU llama_kv_cache_unified: layer 6: dev = CPU llama_kv_cache_unified: layer 7: dev = CPU llama_kv_cache_unified: layer 8: dev = CPU llama_kv_cache_unified: layer 9: dev = CPU llama_kv_cache_unified: layer 10: dev = CPU llama_kv_cache_unified: layer 11: dev = CPU llama_kv_cache_unified: layer 12: dev = CPU llama_kv_cache_unified: layer 13: dev = CPU llama_kv_cache_unified: layer 14: dev = CPU llama_kv_cache_unified: layer 15: dev = CPU llama_kv_cache_unified: layer 16: dev = CPU llama_kv_cache_unified: layer 17: dev = CPU llama_kv_cache_unified: layer 18: dev = CPU llama_kv_cache_unified: layer 19: dev = CPU llama_kv_cache_unified: layer 20: dev = CPU llama_kv_cache_unified: layer 21: dev = CPU llama_kv_cache_unified: layer 22: dev = CPU llama_kv_cache_unified: layer 23: dev = CPU llama_kv_cache_unified: layer 24: dev = CPU llama_kv_cache_unified: layer 25: dev = CPU llama_kv_cache_unified: layer 26: dev = CPU llama_kv_cache_unified: layer 27: dev = CPU llama_kv_cache_unified: layer 28: dev = CPU llama_kv_cache_unified: layer 29: dev = CPU llama_kv_cache_unified: layer 30: dev = CPU llama_kv_cache_unified: layer 31: dev = CPU llama_kv_cache_unified: layer 32: dev = CPU llama_kv_cache_unified: layer 33: dev = CPU llama_kv_cache_unified: layer 34: dev = CPU llama_kv_cache_unified: layer 35: dev = CPU llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 1 llama_context: max_nodes = 3184 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: CPU compute buffer size = 316.23 MiB llama_context: graph nodes = 1411 llama_context: graph splits = 1 time=2025-08-22T08:25:23.894Z level=INFO source=server.go:1272 msg="llama runner started in 2.81 seconds" time=2025-08-22T08:25:23.894Z level=INFO source=sched.go:473 msg="loaded runners" count=1 time=2025-08-22T08:25:23.894Z level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-22T08:25:23.895Z level=INFO source=server.go:1272 msg="llama runner started in 2.81 seconds" time=2025-08-22T08:25:23.895Z level=DEBUG source=sched.go:485 msg="finished setting up" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 time=2025-08-22T08:25:23.903Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-22T08:25:23.904Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=2 used=0 remaining=2 //ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed /usr/lib/ollama/libggml-base.so(+0x151a8)[0x7f59980c31a8] /usr/lib/ollama/libggml-base.so(ggml_print_backtrace+0x1e6)[0x7f59980c3576] /usr/lib/ollama/libggml-base.so(ggml_abort+0x11d)[0x7f59980c36fd] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x5d79e)[0x7f599371b79e] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x11c51)[0x7f59936cfc51] /usr/lib/ollama/libggml-cpu-alderlake.so(ggml_graph_compute+0xdc)[0x7f59936d205c] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x144a0)[0x7f59936d24a0] /usr/bin/ollama(+0x109bf75)[0x55dc081f1f75] /usr/bin/ollama(+0x110f7a1)[0x55dc082657a1] /usr/bin/ollama(+0x110fac2)[0x55dc08265ac2] /usr/bin/ollama(+0x1115da7)[0x55dc0826bda7] /usr/bin/ollama(+0x1116c5c)[0x55dc0826cc5c] /usr/bin/ollama(+0x1034d21)[0x55dc0818ad21] /usr/bin/ollama(+0x360c21)[0x55dc074b6c21] SIGABRT: abort PC=0x7f59e201ab2c m=0 sigcode=18446744073709551610 signal arrived during cgo execution

goroutine 16 gp=0xc000102a80 m=0 mp=0x55dc091f7d20 [syscall]:
runtime.cgocall(0x55dc0818ace0, 0xc0000bdbd8)
runtime/cgocall.go:167 +0x4b fp=0xc0000bdbb0 sp=0xc0000bdb78 pc=0x55dc074ac3eb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f598800a0d0, {0x2, 0x7f59880c7110, 0x0, 0x7f59880cbb40, 0x7f5988007150, 0x7f598a6e66d0, 0x7f5988010af0})
_cgo_gotypes.go:668 +0x4a fp=0xc0000bdbd8 sp=0xc0000bdbb0 pc=0x55dc0785b90a
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
github.com/ollama/ollama/llama/llama.go:150
github.com/ollama/ollama/llama.(*Context).Decode(0xc00068d1a0?, 0x1?)
github.com/ollama/ollama/llama/llama.go:150 +0xed fp=0xc0000bdcc0 sp=0xc0000bdbd8 pc=0x55dc0785e6ed
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004e12c0, 0xc000692640, 0xc0004adf28)
github.com/ollama/ollama/runner/llamarunner/runner.go:441 +0x209 fp=0xc0000bdee8 sp=0xc0000bdcc0 pc=0x55dc07924f29
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004e12c0, {0x55dc089553e0, 0xc0003a4730})
github.com/ollama/ollama/runner/llamarunner/runner.go:346 +0x1d5 fp=0xc0000bdfb8 sp=0xc0000bdee8 pc=0x55dc07924bb5
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x28 fp=0xc0000bdfe0 sp=0xc0000bdfb8 pc=0x55dc07929908
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000bdfe8 sp=0xc0000bdfe0 pc=0x55dc074b6fa1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x4c5

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00058f790 sp=0xc00058f770 pc=0x55dc074af86e
runtime.netpollblock(0xc00058f7e0?, 0x7448666?, 0xdc?)
runtime/netpoll.go:575 +0xf7 fp=0xc00058f7c8 sp=0xc00058f790 pc=0x55dc07474357
internal/poll.runtime_pollWait(0x7f599acaceb0, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc00058f7e8 sp=0xc00058f7c8 pc=0x55dc074aea85
internal/poll.(*pollDesc).wait(0xc000685200?, 0x900000036?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00058f810 sp=0xc00058f7e8 pc=0x55dc07535ec7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000685200)
internal/poll/fd_unix.go:620 +0x295 fp=0xc00058f8b8 sp=0xc00058f810 pc=0x55dc0753b295
net.(*netFD).accept(0xc000685200)
net/fd_unix.go:172 +0x29 fp=0xc00058f970 sp=0xc00058f8b8 pc=0x55dc075ae249
net.(*TCPListener).accept(0xc00044d100)
net/tcpsock_posix.go:159 +0x1b fp=0xc00058f9c0 sp=0xc00058f970 pc=0x55dc075c3bfb
net.(*TCPListener).Accept(0xc00044d100)
net/tcpsock.go:380 +0x30 fp=0xc00058f9f0 sp=0xc00058f9c0 pc=0x55dc075c2ab0
net/http.(*onceCloseListener).Accept(0xc0004e83f0?)
:1 +0x24 fp=0xc00058fa08 sp=0xc00058f9f0 pc=0x55dc077da204
net/http.(*Server).Serve(0xc00004f800, {0x55dc08952f28, 0xc00044d100})
net/http/server.go:3424 +0x30c fp=0xc00058fb38 sp=0xc00058fa08 pc=0x55dc077b1acc
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034260, 0x4, 0x4})
github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x8f5 fp=0xc00058fd08 sp=0xc00058fb38 pc=0x55dc07929695
github.com/ollama/ollama/runner.Execute({0xc000034250?, 0x0?, 0x0?})
github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00058fd30 sp=0xc00058fd08 pc=0x55dc079b3c34
github.com/ollama/ollama/cmd.NewCLI.func2(0xc00004f500?, {0x55dc0846e081?, 0x4?, 0x55dc0846e085?})
github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc00058fd58 sp=0xc00058fd30 pc=0x55dc08118ce5
github.com/spf13/cobra.(*Command).execute(0xc0004eaf08, {0xc00044cf40, 0x4, 0x4})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00058fe78 sp=0xc00058fd58 pc=0x55dc0762789c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0006a1508)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00058ff30 sp=0xc00058fe78 pc=0x55dc076280e5
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00058ff50 sp=0xc00058ff30 pc=0x55dc081197cd
runtime.main()
runtime/proc.go:283 +0x29d fp=0xc00058ffe0 sp=0xc00058ff50 pc=0x55dc0747b9dd
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00058ffe8 sp=0xc00058ffe0 pc=0x55dc074b6fa1

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000aafa8 sp=0xc0000aaf88 pc=0x55dc074af86e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.forcegchelper()
runtime/proc.go:348 +0xb8 fp=0xc0000aafe0 sp=0xc0000aafa8 pc=0x55dc0747bd18
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aafe8 sp=0xc0000aafe0 pc=0x55dc074b6fa1
created by runtime.init.7 in goroutine 1
runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000ab780 sp=0xc0000ab760 pc=0x55dc074af86e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.bgsweep(0xc0000d6000)
runtime/mgcsweep.go:316 +0xdf fp=0xc0000ab7c8 sp=0xc0000ab780 pc=0x55dc074664bf
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000ab7e0 sp=0xc0000ab7c8 pc=0x55dc0745a8a5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ab7e8 sp=0xc0000ab7e0 pc=0x55dc074b6fa1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x55dc08631af8?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000abf78 sp=0xc0000abf58 pc=0x55dc074af86e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.(*scavengerState).park(0x55dc091f4f00)
runtime/mgcscavenge.go:425 +0x49 fp=0xc0000abfa8 sp=0xc0000abf78 pc=0x55dc07463f09
runtime.bgscavenge(0xc0000d6000)
runtime/mgcscavenge.go:658 +0x59 fp=0xc0000abfc8 sp=0xc0000abfa8 pc=0x55dc07464499
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc0000abfe0 sp=0xc0000abfc8 pc=0x55dc0745a845
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000abfe8 sp=0xc0000abfe0 pc=0x55dc074b6fa1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc0000aa688?)
runtime/proc.go:435 +0xce fp=0xc0000aa630 sp=0xc0000aa610 pc=0x55dc074af86e
runtime.runfinq()
runtime/mfinal.go:196 +0x107 fp=0xc0000aa7e0 sp=0xc0000aa630 pc=0x55dc07459867
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aa7e8 sp=0xc0000aa7e0 pc=0x55dc074b6fa1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001fa8c0 m=nil [chan receive]:
runtime.gopark(0xc00025f540?, 0xc000118018?, 0x60?, 0xc7?, 0x55dc07594e88?)
runtime/proc.go:435 +0xce fp=0xc0000ac718 sp=0xc0000ac6f8 pc=0x55dc074af86e
runtime.chanrecv(0xc0000e2310, 0x0, 0x1)
runtime/chan.go:664 +0x445 fp=0xc0000ac790 sp=0xc0000ac718 pc=0x55dc0744b245
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:506 +0x12 fp=0xc0000ac7b8 sp=0xc0000ac790 pc=0x55dc0744add2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1799 +0x2f fp=0xc0000ac7e0 sp=0xc0000ac7b8 pc=0x55dc0745da4f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ac7e8 sp=0xc0000ac7e0 pc=0x55dc074b6fa1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001fac40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000acf38 sp=0xc0000acf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000acfc8 sp=0xc0000acf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000acfe0 sp=0xc0000acfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000acfe8 sp=0xc0000acfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000a6738 sp=0xc0000a6718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000a67c8 sp=0xc0000a6738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000a67e0 sp=0xc0000a67c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000502380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000518738 sp=0xc000518718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005187c8 sp=0xc000518738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005187e0 sp=0xc0005187c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005187e8 sp=0xc0005187e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0001fae00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000ad738 sp=0xc0000ad718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000ad7c8 sp=0xc0000ad738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000ad7e0 sp=0xc0000ad7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ad7e8 sp=0xc0000ad7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000514738 sp=0xc000514718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005147c8 sp=0xc000514738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005147e0 sp=0xc0005147c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005147e8 sp=0xc0005147e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000514f38 sp=0xc000514f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000514fc8 sp=0xc000514f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000514fe0 sp=0xc000514fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000514fe8 sp=0xc000514fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000502540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000518f38 sp=0xc000518f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000518fc8 sp=0xc000518f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000518fe0 sp=0xc000518fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000518fe8 sp=0xc000518fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0001fafc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000adf38 sp=0xc0000adf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000adfc8 sp=0xc0000adf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000adfe0 sp=0xc0000adfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000515738 sp=0xc000515718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005157c8 sp=0xc000515738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005157e0 sp=0xc0005157c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005157e8 sp=0xc0005157e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 53 gp=0xc000584540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000515f38 sp=0xc000515f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000515fc8 sp=0xc000515f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000515fe0 sp=0xc000515fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000515fe8 sp=0xc000515fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 54 gp=0xc000584700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000516738 sp=0xc000516718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005167c8 sp=0xc000516738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005167e0 sp=0xc0005167c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005167e8 sp=0xc0005167e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000502700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000519738 sp=0xc000519718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005197c8 sp=0xc000519738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005197e0 sp=0xc0005197c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005197e8 sp=0xc0005197e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0001fb180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004aa738 sp=0xc0004aa718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004aa7c8 sp=0xc0004aa738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004aa7e0 sp=0xc0004aa7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004aa7e8 sp=0xc0004aa7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 55 gp=0xc0005848c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000516f38 sp=0xc000516f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000516fc8 sp=0xc000516f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000516fe0 sp=0xc000516fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000516fe8 sp=0xc000516fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc0005028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000519f38 sp=0xc000519f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000519fc8 sp=0xc000519f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000519fe0 sp=0xc000519fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000519fe8 sp=0xc000519fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0001fb340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004aaf38 sp=0xc0004aaf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004aafc8 sp=0xc0004aaf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004aafe0 sp=0xc0004aafc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004aafe8 sp=0xc0004aafe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 56 gp=0xc000584a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000517738 sp=0xc000517718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005177c8 sp=0xc000517738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005177e0 sp=0xc0005177c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005177e8 sp=0xc0005177e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000502a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051a738 sp=0xc00051a718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051a7c8 sp=0xc00051a738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051a7e0 sp=0xc00051a7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051a7e8 sp=0xc00051a7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0001fb500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004ab738 sp=0xc0004ab718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004ab7c8 sp=0xc0004ab738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004ab7e0 sp=0xc0004ab7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ab7e8 sp=0xc0004ab7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 57 gp=0xc000584c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000517f38 sp=0xc000517f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000517fc8 sp=0xc000517f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000517fe0 sp=0xc000517fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000517fe8 sp=0xc000517fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000502c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051af38 sp=0xc00051af18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051afc8 sp=0xc00051af38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051afe0 sp=0xc00051afc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051afe8 sp=0xc00051afe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 40 gp=0xc000502e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051b738 sp=0xc00051b718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051b7c8 sp=0xc00051b738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051b7e0 sp=0xc00051b7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051b7e8 sp=0xc00051b7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc0001fb6c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004abf38 sp=0xc0004abf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004abfc8 sp=0xc0004abf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004abfe0 sp=0xc0004abfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004abfe8 sp=0xc0004abfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 58 gp=0xc000584e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a6738 sp=0xc0004a6718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a67c8 sp=0xc0004a6738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a67e0 sp=0xc0004a67c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a67e8 sp=0xc0004a67e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 41 gp=0xc000502fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051bf38 sp=0xc00051bf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051bfc8 sp=0xc00051bf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051bfe0 sp=0xc00051bfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051bfe8 sp=0xc00051bfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc0001fb880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004ac738 sp=0xc0004ac718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004ac7c8 sp=0xc0004ac738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004ac7e0 sp=0xc0004ac7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ac7e8 sp=0xc0004ac7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 15 gp=0xc0001fba40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004acf38 sp=0xc0004acf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004acfc8 sp=0xc0004acf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004acfe0 sp=0xc0004acfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004acfe8 sp=0xc0004acfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 59 gp=0xc000584fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a6f38 sp=0xc0004a6f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a6fc8 sp=0xc0004a6f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a6fe0 sp=0xc0004a6fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a6fe8 sp=0xc0004a6fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 60 gp=0xc000585180 m=nil [GC worker (idle)]:
runtime.gopark(0x52663eb4c0ff?, 0x1?, 0x56?, 0xd7?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a7738 sp=0xc0004a7718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a77c8 sp=0xc0004a7738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a77e0 sp=0xc0004a77c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a77e8 sp=0xc0004a77e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 61 gp=0xc000585340 m=nil [GC worker (idle)]:
runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x5c?, 0xa1?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a7f38 sp=0xc0004a7f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a7fc8 sp=0xc0004a7f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a7fe0 sp=0xc0004a7fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a7fe8 sp=0xc0004a7fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 62 gp=0xc000585500 m=nil [GC worker (idle)]:
runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x3b?, 0xc0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a8738 sp=0xc0004a8718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a87c8 sp=0xc0004a8738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a87e0 sp=0xc0004a87c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a87e8 sp=0xc0004a87e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 63 gp=0xc0005856c0 m=nil [GC worker (idle)]:
runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x7e?, 0xe3?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a8f38 sp=0xc0004a8f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a8fc8 sp=0xc0004a8f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a8fe0 sp=0xc0004a8fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a8fe8 sp=0xc0004a8fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 66 gp=0xc000102c40 m=nil [chan receive]:
runtime.gopark(0x55dc074b4fb4?, 0xc0000478d8?, 0xf0?, 0xec?, 0xc0000478c0?)
runtime/proc.go:435 +0xce fp=0xc0000478a0 sp=0xc000047880 pc=0x55dc074af86e
runtime.chanrecv(0xc0003b3960, 0xc000047a70, 0x1)
runtime/chan.go:664 +0x445 fp=0xc000047918 sp=0xc0000478a0 pc=0x55dc0744b245
runtime.chanrecv1(0xc00031e5a0?, 0xc00044d500?)
runtime/chan.go:506 +0x12 fp=0xc000047940 sp=0xc000047918 pc=0x55dc0744add2
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0004e12c0, {0x55dc08953108, 0xc00069ed20}, 0xc0004d3e00)
github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x697 fp=0xc000047ac0 sp=0xc000047940 pc=0x55dc07927657
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x55dc08953108?, 0xc00069ed20?}, 0xc000047b40?)
:1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x55dc07929c96
net/http.HandlerFunc.ServeHTTP(0xc00053f2c0?, {0x55dc08953108?, 0xc00069ed20?}, 0xc000047b60?)
net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x55dc077ae109
net/http.(*ServeMux).ServeHTTP(0x55dc07453d85?, {0x55dc08953108, 0xc00069ed20}, 0xc0004d3e00)
net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x55dc077b0004
net/http.serverHandler.ServeHTTP({0x55dc0894f790?}, {0x55dc08953108?, 0xc00069ed20?}, 0x1?)
net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x55dc077cda8e
net/http.(*conn).serve(0xc0004e83f0, {0x55dc089553a8, 0xc000255e60})
net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x55dc077ac605
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x55dc077b1ec8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x55dc074b6fa1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3454 +0x485

goroutine 76 gp=0xc000102e00 m=nil [IO wait]:
runtime.gopark(0x55dc09189600?, 0xc0000a6e38?, 0x38?, 0x6e?, 0xb?)
runtime/proc.go:435 +0xce fp=0xc0000a6dd8 sp=0xc0000a6db8 pc=0x55dc074af86e
runtime.netpollblock(0x55dc074d2bd8?, 0x7448666?, 0xdc?)
runtime/netpoll.go:575 +0xf7 fp=0xc0000a6e10 sp=0xc0000a6dd8 pc=0x55dc07474357
internal/poll.runtime_pollWait(0x7f599acacd98, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc0000a6e30 sp=0xc0000a6e10 pc=0x55dc074aea85
internal/poll.(*pollDesc).wait(0xc000685280?, 0xc0002ac101?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000a6e58 sp=0xc0000a6e30 pc=0x55dc07535ec7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000685280, {0xc0002ac101, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc0000a6ef0 sp=0xc0000a6e58 pc=0x55dc075371ba
net.(*netFD).Read(0xc000685280, {0xc0002ac101?, 0xc00044d1d8?, 0xc0000a6f70?})
net/fd_posix.go:55 +0x25 fp=0xc0000a6f38 sp=0xc0000a6ef0 pc=0x55dc075ac2a5
net.(*conn).Read(0xc0000ae920, {0xc0002ac101?, 0x0?, 0x0?})
net/net.go:194 +0x45 fp=0xc0000a6f80 sp=0xc0000a6f38 pc=0x55dc075ba665
net/http.(*connReader).backgroundRead(0xc0002ac0f0)
net/http/server.go:690 +0x37 fp=0xc0000a6fc8 sp=0xc0000a6f80 pc=0x55dc077a64d7
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc0000a6fe0 sp=0xc0000a6fc8 pc=0x55dc077a6405
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0 pc=0x55dc074b6fa1
created by net/http.(*connReader).startBackgroundRead in goroutine 66
net/http/server.go:686 +0xb6

rax 0x0
rbx 0x15
rcx 0x7f59e201ab2c
rdx 0x6
rdi 0x15
rsi 0x15
rbp 0x7fffeeab7a70
rsp 0x7fffeeab7a30
r8 0x0
r9 0x7
r10 0x8
r11 0x246
r12 0x6
r13 0x7f5993766ca0
r14 0x16
r15 0x7f59887324a0
rip 0x7f59e201ab2c
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
[GIN] 2025/08/22 - 08:25:24 | 500 | 3.236966618s | 192.168.127.1 | POST "/api/embed"
time=2025-08-22T08:25:24.131Z level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:493 msg="context for request finished"
time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 duration=5m0s
time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 refCount=0


</p>
</details>

@cornfusing commented on GitHub (Aug 22, 2025): I encountered the same issue using 0.11.5 and 0.11.6. With 0.11.4 it works without problems. OS: Ubuntu 24.04 for Ollama; Windows 11 for python CPU: Intel i9-13900K Ollama version: 0.11.6 Docker logs at the end. Using python: ```python from ollama import embed response = embed( model="hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0", input="test", ) print(response) ``` Output: ```shell Traceback (most recent call last): File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\ollama_embed_test.py", line 3, in <module> response = embed( ^^^^^^ File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\.venv\Lib\site-packages\ollama\_client.py", line 367, in embed return self._request( ^^^^^^^^^^^^^^ File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\.venv\Lib\site-packages\ollama\_client.py", line 180, in _request return cls(**self._request_raw(*args, **kwargs).json()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\user\Documents\scripts\2025_08_22_qwen_embedding\.venv\Lib\site-packages\ollama\_client.py", line 124, in _request_raw raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: do embedding request: Post "http://127.0.0.1:36465/embedding": EOF (status code: 500) ``` Comparison nomic-embed-text: ```python from ollama import embed response = embed( model="nomic-embed-text", input="test", ) print(response) ``` Output: ```shell model='nomic-embed-text' created_at=None done=None done_reason=None total_duration=316931954 load_duration=295798503 prompt_eval_count=1 prompt_eval_duration=None eval_count=None eval_duration=None embeddings=[[0.028800003, 0.011689755, -0.19166185, -0.008959678 [...] ``` <details><summary>Full docker log</summary> <p> ``` time=2025-08-22T08:23:49.512Z level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-08-22T08:23:49.513Z level=INFO source=images.go:477 msg="total blobs: 20" time=2025-08-22T08:23:49.513Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-08-22T08:23:49.513Z level=INFO source=routes.go:1371 msg="Listening on [::]:11434 (version 0.11.6)" time=2025-08-22T08:23:49.514Z level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-08-22T08:23:49.514Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:503 msg="Searching for GPU library" name=libcuda.so* time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:527 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:560 msg="discovered GPU libraries" paths=[] time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:503 msg="Searching for GPU library" name=libcudart.so* time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:527 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:560 msg="discovered GPU libraries" paths=[/usr/lib/ollama/libcudart.so.12.8.90] cudaSetDevice err: 35 time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:576 msg="Unable to load cudart library /usr/lib/ollama/libcudart.so.12.8.90: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-08-22T08:23:49.516Z level=DEBUG source=amd_linux.go:422 msg="amdgpu driver not detected /sys/module/amdgpu" time=2025-08-22T08:23:49.516Z level=INFO source=gpu.go:379 msg="no compatible GPUs were discovered" time=2025-08-22T08:23:49.516Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.2 GiB" available="29.3 GiB" time=2025-08-22T08:25:20.929Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="29.3 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:20.929Z level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-08-22T08:25:20.936Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-22T08:25:20.936Z level=DEBUG source=sched.go:208 msg="loading first model" model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 8B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 8B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 8B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-8B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 4096 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 12288 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 7 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q8_0: 253 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 7.49 GiB (8.50 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 7.57 B print_info: general.name = Qwen3 Embedding 8B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-08-22T08:25:21.072Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.5 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=INFO source=server.go:383 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb --port 41285" time=2025-08-22T08:25:21.081Z level=DEBUG source=server.go:384 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/lib/ollama time=2025-08-22T08:25:21.081Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="28.5 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.5 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=INFO source=server.go:488 msg="system memory" total="31.2 GiB" free="28.5 GiB" free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=DEBUG source=memory.go:177 msg=evaluating library=cpu gpu_count=1 available="[28.6 GiB]" time=2025-08-22T08:25:21.081Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-08-22T08:25:21.081Z level=DEBUG source=memory.go:177 msg=evaluating library=cpu gpu_count=1 available="[28.6 GiB]" time=2025-08-22T08:25:21.081Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-08-22T08:25:21.082Z level=INFO source=server.go:531 msg=offload library=cpu layers.requested=-1 layers.model=37 layers.offload=0 layers.split=[] memory.available="[28.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.6 GiB" memory.required.partial="0 B" memory.required.kv="576.0 MiB" memory.required.allocations="[8.6 GiB]" memory.weights.total="7.5 GiB" memory.weights.repeating="6.9 GiB" memory.weights.nonrepeating="629.5 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB" time=2025-08-22T08:25:21.129Z level=INFO source=runner.go:864 msg="starting go runner" time=2025-08-22T08:25:21.129Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-08-22T08:25:21.132Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-08-22T08:25:21.133Z level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:41285" time=2025-08-22T08:25:21.134Z level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-08-22T08:25:21.135Z level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-22T08:25:21.135Z level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 8B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 8B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 8B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-8B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 4096 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 12288 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 7 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q8_0: 253 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 7.49 GiB (8.50 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 0 print_info: n_ctx_train = 40960 print_info: n_embd = 4096 print_info: n_layer = 36 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 12288 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 3 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 40960 print_info: rope_finetuned = unknown print_info: model type = 8B print_info: model params = 7.57 B print_info: general.name = Qwen3 Embedding 8B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: layer 0 assigned to device CPU, is_swa = 0 load_tensors: layer 1 assigned to device CPU, is_swa = 0 load_tensors: layer 2 assigned to device CPU, is_swa = 0 load_tensors: layer 3 assigned to device CPU, is_swa = 0 load_tensors: layer 4 assigned to device CPU, is_swa = 0 load_tensors: layer 5 assigned to device CPU, is_swa = 0 load_tensors: layer 6 assigned to device CPU, is_swa = 0 load_tensors: layer 7 assigned to device CPU, is_swa = 0 load_tensors: layer 8 assigned to device CPU, is_swa = 0 load_tensors: layer 9 assigned to device CPU, is_swa = 0 load_tensors: layer 10 assigned to device CPU, is_swa = 0 load_tensors: layer 11 assigned to device CPU, is_swa = 0 load_tensors: layer 12 assigned to device CPU, is_swa = 0 load_tensors: layer 13 assigned to device CPU, is_swa = 0 load_tensors: layer 14 assigned to device CPU, is_swa = 0 load_tensors: layer 15 assigned to device CPU, is_swa = 0 load_tensors: layer 16 assigned to device CPU, is_swa = 0 load_tensors: layer 17 assigned to device CPU, is_swa = 0 load_tensors: layer 18 assigned to device CPU, is_swa = 0 load_tensors: layer 19 assigned to device CPU, is_swa = 0 load_tensors: layer 20 assigned to device CPU, is_swa = 0 load_tensors: layer 21 assigned to device CPU, is_swa = 0 load_tensors: layer 22 assigned to device CPU, is_swa = 0 load_tensors: layer 23 assigned to device CPU, is_swa = 0 load_tensors: layer 24 assigned to device CPU, is_swa = 0 load_tensors: layer 25 assigned to device CPU, is_swa = 0 load_tensors: layer 26 assigned to device CPU, is_swa = 0 load_tensors: layer 27 assigned to device CPU, is_swa = 0 load_tensors: layer 28 assigned to device CPU, is_swa = 0 load_tensors: layer 29 assigned to device CPU, is_swa = 0 load_tensors: layer 30 assigned to device CPU, is_swa = 0 load_tensors: layer 31 assigned to device CPU, is_swa = 0 load_tensors: layer 32 assigned to device CPU, is_swa = 0 load_tensors: layer 33 assigned to device CPU, is_swa = 0 load_tensors: layer 34 assigned to device CPU, is_swa = 0 load_tensors: layer 35 assigned to device CPU, is_swa = 0 load_tensors: layer 36 assigned to device CPU, is_swa = 0 load_tensors: CPU model buffer size = 7668.64 MiB load_all_data: no device found for buffer type CPU for async uploads time=2025-08-22T08:25:21.637Z level=DEBUG source=server.go:1278 msg="model load progress 0.18" time=2025-08-22T08:25:21.888Z level=DEBUG source=server.go:1278 msg="model load progress 0.30" time=2025-08-22T08:25:22.139Z level=DEBUG source=server.go:1278 msg="model load progress 0.43" time=2025-08-22T08:25:22.390Z level=DEBUG source=server.go:1278 msg="model load progress 0.52" time=2025-08-22T08:25:22.640Z level=DEBUG source=server.go:1278 msg="model load progress 0.60" time=2025-08-22T08:25:22.891Z level=DEBUG source=server.go:1278 msg="model load progress 0.67" time=2025-08-22T08:25:23.142Z level=DEBUG source=server.go:1278 msg="model load progress 0.78" time=2025-08-22T08:25:23.393Z level=DEBUG source=server.go:1278 msg="model load progress 0.87" time=2025-08-22T08:25:23.644Z level=DEBUG source=server.go:1278 msg="model load progress 0.97" llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized set_abort_callback: call llama_context: CPU output buffer size = 0.59 MiB create_memory: n_ctx = 4096 (padded) llama_kv_cache_unified: layer 0: dev = CPU llama_kv_cache_unified: layer 1: dev = CPU llama_kv_cache_unified: layer 2: dev = CPU llama_kv_cache_unified: layer 3: dev = CPU llama_kv_cache_unified: layer 4: dev = CPU llama_kv_cache_unified: layer 5: dev = CPU llama_kv_cache_unified: layer 6: dev = CPU llama_kv_cache_unified: layer 7: dev = CPU llama_kv_cache_unified: layer 8: dev = CPU llama_kv_cache_unified: layer 9: dev = CPU llama_kv_cache_unified: layer 10: dev = CPU llama_kv_cache_unified: layer 11: dev = CPU llama_kv_cache_unified: layer 12: dev = CPU llama_kv_cache_unified: layer 13: dev = CPU llama_kv_cache_unified: layer 14: dev = CPU llama_kv_cache_unified: layer 15: dev = CPU llama_kv_cache_unified: layer 16: dev = CPU llama_kv_cache_unified: layer 17: dev = CPU llama_kv_cache_unified: layer 18: dev = CPU llama_kv_cache_unified: layer 19: dev = CPU llama_kv_cache_unified: layer 20: dev = CPU llama_kv_cache_unified: layer 21: dev = CPU llama_kv_cache_unified: layer 22: dev = CPU llama_kv_cache_unified: layer 23: dev = CPU llama_kv_cache_unified: layer 24: dev = CPU llama_kv_cache_unified: layer 25: dev = CPU llama_kv_cache_unified: layer 26: dev = CPU llama_kv_cache_unified: layer 27: dev = CPU llama_kv_cache_unified: layer 28: dev = CPU llama_kv_cache_unified: layer 29: dev = CPU llama_kv_cache_unified: layer 30: dev = CPU llama_kv_cache_unified: layer 31: dev = CPU llama_kv_cache_unified: layer 32: dev = CPU llama_kv_cache_unified: layer 33: dev = CPU llama_kv_cache_unified: layer 34: dev = CPU llama_kv_cache_unified: layer 35: dev = CPU llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 1 llama_context: max_nodes = 3184 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: CPU compute buffer size = 316.23 MiB llama_context: graph nodes = 1411 llama_context: graph splits = 1 time=2025-08-22T08:25:23.894Z level=INFO source=server.go:1272 msg="llama runner started in 2.81 seconds" time=2025-08-22T08:25:23.894Z level=INFO source=sched.go:473 msg="loaded runners" count=1 time=2025-08-22T08:25:23.894Z level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-22T08:25:23.895Z level=INFO source=server.go:1272 msg="llama runner started in 2.81 seconds" time=2025-08-22T08:25:23.895Z level=DEBUG source=sched.go:485 msg="finished setting up" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 time=2025-08-22T08:25:23.903Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-22T08:25:23.904Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=2 used=0 remaining=2 //ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed /usr/lib/ollama/libggml-base.so(+0x151a8)[0x7f59980c31a8] /usr/lib/ollama/libggml-base.so(ggml_print_backtrace+0x1e6)[0x7f59980c3576] /usr/lib/ollama/libggml-base.so(ggml_abort+0x11d)[0x7f59980c36fd] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x5d79e)[0x7f599371b79e] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x11c51)[0x7f59936cfc51] /usr/lib/ollama/libggml-cpu-alderlake.so(ggml_graph_compute+0xdc)[0x7f59936d205c] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x144a0)[0x7f59936d24a0] /usr/bin/ollama(+0x109bf75)[0x55dc081f1f75] /usr/bin/ollama(+0x110f7a1)[0x55dc082657a1] /usr/bin/ollama(+0x110fac2)[0x55dc08265ac2] /usr/bin/ollama(+0x1115da7)[0x55dc0826bda7] /usr/bin/ollama(+0x1116c5c)[0x55dc0826cc5c] /usr/bin/ollama(+0x1034d21)[0x55dc0818ad21] /usr/bin/ollama(+0x360c21)[0x55dc074b6c21] SIGABRT: abort PC=0x7f59e201ab2c m=0 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 16 gp=0xc000102a80 m=0 mp=0x55dc091f7d20 [syscall]: runtime.cgocall(0x55dc0818ace0, 0xc0000bdbd8) runtime/cgocall.go:167 +0x4b fp=0xc0000bdbb0 sp=0xc0000bdb78 pc=0x55dc074ac3eb github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f598800a0d0, {0x2, 0x7f59880c7110, 0x0, 0x7f59880cbb40, 0x7f5988007150, 0x7f598a6e66d0, 0x7f5988010af0}) _cgo_gotypes.go:668 +0x4a fp=0xc0000bdbd8 sp=0xc0000bdbb0 pc=0x55dc0785b90a github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:150 github.com/ollama/ollama/llama.(*Context).Decode(0xc00068d1a0?, 0x1?) github.com/ollama/ollama/llama/llama.go:150 +0xed fp=0xc0000bdcc0 sp=0xc0000bdbd8 pc=0x55dc0785e6ed github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004e12c0, 0xc000692640, 0xc0004adf28) github.com/ollama/ollama/runner/llamarunner/runner.go:441 +0x209 fp=0xc0000bdee8 sp=0xc0000bdcc0 pc=0x55dc07924f29 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004e12c0, {0x55dc089553e0, 0xc0003a4730}) github.com/ollama/ollama/runner/llamarunner/runner.go:346 +0x1d5 fp=0xc0000bdfb8 sp=0xc0000bdee8 pc=0x55dc07924bb5 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x28 fp=0xc0000bdfe0 sp=0xc0000bdfb8 pc=0x55dc07929908 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000bdfe8 sp=0xc0000bdfe0 pc=0x55dc074b6fa1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x4c5 goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00058f790 sp=0xc00058f770 pc=0x55dc074af86e runtime.netpollblock(0xc00058f7e0?, 0x7448666?, 0xdc?) runtime/netpoll.go:575 +0xf7 fp=0xc00058f7c8 sp=0xc00058f790 pc=0x55dc07474357 internal/poll.runtime_pollWait(0x7f599acaceb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00058f7e8 sp=0xc00058f7c8 pc=0x55dc074aea85 internal/poll.(*pollDesc).wait(0xc000685200?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00058f810 sp=0xc00058f7e8 pc=0x55dc07535ec7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000685200) internal/poll/fd_unix.go:620 +0x295 fp=0xc00058f8b8 sp=0xc00058f810 pc=0x55dc0753b295 net.(*netFD).accept(0xc000685200) net/fd_unix.go:172 +0x29 fp=0xc00058f970 sp=0xc00058f8b8 pc=0x55dc075ae249 net.(*TCPListener).accept(0xc00044d100) net/tcpsock_posix.go:159 +0x1b fp=0xc00058f9c0 sp=0xc00058f970 pc=0x55dc075c3bfb net.(*TCPListener).Accept(0xc00044d100) net/tcpsock.go:380 +0x30 fp=0xc00058f9f0 sp=0xc00058f9c0 pc=0x55dc075c2ab0 net/http.(*onceCloseListener).Accept(0xc0004e83f0?) <autogenerated>:1 +0x24 fp=0xc00058fa08 sp=0xc00058f9f0 pc=0x55dc077da204 net/http.(*Server).Serve(0xc00004f800, {0x55dc08952f28, 0xc00044d100}) net/http/server.go:3424 +0x30c fp=0xc00058fb38 sp=0xc00058fa08 pc=0x55dc077b1acc github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034260, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x8f5 fp=0xc00058fd08 sp=0xc00058fb38 pc=0x55dc07929695 github.com/ollama/ollama/runner.Execute({0xc000034250?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00058fd30 sp=0xc00058fd08 pc=0x55dc079b3c34 github.com/ollama/ollama/cmd.NewCLI.func2(0xc00004f500?, {0x55dc0846e081?, 0x4?, 0x55dc0846e085?}) github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc00058fd58 sp=0xc00058fd30 pc=0x55dc08118ce5 github.com/spf13/cobra.(*Command).execute(0xc0004eaf08, {0xc00044cf40, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00058fe78 sp=0xc00058fd58 pc=0x55dc0762789c github.com/spf13/cobra.(*Command).ExecuteC(0xc0006a1508) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00058ff30 sp=0xc00058fe78 pc=0x55dc076280e5 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00058ff50 sp=0xc00058ff30 pc=0x55dc081197cd runtime.main() runtime/proc.go:283 +0x29d fp=0xc00058ffe0 sp=0xc00058ff50 pc=0x55dc0747b9dd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00058ffe8 sp=0xc00058ffe0 pc=0x55dc074b6fa1 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000aafa8 sp=0xc0000aaf88 pc=0x55dc074af86e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc0000aafe0 sp=0xc0000aafa8 pc=0x55dc0747bd18 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aafe8 sp=0xc0000aafe0 pc=0x55dc074b6fa1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000ab780 sp=0xc0000ab760 pc=0x55dc074af86e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc0000d6000) runtime/mgcsweep.go:316 +0xdf fp=0xc0000ab7c8 sp=0xc0000ab780 pc=0x55dc074664bf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000ab7e0 sp=0xc0000ab7c8 pc=0x55dc0745a8a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ab7e8 sp=0xc0000ab7e0 pc=0x55dc074b6fa1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x55dc08631af8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000abf78 sp=0xc0000abf58 pc=0x55dc074af86e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x55dc091f4f00) runtime/mgcscavenge.go:425 +0x49 fp=0xc0000abfa8 sp=0xc0000abf78 pc=0x55dc07463f09 runtime.bgscavenge(0xc0000d6000) runtime/mgcscavenge.go:658 +0x59 fp=0xc0000abfc8 sp=0xc0000abfa8 pc=0x55dc07464499 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc0000abfe0 sp=0xc0000abfc8 pc=0x55dc0745a845 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000abfe8 sp=0xc0000abfe0 pc=0x55dc074b6fa1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc0000aa688?) runtime/proc.go:435 +0xce fp=0xc0000aa630 sp=0xc0000aa610 pc=0x55dc074af86e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000aa7e0 sp=0xc0000aa630 pc=0x55dc07459867 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aa7e8 sp=0xc0000aa7e0 pc=0x55dc074b6fa1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001fa8c0 m=nil [chan receive]: runtime.gopark(0xc00025f540?, 0xc000118018?, 0x60?, 0xc7?, 0x55dc07594e88?) runtime/proc.go:435 +0xce fp=0xc0000ac718 sp=0xc0000ac6f8 pc=0x55dc074af86e runtime.chanrecv(0xc0000e2310, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc0000ac790 sp=0xc0000ac718 pc=0x55dc0744b245 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000ac7b8 sp=0xc0000ac790 pc=0x55dc0744add2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000ac7e0 sp=0xc0000ac7b8 pc=0x55dc0745da4f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ac7e8 sp=0xc0000ac7e0 pc=0x55dc074b6fa1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001fac40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000acf38 sp=0xc0000acf18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0000acfc8 sp=0xc0000acf38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000acfe0 sp=0xc0000acfc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000acfe8 sp=0xc0000acfe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000a6738 sp=0xc0000a6718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0000a67c8 sp=0xc0000a6738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000a67e0 sp=0xc0000a67c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000502380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000518738 sp=0xc000518718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0005187c8 sp=0xc000518738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005187e0 sp=0xc0005187c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005187e8 sp=0xc0005187e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001fae00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000ad738 sp=0xc0000ad718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0000ad7c8 sp=0xc0000ad738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000ad7e0 sp=0xc0000ad7c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ad7e8 sp=0xc0000ad7e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000514738 sp=0xc000514718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0005147c8 sp=0xc000514738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005147e0 sp=0xc0005147c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005147e8 sp=0xc0005147e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000514f38 sp=0xc000514f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc000514fc8 sp=0xc000514f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000514fe0 sp=0xc000514fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000514fe8 sp=0xc000514fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc000502540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000518f38 sp=0xc000518f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc000518fc8 sp=0xc000518f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000518fe0 sp=0xc000518fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000518fe8 sp=0xc000518fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001fafc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000adf38 sp=0xc0000adf18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0000adfc8 sp=0xc0000adf38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000adfe0 sp=0xc0000adfc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000515738 sp=0xc000515718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0005157c8 sp=0xc000515738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005157e0 sp=0xc0005157c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005157e8 sp=0xc0005157e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 53 gp=0xc000584540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000515f38 sp=0xc000515f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc000515fc8 sp=0xc000515f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000515fe0 sp=0xc000515fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000515fe8 sp=0xc000515fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 54 gp=0xc000584700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000516738 sp=0xc000516718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0005167c8 sp=0xc000516738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005167e0 sp=0xc0005167c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005167e8 sp=0xc0005167e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000502700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000519738 sp=0xc000519718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0005197c8 sp=0xc000519738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005197e0 sp=0xc0005197c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005197e8 sp=0xc0005197e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0001fb180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004aa738 sp=0xc0004aa718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004aa7c8 sp=0xc0004aa738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004aa7e0 sp=0xc0004aa7c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004aa7e8 sp=0xc0004aa7e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 55 gp=0xc0005848c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000516f38 sp=0xc000516f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc000516fc8 sp=0xc000516f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000516fe0 sp=0xc000516fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000516fe8 sp=0xc000516fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc0005028c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000519f38 sp=0xc000519f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc000519fc8 sp=0xc000519f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000519fe0 sp=0xc000519fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000519fe8 sp=0xc000519fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0001fb340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004aaf38 sp=0xc0004aaf18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004aafc8 sp=0xc0004aaf38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004aafe0 sp=0xc0004aafc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004aafe8 sp=0xc0004aafe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 56 gp=0xc000584a80 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000517738 sp=0xc000517718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0005177c8 sp=0xc000517738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005177e0 sp=0xc0005177c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005177e8 sp=0xc0005177e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 38 gp=0xc000502a80 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00051a738 sp=0xc00051a718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc00051a7c8 sp=0xc00051a738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00051a7e0 sp=0xc00051a7c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00051a7e8 sp=0xc00051a7e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 12 gp=0xc0001fb500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004ab738 sp=0xc0004ab718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004ab7c8 sp=0xc0004ab738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004ab7e0 sp=0xc0004ab7c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ab7e8 sp=0xc0004ab7e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 57 gp=0xc000584c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000517f38 sp=0xc000517f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc000517fc8 sp=0xc000517f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000517fe0 sp=0xc000517fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000517fe8 sp=0xc000517fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 39 gp=0xc000502c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00051af38 sp=0xc00051af18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc00051afc8 sp=0xc00051af38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00051afe0 sp=0xc00051afc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00051afe8 sp=0xc00051afe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 40 gp=0xc000502e00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00051b738 sp=0xc00051b718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc00051b7c8 sp=0xc00051b738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00051b7e0 sp=0xc00051b7c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00051b7e8 sp=0xc00051b7e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 13 gp=0xc0001fb6c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004abf38 sp=0xc0004abf18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004abfc8 sp=0xc0004abf38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004abfe0 sp=0xc0004abfc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004abfe8 sp=0xc0004abfe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 58 gp=0xc000584e00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004a6738 sp=0xc0004a6718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004a67c8 sp=0xc0004a6738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004a67e0 sp=0xc0004a67c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a67e8 sp=0xc0004a67e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 41 gp=0xc000502fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00051bf38 sp=0xc00051bf18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc00051bfc8 sp=0xc00051bf38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00051bfe0 sp=0xc00051bfc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00051bfe8 sp=0xc00051bfe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 14 gp=0xc0001fb880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004ac738 sp=0xc0004ac718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004ac7c8 sp=0xc0004ac738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004ac7e0 sp=0xc0004ac7c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ac7e8 sp=0xc0004ac7e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 15 gp=0xc0001fba40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004acf38 sp=0xc0004acf18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004acfc8 sp=0xc0004acf38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004acfe0 sp=0xc0004acfc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004acfe8 sp=0xc0004acfe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 59 gp=0xc000584fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004a6f38 sp=0xc0004a6f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004a6fc8 sp=0xc0004a6f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004a6fe0 sp=0xc0004a6fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a6fe8 sp=0xc0004a6fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 60 gp=0xc000585180 m=nil [GC worker (idle)]: runtime.gopark(0x52663eb4c0ff?, 0x1?, 0x56?, 0xd7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004a7738 sp=0xc0004a7718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004a77c8 sp=0xc0004a7738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004a77e0 sp=0xc0004a77c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a77e8 sp=0xc0004a77e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 61 gp=0xc000585340 m=nil [GC worker (idle)]: runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x5c?, 0xa1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004a7f38 sp=0xc0004a7f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004a7fc8 sp=0xc0004a7f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004a7fe0 sp=0xc0004a7fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a7fe8 sp=0xc0004a7fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 62 gp=0xc000585500 m=nil [GC worker (idle)]: runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x3b?, 0xc0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004a8738 sp=0xc0004a8718 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004a87c8 sp=0xc0004a8738 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004a87e0 sp=0xc0004a87c8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a87e8 sp=0xc0004a87e0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 63 gp=0xc0005856c0 m=nil [GC worker (idle)]: runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x7e?, 0xe3?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0004a8f38 sp=0xc0004a8f18 pc=0x55dc074af86e runtime.gcBgMarkWorker(0xc0000e3730) runtime/mgc.go:1423 +0xe9 fp=0xc0004a8fc8 sp=0xc0004a8f38 pc=0x55dc0745cd69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004a8fe0 sp=0xc0004a8fc8 pc=0x55dc0745cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a8fe8 sp=0xc0004a8fe0 pc=0x55dc074b6fa1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 66 gp=0xc000102c40 m=nil [chan receive]: runtime.gopark(0x55dc074b4fb4?, 0xc0000478d8?, 0xf0?, 0xec?, 0xc0000478c0?) runtime/proc.go:435 +0xce fp=0xc0000478a0 sp=0xc000047880 pc=0x55dc074af86e runtime.chanrecv(0xc0003b3960, 0xc000047a70, 0x1) runtime/chan.go:664 +0x445 fp=0xc000047918 sp=0xc0000478a0 pc=0x55dc0744b245 runtime.chanrecv1(0xc00031e5a0?, 0xc00044d500?) runtime/chan.go:506 +0x12 fp=0xc000047940 sp=0xc000047918 pc=0x55dc0744add2 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0004e12c0, {0x55dc08953108, 0xc00069ed20}, 0xc0004d3e00) github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x697 fp=0xc000047ac0 sp=0xc000047940 pc=0x55dc07927657 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x55dc08953108?, 0xc00069ed20?}, 0xc000047b40?) <autogenerated>:1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x55dc07929c96 net/http.HandlerFunc.ServeHTTP(0xc00053f2c0?, {0x55dc08953108?, 0xc00069ed20?}, 0xc000047b60?) net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x55dc077ae109 net/http.(*ServeMux).ServeHTTP(0x55dc07453d85?, {0x55dc08953108, 0xc00069ed20}, 0xc0004d3e00) net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x55dc077b0004 net/http.serverHandler.ServeHTTP({0x55dc0894f790?}, {0x55dc08953108?, 0xc00069ed20?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x55dc077cda8e net/http.(*conn).serve(0xc0004e83f0, {0x55dc089553a8, 0xc000255e60}) net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x55dc077ac605 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x55dc077b1ec8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x55dc074b6fa1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 76 gp=0xc000102e00 m=nil [IO wait]: runtime.gopark(0x55dc09189600?, 0xc0000a6e38?, 0x38?, 0x6e?, 0xb?) runtime/proc.go:435 +0xce fp=0xc0000a6dd8 sp=0xc0000a6db8 pc=0x55dc074af86e runtime.netpollblock(0x55dc074d2bd8?, 0x7448666?, 0xdc?) runtime/netpoll.go:575 +0xf7 fp=0xc0000a6e10 sp=0xc0000a6dd8 pc=0x55dc07474357 internal/poll.runtime_pollWait(0x7f599acacd98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0000a6e30 sp=0xc0000a6e10 pc=0x55dc074aea85 internal/poll.(*pollDesc).wait(0xc000685280?, 0xc0002ac101?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000a6e58 sp=0xc0000a6e30 pc=0x55dc07535ec7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000685280, {0xc0002ac101, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0000a6ef0 sp=0xc0000a6e58 pc=0x55dc075371ba net.(*netFD).Read(0xc000685280, {0xc0002ac101?, 0xc00044d1d8?, 0xc0000a6f70?}) net/fd_posix.go:55 +0x25 fp=0xc0000a6f38 sp=0xc0000a6ef0 pc=0x55dc075ac2a5 net.(*conn).Read(0xc0000ae920, {0xc0002ac101?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc0000a6f80 sp=0xc0000a6f38 pc=0x55dc075ba665 net/http.(*connReader).backgroundRead(0xc0002ac0f0) net/http/server.go:690 +0x37 fp=0xc0000a6fc8 sp=0xc0000a6f80 pc=0x55dc077a64d7 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0000a6fe0 sp=0xc0000a6fc8 pc=0x55dc077a6405 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0 pc=0x55dc074b6fa1 created by net/http.(*connReader).startBackgroundRead in goroutine 66 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x15 rcx 0x7f59e201ab2c rdx 0x6 rdi 0x15 rsi 0x15 rbp 0x7fffeeab7a70 rsp 0x7fffeeab7a30 r8 0x0 r9 0x7 r10 0x8 r11 0x246 r12 0x6 r13 0x7f5993766ca0 r14 0x16 r15 0x7f59887324a0 rip 0x7f59e201ab2c rflags 0x246 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/08/22 - 08:25:24 | 500 | 3.236966618s | 192.168.127.1 | POST "/api/embed" time=2025-08-22T08:25:24.131Z level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:493 msg="context for request finished" time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 duration=5m0s time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 refCount=0 ``` </p> </details>

GiteaMirror commented

2026-05-04 20:07:44 -05:00

@liangstein commented on GitHub (Aug 26, 2025):

I have updated the error log in this thread. I'm using ollama 0.11.7

@liangstein commented on GitHub (Aug 26, 2025): I have updated the error log in this thread. I'm using ollama 0.11.7

GiteaMirror commented

2026-05-04 20:07:45 -05:00

@really-hzy commented on GitHub (Sep 2, 2025):

+1, It has existed since 0.11.5 to 0.11.8. There is no problem with cuda. It will inevitably appear when running based on the CPU.

@really-hzy commented on GitHub (Sep 2, 2025): +1, It has existed since 0.11.5 to 0.11.8. There is no problem with cuda. It will inevitably appear when running based on the CPU.

GiteaMirror commented

2026-05-04 20:07:46 -05:00

@really-hzy commented on GitHub (Sep 5, 2025):

time=2025-09-05T14:32:27.553+08:00 level=INFO source=routes.go:1331 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\huangzy\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-09-05T14:32:27.581+08:00 level=INFO source=images.go:477 msg="total blobs: 221"
time=2025-09-05T14:32:27.590+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-09-05T14:32:27.597+08:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)"
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=24 efficiency=16 threads=32
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:512 msg="Searching for GPU library" name=nvml.dll
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:536 msg="gpu library search" globs="[C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\nvml.dll E:\Program Files\Microsoft Visual Studio\2022\Community\VC\Redist\MSVC\14.42.34433\x64\Microsoft.VC143.CRT\nvml.dll E:\VulkanSDK\1.4.321.1\Bin\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp\nvml.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll C:\Program Files\YunShu\utils\nvml.dll C:\Windows\system32\nvml.dll C:\Windows\nvml.dll C:\Windows\System32\Wbem\nvml.dll C:\Windows\System32\WindowsPowerShell\v1.0\nvml.dll C:\Windows\System32\OpenSSH\nvml.dll E:\Program Files\CMake\bin\nvml.dll C:\Users\huangzy\.local\bin\nvml.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvml.dll C:\Users\huangzy\.dotnet\tools\nvml.dll C:\Users\huangzy\AppData\Roaming\npm\nvml.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll C:\ProgramData\chocolatey\bin\nvml.dll E:\Program Files\Git\cmd\nvml.dll C:\Program Files\Go\bin\nvml.dll C:\TDM-GCC-64\bin\nvml.dll C:\Users\huangzy\AppData\Local\nvm\nvml.dll C:\nvm4w\nodejs\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp\nvml.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2025.1.0\nvml.dll C:\Program Files\NVIDIA Corporation\NVIDIA app\NvDLISR\nvml.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvml.dll C:\Program Files\Docker\Docker\resources\bin\nvml.dll E:\software\cwrsync_6.4.4_x64_free\bin\nvml.dll C:\Program Files\dotnet\nvml.dll E:\Windows Kits\10\Windows Performance Toolkit\nvml.dll E:\Program Files\TortoiseGit\bin\nvml.dll E:\software\iMyFone Nut Studio\.nodejs\nvml.dll C:\Users\huangzy\.local\bin\nvml.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvml.dll C:\Users\huangzy\.dotnet\tools\nvml.dll E:\Users\huangzy\AppData\Local\Programs\Microsoft VS Code\bin\nvml.dll E:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll C:\Users\huangzy\go\bin\nvml.dll C:\Users\huangzy\AppData\Local\Programs\Ollama\nvml.dll C:\Users\huangzy\.lmstudio\bin\nvml.dll C:\Users\huangzy\AppData\Local\nvm\nvml.dll C:\nvm4w\nodejs\nvml.dll C:\Users\huangzy\.dotnet\tools\nvml.dll C:\Users\huangzy\AppData\Roaming\npm\nvml.dll c:\Windows\System32\nvml.dll]"
time=2025-09-05T14:32:27.599+08:00 level=DEBUG source=gpu.go:540 msg="skipping PhysX cuda library path" path="C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvml.dll"
time=2025-09-05T14:32:27.599+08:00 level=DEBUG source=gpu.go:569 msg="discovered GPU libraries" paths="[C:\Windows\system32\nvml.dll c:\Windows\System32\nvml.dll]"
time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:111 msg="nvidia-ml loaded" library=C:\Windows\system32\nvml.dll
time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:512 msg="Searching for GPU library" name=nvcuda.dll
time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:536 msg="gpu library search" globs="[C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\nvcuda.dll E:\Program Files\Microsoft Visual Studio\2022\Community\VC\Redist\MSVC\14.42.34433\x64\Microsoft.VC143.CRT\nvcuda.dll E:\VulkanSDK\1.4.321.1\Bin\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp\nvcuda.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll C:\Program Files\YunShu\utils\nvcuda.dll C:\Windows\system32\nvcuda.dll C:\Windows\nvcuda.dll C:\Windows\System32\Wbem\nvcuda.dll C:\Windows\System32\WindowsPowerShell\v1.0\nvcuda.dll C:\Windows\System32\OpenSSH\nvcuda.dll E:\Program Files\CMake\bin\nvcuda.dll C:\Users\huangzy\.local\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvcuda.dll C:\Users\huangzy\.dotnet\tools\nvcuda.dll C:\Users\huangzy\AppData\Roaming\npm\nvcuda.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll C:\ProgramData\chocolatey\bin\nvcuda.dll E:\Program Files\Git\cmd\nvcuda.dll C:\Program Files\Go\bin\nvcuda.dll C:\TDM-GCC-64\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\nvm\nvcuda.dll C:\nvm4w\nodejs\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp\nvcuda.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2025.1.0\nvcuda.dll C:\Program Files\NVIDIA Corporation\NVIDIA app\NvDLISR\nvcuda.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvcuda.dll C:\Program Files\Docker\Docker\resources\bin\nvcuda.dll E:\software\cwrsync_6.4.4_x64_free\bin\nvcuda.dll C:\Program Files\dotnet\nvcuda.dll E:\Windows Kits\10\Windows Performance Toolkit\nvcuda.dll E:\Program Files\TortoiseGit\bin\nvcuda.dll E:\software\iMyFone Nut Studio\.nodejs\nvcuda.dll C:\Users\huangzy\.local\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvcuda.dll C:\Users\huangzy\.dotnet\tools\nvcuda.dll E:\Users\huangzy\AppData\Local\Programs\Microsoft VS Code\bin\nvcuda.dll E:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll C:\Users\huangzy\go\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\Programs\Ollama\nvcuda.dll C:\Users\huangzy\.lmstudio\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\nvm\nvcuda.dll C:\nvm4w\nodejs\nvcuda.dll C:\Users\huangzy\.dotnet\tools\nvcuda.dll C:\Users\huangzy\AppData\Roaming\npm\nvcuda.dll c:\windows\system\nvcuda.dll]"
time=2025-09-05T14:32:27.611+08:00 level=DEBUG source=gpu.go:540 msg="skipping PhysX cuda library path" path="C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvcuda.dll"
time=2025-09-05T14:32:27.612+08:00 level=DEBUG source=gpu.go:569 msg="discovered GPU libraries" paths=[C:\Windows\system32\nvcuda.dll]
initializing C:\Windows\system32\nvcuda.dll
dlsym: cuInit - 00007FF81CDB5F80
dlsym: cuDriverGetVersion - 00007FF81CDB6020
dlsym: cuDeviceGetCount - 00007FF81CDB6816
dlsym: cuDeviceGet - 00007FF81CDB6810
dlsym: cuDeviceGetAttribute - 00007FF81CDB6170
dlsym: cuDeviceGetUuid - 00007FF81CDB6822
dlsym: cuDeviceGetName - 00007FF81CDB681C
dlsym: cuCtxCreate_v3 - 00007FF81CDB6894
dlsym: cuMemGetInfo_v2 - 00007FF81CDB6996
dlsym: cuCtxDestroy - 00007FF81CDB68A6
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-09-05T14:32:27.624+08:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\Windows\system32\nvcuda.dll
[GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] CUDA totalMem 24563mb
[GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] CUDA freeMem 23042mb
[GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] Compute Capability 8.9
time=2025-09-05T14:32:27.730+08:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found."
releasing cuda driver library
releasing nvml library
time=2025-09-05T14:32:27.731+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 D" total="24.0 GiB" available="22.5 GiB"
time=2025-09-05T14:32:43.612+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.6 GiB" before.free_swap="26.8 GiB" now.total="63.8 GiB" now.free="37.5 GiB" now.free_swap="26.6 GiB"
time=2025-09-05T14:32:43.634+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="22.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB"
releasing nvml library
time=2025-09-05T14:32:43.636+08:00 level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-09-05T14:32:43.650+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=general.alignment default=32
time=2025-09-05T14:32:43.650+08:00 level=DEBUG source=sched.go:208 msg="loading first model" model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b
llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B
llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding
llama_model_loader: - kv 4: general.size_label str = 4B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base
llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-...
llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme...
llama_model_loader: - kv 11: qwen3.block_count u32 = 36
llama_model_loader: - kv 12: qwen3.context_length u32 = 40960
llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560
llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 9728
llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32
llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8
llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128
llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128
llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true
llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 34: general.quantization_version u32 = 2
llama_model_loader: - kv 35: general.file_type u32 = 15
llama_model_loader: - type f32: 145 tensors
llama_model_loader: - type q4_K: 216 tensors
llama_model_loader: - type q6_K: 37 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 2.32 GiB (4.95 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen3
print_info: vocab_only = 1
print_info: model type = ?B
print_info: model params = 4.02 B
print_info: general.name = Qwen3 Embedding 4B
print_info: vocab type = BPE
print_info: n_vocab = 151665
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151643 '<|endoftext|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-09-05T14:32:43.906+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.5 GiB" before.free_swap="26.6 GiB" now.total="63.8 GiB" now.free="37.4 GiB" now.free_swap="26.6 GiB"
time=2025-09-05T14:32:43.925+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="21.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB"
releasing nvml library
time=2025-09-05T14:32:43.935+08:00 level=INFO source=server.go:398 msg="starting runner" cmd="C:\Users\huangzy\AppData\Local\Programs\Ollama\ollama.exe runner --model C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b --port 53597"
time=2025-09-05T14:32:43.936+08:00 level=DEBUG source=server.go:399 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8" CUDA_PATH_V11_3="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3" CUDA_PATH_V12_8="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8" OLLAMA_DEBUG=2 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_ORIGINS=* PATH="C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama;E:\Program Files\Microsoft Visual Studio\2022\Community\VC\Redist\MSVC\14.42.34433\x64\Microsoft.VC143.CRT;E:\VulkanSDK\1.4.321.1\Bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp;e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;C:\Program Files\YunShu\utils;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;E:\Program Files\CMake\bin;C:\Users\huangzy\.local\bin;C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps;C:\Users\huangzy\.dotnet\tools;C:\Users\huangzy\AppData\Roaming\npm;e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;C:\ProgramData\chocolatey\bin;E:\Program Files\Git\cmd;C:\Program Files\Go\bin;C:\TDM-GCC-64\bin;C:\Users\huangzy\AppData\Local\nvm;C:\nvm4w\nodejs;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Program Files\NVIDIA Corporation\Nsight Compute 2025.1.0\;C:\Program Files\NVIDIA Corporation\NVIDIA app\NvDLISR;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Docker\Docker\resources\bin;E:\software\cwrsync_6.4.4_x64_free\bin;C:\Program Files\dotnet\;E:\Windows Kits\10\Windows Performance Toolkit\;E:\Program Files\TortoiseGit\bin;E:\software\iMyFone Nut Studio\.nodejs\;C:\Users\huangzy\.local\bin;C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps;C:\Users\huangzy\.dotnet\tools;E:\Users\huangzy\AppData\Local\Programs\Microsoft VS Code\bin;E:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;C:\Users\huangzy\go\bin;C:\Users\huangzy\AppData\Local\Programs\Ollama;C:\Users\huangzy\.lmstudio\bin;C:\Users\huangzy\AppData\Local\nvm;C:\nvm4w\nodejs;C:\Users\huangzy\.dotnet\tools;C:\Users\huangzy\AppData\Roaming\npm;C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama" OLLAMA_LIBRARY_PATH=C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama
time=2025-09-05T14:32:43.938+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.4 GiB" before.free_swap="26.6 GiB" now.total="63.8 GiB" now.free="37.4 GiB" now.free_swap="26.6 GiB"
time=2025-09-05T14:32:43.955+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="21.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB"
releasing nvml library
time=2025-09-05T14:32:43.956+08:00 level=INFO source=server.go:503 msg="system memory" total="63.8 GiB" free="37.4 GiB" free_swap="26.6 GiB"
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=memory.go:181 msg=evaluating library=cuda gpu_count=1 available="[21.5 GiB]"
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=qwen3.vision.block_count default=0
time=2025-09-05T14:32:43.957+08:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b library=cuda parallel=1 required="3.8 GiB" gpus=1
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=memory.go:181 msg=evaluating library=cuda gpu_count=1 available="[21.5 GiB]"
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=qwen3.vision.block_count default=0
time=2025-09-05T14:32:43.957+08:00 level=INFO source=server.go:543 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[21.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.8 GiB" memory.required.partial="3.8 GiB" memory.required.kv="576.0 MiB" memory.required.allocations="[3.8 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="2.0 GiB" memory.weights.nonrepeating="303.8 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB"
time=2025-09-05T14:32:43.975+08:00 level=INFO source=runner.go:864 msg="starting go runner"
time=2025-09-05T14:32:43.980+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2025-09-05T14:32:43.993+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-09-05T14:32:43.994+08:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:53597"
time=2025-09-05T14:32:44.003+08:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-09-05T14:32:44.004+08:00 level=INFO source=server.go:1250 msg="waiting for llama runner to start responding"
time=2025-09-05T14:32:44.004+08:00 level=INFO source=server.go:1284 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B
llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding
llama_model_loader: - kv 4: general.size_label str = 4B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base
llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-...
llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme...
llama_model_loader: - kv 11: qwen3.block_count u32 = 36
llama_model_loader: - kv 12: qwen3.context_length u32 = 40960
llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560
llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 9728
llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32
llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8
llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128
llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128
llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true
llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 34: general.quantization_version u32 = 2
llama_model_loader: - kv 35: general.file_type u32 = 15
llama_model_loader: - type f32: 145 tensors
llama_model_loader: - type q4_K: 216 tensors
llama_model_loader: - type q6_K: 37 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 2.32 GiB (4.95 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen3
print_info: vocab_only = 0
print_info: n_ctx_train = 40960
print_info: n_embd = 2560
print_info: n_layer = 36
print_info: n_head = 32
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 4
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 9728
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 3
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 40960
print_info: rope_finetuned = unknown
print_info: model type = 4B
print_info: model params = 4.02 B
print_info: general.name = Qwen3 Embedding 4B
print_info: vocab type = BPE
print_info: n_vocab = 151665
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151643 '<|endoftext|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: layer 0 assigned to device CPU, is_swa = 0
load_tensors: layer 1 assigned to device CPU, is_swa = 0
load_tensors: layer 2 assigned to device CPU, is_swa = 0
load_tensors: layer 3 assigned to device CPU, is_swa = 0
load_tensors: layer 4 assigned to device CPU, is_swa = 0
load_tensors: layer 5 assigned to device CPU, is_swa = 0
load_tensors: layer 6 assigned to device CPU, is_swa = 0
load_tensors: layer 7 assigned to device CPU, is_swa = 0
load_tensors: layer 8 assigned to device CPU, is_swa = 0
load_tensors: layer 9 assigned to device CPU, is_swa = 0
load_tensors: layer 10 assigned to device CPU, is_swa = 0
load_tensors: layer 11 assigned to device CPU, is_swa = 0
load_tensors: layer 12 assigned to device CPU, is_swa = 0
load_tensors: layer 13 assigned to device CPU, is_swa = 0
load_tensors: layer 14 assigned to device CPU, is_swa = 0
load_tensors: layer 15 assigned to device CPU, is_swa = 0
load_tensors: layer 16 assigned to device CPU, is_swa = 0
load_tensors: layer 17 assigned to device CPU, is_swa = 0
load_tensors: layer 18 assigned to device CPU, is_swa = 0
load_tensors: layer 19 assigned to device CPU, is_swa = 0
load_tensors: layer 20 assigned to device CPU, is_swa = 0
load_tensors: layer 21 assigned to device CPU, is_swa = 0
load_tensors: layer 22 assigned to device CPU, is_swa = 0
load_tensors: layer 23 assigned to device CPU, is_swa = 0
load_tensors: layer 24 assigned to device CPU, is_swa = 0
load_tensors: layer 25 assigned to device CPU, is_swa = 0
load_tensors: layer 26 assigned to device CPU, is_swa = 0
load_tensors: layer 27 assigned to device CPU, is_swa = 0
load_tensors: layer 28 assigned to device CPU, is_swa = 0
load_tensors: layer 29 assigned to device CPU, is_swa = 0
load_tensors: layer 30 assigned to device CPU, is_swa = 0
load_tensors: layer 31 assigned to device CPU, is_swa = 0
load_tensors: layer 32 assigned to device CPU, is_swa = 0
load_tensors: layer 33 assigned to device CPU, is_swa = 0
load_tensors: layer 34 assigned to device CPU, is_swa = 0
load_tensors: layer 35 assigned to device CPU, is_swa = 0
load_tensors: layer 36 assigned to device CPU, is_swa = 0
load_tensors: CPU model buffer size = 2375.37 MiB
load_all_data: no device found for buffer type CPU for async uploads
time=2025-09-05T14:32:44.256+08:00 level=DEBUG source=server.go:1294 msg="model load progress 0.21"
time=2025-09-05T14:32:44.508+08:00 level=DEBUG source=server.go:1294 msg="model load progress 0.64"
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
set_abort_callback: call
llama_context: CPU output buffer size = 0.59 MiB
create_memory: n_ctx = 4096 (padded)
llama_kv_cache_unified: layer 0: dev = CPU
llama_kv_cache_unified: layer 1: dev = CPU
llama_kv_cache_unified: layer 2: dev = CPU
llama_kv_cache_unified: layer 3: dev = CPU
llama_kv_cache_unified: layer 4: dev = CPU
llama_kv_cache_unified: layer 5: dev = CPU
llama_kv_cache_unified: layer 6: dev = CPU
llama_kv_cache_unified: layer 7: dev = CPU
llama_kv_cache_unified: layer 8: dev = CPU
llama_kv_cache_unified: layer 9: dev = CPU
llama_kv_cache_unified: layer 10: dev = CPU
llama_kv_cache_unified: layer 11: dev = CPU
llama_kv_cache_unified: layer 12: dev = CPU
llama_kv_cache_unified: layer 13: dev = CPU
llama_kv_cache_unified: layer 14: dev = CPU
llama_kv_cache_unified: layer 15: dev = CPU
llama_kv_cache_unified: layer 16: dev = CPU
llama_kv_cache_unified: layer 17: dev = CPU
llama_kv_cache_unified: layer 18: dev = CPU
llama_kv_cache_unified: layer 19: dev = CPU
llama_kv_cache_unified: layer 20: dev = CPU
llama_kv_cache_unified: layer 21: dev = CPU
llama_kv_cache_unified: layer 22: dev = CPU
llama_kv_cache_unified: layer 23: dev = CPU
llama_kv_cache_unified: layer 24: dev = CPU
llama_kv_cache_unified: layer 25: dev = CPU
llama_kv_cache_unified: layer 26: dev = CPU
llama_kv_cache_unified: layer 27: dev = CPU
llama_kv_cache_unified: layer 28: dev = CPU
llama_kv_cache_unified: layer 29: dev = CPU
llama_kv_cache_unified: layer 30: dev = CPU
llama_kv_cache_unified: layer 31: dev = CPU
llama_kv_cache_unified: layer 32: dev = CPU
llama_kv_cache_unified: layer 33: dev = CPU
llama_kv_cache_unified: layer 34: dev = CPU
llama_kv_cache_unified: layer 35: dev = CPU
llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB
time=2025-09-05T14:32:44.759+08:00 level=DEBUG source=server.go:1294 msg="model load progress 1.00"
llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 1
llama_context: max_nodes = 3184
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context: CPU compute buffer size = 308.23 MiB
llama_context: graph nodes = 1411
llama_context: graph splits = 1
time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1288 msg="llama runner started in 1.07 seconds"
time=2025-09-05T14:32:45.010+08:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1250 msg="waiting for llama runner to start responding"
time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1288 msg="llama runner started in 1.08 seconds"
time=2025-09-05T14:32:45.010+08:00 level=DEBUG source=sched.go:485 msg="finished setting up" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096
time=2025-09-05T14:32:45.026+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=general.alignment default=32
time=2025-09-05T14:32:45.027+08:00 level=TRACE source=server.go:1554 msg="embedding request" input="Why is the sky blue?"
time=2025-09-05T14:32:45.027+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=7 used=0 remaining=7
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
[GIN] 2025/09/05 - 14:32:45 | 500 | 1.6630946s | 127.0.0.1 | POST "/api/embed"
time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:493 msg="context for request finished"
time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 duration=5m0s
time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 refCount=0
time=2025-09-05T14:32:45.273+08:00 level=ERROR source=server.go:424 msg="llama runner terminated" error="exit status 0xc0000409"

full flog with disabled gpu

@really-hzy commented on GitHub (Sep 5, 2025): time=2025-09-05T14:32:27.553+08:00 level=INFO source=routes.go:1331 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\huangzy\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-09-05T14:32:27.581+08:00 level=INFO source=images.go:477 msg="total blobs: 221" time=2025-09-05T14:32:27.590+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-09-05T14:32:27.597+08:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)" time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1 time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=24 efficiency=16 threads=32 time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:512 msg="Searching for GPU library" name=nvml.dll time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:536 msg="gpu library search" globs="[C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll E:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Redist\\MSVC\\14.42.34433\\x64\\Microsoft.VC143.CRT\\nvml.dll E:\\VulkanSDK\\1.4.321.1\\Bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\libnvvp\\nvml.dll e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\Program Files\\YunShu\\utils\\nvml.dll C:\\Windows\\system32\\nvml.dll C:\\Windows\\nvml.dll C:\\Windows\\System32\\Wbem\\nvml.dll C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\Windows\\System32\\OpenSSH\\nvml.dll E:\\Program Files\\CMake\\bin\\nvml.dll C:\\Users\\huangzy\\.local\\bin\\nvml.dll C:\\Users\\huangzy\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\huangzy\\.dotnet\\tools\\nvml.dll C:\\Users\\huangzy\\AppData\\Roaming\\npm\\nvml.dll e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\ProgramData\\chocolatey\\bin\\nvml.dll E:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\Go\\bin\\nvml.dll C:\\TDM-GCC-64\\bin\\nvml.dll C:\\Users\\huangzy\\AppData\\Local\\nvm\\nvml.dll C:\\nvm4w\\nodejs\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3\\bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3\\libnvvp\\nvml.dll C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2025.1.0\\nvml.dll C:\\Program Files\\NVIDIA Corporation\\NVIDIA app\\NvDLISR\\nvml.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvml.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvml.dll E:\\software\\cwrsync_6.4.4_x64_free\\bin\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll E:\\Windows Kits\\10\\Windows Performance Toolkit\\nvml.dll E:\\Program Files\\TortoiseGit\\bin\\nvml.dll E:\\software\\iMyFone Nut Studio\\.nodejs\\nvml.dll C:\\Users\\huangzy\\.local\\bin\\nvml.dll C:\\Users\\huangzy\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\huangzy\\.dotnet\\tools\\nvml.dll E:\\Users\\huangzy\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvml.dll E:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\Users\\huangzy\\go\\bin\\nvml.dll C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\huangzy\\.lmstudio\\bin\\nvml.dll C:\\Users\\huangzy\\AppData\\Local\\nvm\\nvml.dll C:\\nvm4w\\nodejs\\nvml.dll C:\\Users\\huangzy\\.dotnet\\tools\\nvml.dll C:\\Users\\huangzy\\AppData\\Roaming\\npm\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-09-05T14:32:27.599+08:00 level=DEBUG source=gpu.go:540 msg="skipping PhysX cuda library path" path="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvml.dll" time=2025-09-05T14:32:27.599+08:00 level=DEBUG source=gpu.go:569 msg="discovered GPU libraries" paths="[C:\\Windows\\system32\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:111 msg="nvidia-ml loaded" library=C:\Windows\system32\nvml.dll time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:512 msg="Searching for GPU library" name=nvcuda.dll time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:536 msg="gpu library search" globs="[C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll E:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Redist\\MSVC\\14.42.34433\\x64\\Microsoft.VC143.CRT\\nvcuda.dll E:\\VulkanSDK\\1.4.321.1\\Bin\\nvcuda.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin\\nvcuda.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\libnvvp\\nvcuda.dll e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\Program Files\\YunShu\\utils\\nvcuda.dll C:\\Windows\\system32\\nvcuda.dll C:\\Windows\\nvcuda.dll C:\\Windows\\System32\\Wbem\\nvcuda.dll C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\Windows\\System32\\OpenSSH\\nvcuda.dll E:\\Program Files\\CMake\\bin\\nvcuda.dll C:\\Users\\huangzy\\.local\\bin\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\huangzy\\.dotnet\\tools\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Roaming\\npm\\nvcuda.dll e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\ProgramData\\chocolatey\\bin\\nvcuda.dll E:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\Go\\bin\\nvcuda.dll C:\\TDM-GCC-64\\bin\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Local\\nvm\\nvcuda.dll C:\\nvm4w\\nodejs\\nvcuda.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3\\bin\\nvcuda.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3\\libnvvp\\nvcuda.dll C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2025.1.0\\nvcuda.dll C:\\Program Files\\NVIDIA Corporation\\NVIDIA app\\NvDLISR\\nvcuda.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvcuda.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvcuda.dll E:\\software\\cwrsync_6.4.4_x64_free\\bin\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll E:\\Windows Kits\\10\\Windows Performance Toolkit\\nvcuda.dll E:\\Program Files\\TortoiseGit\\bin\\nvcuda.dll E:\\software\\iMyFone Nut Studio\\.nodejs\\nvcuda.dll C:\\Users\\huangzy\\.local\\bin\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\huangzy\\.dotnet\\tools\\nvcuda.dll E:\\Users\\huangzy\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvcuda.dll E:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\Users\\huangzy\\go\\bin\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\huangzy\\.lmstudio\\bin\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Local\\nvm\\nvcuda.dll C:\\nvm4w\\nodejs\\nvcuda.dll C:\\Users\\huangzy\\.dotnet\\tools\\nvcuda.dll C:\\Users\\huangzy\\AppData\\Roaming\\npm\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]" time=2025-09-05T14:32:27.611+08:00 level=DEBUG source=gpu.go:540 msg="skipping PhysX cuda library path" path="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvcuda.dll" time=2025-09-05T14:32:27.612+08:00 level=DEBUG source=gpu.go:569 msg="discovered GPU libraries" paths=[C:\Windows\system32\nvcuda.dll] initializing C:\Windows\system32\nvcuda.dll dlsym: cuInit - 00007FF81CDB5F80 dlsym: cuDriverGetVersion - 00007FF81CDB6020 dlsym: cuDeviceGetCount - 00007FF81CDB6816 dlsym: cuDeviceGet - 00007FF81CDB6810 dlsym: cuDeviceGetAttribute - 00007FF81CDB6170 dlsym: cuDeviceGetUuid - 00007FF81CDB6822 dlsym: cuDeviceGetName - 00007FF81CDB681C dlsym: cuCtxCreate_v3 - 00007FF81CDB6894 dlsym: cuMemGetInfo_v2 - 00007FF81CDB6996 dlsym: cuCtxDestroy - 00007FF81CDB68A6 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-09-05T14:32:27.624+08:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\Windows\system32\nvcuda.dll [GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] CUDA totalMem 24563mb [GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] CUDA freeMem 23042mb [GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] Compute Capability 8.9 time=2025-09-05T14:32:27.730+08:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found." releasing cuda driver library releasing nvml library time=2025-09-05T14:32:27.731+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 D" total="24.0 GiB" available="22.5 GiB" time=2025-09-05T14:32:43.612+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.6 GiB" before.free_swap="26.8 GiB" now.total="63.8 GiB" now.free="37.5 GiB" now.free_swap="26.6 GiB" time=2025-09-05T14:32:43.634+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="22.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB" releasing nvml library time=2025-09-05T14:32:43.636+08:00 level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-09-05T14:32:43.650+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=general.alignment default=32 time=2025-09-05T14:32:43.650+08:00 level=DEBUG source=sched.go:208 msg="loading first model" model=C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 4B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 9728 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 15 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q4_K: 216 tensors llama_model_loader: - type q6_K: 37 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 2.32 GiB (4.95 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 4.02 B print_info: general.name = Qwen3 Embedding 4B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-09-05T14:32:43.906+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.5 GiB" before.free_swap="26.6 GiB" now.total="63.8 GiB" now.free="37.4 GiB" now.free_swap="26.6 GiB" time=2025-09-05T14:32:43.925+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="21.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB" releasing nvml library time=2025-09-05T14:32:43.935+08:00 level=INFO source=server.go:398 msg="starting runner" cmd="C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\huangzy\\.ollama\\models\\blobs\\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b --port 53597" time=2025-09-05T14:32:43.936+08:00 level=DEBUG source=server.go:399 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8" CUDA_PATH_V11_3="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3" CUDA_PATH_V12_8="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8" OLLAMA_DEBUG=2 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_ORIGINS=* PATH="C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;E:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Redist\\MSVC\\14.42.34433\\x64\\Microsoft.VC143.CRT;E:\\VulkanSDK\\1.4.321.1\\Bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\libnvvp;e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Program Files\\YunShu\\utils;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;E:\\Program Files\\CMake\\bin;C:\\Users\\huangzy\\.local\\bin;C:\\Users\\huangzy\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\huangzy\\.dotnet\\tools;C:\\Users\\huangzy\\AppData\\Roaming\\npm;e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;e:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\ProgramData\\chocolatey\\bin;E:\\Program Files\\Git\\cmd;C:\\Program Files\\Go\\bin;C:\\TDM-GCC-64\\bin;C:\\Users\\huangzy\\AppData\\Local\\nvm;C:\\nvm4w\\nodejs;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.3\\libnvvp;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2025.1.0\\;C:\\Program Files\\NVIDIA Corporation\\NVIDIA app\\NvDLISR;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\Docker\\Docker\\resources\\bin;E:\\software\\cwrsync_6.4.4_x64_free\\bin;C:\\Program Files\\dotnet\\;E:\\Windows Kits\\10\\Windows Performance Toolkit\\;E:\\Program Files\\TortoiseGit\\bin;E:\\software\\iMyFone Nut Studio\\.nodejs\\;C:\\Users\\huangzy\\.local\\bin;C:\\Users\\huangzy\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\huangzy\\.dotnet\\tools;E:\\Users\\huangzy\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;E:\\Users\\huangzy\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Users\\huangzy\\go\\bin;C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama;C:\\Users\\huangzy\\.lmstudio\\bin;C:\\Users\\huangzy\\AppData\\Local\\nvm;C:\\nvm4w\\nodejs;C:\\Users\\huangzy\\.dotnet\\tools;C:\\Users\\huangzy\\AppData\\Roaming\\npm;C:\\Users\\huangzy\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama time=2025-09-05T14:32:43.938+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.4 GiB" before.free_swap="26.6 GiB" now.total="63.8 GiB" now.free="37.4 GiB" now.free_swap="26.6 GiB" time=2025-09-05T14:32:43.955+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="21.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB" releasing nvml library time=2025-09-05T14:32:43.956+08:00 level=INFO source=server.go:503 msg="system memory" total="63.8 GiB" free="37.4 GiB" free_swap="26.6 GiB" time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=memory.go:181 msg=evaluating library=cuda gpu_count=1 available="[21.5 GiB]" time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-09-05T14:32:43.957+08:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b library=cuda parallel=1 required="3.8 GiB" gpus=1 time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=memory.go:181 msg=evaluating library=cuda gpu_count=1 available="[21.5 GiB]" time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-09-05T14:32:43.957+08:00 level=INFO source=server.go:543 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[21.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.8 GiB" memory.required.partial="3.8 GiB" memory.required.kv="576.0 MiB" memory.required.allocations="[3.8 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="2.0 GiB" memory.weights.nonrepeating="303.8 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB" time=2025-09-05T14:32:43.975+08:00 level=INFO source=runner.go:864 msg="starting go runner" time=2025-09-05T14:32:43.980+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2025-09-05T14:32:43.993+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-09-05T14:32:43.994+08:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:53597" time=2025-09-05T14:32:44.003+08:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-09-05T14:32:44.004+08:00 level=INFO source=server.go:1250 msg="waiting for llama runner to start responding" time=2025-09-05T14:32:44.004+08:00 level=INFO source=server.go:1284 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 4B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 9728 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 15 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q4_K: 216 tensors llama_model_loader: - type q6_K: 37 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 2.32 GiB (4.95 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 0 print_info: n_ctx_train = 40960 print_info: n_embd = 2560 print_info: n_layer = 36 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 9728 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 3 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 40960 print_info: rope_finetuned = unknown print_info: model type = 4B print_info: model params = 4.02 B print_info: general.name = Qwen3 Embedding 4B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: layer 0 assigned to device CPU, is_swa = 0 load_tensors: layer 1 assigned to device CPU, is_swa = 0 load_tensors: layer 2 assigned to device CPU, is_swa = 0 load_tensors: layer 3 assigned to device CPU, is_swa = 0 load_tensors: layer 4 assigned to device CPU, is_swa = 0 load_tensors: layer 5 assigned to device CPU, is_swa = 0 load_tensors: layer 6 assigned to device CPU, is_swa = 0 load_tensors: layer 7 assigned to device CPU, is_swa = 0 load_tensors: layer 8 assigned to device CPU, is_swa = 0 load_tensors: layer 9 assigned to device CPU, is_swa = 0 load_tensors: layer 10 assigned to device CPU, is_swa = 0 load_tensors: layer 11 assigned to device CPU, is_swa = 0 load_tensors: layer 12 assigned to device CPU, is_swa = 0 load_tensors: layer 13 assigned to device CPU, is_swa = 0 load_tensors: layer 14 assigned to device CPU, is_swa = 0 load_tensors: layer 15 assigned to device CPU, is_swa = 0 load_tensors: layer 16 assigned to device CPU, is_swa = 0 load_tensors: layer 17 assigned to device CPU, is_swa = 0 load_tensors: layer 18 assigned to device CPU, is_swa = 0 load_tensors: layer 19 assigned to device CPU, is_swa = 0 load_tensors: layer 20 assigned to device CPU, is_swa = 0 load_tensors: layer 21 assigned to device CPU, is_swa = 0 load_tensors: layer 22 assigned to device CPU, is_swa = 0 load_tensors: layer 23 assigned to device CPU, is_swa = 0 load_tensors: layer 24 assigned to device CPU, is_swa = 0 load_tensors: layer 25 assigned to device CPU, is_swa = 0 load_tensors: layer 26 assigned to device CPU, is_swa = 0 load_tensors: layer 27 assigned to device CPU, is_swa = 0 load_tensors: layer 28 assigned to device CPU, is_swa = 0 load_tensors: layer 29 assigned to device CPU, is_swa = 0 load_tensors: layer 30 assigned to device CPU, is_swa = 0 load_tensors: layer 31 assigned to device CPU, is_swa = 0 load_tensors: layer 32 assigned to device CPU, is_swa = 0 load_tensors: layer 33 assigned to device CPU, is_swa = 0 load_tensors: layer 34 assigned to device CPU, is_swa = 0 load_tensors: layer 35 assigned to device CPU, is_swa = 0 load_tensors: layer 36 assigned to device CPU, is_swa = 0 load_tensors: CPU model buffer size = 2375.37 MiB load_all_data: no device found for buffer type CPU for async uploads time=2025-09-05T14:32:44.256+08:00 level=DEBUG source=server.go:1294 msg="model load progress 0.21" time=2025-09-05T14:32:44.508+08:00 level=DEBUG source=server.go:1294 msg="model load progress 0.64" llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized set_abort_callback: call llama_context: CPU output buffer size = 0.59 MiB create_memory: n_ctx = 4096 (padded) llama_kv_cache_unified: layer 0: dev = CPU llama_kv_cache_unified: layer 1: dev = CPU llama_kv_cache_unified: layer 2: dev = CPU llama_kv_cache_unified: layer 3: dev = CPU llama_kv_cache_unified: layer 4: dev = CPU llama_kv_cache_unified: layer 5: dev = CPU llama_kv_cache_unified: layer 6: dev = CPU llama_kv_cache_unified: layer 7: dev = CPU llama_kv_cache_unified: layer 8: dev = CPU llama_kv_cache_unified: layer 9: dev = CPU llama_kv_cache_unified: layer 10: dev = CPU llama_kv_cache_unified: layer 11: dev = CPU llama_kv_cache_unified: layer 12: dev = CPU llama_kv_cache_unified: layer 13: dev = CPU llama_kv_cache_unified: layer 14: dev = CPU llama_kv_cache_unified: layer 15: dev = CPU llama_kv_cache_unified: layer 16: dev = CPU llama_kv_cache_unified: layer 17: dev = CPU llama_kv_cache_unified: layer 18: dev = CPU llama_kv_cache_unified: layer 19: dev = CPU llama_kv_cache_unified: layer 20: dev = CPU llama_kv_cache_unified: layer 21: dev = CPU llama_kv_cache_unified: layer 22: dev = CPU llama_kv_cache_unified: layer 23: dev = CPU llama_kv_cache_unified: layer 24: dev = CPU llama_kv_cache_unified: layer 25: dev = CPU llama_kv_cache_unified: layer 26: dev = CPU llama_kv_cache_unified: layer 27: dev = CPU llama_kv_cache_unified: layer 28: dev = CPU llama_kv_cache_unified: layer 29: dev = CPU llama_kv_cache_unified: layer 30: dev = CPU llama_kv_cache_unified: layer 31: dev = CPU llama_kv_cache_unified: layer 32: dev = CPU llama_kv_cache_unified: layer 33: dev = CPU llama_kv_cache_unified: layer 34: dev = CPU llama_kv_cache_unified: layer 35: dev = CPU llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB time=2025-09-05T14:32:44.759+08:00 level=DEBUG source=server.go:1294 msg="model load progress 1.00" llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 1 llama_context: max_nodes = 3184 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: CPU compute buffer size = 308.23 MiB llama_context: graph nodes = 1411 llama_context: graph splits = 1 time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1288 msg="llama runner started in 1.07 seconds" time=2025-09-05T14:32:45.010+08:00 level=INFO source=sched.go:473 msg="loaded runners" count=1 time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1250 msg="waiting for llama runner to start responding" time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1288 msg="llama runner started in 1.08 seconds" time=2025-09-05T14:32:45.010+08:00 level=DEBUG source=sched.go:485 msg="finished setting up" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 time=2025-09-05T14:32:45.026+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=general.alignment default=32 time=2025-09-05T14:32:45.027+08:00 level=TRACE source=server.go:1554 msg="embedding request" input="Why is the sky blue?" time=2025-09-05T14:32:45.027+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=7 used=0 remaining=7 C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed [GIN] 2025/09/05 - 14:32:45 | 500 | 1.6630946s | 127.0.0.1 | POST "/api/embed" time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:493 msg="context for request finished" time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 duration=5m0s time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 refCount=0 time=2025-09-05T14:32:45.273+08:00 level=ERROR source=server.go:424 msg="llama runner terminated" error="exit status 0xc0000409" full flog with disabled gpu

GiteaMirror commented

2026-05-04 20:07:47 -05:00

@Muzixin commented on GitHub (Sep 12, 2025):

the same question

@Muzixin commented on GitHub (Sep 12, 2025): the same question

GiteaMirror commented

2026-05-04 20:07:48 -05:00

@pdevine commented on GitHub (Sep 17, 2025):

#12301 implements qwen3-embedding on the Ollama engine (instead of the legacy llama.cpp engine).

@pdevine commented on GitHub (Sep 17, 2025): #12301 implements qwen3-embedding on the Ollama engine (instead of the legacy llama.cpp engine).

GiteaMirror commented

2026-05-04 20:07:49 -05:00

@ubaldus commented on GitHub (Oct 13, 2025):

Hi, the error is still present in version 0.12.5 when using the CPU only.

llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 32768
llama_context: n_ctx_per_seq = 32768
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = disabled
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: CPU output buffer size = 0.58 MiB
llama_kv_cache: CPU KV buffer size = 3584.00 MiB
llama_kv_cache: size = 3584.00 MiB ( 32768 cells, 28 layers, 1/1 seqs), K (f16): 1792.00 MiB, V (f16): 1792.00 MiB
llama_context: CPU compute buffer size = 1104.01 MiB
llama_context: graph nodes = 1127
llama_context: graph splits = 1
ops.cpp:4663: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed

@ubaldus commented on GitHub (Oct 13, 2025): Hi, the error is still present in version 0.12.5 when using the CPU only. llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 32768 llama_context: n_ctx_per_seq = 32768 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: CPU output buffer size = 0.58 MiB llama_kv_cache: CPU KV buffer size = 3584.00 MiB llama_kv_cache: size = 3584.00 MiB ( 32768 cells, 28 layers, 1/1 seqs), K (f16): 1792.00 MiB, V (f16): 1792.00 MiB llama_context: CPU compute buffer size = 1104.01 MiB llama_context: graph nodes = 1127 llama_context: graph splits = 1 ops.cpp:4663: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed

GiteaMirror commented

2026-05-04 20:07:51 -05:00

@pdevine commented on GitHub (Oct 13, 2025):

OK, I can confirm it's happening on CPU w/ Windows. It works fine w/ GPU w/ CUDA on Windows, and is fine in both CPU/GPU w/ macOS.

@pdevine commented on GitHub (Oct 13, 2025): OK, I can confirm it's happening on CPU w/ Windows. It works fine w/ GPU w/ CUDA on Windows, and is fine in both CPU/GPU w/ macOS.

GiteaMirror commented

2026-05-04 20:07:52 -05:00

@rick-github commented on GitHub (Oct 13, 2025):

Fails on Linux/CPU too.

@rick-github commented on GitHub (Oct 13, 2025): Fails on Linux/CPU too.

GiteaMirror commented

2026-05-04 20:07:53 -05:00

@pdevine commented on GitHub (Oct 13, 2025):

My guess is x86 vs arm. There is a bump of GGML coming, so I'll test with that and see if it was fixed upstream.

@pdevine commented on GitHub (Oct 13, 2025): My guess is x86 vs arm. There is a bump of GGML coming, so I'll test with that and see if it was fixed upstream.

GiteaMirror commented

2026-05-04 20:07:54 -05:00

@kitche1985 commented on GitHub (Oct 15, 2025):

fails on macOS GPU metal too

@kitche1985 commented on GitHub (Oct 15, 2025): fails on macOS GPU metal too

GiteaMirror commented

2026-05-04 20:07:55 -05:00

@pdevine commented on GitHub (Oct 15, 2025):

@kitche1985 I've tried reproducing it on metal both on the GPU and CPU and not seen any issues. What are you seeing?

@pdevine commented on GitHub (Oct 15, 2025): @kitche1985 I've tried reproducing it on metal both on the GPU and CPU and not seen any issues. What are you seeing?

GiteaMirror commented

2026-05-04 20:07:59 -05:00

@rasheduzzaman-brur commented on GitHub (Oct 16, 2025):

I am facing the same issues in updated version [ERROR] Error processing rashed.txt: Error raised by inference API HTTP code: 500, {"error":"do embedding request: Post "http://127.0.0.1:40825/embedding": EOF"}

@rasheduzzaman-brur commented on GitHub (Oct 16, 2025): I am facing the same issues in updated version [ERROR] Error processing rashed.txt: Error raised by inference API HTTP code: 500, {"error":"do embedding request: Post \"http://127.0.0.1:40825/embedding\": EOF"}

GiteaMirror commented

2026-05-04 20:08:05 -05:00

@jmorganca commented on GitHub (Oct 27, 2025):

Hi all, this should be fixed now as of 0.12.6. Let me know if you're still seeing the issue

@jmorganca commented on GitHub (Oct 27, 2025): Hi all, this should be fixed now as of 0.12.6. Let me know if you're still seeing the issue

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#70038