[GH-ISSUE #12058] MMAP failed - Failed to load model #8008

Closed
opened 2026-04-12 20:12:55 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @LaCocoRoco on GitHub (Aug 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12058

What is the issue?

For several weeks, I have been unable to use Qwen models. I reinstalled Ollama, reverting to a January version, suspecting an issue with the software. I removed and re-pulled the models, but consistently receive the following error:

ollama@tensor:~$ ollama list
NAME                 ID              SIZE      MODIFIED       
qwen2.5-coder:7b     dae161e27b0e    4.7 GB    3 minutes ago     
qwen2.5-coder:32b    b92d6a0bd47e    19 GB     11 minutes ago    
gemma3:27b           a418f5838eaf    17 GB     3 hours ago       
gpt-oss:20b          aa4295ac10c3    13 GB     3 hours ago       
deepseek-r1:32b      38056bbcbb2d    19 GB     3 hours ago       
gemma3:12b           f4031aab637d    8.1 GB    3 hours ago       
ollama@tensor:~$ ollama run gemma3:27b
>>> /bye
ollama@tensor:~$ ollama run qwen2.5-coder:7b
Error: llama runner process has terminated: error loading model: mmap failed: No such device
llama_model_load_from_file_impl: failed to load model

As demonstrated, models like Gemma3:27b function correctly.

Relevant log output

Aug 24 09:36:08 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 24 09:36:08 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device
Aug 24 09:36:08 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model
Aug 24 09:36:08 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba
Aug 24 09:36:08 tensor ollama[5671]: goroutine 54 [running]:
Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0002f6500, {0x25, 0x0, 0x1, {0xc0001cd208, 0x1, 0x1}, 0xc000502cd0, 0x0}, {0x7ffe0e254d54, ...}, ...)
Aug 24 09:36:08 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f
Aug 24 09:36:08 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51
Aug 24 09:36:08 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.357+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.397+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model"
Aug 24 09:36:08 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:08 | 500 |  769.338871ms |       127.0.0.1 | POST     "/api/generate"
Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 |   19.020204ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 |      27.749µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 09:37:21 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:21 | 200 |      56.493µs |       127.0.0.1 | GET      "/api/version"
Aug 24 09:37:42 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:42 | 200 |   4.36063473s |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:37:43 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:43 | 200 |  448.384271ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:37:44 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:44 | 200 |  1.045604379s |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:38:30 tensor ollama[5671]: [GIN] 2025/08/24 - 09:38:30 | 200 |  3.194981361s |       127.0.0.1 | POST     "/api/chat"

OS

Ubuntu 24.04.3 LTS

GPU

RTX3090

CPU

12th Gen Intel(R) Core(TM) i7-12700

Ollama version

0.11.6

Originally created by @LaCocoRoco on GitHub (Aug 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12058 ### What is the issue? For several weeks, I have been unable to use Qwen models. I reinstalled Ollama, reverting to a January version, suspecting an issue with the software. I removed and re-pulled the models, but consistently receive the following error: ``` ollama@tensor:~$ ollama list NAME ID SIZE MODIFIED qwen2.5-coder:7b dae161e27b0e 4.7 GB 3 minutes ago qwen2.5-coder:32b b92d6a0bd47e 19 GB 11 minutes ago gemma3:27b a418f5838eaf 17 GB 3 hours ago gpt-oss:20b aa4295ac10c3 13 GB 3 hours ago deepseek-r1:32b 38056bbcbb2d 19 GB 3 hours ago gemma3:12b f4031aab637d 8.1 GB 3 hours ago ollama@tensor:~$ ollama run gemma3:27b >>> /bye ollama@tensor:~$ ollama run qwen2.5-coder:7b Error: llama runner process has terminated: error loading model: mmap failed: No such device llama_model_load_from_file_impl: failed to load model ``` As demonstrated, models like Gemma3:27b function correctly. ### Relevant log output ```shell Aug 24 09:36:08 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 24 09:36:08 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device Aug 24 09:36:08 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model Aug 24 09:36:08 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba Aug 24 09:36:08 tensor ollama[5671]: goroutine 54 [running]: Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0002f6500, {0x25, 0x0, 0x1, {0xc0001cd208, 0x1, 0x1}, 0xc000502cd0, 0x0}, {0x7ffe0e254d54, ...}, ...) Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f Aug 24 09:36:08 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51 Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.357+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.397+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model" Aug 24 09:36:08 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:08 | 500 | 769.338871ms | 127.0.0.1 | POST "/api/generate" Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 | 19.020204ms | 127.0.0.1 | GET "/api/tags" Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 | 27.749µs | 127.0.0.1 | GET "/api/ps" Aug 24 09:37:21 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:21 | 200 | 56.493µs | 127.0.0.1 | GET "/api/version" Aug 24 09:37:42 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:42 | 200 | 4.36063473s | 127.0.0.1 | POST "/api/chat" Aug 24 09:37:43 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:43 | 200 | 448.384271ms | 127.0.0.1 | POST "/api/chat" Aug 24 09:37:44 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:44 | 200 | 1.045604379s | 127.0.0.1 | POST "/api/chat" Aug 24 09:38:30 tensor ollama[5671]: [GIN] 2025/08/24 - 09:38:30 | 200 | 3.194981361s | 127.0.0.1 | POST "/api/chat" ``` ### OS Ubuntu 24.04.3 LTS ### GPU RTX3090 ### CPU 12th Gen Intel(R) Core(TM) i7-12700 ### Ollama version 0.11.6
GiteaMirror added the bug label 2026-04-12 20:12:55 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 24, 2025):

A full log may provide more context.

What's the output of

ls -l /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba
sha256sum /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba
<!-- gh-comment-id:3217968209 --> @rick-github commented on GitHub (Aug 24, 2025): A full log may provide more context. What's the output of ``` ls -l /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba sha256sum /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba ```
Author
Owner

@LaCocoRoco commented on GitHub (Aug 24, 2025):

ls -l /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba

-rw-r--r-- 1 ollama openai 1929903072 Aug 24 09:36 /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba

sha256sum /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba

4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba

Aug 23 07:37:25 tensor systemd[1]: Started ollama.service - Ollama Service.
Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.578+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.660+02:00 level=INFO source=images.go:477 msg="total blobs: 24"
Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.677+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.700+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)"
Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.703+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.993+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB"
Aug 23 07:37:45 tensor ollama[1276]: [GIN] 2025/08/23 - 07:37:45 | 200 |   25.073075ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 07:37:45 tensor ollama[1276]: [GIN] 2025/08/23 - 07:37:45 | 200 |     736.727µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 07:37:46 tensor ollama[1276]: [GIN] 2025/08/23 - 07:37:46 | 200 |       48.98µs |       127.0.0.1 | GET      "/api/version"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.006+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 33291"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.019+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.019+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:33291"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.051+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.8 GiB" free_swap="8.0 GiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.052+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.053+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.055+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.120+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37
Aug 23 07:55:28 tensor ollama[1276]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 07:55:28 tensor ollama[1276]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 07:55:28 tensor ollama[1276]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 07:55:28 tensor ollama[1276]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 07:55:28 tensor ollama[1276]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 07:55:28 tensor ollama[1276]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.342+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 07:55:49 tensor ollama[1276]: time=2025-08-23T07:55:49.955+02:00 level=INFO source=server.go:1272 msg="llama runner started in 21.95 seconds"
Aug 23 07:55:53 tensor ollama[1276]: [GIN] 2025/08/23 - 07:55:53 | 200 | 26.117570848s |       127.0.0.1 | POST     "/api/chat"
Aug 23 07:55:54 tensor ollama[1276]: [GIN] 2025/08/23 - 07:55:54 | 200 |  645.325173ms |       127.0.0.1 | POST     "/api/chat"
Aug 23 07:55:55 tensor ollama[1276]: [GIN] 2025/08/23 - 07:55:55 | 200 |  1.332291122s |       127.0.0.1 | POST     "/api/chat"
Aug 23 07:57:29 tensor ollama[1276]: [GIN] 2025/08/23 - 07:57:29 | 200 | 11.295562813s |       127.0.0.1 | POST     "/api/chat"
Aug 23 07:58:33 tensor ollama[1276]: [GIN] 2025/08/23 - 07:58:33 | 200 |  3.546104655s |       127.0.0.1 | POST     "/api/chat"
Aug 23 07:59:49 tensor ollama[1276]: [GIN] 2025/08/23 - 07:59:49 | 200 |  3.972724828s |       127.0.0.1 | POST     "/api/chat"
Aug 23 08:01:11 tensor ollama[1276]: [GIN] 2025/08/23 - 08:01:11 | 200 |  3.705968817s |       127.0.0.1 | POST     "/api/chat"
Aug 23 08:12:29 tensor ollama[1276]: [GIN] 2025/08/23 - 08:12:29 | 200 |   25.267944ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 08:12:29 tensor ollama[1276]: [GIN] 2025/08/23 - 08:12:29 | 200 |      53.196µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 08:23:35 tensor systemd[1]: Stopping ollama.service - Ollama Service...
Aug 23 08:23:35 tensor systemd[1]: ollama.service: Deactivated successfully.
Aug 23 08:23:35 tensor systemd[1]: Stopped ollama.service - Ollama Service.
Aug 23 08:23:35 tensor systemd[1]: ollama.service: Consumed 46.189s CPU time, 1.8G memory peak, 0B memory swap peak.
Aug 23 08:23:35 tensor systemd[1]: Started ollama.service - Ollama Service.
Aug 23 08:23:35 tensor ollama[3460]: time=2025-08-23T08:23:35.876+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Aug 23 08:23:35 tensor systemd[1]: Stopping ollama.service - Ollama Service...
Aug 23 08:23:35 tensor systemd[1]: ollama.service: Deactivated successfully.
Aug 23 08:23:35 tensor systemd[1]: Stopped ollama.service - Ollama Service.
Aug 23 08:23:35 tensor systemd[1]: Started ollama.service - Ollama Service.
Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.902+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.918+02:00 level=INFO source=images.go:477 msg="total blobs: 24"
Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.925+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.933+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)"
Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.933+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Aug 23 08:23:36 tensor ollama[3475]: time=2025-08-23T08:23:36.115+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.2 GiB"
Aug 23 09:20:11 tensor ollama[3475]: [GIN] 2025/08/23 - 09:20:11 | 200 |   12.128463ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 09:20:11 tensor ollama[3475]: [GIN] 2025/08/23 - 09:20:11 | 200 |      73.607µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 09:33:02 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:02 | 200 |    6.800455ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 09:33:02 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:02 | 200 |      28.446µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 09:33:02 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:02 | 200 |      63.496µs |       127.0.0.1 | GET      "/api/version"
Aug 23 09:33:03 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:03 | 200 |    6.310948ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 09:33:03 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:03 | 200 |       14.95µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 09:36:45 tensor ollama[3475]: [GIN] 2025/08/23 - 09:36:45 | 200 |      21.781µs |       127.0.0.1 | HEAD     "/"
Aug 23 09:36:45 tensor ollama[3475]: [GIN] 2025/08/23 - 09:36:45 | 200 |  798.861016ms |       127.0.0.1 | POST     "/api/pull"
Aug 23 09:50:27 tensor ollama[3475]: [GIN] 2025/08/23 - 09:50:27 | 200 |      22.983µs |       127.0.0.1 | HEAD     "/"
Aug 23 09:50:28 tensor ollama[3475]: [GIN] 2025/08/23 - 09:50:28 | 200 |  815.541413ms |       127.0.0.1 | POST     "/api/pull"
Aug 23 10:01:55 tensor systemd[1]: Stopping ollama.service - Ollama Service...
Aug 23 10:01:55 tensor systemd[1]: ollama.service: Deactivated successfully.
Aug 23 10:01:55 tensor systemd[1]: Stopped ollama.service - Ollama Service.
Aug 23 10:01:55 tensor systemd[1]: Started ollama.service - Ollama Service.
Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.185+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.198+02:00 level=INFO source=images.go:477 msg="total blobs: 24"
Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.206+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.212+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)"
Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.212+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.283+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.2 GiB"
Aug 23 10:02:16 tensor systemd[1]: Stopping ollama.service - Ollama Service...
Aug 23 10:02:16 tensor systemd[1]: ollama.service: Deactivated successfully.
Aug 23 10:02:16 tensor systemd[1]: Stopped ollama.service - Ollama Service.
Aug 23 10:02:16 tensor systemd[1]: Started ollama.service - Ollama Service.
Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.800+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.818+02:00 level=INFO source=images.go:477 msg="total blobs: 24"
Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.824+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.830+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)"
Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.830+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.911+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.2 GiB"
Aug 23 10:03:46 tensor ollama[5671]: [GIN] 2025/08/23 - 10:03:46 | 200 |       34.61µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:03:47 tensor ollama[5671]: time=2025-08-23T10:03:47.106+02:00 level=INFO source=download.go:177 msg="downloading ac3d1ba8aa77 in 20 1 GB part(s)"
Aug 23 10:08:02 tensor ollama[5671]: time=2025-08-23T10:08:02.524+02:00 level=INFO source=download.go:177 msg="downloading 832dd9e00a68 in 1 11 KB part(s)"
Aug 23 10:08:03 tensor ollama[5671]: time=2025-08-23T10:08:03.866+02:00 level=INFO source=download.go:177 msg="downloading f0676bd3c336 in 1 488 B part(s)"
Aug 23 10:08:38 tensor ollama[5671]: [GIN] 2025/08/23 - 10:08:38 | 200 |         4m51s |       127.0.0.1 | POST     "/api/pull"
Aug 23 10:10:27 tensor ollama[5671]: [GIN] 2025/08/23 - 10:10:27 | 200 |      36.552µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:10:27 tensor ollama[5671]: [GIN] 2025/08/23 - 10:10:27 | 200 |   55.457418ms |       127.0.0.1 | POST     "/api/show"
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 (version GGUF V3 (latest))
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 32B Instruct
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 32B
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = apache-2.0
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 32B
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  14:                          qwen2.block_count u32              = 64
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 5120
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 27648
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 40
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 8
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  22:                          general.file_type u32              = 15
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  33:               general.quantization_version u32              = 2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type  f32:  321 tensors
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q4_K:  385 tensors
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q6_K:   65 tensors
Aug 23 10:10:28 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 23 10:10:28 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 23 10:10:28 tensor ollama[5671]: print_info: file size   = 18.48 GiB (4.85 BPW)
Aug 23 10:10:28 tensor ollama[5671]: load: printing all EOG tokens:
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 23 10:10:28 tensor ollama[5671]: load: special tokens cache size = 22
Aug 23 10:10:28 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 23 10:10:28 tensor ollama[5671]: print_info: arch             = qwen2
Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab_only       = 1
Aug 23 10:10:28 tensor ollama[5671]: print_info: model type       = ?B
Aug 23 10:10:28 tensor ollama[5671]: print_info: model params     = 32.76 B
Aug 23 10:10:28 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 32B Instruct
Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_vocab          = 152064
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 23 10:10:28 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: max token length = 256
Aug 23 10:10:28 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.388+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 --port 45561"
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.402+02:00 level=INFO source=runner.go:864 msg="starting go runner"
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.437+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.5 GiB" free_swap="8.0 GiB"
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.438+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 library=cuda parallel=1 required="20.2 GiB" gpus=1
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.438+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=[65] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.2 GiB" memory.required.partial="20.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[20.2 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="348.0 MiB" memory.graph.partial="916.1 MiB"
Aug 23 10:10:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 10:10:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 10:10:28 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 10:10:28 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 10:10:28 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 10:10:28 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.478+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.478+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:45561"
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.481+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:65[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:65(0..64)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
Aug 23 10:10:28 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23734 MiB free
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.517+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.517+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 (version GGUF V3 (latest))
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 32B Instruct
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 32B
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = apache-2.0
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 32B
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  14:                          qwen2.block_count u32              = 64
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 5120
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 27648
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 40
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 8
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  22:                          general.file_type u32              = 15
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv  33:               general.quantization_version u32              = 2
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type  f32:  321 tensors
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q4_K:  385 tensors
Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q6_K:   65 tensors
Aug 23 10:10:28 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 23 10:10:28 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 23 10:10:28 tensor ollama[5671]: print_info: file size   = 18.48 GiB (4.85 BPW)
Aug 23 10:10:28 tensor ollama[5671]: load: printing all EOG tokens:
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 23 10:10:28 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 23 10:10:28 tensor ollama[5671]: load: special tokens cache size = 22
Aug 23 10:10:28 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 23 10:10:28 tensor ollama[5671]: print_info: arch             = qwen2
Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab_only       = 0
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_ctx_train      = 32768
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd           = 5120
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_layer          = 64
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_head           = 40
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_head_kv        = 8
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_rot            = 128
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_swa            = 0
Aug 23 10:10:28 tensor ollama[5671]: print_info: is_swa_any       = 0
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_head_k    = 128
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_head_v    = 128
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_gqa            = 5
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_k_gqa     = 1024
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_v_gqa     = 1024
Aug 23 10:10:28 tensor ollama[5671]: print_info: f_norm_eps       = 0.0e+00
Aug 23 10:10:28 tensor ollama[5671]: print_info: f_norm_rms_eps   = 1.0e-06
Aug 23 10:10:28 tensor ollama[5671]: print_info: f_clamp_kqv      = 0.0e+00
Aug 23 10:10:28 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00
Aug 23 10:10:28 tensor ollama[5671]: print_info: f_logit_scale    = 0.0e+00
Aug 23 10:10:28 tensor ollama[5671]: print_info: f_attn_scale     = 0.0e+00
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_ff             = 27648
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_expert         = 0
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_expert_used    = 0
Aug 23 10:10:28 tensor ollama[5671]: print_info: causal attn      = 1
Aug 23 10:10:28 tensor ollama[5671]: print_info: pooling type     = -1
Aug 23 10:10:28 tensor ollama[5671]: print_info: rope type        = 2
Aug 23 10:10:28 tensor ollama[5671]: print_info: rope scaling     = linear
Aug 23 10:10:28 tensor ollama[5671]: print_info: freq_base_train  = 1000000.0
Aug 23 10:10:28 tensor ollama[5671]: print_info: freq_scale_train = 1
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_ctx_orig_yarn  = 32768
Aug 23 10:10:28 tensor ollama[5671]: print_info: rope_finetuned   = unknown
Aug 23 10:10:28 tensor ollama[5671]: print_info: model type       = 32B
Aug 23 10:10:28 tensor ollama[5671]: print_info: model params     = 32.76 B
Aug 23 10:10:28 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 32B Instruct
Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_vocab          = 152064
Aug 23 10:10:28 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 23 10:10:28 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 23 10:10:28 tensor ollama[5671]: print_info: max token length = 256
Aug 23 10:10:28 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 23 10:10:28 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device
Aug 23 10:10:28 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model
Aug 23 10:10:28 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9
Aug 23 10:10:28 tensor ollama[5671]: goroutine 14 [running]:
Aug 23 10:10:28 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc00047b2c0, {0x41, 0x0, 0x1, {0xc0003ac0d8, 0x1, 0x1}, 0xc0005a34d0, 0x0}, {0x7ffd22a8bd54, ...}, ...)
Aug 23 10:10:28 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f
Aug 23 10:10:28 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 12
Aug 23 10:10:28 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.743+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.768+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model"
Aug 23 10:10:28 tensor ollama[5671]: [GIN] 2025/08/23 - 10:10:28 | 500 |  803.675254ms |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:15:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:15:13 | 200 |     119.388µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:15:14 tensor ollama[5671]: time=2025-08-23T10:15:14.086+02:00 level=INFO source=download.go:177 msg="downloading 60e05f210007 in 16 292 MB part(s)"
Aug 23 10:16:16 tensor ollama[5671]: time=2025-08-23T10:16:16.415+02:00 level=INFO source=download.go:177 msg="downloading d9bb33f27869 in 1 487 B part(s)"
Aug 23 10:16:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:16:23 | 200 |         1m10s |       127.0.0.1 | POST     "/api/pull"
Aug 23 10:17:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:23 | 200 |      39.642µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:17:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:23 | 200 |   30.982614ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 10:17:41 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:41 | 200 |      29.386µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:17:41 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:41 | 200 |   14.797788ms |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:17:42 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:42 | 200 |   66.935873ms |       127.0.0.1 | DELETE   "/api/delete"
Aug 23 10:17:53 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:53 | 200 |      26.657µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:17:53 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:53 | 200 |   12.036165ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 10:18:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:13 | 200 |      19.668µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:18:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:13 | 200 |   23.803691ms |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:18:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:13 | 200 |   49.584618ms |       127.0.0.1 | DELETE   "/api/delete"
Aug 23 10:18:17 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:17 | 200 |      20.163µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:18:17 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:17 | 200 |   10.459848ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 10:18:38 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:38 | 200 |      27.028µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:18:38 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:38 | 200 |   63.247975ms |       127.0.0.1 | POST     "/api/show"
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 7B
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = apache-2.0
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  22:                          general.file_type u32              = 15
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv  33:               general.quantization_version u32              = 2
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - type  f32:  141 tensors
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - type q4_K:  169 tensors
Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - type q6_K:   29 tensors
Aug 23 10:18:38 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 23 10:18:38 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 23 10:18:38 tensor ollama[5671]: print_info: file size   = 4.36 GiB (4.91 BPW)
Aug 23 10:18:38 tensor ollama[5671]: load: printing all EOG tokens:
Aug 23 10:18:38 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 23 10:18:38 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 23 10:18:38 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 23 10:18:38 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 23 10:18:38 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 23 10:18:38 tensor ollama[5671]: load: special tokens cache size = 22
Aug 23 10:18:38 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 23 10:18:38 tensor ollama[5671]: print_info: arch             = qwen2
Aug 23 10:18:38 tensor ollama[5671]: print_info: vocab_only       = 1
Aug 23 10:18:38 tensor ollama[5671]: print_info: model type       = ?B
Aug 23 10:18:38 tensor ollama[5671]: print_info: model params     = 7.62 B
Aug 23 10:18:38 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 7B Instruct
Aug 23 10:18:38 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 23 10:18:38 tensor ollama[5671]: print_info: n_vocab          = 152064
Aug 23 10:18:38 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 23 10:18:38 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 23 10:18:38 tensor ollama[5671]: print_info: max token length = 256
Aug 23 10:18:38 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors
Aug 23 10:18:38 tensor ollama[5671]: time=2025-08-23T10:18:38.959+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 33097"
Aug 23 10:18:38 tensor ollama[5671]: time=2025-08-23T10:18:38.969+02:00 level=INFO source=runner.go:864 msg="starting go runner"
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.006+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.5 GiB" free_swap="8.0 GiB"
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.006+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=cuda parallel=1 required="5.2 GiB" gpus=1
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.006+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split=[29] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.2 GiB" memory.required.partial="5.2 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[5.2 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="730.4 MiB"
Aug 23 10:18:39 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 10:18:39 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 10:18:39 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 10:18:39 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 10:18:39 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 10:18:39 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.023+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.023+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:33097"
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.028+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:29[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
Aug 23 10:18:39 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23734 MiB free
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.058+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.059+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 7B
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = apache-2.0
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  22:                          general.file_type u32              = 15
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv  33:               general.quantization_version u32              = 2
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - type  f32:  141 tensors
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - type q4_K:  169 tensors
Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - type q6_K:   29 tensors
Aug 23 10:18:39 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 23 10:18:39 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 23 10:18:39 tensor ollama[5671]: print_info: file size   = 4.36 GiB (4.91 BPW)
Aug 23 10:18:39 tensor ollama[5671]: load: printing all EOG tokens:
Aug 23 10:18:39 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 23 10:18:39 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 23 10:18:39 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 23 10:18:39 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 23 10:18:39 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 23 10:18:39 tensor ollama[5671]: load: special tokens cache size = 22
Aug 23 10:18:39 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 23 10:18:39 tensor ollama[5671]: print_info: arch             = qwen2
Aug 23 10:18:39 tensor ollama[5671]: print_info: vocab_only       = 0
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_ctx_train      = 32768
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd           = 3584
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_layer          = 28
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_head           = 28
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_head_kv        = 4
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_rot            = 128
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_swa            = 0
Aug 23 10:18:39 tensor ollama[5671]: print_info: is_swa_any       = 0
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_head_k    = 128
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_head_v    = 128
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_gqa            = 7
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_k_gqa     = 512
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_v_gqa     = 512
Aug 23 10:18:39 tensor ollama[5671]: print_info: f_norm_eps       = 0.0e+00
Aug 23 10:18:39 tensor ollama[5671]: print_info: f_norm_rms_eps   = 1.0e-06
Aug 23 10:18:39 tensor ollama[5671]: print_info: f_clamp_kqv      = 0.0e+00
Aug 23 10:18:39 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00
Aug 23 10:18:39 tensor ollama[5671]: print_info: f_logit_scale    = 0.0e+00
Aug 23 10:18:39 tensor ollama[5671]: print_info: f_attn_scale     = 0.0e+00
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_ff             = 18944
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_expert         = 0
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_expert_used    = 0
Aug 23 10:18:39 tensor ollama[5671]: print_info: causal attn      = 1
Aug 23 10:18:39 tensor ollama[5671]: print_info: pooling type     = -1
Aug 23 10:18:39 tensor ollama[5671]: print_info: rope type        = 2
Aug 23 10:18:39 tensor ollama[5671]: print_info: rope scaling     = linear
Aug 23 10:18:39 tensor ollama[5671]: print_info: freq_base_train  = 1000000.0
Aug 23 10:18:39 tensor ollama[5671]: print_info: freq_scale_train = 1
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_ctx_orig_yarn  = 32768
Aug 23 10:18:39 tensor ollama[5671]: print_info: rope_finetuned   = unknown
Aug 23 10:18:39 tensor ollama[5671]: print_info: model type       = 7B
Aug 23 10:18:39 tensor ollama[5671]: print_info: model params     = 7.62 B
Aug 23 10:18:39 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 7B Instruct
Aug 23 10:18:39 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_vocab          = 152064
Aug 23 10:18:39 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 23 10:18:39 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 23 10:18:39 tensor ollama[5671]: print_info: max token length = 256
Aug 23 10:18:39 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 23 10:18:39 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device
Aug 23 10:18:39 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model
Aug 23 10:18:39 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463
Aug 23 10:18:39 tensor ollama[5671]: goroutine 54 [running]:
Aug 23 10:18:39 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc00047c500, {0x1d, 0x0, 0x1, {0xc0001cd228, 0x1, 0x1}, 0xc000042ab0, 0x0}, {0x7ffc51cf7d54, ...}, ...)
Aug 23 10:18:39 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f
Aug 23 10:18:39 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51
Aug 23 10:18:39 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.258+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.309+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model"
Aug 23 10:18:39 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:39 | 500 |  778.796128ms |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:18:49 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:49 | 200 |      23.243µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:18:49 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:49 | 404 |    7.871792ms |       127.0.0.1 | POST     "/api/show"
Aug 23 10:18:49 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:49 | 200 |  471.111409ms |       127.0.0.1 | POST     "/api/pull"
Aug 23 10:18:59 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:59 | 200 |      23.642µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:18:59 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:59 | 200 |  124.849451ms |       127.0.0.1 | POST     "/api/show"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.603+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 39627"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.612+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.612+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:39627"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.648+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.5 GiB" free_swap="8.0 GiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.650+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.651+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.651+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.707+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37
Aug 23 10:18:59 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 10:18:59 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 10:18:59 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 10:18:59 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 10:18:59 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 10:18:59 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.747+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.981+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 10:19:22 tensor ollama[5671]: time=2025-08-23T10:19:22.049+02:00 level=INFO source=server.go:1272 msg="llama runner started in 22.45 seconds"
Aug 23 10:19:22 tensor ollama[5671]: [GIN] 2025/08/23 - 10:19:22 | 200 | 22.846337214s |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:20:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:13 | 200 |       23.01µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:20:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:13 | 200 |   21.898271ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 10:20:19 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:19 | 200 |       21.75µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:20:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:20 | 200 |    109.8799ms |       127.0.0.1 | POST     "/api/show"
Aug 23 10:20:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:20 | 200 |  145.193826ms |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:20:28 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:28 | 200 |      24.753µs |       127.0.0.1 | HEAD     "/"
Aug 23 10:20:28 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:28 | 200 |   60.812606ms |       127.0.0.1 | POST     "/api/show"
Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.405+02:00 level=INFO source=sched.go:540 msg="updated VRAM based on existing loaded models" gpu=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda total="23.7 GiB" available="4.4 GiB"
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 7B
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = apache-2.0
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  22:                          general.file_type u32              = 15
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv  33:               general.quantization_version u32              = 2
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - type  f32:  141 tensors
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - type q4_K:  169 tensors
Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - type q6_K:   29 tensors
Aug 23 10:20:28 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 23 10:20:28 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 23 10:20:28 tensor ollama[5671]: print_info: file size   = 4.36 GiB (4.91 BPW)
Aug 23 10:20:28 tensor ollama[5671]: load: printing all EOG tokens:
Aug 23 10:20:28 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 23 10:20:28 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 23 10:20:28 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 23 10:20:28 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 23 10:20:28 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 23 10:20:28 tensor ollama[5671]: load: special tokens cache size = 22
Aug 23 10:20:28 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 23 10:20:28 tensor ollama[5671]: print_info: arch             = qwen2
Aug 23 10:20:28 tensor ollama[5671]: print_info: vocab_only       = 1
Aug 23 10:20:28 tensor ollama[5671]: print_info: model type       = ?B
Aug 23 10:20:28 tensor ollama[5671]: print_info: model params     = 7.62 B
Aug 23 10:20:28 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 7B Instruct
Aug 23 10:20:28 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 23 10:20:28 tensor ollama[5671]: print_info: n_vocab          = 152064
Aug 23 10:20:28 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 23 10:20:28 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 23 10:20:28 tensor ollama[5671]: print_info: max token length = 256
Aug 23 10:20:28 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors
Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.663+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 33743"
Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.674+02:00 level=INFO source=runner.go:864 msg="starting go runner"
Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.711+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="58.0 GiB" free_swap="8.0 GiB"
Aug 23 10:20:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 10:20:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 10:20:28 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 10:20:28 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 10:20:28 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 10:20:28 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.735+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.736+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:33743"
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.380+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.3 GiB" free_swap="8.0 GiB"
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.380+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=cuda parallel=1 required="5.2 GiB" gpus=1
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.381+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split=[29] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.2 GiB" memory.required.partial="5.2 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[5.2 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="730.4 MiB"
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.382+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:29[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
Aug 23 10:20:29 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23734 MiB free
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.415+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.416+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 7B
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = apache-2.0
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  22:                          general.file_type u32              = 15
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv  33:               general.quantization_version u32              = 2
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - type  f32:  141 tensors
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - type q4_K:  169 tensors
Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - type q6_K:   29 tensors
Aug 23 10:20:29 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 23 10:20:29 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 23 10:20:29 tensor ollama[5671]: print_info: file size   = 4.36 GiB (4.91 BPW)
Aug 23 10:20:29 tensor ollama[5671]: load: printing all EOG tokens:
Aug 23 10:20:29 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 23 10:20:29 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 23 10:20:29 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 23 10:20:29 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 23 10:20:29 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 23 10:20:29 tensor ollama[5671]: load: special tokens cache size = 22
Aug 23 10:20:29 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 23 10:20:29 tensor ollama[5671]: print_info: arch             = qwen2
Aug 23 10:20:29 tensor ollama[5671]: print_info: vocab_only       = 0
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_ctx_train      = 32768
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd           = 3584
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_layer          = 28
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_head           = 28
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_head_kv        = 4
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_rot            = 128
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_swa            = 0
Aug 23 10:20:29 tensor ollama[5671]: print_info: is_swa_any       = 0
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_head_k    = 128
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_head_v    = 128
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_gqa            = 7
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_k_gqa     = 512
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_v_gqa     = 512
Aug 23 10:20:29 tensor ollama[5671]: print_info: f_norm_eps       = 0.0e+00
Aug 23 10:20:29 tensor ollama[5671]: print_info: f_norm_rms_eps   = 1.0e-06
Aug 23 10:20:29 tensor ollama[5671]: print_info: f_clamp_kqv      = 0.0e+00
Aug 23 10:20:29 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00
Aug 23 10:20:29 tensor ollama[5671]: print_info: f_logit_scale    = 0.0e+00
Aug 23 10:20:29 tensor ollama[5671]: print_info: f_attn_scale     = 0.0e+00
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_ff             = 18944
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_expert         = 0
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_expert_used    = 0
Aug 23 10:20:29 tensor ollama[5671]: print_info: causal attn      = 1
Aug 23 10:20:29 tensor ollama[5671]: print_info: pooling type     = -1
Aug 23 10:20:29 tensor ollama[5671]: print_info: rope type        = 2
Aug 23 10:20:29 tensor ollama[5671]: print_info: rope scaling     = linear
Aug 23 10:20:29 tensor ollama[5671]: print_info: freq_base_train  = 1000000.0
Aug 23 10:20:29 tensor ollama[5671]: print_info: freq_scale_train = 1
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_ctx_orig_yarn  = 32768
Aug 23 10:20:29 tensor ollama[5671]: print_info: rope_finetuned   = unknown
Aug 23 10:20:29 tensor ollama[5671]: print_info: model type       = 7B
Aug 23 10:20:29 tensor ollama[5671]: print_info: model params     = 7.62 B
Aug 23 10:20:29 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 7B Instruct
Aug 23 10:20:29 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_vocab          = 152064
Aug 23 10:20:29 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 23 10:20:29 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 23 10:20:29 tensor ollama[5671]: print_info: max token length = 256
Aug 23 10:20:29 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 23 10:20:29 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device
Aug 23 10:20:29 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model
Aug 23 10:20:29 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463
Aug 23 10:20:29 tensor ollama[5671]: goroutine 8 [running]:
Aug 23 10:20:29 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0004bc280, {0x1d, 0x0, 0x1, {0xc00070fd38, 0x1, 0x1}, 0xc00059b8b0, 0x0}, {0x7fff97038d54, ...}, ...)
Aug 23 10:20:29 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f
Aug 23 10:20:29 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 24
Aug 23 10:20:29 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.614+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.667+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model"
Aug 23 10:20:29 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:29 | 500 |  1.434612558s |       127.0.0.1 | POST     "/api/generate"
Aug 23 10:27:03 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:03 | 200 |   23.940335ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 10:27:03 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:03 | 200 |     125.544µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.463+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 41273"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.473+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.473+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:41273"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.505+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.3 GiB" free_swap="8.0 GiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.507+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.508+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[22.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.508+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.563+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37
Aug 23 10:27:11 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 10:27:11 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 10:27:11 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 10:27:11 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 10:27:11 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 10:27:11 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.605+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 10:27:20 tensor ollama[5671]: time=2025-08-23T10:27:20.608+02:00 level=WARN source=server.go:1241 msg="client connection closed before server finished loading, aborting load"
Aug 23 10:27:20 tensor ollama[5671]: time=2025-08-23T10:27:20.608+02:00 level=ERROR source=sched.go:479 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled"
Aug 23 10:27:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:20 | 499 |  9.550694558s |       127.0.0.1 | POST     "/api/chat"
Aug 23 10:27:25 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:25 | 200 |   18.753421ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 10:27:25 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:25 | 200 |      30.961µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.527+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 34953"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.536+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.536+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:34953"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.575+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.3 GiB" free_swap="8.0 GiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.576+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.577+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[22.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.578+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.633+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37
Aug 23 10:27:32 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 23 10:27:32 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 23 10:27:32 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 23 10:27:32 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 23 10:27:32 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 23 10:27:32 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.672+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.897+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 23 10:27:55 tensor ollama[5671]: time=2025-08-23T10:27:55.215+02:00 level=INFO source=server.go:1272 msg="llama runner started in 22.69 seconds"
Aug 23 10:28:19 tensor ollama[5671]: [GIN] 2025/08/23 - 10:28:19 | 200 | 47.448851492s |       127.0.0.1 | POST     "/api/chat"
Aug 23 10:28:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:28:20 | 200 |  892.609664ms |       127.0.0.1 | POST     "/api/chat"
Aug 23 10:28:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:28:23 | 200 |   2.51022638s |       127.0.0.1 | POST     "/api/chat"
Aug 23 13:04:23 tensor ollama[5671]: [GIN] 2025/08/23 - 13:04:23 | 200 |    14.92006ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 13:04:23 tensor ollama[5671]: [GIN] 2025/08/23 - 13:04:23 | 200 |      39.308µs |       127.0.0.1 | GET      "/api/ps"
Aug 23 13:04:24 tensor ollama[5671]: [GIN] 2025/08/23 - 13:04:24 | 200 |      32.957µs |       127.0.0.1 | GET      "/api/version"
Aug 23 13:06:48 tensor ollama[5671]: [GIN] 2025/08/23 - 13:06:48 | 200 | 18.866063628s |       127.0.0.1 | POST     "/api/chat"
Aug 23 13:06:48 tensor ollama[5671]: [GIN] 2025/08/23 - 13:06:48 | 200 |  532.297205ms |       127.0.0.1 | POST     "/api/chat"
Aug 23 13:06:50 tensor ollama[5671]: [GIN] 2025/08/23 - 13:06:50 | 200 |  1.714018055s |       127.0.0.1 | POST     "/api/chat"
Aug 23 13:09:55 tensor ollama[5671]: [GIN] 2025/08/23 - 13:09:55 | 200 |   10.199521ms |       127.0.0.1 | GET      "/api/tags"
Aug 23 13:09:55 tensor ollama[5671]: [GIN] 2025/08/23 - 13:09:55 | 200 |      30.906µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 09:29:45 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:45 | 200 |   16.127291ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 09:29:45 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:45 | 200 |       57.46µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 09:29:46 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:46 | 200 |     103.244µs |       127.0.0.1 | GET      "/api/version"
Aug 24 09:29:48 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:48 | 200 |   312.37447ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:29:48 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:48 | 200 |  313.113207ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:29:49 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:49 | 200 |  900.474615ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:34:11 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:11 | 200 |   15.258462ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 09:34:11 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:11 | 200 |      29.577µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 09:34:12 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:12 | 200 |      35.595µs |       127.0.0.1 | GET      "/api/version"
Aug 24 09:34:27 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:27 | 200 |   11.542712ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 09:34:27 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:27 | 200 |      30.414µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 09:34:28 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:28 | 200 |      40.637µs |       127.0.0.1 | GET      "/api/version"
Aug 24 09:35:33 tensor ollama[5671]: [GIN] 2025/08/24 - 09:35:33 | 200 |      29.911µs |       127.0.0.1 | HEAD     "/"
Aug 24 09:35:33 tensor ollama[5671]: [GIN] 2025/08/24 - 09:35:33 | 404 |    7.397409ms |       127.0.0.1 | POST     "/api/show"
Aug 24 09:35:34 tensor ollama[5671]: time=2025-08-24T09:35:34.315+02:00 level=INFO source=download.go:177 msg="downloading 4a188102020e in 16 120 MB part(s)"
Aug 24 09:36:02 tensor ollama[5671]: time=2025-08-24T09:36:02.676+02:00 level=INFO source=download.go:177 msg="downloading 45fc3ea7579a in 1 7.4 KB part(s)"
Aug 24 09:36:04 tensor ollama[5671]: time=2025-08-24T09:36:04.045+02:00 level=INFO source=download.go:177 msg="downloading bb967eff3bda in 1 487 B part(s)"
Aug 24 09:36:07 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:07 | 200 | 34.110004115s |       127.0.0.1 | POST     "/api/pull"
Aug 24 09:36:07 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:07 | 200 |   53.954475ms |       127.0.0.1 | POST     "/api/show"
Aug 24 09:36:07 tensor ollama[5671]: time=2025-08-24T09:36:07.791+02:00 level=INFO source=sched.go:540 msg="updated VRAM based on existing loaded models" gpu=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda total="23.7 GiB" available="4.4 GiB"
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest))
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 3B Instruct
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 3B
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = other
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.name str              = qwen-research
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   8:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv   9:                   general.base_model.count u32              = 1
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  10:                  general.base_model.0.name str              = Qwen2.5 Coder 3B
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  11:          general.base_model.0.organization str              = Qwen
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  12:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  13:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  14:                          general.languages arr[str,1]       = ["en"]
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  15:                          qwen2.block_count u32              = 36
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  16:                       qwen2.context_length u32              = 32768
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  17:                     qwen2.embedding_length u32              = 2048
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  18:                  qwen2.feed_forward_length u32              = 11008
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  19:                 qwen2.attention.head_count u32              = 16
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  20:              qwen2.attention.head_count_kv u32              = 2
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  21:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  22:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  23:                          general.file_type u32              = 15
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = qwen2
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv  34:               general.quantization_version u32              = 2
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - type  f32:  181 tensors
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - type q4_K:  216 tensors
Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - type q6_K:   37 tensors
Aug 24 09:36:07 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 24 09:36:07 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 24 09:36:07 tensor ollama[5671]: print_info: file size   = 1.79 GiB (4.99 BPW)
Aug 24 09:36:07 tensor ollama[5671]: load: printing all EOG tokens:
Aug 24 09:36:07 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 24 09:36:07 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 24 09:36:07 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 24 09:36:07 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 24 09:36:07 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 24 09:36:07 tensor ollama[5671]: load: special tokens cache size = 22
Aug 24 09:36:07 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 24 09:36:07 tensor ollama[5671]: print_info: arch             = qwen2
Aug 24 09:36:07 tensor ollama[5671]: print_info: vocab_only       = 1
Aug 24 09:36:07 tensor ollama[5671]: print_info: model type       = ?B
Aug 24 09:36:07 tensor ollama[5671]: print_info: model params     = 3.09 B
Aug 24 09:36:07 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 3B Instruct
Aug 24 09:36:07 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 24 09:36:07 tensor ollama[5671]: print_info: n_vocab          = 151936
Aug 24 09:36:07 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 24 09:36:07 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 24 09:36:07 tensor ollama[5671]: print_info: max token length = 256
Aug 24 09:36:07 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.043+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba --port 35345"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.052+02:00 level=INFO source=runner.go:864 msg="starting go runner"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.091+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="56.9 GiB" free_swap="8.0 GiB"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.092+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba library=cuda parallel=1 required="2.7 GiB" gpus=1
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.092+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[4.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.7 GiB" memory.required.partial="2.7 GiB" memory.required.kv="144.0 MiB" memory.required.allocations="[2.7 GiB]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="243.4 MiB" memory.graph.full="300.8 MiB" memory.graph.partial="544.2 MiB"
Aug 24 09:36:08 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 24 09:36:08 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 24 09:36:08 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 24 09:36:08 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 24 09:36:08 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 24 09:36:08 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.109+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.109+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:35345"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.114+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
Aug 24 09:36:08 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 4715 MiB free
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.146+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.147+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest))
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 3B Instruct
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 3B
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = other
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.name str              = qwen-research
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   8:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv   9:                   general.base_model.count u32              = 1
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  10:                  general.base_model.0.name str              = Qwen2.5 Coder 3B
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  11:          general.base_model.0.organization str              = Qwen
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  12:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  13:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  14:                          general.languages arr[str,1]       = ["en"]
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  15:                          qwen2.block_count u32              = 36
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  16:                       qwen2.context_length u32              = 32768
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  17:                     qwen2.embedding_length u32              = 2048
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  18:                  qwen2.feed_forward_length u32              = 11008
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  19:                 qwen2.attention.head_count u32              = 16
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  20:              qwen2.attention.head_count_kv u32              = 2
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  21:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  22:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  23:                          general.file_type u32              = 15
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = qwen2
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv  34:               general.quantization_version u32              = 2
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - type  f32:  181 tensors
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - type q4_K:  216 tensors
Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - type q6_K:   37 tensors
Aug 24 09:36:08 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 24 09:36:08 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 24 09:36:08 tensor ollama[5671]: print_info: file size   = 1.79 GiB (4.99 BPW)
Aug 24 09:36:08 tensor ollama[5671]: load: printing all EOG tokens:
Aug 24 09:36:08 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 24 09:36:08 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 24 09:36:08 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 24 09:36:08 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 24 09:36:08 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 24 09:36:08 tensor ollama[5671]: load: special tokens cache size = 22
Aug 24 09:36:08 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 24 09:36:08 tensor ollama[5671]: print_info: arch             = qwen2
Aug 24 09:36:08 tensor ollama[5671]: print_info: vocab_only       = 0
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_ctx_train      = 32768
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd           = 2048
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_layer          = 36
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_head           = 16
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_head_kv        = 2
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_rot            = 128
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_swa            = 0
Aug 24 09:36:08 tensor ollama[5671]: print_info: is_swa_any       = 0
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_head_k    = 128
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_head_v    = 128
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_gqa            = 8
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_k_gqa     = 256
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_v_gqa     = 256
Aug 24 09:36:08 tensor ollama[5671]: print_info: f_norm_eps       = 0.0e+00
Aug 24 09:36:08 tensor ollama[5671]: print_info: f_norm_rms_eps   = 1.0e-06
Aug 24 09:36:08 tensor ollama[5671]: print_info: f_clamp_kqv      = 0.0e+00
Aug 24 09:36:08 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00
Aug 24 09:36:08 tensor ollama[5671]: print_info: f_logit_scale    = 0.0e+00
Aug 24 09:36:08 tensor ollama[5671]: print_info: f_attn_scale     = 0.0e+00
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_ff             = 11008
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_expert         = 0
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_expert_used    = 0
Aug 24 09:36:08 tensor ollama[5671]: print_info: causal attn      = 1
Aug 24 09:36:08 tensor ollama[5671]: print_info: pooling type     = -1
Aug 24 09:36:08 tensor ollama[5671]: print_info: rope type        = 2
Aug 24 09:36:08 tensor ollama[5671]: print_info: rope scaling     = linear
Aug 24 09:36:08 tensor ollama[5671]: print_info: freq_base_train  = 1000000.0
Aug 24 09:36:08 tensor ollama[5671]: print_info: freq_scale_train = 1
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_ctx_orig_yarn  = 32768
Aug 24 09:36:08 tensor ollama[5671]: print_info: rope_finetuned   = unknown
Aug 24 09:36:08 tensor ollama[5671]: print_info: model type       = 3B
Aug 24 09:36:08 tensor ollama[5671]: print_info: model params     = 3.09 B
Aug 24 09:36:08 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 3B Instruct
Aug 24 09:36:08 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_vocab          = 151936
Aug 24 09:36:08 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 24 09:36:08 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 24 09:36:08 tensor ollama[5671]: print_info: max token length = 256
Aug 24 09:36:08 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 24 09:36:08 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device
Aug 24 09:36:08 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model
Aug 24 09:36:08 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba
Aug 24 09:36:08 tensor ollama[5671]: goroutine 54 [running]:
Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0002f6500, {0x25, 0x0, 0x1, {0xc0001cd208, 0x1, 0x1}, 0xc000502cd0, 0x0}, {0x7ffe0e254d54, ...}, ...)
Aug 24 09:36:08 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f
Aug 24 09:36:08 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51
Aug 24 09:36:08 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.357+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.397+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model"
Aug 24 09:36:08 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:08 | 500 |  769.338871ms |       127.0.0.1 | POST     "/api/generate"
Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 |   19.020204ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 |      27.749µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 09:37:21 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:21 | 200 |      56.493µs |       127.0.0.1 | GET      "/api/version"
Aug 24 09:37:42 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:42 | 200 |   4.36063473s |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:37:43 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:43 | 200 |  448.384271ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:37:44 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:44 | 200 |  1.045604379s |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:38:30 tensor ollama[5671]: [GIN] 2025/08/24 - 09:38:30 | 200 |  3.194981361s |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:41:19 tensor ollama[5671]: [GIN] 2025/08/24 - 09:41:19 | 200 |  3.217782328s |       127.0.0.1 | POST     "/api/chat"
Aug 24 09:42:56 tensor ollama[5671]: [GIN] 2025/08/24 - 09:42:56 | 200 |      41.686µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:23:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:14 | 200 |   12.156087ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:23:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:14 | 200 |      34.087µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:23:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:14 | 200 |      34.367µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:23:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:20 | 200 |  1.141412557s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:23:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:20 | 200 |  308.162597ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:23:21 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:21 | 200 |  940.241778ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:23:46 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:46 | 200 |  1.552475356s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:24:17 tensor ollama[5671]: [GIN] 2025/08/24 - 13:24:17 | 200 |   11.202262ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:24:17 tensor ollama[5671]: [GIN] 2025/08/24 - 13:24:17 | 200 |      72.504µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:26:04 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:04 | 200 |    12.27506ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:26:04 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:04 | 200 |      49.924µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:26:05 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:05 | 200 |   19.065232ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:26:05 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:05 | 200 |      30.361µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:26:05 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:05 | 200 |       36.62µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:07 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:07 | 200 |      33.405µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:11 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:11 | 200 |      35.863µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:13 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:13 | 200 |  1.105016887s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:26:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:14 | 200 |  329.014848ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:26:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:14 | 200 |  941.665313ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:26:19 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:19 | 200 |       35.37µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:20 | 200 |      33.177µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:24 | 200 |      29.952µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:26 | 200 |   11.000159ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:26:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:26 | 200 |      29.142µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:26:36 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:36 | 200 |    12.59344ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:26:36 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:36 | 200 |       30.42µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:26:39 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:39 | 200 |      31.166µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:26:42 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:42 | 200 |  1.050458592s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:26:43 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:43 | 200 |  338.196284ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:26:44 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:44 | 200 |  864.298854ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:26:52 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:52 | 200 |  843.821677ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:27:15 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:15 | 200 |   1.36406631s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:27:27 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:27 | 200 |  1.506646124s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:27:49 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:49 | 200 |    11.89996ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:27:49 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:49 | 200 |      33.547µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:28:00 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:00 | 200 |      39.705µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:28:10 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:10 | 200 |  1.514598358s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:11 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:11 | 200 |   481.75731ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:12 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:12 | 200 |  1.219557438s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:15 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:15 | 200 |   19.193942ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:28:15 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:15 | 200 |      87.819µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:28:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:24 | 200 |    8.827246ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:28:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:24 | 200 |      21.359µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:28:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:25 | 200 |   11.170147ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:28:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:25 | 200 |      31.662µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:28:27 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:27 | 200 |      35.987µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:28:28 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:28 | 200 |       36.86µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:28:32 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:32 | 200 |       39.54µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:28:34 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:34 | 200 |      33.855µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:28:37 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:37 | 200 |      31.254µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:28:41 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:41 | 200 |  2.071607151s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:41 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:41 | 200 |  415.928147ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:42 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:42 | 200 |  1.051054961s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:57 | 200 |  2.413244904s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:57 | 200 |  338.352268ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:58 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:58 | 200 |  879.287437ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:28:59 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:59 | 200 |      36.756µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:29:04 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:04 | 200 |      34.321µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:29:08 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:08 | 200 |   12.256472ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:29:08 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:08 | 200 |      33.353µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:29:13 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:13 | 200 |      31.844µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:29:43 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:43 | 200 |  1.528696642s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:29:53 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:53 | 200 |  1.132260114s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:29:55 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:55 | 200 |  2.349533427s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:29:56 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:56 | 200 |  519.970128ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:29:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:57 | 200 |  805.813547ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:31:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:20 | 200 |   12.815312ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:31:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:20 | 200 |      38.961µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:31:53 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:53 | 200 |   13.795763ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:31:53 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:53 | 200 |      37.274µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:31:54 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:54 | 200 |   14.327849ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:31:54 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:54 | 200 |      25.079µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:32:39 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:39 | 200 |   12.446897ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:32:39 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:39 | 200 |      29.867µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:32:40 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:40 | 200 |   11.573829ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:32:40 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:40 | 200 |      28.836µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:33:58 tensor ollama[5671]: [GIN] 2025/08/24 - 13:33:58 | 200 |      34.315µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:34:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:34:57 | 200 |  2.433509594s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:34:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:34:57 | 200 |  581.535511ms |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:34:58 tensor ollama[5671]: [GIN] 2025/08/24 - 13:34:58 | 200 |  1.166023038s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:35:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:26 | 200 |   12.633019ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:35:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:26 | 200 |      22.481µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:35:37 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:37 | 200 |   10.342028ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:35:37 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:37 | 200 |       34.32µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:38:35 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:35 | 200 |   15.966833ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:38:35 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:35 | 200 |      37.939µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:38:38 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:38 | 200 |   16.321552ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:38:38 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:38 | 200 |      57.795µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:38:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:57 | 200 |   11.743555ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:38:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:57 | 200 |      31.249µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:39:03 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:03 | 200 |      34.231µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:39:22 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:22 | 200 |   13.511931ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:39:22 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:22 | 200 |       32.03µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:39:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:26 | 200 |      34.621µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:39:45 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:45 | 200 |  3.187895355s |       127.0.0.1 | POST     "/api/chat"
Aug 24 13:40:07 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:07 | 200 |    7.228029ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:40:07 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:07 | 200 |      23.113µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:40:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:26 | 200 |   11.993859ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:40:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:26 | 200 |      38.602µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:42:16 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:16 | 200 |   11.183632ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:42:16 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:16 | 200 |      32.363µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 13:42:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:24 | 200 |      37.845µs |       127.0.0.1 | GET      "/api/version"
Aug 24 13:42:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:25 | 200 |      9.6161ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 13:42:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:25 | 200 |       42.35µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 14:04:31 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:31 | 200 |   11.985424ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 14:04:31 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:31 | 200 |      60.806µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 14:04:34 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:34 | 200 |      46.013µs |       127.0.0.1 | GET      "/api/version"
Aug 24 14:04:40 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:40 | 200 |       35.65µs |       127.0.0.1 | GET      "/api/version"
Aug 24 14:07:50 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:50 | 200 |   25.432778ms |       127.0.0.1 | GET      "/api/tags"
Aug 24 14:07:50 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:50 | 200 |      76.826µs |       127.0.0.1 | GET      "/api/ps"
Aug 24 14:07:50 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:50 | 200 |      51.468µs |       127.0.0.1 | GET      "/api/version"
Aug 24 14:07:52 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:52 | 200 |      49.281µs |       127.0.0.1 | GET      "/api/version"
Aug 24 14:08:39 tensor ollama[5671]: [GIN] 2025/08/24 - 14:08:39 | 200 |  4.387403229s |       127.0.0.1 | POST     "/api/chat"
Aug 24 14:09:08 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:08 | 200 |      40.839µs |       127.0.0.1 | HEAD     "/"
Aug 24 14:09:08 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:08 | 404 |   21.351646ms |       127.0.0.1 | POST     "/api/show"
Aug 24 14:09:09 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:09 | 200 |  504.262345ms |       127.0.0.1 | POST     "/api/pull"
Aug 24 14:09:22 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:22 | 200 |      34.843µs |       127.0.0.1 | HEAD     "/"
Aug 24 14:09:22 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:22 | 404 |   19.725564ms |       127.0.0.1 | POST     "/api/show"
Aug 24 14:09:22 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:22 | 200 |    429.8143ms |       127.0.0.1 | POST     "/api/pull"
Aug 24 14:09:30 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:30 | 200 |      30.258µs |       127.0.0.1 | HEAD     "/"
Aug 24 14:09:30 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:30 | 200 |   95.776399ms |       127.0.0.1 | POST     "/api/show"
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.370+02:00 level=INFO source=sched.go:540 msg="updated VRAM based on existing loaded models" gpu=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda total="23.7 GiB" available="4.4 GiB"
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest))
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 3B Instruct
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 3B
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = other
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.name str              = qwen-research
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   8:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   9:                   general.base_model.count u32              = 1
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  10:                  general.base_model.0.name str              = Qwen2.5 Coder 3B
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  11:          general.base_model.0.organization str              = Qwen
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  12:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  13:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  14:                          general.languages arr[str,1]       = ["en"]
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  15:                          qwen2.block_count u32              = 36
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  16:                       qwen2.context_length u32              = 32768
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  17:                     qwen2.embedding_length u32              = 2048
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  18:                  qwen2.feed_forward_length u32              = 11008
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  19:                 qwen2.attention.head_count u32              = 16
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  20:              qwen2.attention.head_count_kv u32              = 2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  21:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  22:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  23:                          general.file_type u32              = 15
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = qwen2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  34:               general.quantization_version u32              = 2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - type  f32:  181 tensors
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - type q4_K:  216 tensors
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - type q6_K:   37 tensors
Aug 24 14:09:30 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 24 14:09:30 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 24 14:09:30 tensor ollama[5671]: print_info: file size   = 1.79 GiB (4.99 BPW)
Aug 24 14:09:30 tensor ollama[5671]: load: printing all EOG tokens:
Aug 24 14:09:30 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 24 14:09:30 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 24 14:09:30 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 24 14:09:30 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 24 14:09:30 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 24 14:09:30 tensor ollama[5671]: load: special tokens cache size = 22
Aug 24 14:09:30 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 24 14:09:30 tensor ollama[5671]: print_info: arch             = qwen2
Aug 24 14:09:30 tensor ollama[5671]: print_info: vocab_only       = 1
Aug 24 14:09:30 tensor ollama[5671]: print_info: model type       = ?B
Aug 24 14:09:30 tensor ollama[5671]: print_info: model params     = 3.09 B
Aug 24 14:09:30 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 3B Instruct
Aug 24 14:09:30 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 24 14:09:30 tensor ollama[5671]: print_info: n_vocab          = 151936
Aug 24 14:09:30 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 24 14:09:30 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 24 14:09:30 tensor ollama[5671]: print_info: max token length = 256
Aug 24 14:09:30 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.718+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba --port 35749"
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.739+02:00 level=INFO source=runner.go:864 msg="starting go runner"
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.796+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="56.6 GiB" free_swap="8.0 GiB"
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.796+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba library=cuda parallel=1 required="2.7 GiB" gpus=1
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.797+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[4.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.7 GiB" memory.required.partial="2.7 GiB" memory.required.kv="144.0 MiB" memory.required.allocations="[2.7 GiB]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="243.4 MiB" memory.graph.full="300.8 MiB" memory.graph.partial="544.2 MiB"
Aug 24 14:09:30 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 24 14:09:30 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 24 14:09:30 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices:
Aug 24 14:09:30 tensor ollama[5671]:   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c
Aug 24 14:09:30 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Aug 24 14:09:30 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.843+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.844+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:35749"
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.851+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
Aug 24 14:09:30 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 4711 MiB free
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.896+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.896+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest))
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   1:                               general.type str              = model
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 3B Instruct
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   5:                         general.size_label str              = 3B
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   6:                            general.license str              = other
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   7:                       general.license.name str              = qwen-research
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   8:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv   9:                   general.base_model.count u32              = 1
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  10:                  general.base_model.0.name str              = Qwen2.5 Coder 3B
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  11:          general.base_model.0.organization str              = Qwen
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  12:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  13:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  14:                          general.languages arr[str,1]       = ["en"]
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  15:                          qwen2.block_count u32              = 36
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  16:                       qwen2.context_length u32              = 32768
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  17:                     qwen2.embedding_length u32              = 2048
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  18:                  qwen2.feed_forward_length u32              = 11008
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  19:                 qwen2.attention.head_count u32              = 16
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  20:              qwen2.attention.head_count_kv u32              = 2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  21:                       qwen2.rope.freq_base f32              = 1000000.000000
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  22:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  23:                          general.file_type u32              = 15
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = qwen2
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 151645
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 151643
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 151643
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv  34:               general.quantization_version u32              = 2
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - type  f32:  181 tensors
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - type q4_K:  216 tensors
Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - type q6_K:   37 tensors
Aug 24 14:09:31 tensor ollama[5671]: print_info: file format = GGUF V3 (latest)
Aug 24 14:09:31 tensor ollama[5671]: print_info: file type   = Q4_K - Medium
Aug 24 14:09:31 tensor ollama[5671]: print_info: file size   = 1.79 GiB (4.99 BPW)
Aug 24 14:09:31 tensor ollama[5671]: load: printing all EOG tokens:
Aug 24 14:09:31 tensor ollama[5671]: load:   - 151643 ('<|endoftext|>')
Aug 24 14:09:31 tensor ollama[5671]: load:   - 151645 ('<|im_end|>')
Aug 24 14:09:31 tensor ollama[5671]: load:   - 151662 ('<|fim_pad|>')
Aug 24 14:09:31 tensor ollama[5671]: load:   - 151663 ('<|repo_name|>')
Aug 24 14:09:31 tensor ollama[5671]: load:   - 151664 ('<|file_sep|>')
Aug 24 14:09:31 tensor ollama[5671]: load: special tokens cache size = 22
Aug 24 14:09:31 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB
Aug 24 14:09:31 tensor ollama[5671]: print_info: arch             = qwen2
Aug 24 14:09:31 tensor ollama[5671]: print_info: vocab_only       = 0
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_ctx_train      = 32768
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd           = 2048
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_layer          = 36
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_head           = 16
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_head_kv        = 2
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_rot            = 128
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_swa            = 0
Aug 24 14:09:31 tensor ollama[5671]: print_info: is_swa_any       = 0
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_head_k    = 128
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_head_v    = 128
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_gqa            = 8
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_k_gqa     = 256
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_v_gqa     = 256
Aug 24 14:09:31 tensor ollama[5671]: print_info: f_norm_eps       = 0.0e+00
Aug 24 14:09:31 tensor ollama[5671]: print_info: f_norm_rms_eps   = 1.0e-06
Aug 24 14:09:31 tensor ollama[5671]: print_info: f_clamp_kqv      = 0.0e+00
Aug 24 14:09:31 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00
Aug 24 14:09:31 tensor ollama[5671]: print_info: f_logit_scale    = 0.0e+00
Aug 24 14:09:31 tensor ollama[5671]: print_info: f_attn_scale     = 0.0e+00
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_ff             = 11008
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_expert         = 0
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_expert_used    = 0
Aug 24 14:09:31 tensor ollama[5671]: print_info: causal attn      = 1
Aug 24 14:09:31 tensor ollama[5671]: print_info: pooling type     = -1
Aug 24 14:09:31 tensor ollama[5671]: print_info: rope type        = 2
Aug 24 14:09:31 tensor ollama[5671]: print_info: rope scaling     = linear
Aug 24 14:09:31 tensor ollama[5671]: print_info: freq_base_train  = 1000000.0
Aug 24 14:09:31 tensor ollama[5671]: print_info: freq_scale_train = 1
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_ctx_orig_yarn  = 32768
Aug 24 14:09:31 tensor ollama[5671]: print_info: rope_finetuned   = unknown
Aug 24 14:09:31 tensor ollama[5671]: print_info: model type       = 3B
Aug 24 14:09:31 tensor ollama[5671]: print_info: model params     = 3.09 B
Aug 24 14:09:31 tensor ollama[5671]: print_info: general.name     = Qwen2.5 Coder 3B Instruct
Aug 24 14:09:31 tensor ollama[5671]: print_info: vocab type       = BPE
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_vocab          = 151936
Aug 24 14:09:31 tensor ollama[5671]: print_info: n_merges         = 151387
Aug 24 14:09:31 tensor ollama[5671]: print_info: BOS token        = 151643 '<|endoftext|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOS token        = 151645 '<|im_end|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOT token        = 151645 '<|im_end|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: PAD token        = 151643 '<|endoftext|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: LF token         = 198 'Ċ'
Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM REP token    = 151663 '<|repo_name|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token        = 151643 '<|endoftext|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token        = 151645 '<|im_end|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token        = 151662 '<|fim_pad|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token        = 151663 '<|repo_name|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token        = 151664 '<|file_sep|>'
Aug 24 14:09:31 tensor ollama[5671]: print_info: max token length = 256
Aug 24 14:09:31 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 24 14:09:31 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device
Aug 24 14:09:31 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model
Aug 24 14:09:31 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba
Aug 24 14:09:31 tensor ollama[5671]: goroutine 38 [running]:
Aug 24 14:09:31 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0000f9220, {0x25, 0x0, 0x1, {0xc0003bfa28, 0x1, 0x1}, 0xc000112680, 0x0}, {0x7ffe8415fd54, ...}, ...)
Aug 24 14:09:31 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f
Aug 24 14:09:31 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 6
Aug 24 14:09:31 tensor ollama[5671]:         github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce
Aug 24 14:09:31 tensor ollama[5671]: time=2025-08-24T14:09:31.238+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
Aug 24 14:09:31 tensor ollama[5671]: time=2025-08-24T14:09:31.398+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model"
Aug 24 14:09:31 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:31 | 500 |  1.293548336s |       127.0.0.1 | POST     "/api/generate"
<!-- gh-comment-id:3218063155 --> @LaCocoRoco commented on GitHub (Aug 24, 2025): ```ls -l /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba``` -rw-r--r-- 1 ollama openai 1929903072 Aug 24 09:36 /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba ```sha256sum /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba``` 4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba ``` Aug 23 07:37:25 tensor systemd[1]: Started ollama.service - Ollama Service. Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.578+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.660+02:00 level=INFO source=images.go:477 msg="total blobs: 24" Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.677+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.700+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)" Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.703+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Aug 23 07:37:25 tensor ollama[1276]: time=2025-08-23T07:37:25.993+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB" Aug 23 07:37:45 tensor ollama[1276]: [GIN] 2025/08/23 - 07:37:45 | 200 | 25.073075ms | 127.0.0.1 | GET "/api/tags" Aug 23 07:37:45 tensor ollama[1276]: [GIN] 2025/08/23 - 07:37:45 | 200 | 736.727µs | 127.0.0.1 | GET "/api/ps" Aug 23 07:37:46 tensor ollama[1276]: [GIN] 2025/08/23 - 07:37:46 | 200 | 48.98µs | 127.0.0.1 | GET "/api/version" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.006+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 33291" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.019+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.019+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:33291" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.051+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.8 GiB" free_swap="8.0 GiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.052+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1 Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.053+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.055+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.120+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37 Aug 23 07:55:28 tensor ollama[1276]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 07:55:28 tensor ollama[1276]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 07:55:28 tensor ollama[1276]: ggml_cuda_init: found 1 CUDA devices: Aug 23 07:55:28 tensor ollama[1276]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 07:55:28 tensor ollama[1276]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 07:55:28 tensor ollama[1276]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.342+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1 Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 07:55:28 tensor ollama[1276]: time=2025-08-23T07:55:28.617+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 07:55:49 tensor ollama[1276]: time=2025-08-23T07:55:49.955+02:00 level=INFO source=server.go:1272 msg="llama runner started in 21.95 seconds" Aug 23 07:55:53 tensor ollama[1276]: [GIN] 2025/08/23 - 07:55:53 | 200 | 26.117570848s | 127.0.0.1 | POST "/api/chat" Aug 23 07:55:54 tensor ollama[1276]: [GIN] 2025/08/23 - 07:55:54 | 200 | 645.325173ms | 127.0.0.1 | POST "/api/chat" Aug 23 07:55:55 tensor ollama[1276]: [GIN] 2025/08/23 - 07:55:55 | 200 | 1.332291122s | 127.0.0.1 | POST "/api/chat" Aug 23 07:57:29 tensor ollama[1276]: [GIN] 2025/08/23 - 07:57:29 | 200 | 11.295562813s | 127.0.0.1 | POST "/api/chat" Aug 23 07:58:33 tensor ollama[1276]: [GIN] 2025/08/23 - 07:58:33 | 200 | 3.546104655s | 127.0.0.1 | POST "/api/chat" Aug 23 07:59:49 tensor ollama[1276]: [GIN] 2025/08/23 - 07:59:49 | 200 | 3.972724828s | 127.0.0.1 | POST "/api/chat" Aug 23 08:01:11 tensor ollama[1276]: [GIN] 2025/08/23 - 08:01:11 | 200 | 3.705968817s | 127.0.0.1 | POST "/api/chat" Aug 23 08:12:29 tensor ollama[1276]: [GIN] 2025/08/23 - 08:12:29 | 200 | 25.267944ms | 127.0.0.1 | GET "/api/tags" Aug 23 08:12:29 tensor ollama[1276]: [GIN] 2025/08/23 - 08:12:29 | 200 | 53.196µs | 127.0.0.1 | GET "/api/ps" Aug 23 08:23:35 tensor systemd[1]: Stopping ollama.service - Ollama Service... Aug 23 08:23:35 tensor systemd[1]: ollama.service: Deactivated successfully. Aug 23 08:23:35 tensor systemd[1]: Stopped ollama.service - Ollama Service. Aug 23 08:23:35 tensor systemd[1]: ollama.service: Consumed 46.189s CPU time, 1.8G memory peak, 0B memory swap peak. Aug 23 08:23:35 tensor systemd[1]: Started ollama.service - Ollama Service. Aug 23 08:23:35 tensor ollama[3460]: time=2025-08-23T08:23:35.876+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Aug 23 08:23:35 tensor systemd[1]: Stopping ollama.service - Ollama Service... Aug 23 08:23:35 tensor systemd[1]: ollama.service: Deactivated successfully. Aug 23 08:23:35 tensor systemd[1]: Stopped ollama.service - Ollama Service. Aug 23 08:23:35 tensor systemd[1]: Started ollama.service - Ollama Service. Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.902+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.918+02:00 level=INFO source=images.go:477 msg="total blobs: 24" Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.925+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.933+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)" Aug 23 08:23:35 tensor ollama[3475]: time=2025-08-23T08:23:35.933+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Aug 23 08:23:36 tensor ollama[3475]: time=2025-08-23T08:23:36.115+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.2 GiB" Aug 23 09:20:11 tensor ollama[3475]: [GIN] 2025/08/23 - 09:20:11 | 200 | 12.128463ms | 127.0.0.1 | GET "/api/tags" Aug 23 09:20:11 tensor ollama[3475]: [GIN] 2025/08/23 - 09:20:11 | 200 | 73.607µs | 127.0.0.1 | GET "/api/ps" Aug 23 09:33:02 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:02 | 200 | 6.800455ms | 127.0.0.1 | GET "/api/tags" Aug 23 09:33:02 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:02 | 200 | 28.446µs | 127.0.0.1 | GET "/api/ps" Aug 23 09:33:02 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:02 | 200 | 63.496µs | 127.0.0.1 | GET "/api/version" Aug 23 09:33:03 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:03 | 200 | 6.310948ms | 127.0.0.1 | GET "/api/tags" Aug 23 09:33:03 tensor ollama[3475]: [GIN] 2025/08/23 - 09:33:03 | 200 | 14.95µs | 127.0.0.1 | GET "/api/ps" Aug 23 09:36:45 tensor ollama[3475]: [GIN] 2025/08/23 - 09:36:45 | 200 | 21.781µs | 127.0.0.1 | HEAD "/" Aug 23 09:36:45 tensor ollama[3475]: [GIN] 2025/08/23 - 09:36:45 | 200 | 798.861016ms | 127.0.0.1 | POST "/api/pull" Aug 23 09:50:27 tensor ollama[3475]: [GIN] 2025/08/23 - 09:50:27 | 200 | 22.983µs | 127.0.0.1 | HEAD "/" Aug 23 09:50:28 tensor ollama[3475]: [GIN] 2025/08/23 - 09:50:28 | 200 | 815.541413ms | 127.0.0.1 | POST "/api/pull" Aug 23 10:01:55 tensor systemd[1]: Stopping ollama.service - Ollama Service... Aug 23 10:01:55 tensor systemd[1]: ollama.service: Deactivated successfully. Aug 23 10:01:55 tensor systemd[1]: Stopped ollama.service - Ollama Service. Aug 23 10:01:55 tensor systemd[1]: Started ollama.service - Ollama Service. Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.185+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.198+02:00 level=INFO source=images.go:477 msg="total blobs: 24" Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.206+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.212+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)" Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.212+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Aug 23 10:01:55 tensor ollama[5576]: time=2025-08-23T10:01:55.283+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.2 GiB" Aug 23 10:02:16 tensor systemd[1]: Stopping ollama.service - Ollama Service... Aug 23 10:02:16 tensor systemd[1]: ollama.service: Deactivated successfully. Aug 23 10:02:16 tensor systemd[1]: Stopped ollama.service - Ollama Service. Aug 23 10:02:16 tensor systemd[1]: Started ollama.service - Ollama Service. Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.800+02:00 level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.818+02:00 level=INFO source=images.go:477 msg="total blobs: 24" Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.824+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.830+02:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:11434 (version 0.11.6)" Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.830+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Aug 23 10:02:16 tensor ollama[5671]: time=2025-08-23T10:02:16.911+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.2 GiB" Aug 23 10:03:46 tensor ollama[5671]: [GIN] 2025/08/23 - 10:03:46 | 200 | 34.61µs | 127.0.0.1 | HEAD "/" Aug 23 10:03:47 tensor ollama[5671]: time=2025-08-23T10:03:47.106+02:00 level=INFO source=download.go:177 msg="downloading ac3d1ba8aa77 in 20 1 GB part(s)" Aug 23 10:08:02 tensor ollama[5671]: time=2025-08-23T10:08:02.524+02:00 level=INFO source=download.go:177 msg="downloading 832dd9e00a68 in 1 11 KB part(s)" Aug 23 10:08:03 tensor ollama[5671]: time=2025-08-23T10:08:03.866+02:00 level=INFO source=download.go:177 msg="downloading f0676bd3c336 in 1 488 B part(s)" Aug 23 10:08:38 tensor ollama[5671]: [GIN] 2025/08/23 - 10:08:38 | 200 | 4m51s | 127.0.0.1 | POST "/api/pull" Aug 23 10:10:27 tensor ollama[5671]: [GIN] 2025/08/23 - 10:10:27 | 200 | 36.552µs | 127.0.0.1 | HEAD "/" Aug 23 10:10:27 tensor ollama[5671]: [GIN] 2025/08/23 - 10:10:27 | 200 | 55.457418ms | 127.0.0.1 | POST "/api/show" Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 (version GGUF V3 (latest)) Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 32B Instruct Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 32B Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = apache-2.0 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 32B Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 22: general.file_type u32 = 15 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type f32: 321 tensors Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q4_K: 385 tensors Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q6_K: 65 tensors Aug 23 10:10:28 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 23 10:10:28 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 23 10:10:28 tensor ollama[5671]: print_info: file size = 18.48 GiB (4.85 BPW) Aug 23 10:10:28 tensor ollama[5671]: load: printing all EOG tokens: Aug 23 10:10:28 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 23 10:10:28 tensor ollama[5671]: load: special tokens cache size = 22 Aug 23 10:10:28 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 23 10:10:28 tensor ollama[5671]: print_info: arch = qwen2 Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab_only = 1 Aug 23 10:10:28 tensor ollama[5671]: print_info: model type = ?B Aug 23 10:10:28 tensor ollama[5671]: print_info: model params = 32.76 B Aug 23 10:10:28 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 32B Instruct Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab type = BPE Aug 23 10:10:28 tensor ollama[5671]: print_info: n_vocab = 152064 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_merges = 151387 Aug 23 10:10:28 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: max token length = 256 Aug 23 10:10:28 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.388+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 --port 45561" Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.402+02:00 level=INFO source=runner.go:864 msg="starting go runner" Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.437+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.5 GiB" free_swap="8.0 GiB" Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.438+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 library=cuda parallel=1 required="20.2 GiB" gpus=1 Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.438+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=[65] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.2 GiB" memory.required.partial="20.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[20.2 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="348.0 MiB" memory.graph.partial="916.1 MiB" Aug 23 10:10:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 10:10:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 10:10:28 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 23 10:10:28 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 10:10:28 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 10:10:28 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.478+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.478+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:45561" Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.481+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:65[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:65(0..64)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" Aug 23 10:10:28 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23734 MiB free Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.517+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.517+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 (version GGUF V3 (latest)) Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 32B Instruct Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 32B Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = apache-2.0 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 32B Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 22: general.file_type u32 = 15 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type f32: 321 tensors Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q4_K: 385 tensors Aug 23 10:10:28 tensor ollama[5671]: llama_model_loader: - type q6_K: 65 tensors Aug 23 10:10:28 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 23 10:10:28 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 23 10:10:28 tensor ollama[5671]: print_info: file size = 18.48 GiB (4.85 BPW) Aug 23 10:10:28 tensor ollama[5671]: load: printing all EOG tokens: Aug 23 10:10:28 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 23 10:10:28 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 23 10:10:28 tensor ollama[5671]: load: special tokens cache size = 22 Aug 23 10:10:28 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 23 10:10:28 tensor ollama[5671]: print_info: arch = qwen2 Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab_only = 0 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_ctx_train = 32768 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd = 5120 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_layer = 64 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_head = 40 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_head_kv = 8 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_rot = 128 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_swa = 0 Aug 23 10:10:28 tensor ollama[5671]: print_info: is_swa_any = 0 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_head_k = 128 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_head_v = 128 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_gqa = 5 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_k_gqa = 1024 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_embd_v_gqa = 1024 Aug 23 10:10:28 tensor ollama[5671]: print_info: f_norm_eps = 0.0e+00 Aug 23 10:10:28 tensor ollama[5671]: print_info: f_norm_rms_eps = 1.0e-06 Aug 23 10:10:28 tensor ollama[5671]: print_info: f_clamp_kqv = 0.0e+00 Aug 23 10:10:28 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00 Aug 23 10:10:28 tensor ollama[5671]: print_info: f_logit_scale = 0.0e+00 Aug 23 10:10:28 tensor ollama[5671]: print_info: f_attn_scale = 0.0e+00 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_ff = 27648 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_expert = 0 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_expert_used = 0 Aug 23 10:10:28 tensor ollama[5671]: print_info: causal attn = 1 Aug 23 10:10:28 tensor ollama[5671]: print_info: pooling type = -1 Aug 23 10:10:28 tensor ollama[5671]: print_info: rope type = 2 Aug 23 10:10:28 tensor ollama[5671]: print_info: rope scaling = linear Aug 23 10:10:28 tensor ollama[5671]: print_info: freq_base_train = 1000000.0 Aug 23 10:10:28 tensor ollama[5671]: print_info: freq_scale_train = 1 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_ctx_orig_yarn = 32768 Aug 23 10:10:28 tensor ollama[5671]: print_info: rope_finetuned = unknown Aug 23 10:10:28 tensor ollama[5671]: print_info: model type = 32B Aug 23 10:10:28 tensor ollama[5671]: print_info: model params = 32.76 B Aug 23 10:10:28 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 32B Instruct Aug 23 10:10:28 tensor ollama[5671]: print_info: vocab type = BPE Aug 23 10:10:28 tensor ollama[5671]: print_info: n_vocab = 152064 Aug 23 10:10:28 tensor ollama[5671]: print_info: n_merges = 151387 Aug 23 10:10:28 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 23 10:10:28 tensor ollama[5671]: print_info: max token length = 256 Aug 23 10:10:28 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 23 10:10:28 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device Aug 23 10:10:28 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model Aug 23 10:10:28 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 Aug 23 10:10:28 tensor ollama[5671]: goroutine 14 [running]: Aug 23 10:10:28 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc00047b2c0, {0x41, 0x0, 0x1, {0xc0003ac0d8, 0x1, 0x1}, 0xc0005a34d0, 0x0}, {0x7ffd22a8bd54, ...}, ...) Aug 23 10:10:28 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f Aug 23 10:10:28 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 12 Aug 23 10:10:28 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.743+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" Aug 23 10:10:28 tensor ollama[5671]: time=2025-08-23T10:10:28.768+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model" Aug 23 10:10:28 tensor ollama[5671]: [GIN] 2025/08/23 - 10:10:28 | 500 | 803.675254ms | 127.0.0.1 | POST "/api/generate" Aug 23 10:15:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:15:13 | 200 | 119.388µs | 127.0.0.1 | HEAD "/" Aug 23 10:15:14 tensor ollama[5671]: time=2025-08-23T10:15:14.086+02:00 level=INFO source=download.go:177 msg="downloading 60e05f210007 in 16 292 MB part(s)" Aug 23 10:16:16 tensor ollama[5671]: time=2025-08-23T10:16:16.415+02:00 level=INFO source=download.go:177 msg="downloading d9bb33f27869 in 1 487 B part(s)" Aug 23 10:16:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:16:23 | 200 | 1m10s | 127.0.0.1 | POST "/api/pull" Aug 23 10:17:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:23 | 200 | 39.642µs | 127.0.0.1 | HEAD "/" Aug 23 10:17:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:23 | 200 | 30.982614ms | 127.0.0.1 | GET "/api/tags" Aug 23 10:17:41 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:41 | 200 | 29.386µs | 127.0.0.1 | HEAD "/" Aug 23 10:17:41 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:41 | 200 | 14.797788ms | 127.0.0.1 | POST "/api/generate" Aug 23 10:17:42 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:42 | 200 | 66.935873ms | 127.0.0.1 | DELETE "/api/delete" Aug 23 10:17:53 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:53 | 200 | 26.657µs | 127.0.0.1 | HEAD "/" Aug 23 10:17:53 tensor ollama[5671]: [GIN] 2025/08/23 - 10:17:53 | 200 | 12.036165ms | 127.0.0.1 | GET "/api/tags" Aug 23 10:18:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:13 | 200 | 19.668µs | 127.0.0.1 | HEAD "/" Aug 23 10:18:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:13 | 200 | 23.803691ms | 127.0.0.1 | POST "/api/generate" Aug 23 10:18:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:13 | 200 | 49.584618ms | 127.0.0.1 | DELETE "/api/delete" Aug 23 10:18:17 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:17 | 200 | 20.163µs | 127.0.0.1 | HEAD "/" Aug 23 10:18:17 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:17 | 200 | 10.459848ms | 127.0.0.1 | GET "/api/tags" Aug 23 10:18:38 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:38 | 200 | 27.028µs | 127.0.0.1 | HEAD "/" Aug 23 10:18:38 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:38 | 200 | 63.247975ms | 127.0.0.1 | POST "/api/show" Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 7B Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = apache-2.0 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 14: qwen2.block_count u32 = 28 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 22: general.file_type u32 = 15 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - type f32: 141 tensors Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - type q4_K: 169 tensors Aug 23 10:18:38 tensor ollama[5671]: llama_model_loader: - type q6_K: 29 tensors Aug 23 10:18:38 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 23 10:18:38 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 23 10:18:38 tensor ollama[5671]: print_info: file size = 4.36 GiB (4.91 BPW) Aug 23 10:18:38 tensor ollama[5671]: load: printing all EOG tokens: Aug 23 10:18:38 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 23 10:18:38 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 23 10:18:38 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 23 10:18:38 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 23 10:18:38 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 23 10:18:38 tensor ollama[5671]: load: special tokens cache size = 22 Aug 23 10:18:38 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 23 10:18:38 tensor ollama[5671]: print_info: arch = qwen2 Aug 23 10:18:38 tensor ollama[5671]: print_info: vocab_only = 1 Aug 23 10:18:38 tensor ollama[5671]: print_info: model type = ?B Aug 23 10:18:38 tensor ollama[5671]: print_info: model params = 7.62 B Aug 23 10:18:38 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 7B Instruct Aug 23 10:18:38 tensor ollama[5671]: print_info: vocab type = BPE Aug 23 10:18:38 tensor ollama[5671]: print_info: n_vocab = 152064 Aug 23 10:18:38 tensor ollama[5671]: print_info: n_merges = 151387 Aug 23 10:18:38 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 23 10:18:38 tensor ollama[5671]: print_info: max token length = 256 Aug 23 10:18:38 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors Aug 23 10:18:38 tensor ollama[5671]: time=2025-08-23T10:18:38.959+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 33097" Aug 23 10:18:38 tensor ollama[5671]: time=2025-08-23T10:18:38.969+02:00 level=INFO source=runner.go:864 msg="starting go runner" Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.006+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.5 GiB" free_swap="8.0 GiB" Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.006+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=cuda parallel=1 required="5.2 GiB" gpus=1 Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.006+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split=[29] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.2 GiB" memory.required.partial="5.2 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[5.2 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="730.4 MiB" Aug 23 10:18:39 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 10:18:39 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 10:18:39 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 23 10:18:39 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 10:18:39 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 10:18:39 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.023+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.023+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:33097" Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.028+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:29[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" Aug 23 10:18:39 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23734 MiB free Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.058+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.059+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 7B Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = apache-2.0 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 14: qwen2.block_count u32 = 28 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 22: general.file_type u32 = 15 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - type f32: 141 tensors Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - type q4_K: 169 tensors Aug 23 10:18:39 tensor ollama[5671]: llama_model_loader: - type q6_K: 29 tensors Aug 23 10:18:39 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 23 10:18:39 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 23 10:18:39 tensor ollama[5671]: print_info: file size = 4.36 GiB (4.91 BPW) Aug 23 10:18:39 tensor ollama[5671]: load: printing all EOG tokens: Aug 23 10:18:39 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 23 10:18:39 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 23 10:18:39 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 23 10:18:39 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 23 10:18:39 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 23 10:18:39 tensor ollama[5671]: load: special tokens cache size = 22 Aug 23 10:18:39 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 23 10:18:39 tensor ollama[5671]: print_info: arch = qwen2 Aug 23 10:18:39 tensor ollama[5671]: print_info: vocab_only = 0 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_ctx_train = 32768 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd = 3584 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_layer = 28 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_head = 28 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_head_kv = 4 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_rot = 128 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_swa = 0 Aug 23 10:18:39 tensor ollama[5671]: print_info: is_swa_any = 0 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_head_k = 128 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_head_v = 128 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_gqa = 7 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_k_gqa = 512 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_embd_v_gqa = 512 Aug 23 10:18:39 tensor ollama[5671]: print_info: f_norm_eps = 0.0e+00 Aug 23 10:18:39 tensor ollama[5671]: print_info: f_norm_rms_eps = 1.0e-06 Aug 23 10:18:39 tensor ollama[5671]: print_info: f_clamp_kqv = 0.0e+00 Aug 23 10:18:39 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00 Aug 23 10:18:39 tensor ollama[5671]: print_info: f_logit_scale = 0.0e+00 Aug 23 10:18:39 tensor ollama[5671]: print_info: f_attn_scale = 0.0e+00 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_ff = 18944 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_expert = 0 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_expert_used = 0 Aug 23 10:18:39 tensor ollama[5671]: print_info: causal attn = 1 Aug 23 10:18:39 tensor ollama[5671]: print_info: pooling type = -1 Aug 23 10:18:39 tensor ollama[5671]: print_info: rope type = 2 Aug 23 10:18:39 tensor ollama[5671]: print_info: rope scaling = linear Aug 23 10:18:39 tensor ollama[5671]: print_info: freq_base_train = 1000000.0 Aug 23 10:18:39 tensor ollama[5671]: print_info: freq_scale_train = 1 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_ctx_orig_yarn = 32768 Aug 23 10:18:39 tensor ollama[5671]: print_info: rope_finetuned = unknown Aug 23 10:18:39 tensor ollama[5671]: print_info: model type = 7B Aug 23 10:18:39 tensor ollama[5671]: print_info: model params = 7.62 B Aug 23 10:18:39 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 7B Instruct Aug 23 10:18:39 tensor ollama[5671]: print_info: vocab type = BPE Aug 23 10:18:39 tensor ollama[5671]: print_info: n_vocab = 152064 Aug 23 10:18:39 tensor ollama[5671]: print_info: n_merges = 151387 Aug 23 10:18:39 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 23 10:18:39 tensor ollama[5671]: print_info: max token length = 256 Aug 23 10:18:39 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 23 10:18:39 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device Aug 23 10:18:39 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model Aug 23 10:18:39 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 Aug 23 10:18:39 tensor ollama[5671]: goroutine 54 [running]: Aug 23 10:18:39 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc00047c500, {0x1d, 0x0, 0x1, {0xc0001cd228, 0x1, 0x1}, 0xc000042ab0, 0x0}, {0x7ffc51cf7d54, ...}, ...) Aug 23 10:18:39 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f Aug 23 10:18:39 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51 Aug 23 10:18:39 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.258+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" Aug 23 10:18:39 tensor ollama[5671]: time=2025-08-23T10:18:39.309+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model" Aug 23 10:18:39 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:39 | 500 | 778.796128ms | 127.0.0.1 | POST "/api/generate" Aug 23 10:18:49 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:49 | 200 | 23.243µs | 127.0.0.1 | HEAD "/" Aug 23 10:18:49 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:49 | 404 | 7.871792ms | 127.0.0.1 | POST "/api/show" Aug 23 10:18:49 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:49 | 200 | 471.111409ms | 127.0.0.1 | POST "/api/pull" Aug 23 10:18:59 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:59 | 200 | 23.642µs | 127.0.0.1 | HEAD "/" Aug 23 10:18:59 tensor ollama[5671]: [GIN] 2025/08/23 - 10:18:59 | 200 | 124.849451ms | 127.0.0.1 | POST "/api/show" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.603+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 39627" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.612+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.612+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:39627" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.648+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.5 GiB" free_swap="8.0 GiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.650+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1 Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.651+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.651+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.707+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37 Aug 23 10:18:59 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 10:18:59 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 10:18:59 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 23 10:18:59 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 10:18:59 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 10:18:59 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.747+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1 Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.980+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 10:18:59 tensor ollama[5671]: time=2025-08-23T10:18:59.981+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 10:19:22 tensor ollama[5671]: time=2025-08-23T10:19:22.049+02:00 level=INFO source=server.go:1272 msg="llama runner started in 22.45 seconds" Aug 23 10:19:22 tensor ollama[5671]: [GIN] 2025/08/23 - 10:19:22 | 200 | 22.846337214s | 127.0.0.1 | POST "/api/generate" Aug 23 10:20:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:13 | 200 | 23.01µs | 127.0.0.1 | HEAD "/" Aug 23 10:20:13 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:13 | 200 | 21.898271ms | 127.0.0.1 | GET "/api/tags" Aug 23 10:20:19 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:19 | 200 | 21.75µs | 127.0.0.1 | HEAD "/" Aug 23 10:20:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:20 | 200 | 109.8799ms | 127.0.0.1 | POST "/api/show" Aug 23 10:20:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:20 | 200 | 145.193826ms | 127.0.0.1 | POST "/api/generate" Aug 23 10:20:28 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:28 | 200 | 24.753µs | 127.0.0.1 | HEAD "/" Aug 23 10:20:28 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:28 | 200 | 60.812606ms | 127.0.0.1 | POST "/api/show" Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.405+02:00 level=INFO source=sched.go:540 msg="updated VRAM based on existing loaded models" gpu=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda total="23.7 GiB" available="4.4 GiB" Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 7B Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = apache-2.0 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 14: qwen2.block_count u32 = 28 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 22: general.file_type u32 = 15 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - type f32: 141 tensors Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - type q4_K: 169 tensors Aug 23 10:20:28 tensor ollama[5671]: llama_model_loader: - type q6_K: 29 tensors Aug 23 10:20:28 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 23 10:20:28 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 23 10:20:28 tensor ollama[5671]: print_info: file size = 4.36 GiB (4.91 BPW) Aug 23 10:20:28 tensor ollama[5671]: load: printing all EOG tokens: Aug 23 10:20:28 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 23 10:20:28 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 23 10:20:28 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 23 10:20:28 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 23 10:20:28 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 23 10:20:28 tensor ollama[5671]: load: special tokens cache size = 22 Aug 23 10:20:28 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 23 10:20:28 tensor ollama[5671]: print_info: arch = qwen2 Aug 23 10:20:28 tensor ollama[5671]: print_info: vocab_only = 1 Aug 23 10:20:28 tensor ollama[5671]: print_info: model type = ?B Aug 23 10:20:28 tensor ollama[5671]: print_info: model params = 7.62 B Aug 23 10:20:28 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 7B Instruct Aug 23 10:20:28 tensor ollama[5671]: print_info: vocab type = BPE Aug 23 10:20:28 tensor ollama[5671]: print_info: n_vocab = 152064 Aug 23 10:20:28 tensor ollama[5671]: print_info: n_merges = 151387 Aug 23 10:20:28 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 23 10:20:28 tensor ollama[5671]: print_info: max token length = 256 Aug 23 10:20:28 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.663+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 33743" Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.674+02:00 level=INFO source=runner.go:864 msg="starting go runner" Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.711+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="58.0 GiB" free_swap="8.0 GiB" Aug 23 10:20:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 10:20:28 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 10:20:28 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 23 10:20:28 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 10:20:28 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 10:20:28 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.735+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 10:20:28 tensor ollama[5671]: time=2025-08-23T10:20:28.736+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:33743" Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.380+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.3 GiB" free_swap="8.0 GiB" Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.380+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=cuda parallel=1 required="5.2 GiB" gpus=1 Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.381+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split=[29] memory.available="[23.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.2 GiB" memory.required.partial="5.2 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[5.2 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="730.4 MiB" Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.382+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:29[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" Aug 23 10:20:29 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23734 MiB free Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.415+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.416+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 7B Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = apache-2.0 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 14: qwen2.block_count u32 = 28 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 22: general.file_type u32 = 15 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - type f32: 141 tensors Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - type q4_K: 169 tensors Aug 23 10:20:29 tensor ollama[5671]: llama_model_loader: - type q6_K: 29 tensors Aug 23 10:20:29 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 23 10:20:29 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 23 10:20:29 tensor ollama[5671]: print_info: file size = 4.36 GiB (4.91 BPW) Aug 23 10:20:29 tensor ollama[5671]: load: printing all EOG tokens: Aug 23 10:20:29 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 23 10:20:29 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 23 10:20:29 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 23 10:20:29 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 23 10:20:29 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 23 10:20:29 tensor ollama[5671]: load: special tokens cache size = 22 Aug 23 10:20:29 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 23 10:20:29 tensor ollama[5671]: print_info: arch = qwen2 Aug 23 10:20:29 tensor ollama[5671]: print_info: vocab_only = 0 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_ctx_train = 32768 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd = 3584 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_layer = 28 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_head = 28 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_head_kv = 4 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_rot = 128 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_swa = 0 Aug 23 10:20:29 tensor ollama[5671]: print_info: is_swa_any = 0 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_head_k = 128 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_head_v = 128 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_gqa = 7 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_k_gqa = 512 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_embd_v_gqa = 512 Aug 23 10:20:29 tensor ollama[5671]: print_info: f_norm_eps = 0.0e+00 Aug 23 10:20:29 tensor ollama[5671]: print_info: f_norm_rms_eps = 1.0e-06 Aug 23 10:20:29 tensor ollama[5671]: print_info: f_clamp_kqv = 0.0e+00 Aug 23 10:20:29 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00 Aug 23 10:20:29 tensor ollama[5671]: print_info: f_logit_scale = 0.0e+00 Aug 23 10:20:29 tensor ollama[5671]: print_info: f_attn_scale = 0.0e+00 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_ff = 18944 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_expert = 0 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_expert_used = 0 Aug 23 10:20:29 tensor ollama[5671]: print_info: causal attn = 1 Aug 23 10:20:29 tensor ollama[5671]: print_info: pooling type = -1 Aug 23 10:20:29 tensor ollama[5671]: print_info: rope type = 2 Aug 23 10:20:29 tensor ollama[5671]: print_info: rope scaling = linear Aug 23 10:20:29 tensor ollama[5671]: print_info: freq_base_train = 1000000.0 Aug 23 10:20:29 tensor ollama[5671]: print_info: freq_scale_train = 1 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_ctx_orig_yarn = 32768 Aug 23 10:20:29 tensor ollama[5671]: print_info: rope_finetuned = unknown Aug 23 10:20:29 tensor ollama[5671]: print_info: model type = 7B Aug 23 10:20:29 tensor ollama[5671]: print_info: model params = 7.62 B Aug 23 10:20:29 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 7B Instruct Aug 23 10:20:29 tensor ollama[5671]: print_info: vocab type = BPE Aug 23 10:20:29 tensor ollama[5671]: print_info: n_vocab = 152064 Aug 23 10:20:29 tensor ollama[5671]: print_info: n_merges = 151387 Aug 23 10:20:29 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 23 10:20:29 tensor ollama[5671]: print_info: max token length = 256 Aug 23 10:20:29 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 23 10:20:29 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device Aug 23 10:20:29 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model Aug 23 10:20:29 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 Aug 23 10:20:29 tensor ollama[5671]: goroutine 8 [running]: Aug 23 10:20:29 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0004bc280, {0x1d, 0x0, 0x1, {0xc00070fd38, 0x1, 0x1}, 0xc00059b8b0, 0x0}, {0x7fff97038d54, ...}, ...) Aug 23 10:20:29 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f Aug 23 10:20:29 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 24 Aug 23 10:20:29 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.614+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" Aug 23 10:20:29 tensor ollama[5671]: time=2025-08-23T10:20:29.667+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model" Aug 23 10:20:29 tensor ollama[5671]: [GIN] 2025/08/23 - 10:20:29 | 500 | 1.434612558s | 127.0.0.1 | POST "/api/generate" Aug 23 10:27:03 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:03 | 200 | 23.940335ms | 127.0.0.1 | GET "/api/tags" Aug 23 10:27:03 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:03 | 200 | 125.544µs | 127.0.0.1 | GET "/api/ps" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.463+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 41273" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.473+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.473+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:41273" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.505+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.3 GiB" free_swap="8.0 GiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.507+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1 Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.508+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[22.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.508+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.563+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37 Aug 23 10:27:11 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 10:27:11 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 10:27:11 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 23 10:27:11 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 10:27:11 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 10:27:11 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.605+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1 Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 10:27:11 tensor ollama[5671]: time=2025-08-23T10:27:11.831+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 10:27:20 tensor ollama[5671]: time=2025-08-23T10:27:20.608+02:00 level=WARN source=server.go:1241 msg="client connection closed before server finished loading, aborting load" Aug 23 10:27:20 tensor ollama[5671]: time=2025-08-23T10:27:20.608+02:00 level=ERROR source=sched.go:479 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled" Aug 23 10:27:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:20 | 499 | 9.550694558s | 127.0.0.1 | POST "/api/chat" Aug 23 10:27:25 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:25 | 200 | 18.753421ms | 127.0.0.1 | GET "/api/tags" Aug 23 10:27:25 tensor ollama[5671]: [GIN] 2025/08/23 - 10:27:25 | 200 | 30.961µs | 127.0.0.1 | GET "/api/ps" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.527+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --port 34953" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.536+02:00 level=INFO source=runner.go:1006 msg="starting ollama engine" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.536+02:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:34953" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.575+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="59.3 GiB" free_swap="8.0 GiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.576+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 library=cuda parallel=1 required="19.3 GiB" gpus=1 Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.577+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=[63] memory.available="[22.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.3 GiB" memory.required.partial="19.3 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.3 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.578+02:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:63[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:63(0..62)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.633+02:00 level=INFO source=ggml.go:130 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=37 Aug 23 10:27:32 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 23 10:27:32 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 23 10:27:32 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 23 10:27:32 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 23 10:27:32 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 23 10:27:32 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.672+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=ggml.go:486 msg="offloading 62 repeating layers to GPU" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=ggml.go:497 msg="offloaded 63/63 layers to GPU" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="16.2 GiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="944.0 MiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="1.1 GiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.5 MiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.3 GiB" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=sched.go:473 msg="loaded runners" count=1 Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.893+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 23 10:27:32 tensor ollama[5671]: time=2025-08-23T10:27:32.897+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 23 10:27:55 tensor ollama[5671]: time=2025-08-23T10:27:55.215+02:00 level=INFO source=server.go:1272 msg="llama runner started in 22.69 seconds" Aug 23 10:28:19 tensor ollama[5671]: [GIN] 2025/08/23 - 10:28:19 | 200 | 47.448851492s | 127.0.0.1 | POST "/api/chat" Aug 23 10:28:20 tensor ollama[5671]: [GIN] 2025/08/23 - 10:28:20 | 200 | 892.609664ms | 127.0.0.1 | POST "/api/chat" Aug 23 10:28:23 tensor ollama[5671]: [GIN] 2025/08/23 - 10:28:23 | 200 | 2.51022638s | 127.0.0.1 | POST "/api/chat" Aug 23 13:04:23 tensor ollama[5671]: [GIN] 2025/08/23 - 13:04:23 | 200 | 14.92006ms | 127.0.0.1 | GET "/api/tags" Aug 23 13:04:23 tensor ollama[5671]: [GIN] 2025/08/23 - 13:04:23 | 200 | 39.308µs | 127.0.0.1 | GET "/api/ps" Aug 23 13:04:24 tensor ollama[5671]: [GIN] 2025/08/23 - 13:04:24 | 200 | 32.957µs | 127.0.0.1 | GET "/api/version" Aug 23 13:06:48 tensor ollama[5671]: [GIN] 2025/08/23 - 13:06:48 | 200 | 18.866063628s | 127.0.0.1 | POST "/api/chat" Aug 23 13:06:48 tensor ollama[5671]: [GIN] 2025/08/23 - 13:06:48 | 200 | 532.297205ms | 127.0.0.1 | POST "/api/chat" Aug 23 13:06:50 tensor ollama[5671]: [GIN] 2025/08/23 - 13:06:50 | 200 | 1.714018055s | 127.0.0.1 | POST "/api/chat" Aug 23 13:09:55 tensor ollama[5671]: [GIN] 2025/08/23 - 13:09:55 | 200 | 10.199521ms | 127.0.0.1 | GET "/api/tags" Aug 23 13:09:55 tensor ollama[5671]: [GIN] 2025/08/23 - 13:09:55 | 200 | 30.906µs | 127.0.0.1 | GET "/api/ps" Aug 24 09:29:45 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:45 | 200 | 16.127291ms | 127.0.0.1 | GET "/api/tags" Aug 24 09:29:45 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:45 | 200 | 57.46µs | 127.0.0.1 | GET "/api/ps" Aug 24 09:29:46 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:46 | 200 | 103.244µs | 127.0.0.1 | GET "/api/version" Aug 24 09:29:48 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:48 | 200 | 312.37447ms | 127.0.0.1 | POST "/api/chat" Aug 24 09:29:48 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:48 | 200 | 313.113207ms | 127.0.0.1 | POST "/api/chat" Aug 24 09:29:49 tensor ollama[5671]: [GIN] 2025/08/24 - 09:29:49 | 200 | 900.474615ms | 127.0.0.1 | POST "/api/chat" Aug 24 09:34:11 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:11 | 200 | 15.258462ms | 127.0.0.1 | GET "/api/tags" Aug 24 09:34:11 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:11 | 200 | 29.577µs | 127.0.0.1 | GET "/api/ps" Aug 24 09:34:12 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:12 | 200 | 35.595µs | 127.0.0.1 | GET "/api/version" Aug 24 09:34:27 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:27 | 200 | 11.542712ms | 127.0.0.1 | GET "/api/tags" Aug 24 09:34:27 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:27 | 200 | 30.414µs | 127.0.0.1 | GET "/api/ps" Aug 24 09:34:28 tensor ollama[5671]: [GIN] 2025/08/24 - 09:34:28 | 200 | 40.637µs | 127.0.0.1 | GET "/api/version" Aug 24 09:35:33 tensor ollama[5671]: [GIN] 2025/08/24 - 09:35:33 | 200 | 29.911µs | 127.0.0.1 | HEAD "/" Aug 24 09:35:33 tensor ollama[5671]: [GIN] 2025/08/24 - 09:35:33 | 404 | 7.397409ms | 127.0.0.1 | POST "/api/show" Aug 24 09:35:34 tensor ollama[5671]: time=2025-08-24T09:35:34.315+02:00 level=INFO source=download.go:177 msg="downloading 4a188102020e in 16 120 MB part(s)" Aug 24 09:36:02 tensor ollama[5671]: time=2025-08-24T09:36:02.676+02:00 level=INFO source=download.go:177 msg="downloading 45fc3ea7579a in 1 7.4 KB part(s)" Aug 24 09:36:04 tensor ollama[5671]: time=2025-08-24T09:36:04.045+02:00 level=INFO source=download.go:177 msg="downloading bb967eff3bda in 1 487 B part(s)" Aug 24 09:36:07 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:07 | 200 | 34.110004115s | 127.0.0.1 | POST "/api/pull" Aug 24 09:36:07 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:07 | 200 | 53.954475ms | 127.0.0.1 | POST "/api/show" Aug 24 09:36:07 tensor ollama[5671]: time=2025-08-24T09:36:07.791+02:00 level=INFO source=sched.go:540 msg="updated VRAM based on existing loaded models" gpu=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda total="23.7 GiB" available="4.4 GiB" Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest)) Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 3B Instruct Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 3B Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = other Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.name str = qwen-research Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.count u32 = 1 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.name str = Qwen2.5 Coder 3B Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.organization str = Qwen Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 13: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"] Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.block_count u32 = 36 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.context_length u32 = 32768 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.embedding_length u32 = 2048 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.feed_forward_length u32 = 11008 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count u32 = 16 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.attention.head_count_kv u32 = 2 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.rope.freq_base f32 = 1000000.000000 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 22: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 23: general.file_type u32 = 15 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.pre str = qwen2 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 151645 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 151643 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 151643 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - kv 34: general.quantization_version u32 = 2 Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - type f32: 181 tensors Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - type q4_K: 216 tensors Aug 24 09:36:07 tensor ollama[5671]: llama_model_loader: - type q6_K: 37 tensors Aug 24 09:36:07 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 24 09:36:07 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 24 09:36:07 tensor ollama[5671]: print_info: file size = 1.79 GiB (4.99 BPW) Aug 24 09:36:07 tensor ollama[5671]: load: printing all EOG tokens: Aug 24 09:36:07 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 24 09:36:07 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 24 09:36:07 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 24 09:36:07 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 24 09:36:07 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 24 09:36:07 tensor ollama[5671]: load: special tokens cache size = 22 Aug 24 09:36:07 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 24 09:36:07 tensor ollama[5671]: print_info: arch = qwen2 Aug 24 09:36:07 tensor ollama[5671]: print_info: vocab_only = 1 Aug 24 09:36:07 tensor ollama[5671]: print_info: model type = ?B Aug 24 09:36:07 tensor ollama[5671]: print_info: model params = 3.09 B Aug 24 09:36:07 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 3B Instruct Aug 24 09:36:07 tensor ollama[5671]: print_info: vocab type = BPE Aug 24 09:36:07 tensor ollama[5671]: print_info: n_vocab = 151936 Aug 24 09:36:07 tensor ollama[5671]: print_info: n_merges = 151387 Aug 24 09:36:07 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 24 09:36:07 tensor ollama[5671]: print_info: max token length = 256 Aug 24 09:36:07 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.043+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba --port 35345" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.052+02:00 level=INFO source=runner.go:864 msg="starting go runner" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.091+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="56.9 GiB" free_swap="8.0 GiB" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.092+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba library=cuda parallel=1 required="2.7 GiB" gpus=1 Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.092+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[4.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.7 GiB" memory.required.partial="2.7 GiB" memory.required.kv="144.0 MiB" memory.required.allocations="[2.7 GiB]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="243.4 MiB" memory.graph.full="300.8 MiB" memory.graph.partial="544.2 MiB" Aug 24 09:36:08 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 24 09:36:08 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 24 09:36:08 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 24 09:36:08 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 24 09:36:08 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 24 09:36:08 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.109+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.109+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:35345" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.114+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" Aug 24 09:36:08 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 4715 MiB free Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.146+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.147+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest)) Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 3B Instruct Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 3B Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = other Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.name str = qwen-research Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.count u32 = 1 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.name str = Qwen2.5 Coder 3B Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.organization str = Qwen Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 13: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"] Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.block_count u32 = 36 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.context_length u32 = 32768 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.embedding_length u32 = 2048 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.feed_forward_length u32 = 11008 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count u32 = 16 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.attention.head_count_kv u32 = 2 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.rope.freq_base f32 = 1000000.000000 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 22: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 23: general.file_type u32 = 15 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.pre str = qwen2 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 151645 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 151643 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 151643 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - kv 34: general.quantization_version u32 = 2 Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - type f32: 181 tensors Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - type q4_K: 216 tensors Aug 24 09:36:08 tensor ollama[5671]: llama_model_loader: - type q6_K: 37 tensors Aug 24 09:36:08 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 24 09:36:08 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 24 09:36:08 tensor ollama[5671]: print_info: file size = 1.79 GiB (4.99 BPW) Aug 24 09:36:08 tensor ollama[5671]: load: printing all EOG tokens: Aug 24 09:36:08 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 24 09:36:08 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 24 09:36:08 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 24 09:36:08 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 24 09:36:08 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 24 09:36:08 tensor ollama[5671]: load: special tokens cache size = 22 Aug 24 09:36:08 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 24 09:36:08 tensor ollama[5671]: print_info: arch = qwen2 Aug 24 09:36:08 tensor ollama[5671]: print_info: vocab_only = 0 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_ctx_train = 32768 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd = 2048 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_layer = 36 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_head = 16 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_head_kv = 2 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_rot = 128 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_swa = 0 Aug 24 09:36:08 tensor ollama[5671]: print_info: is_swa_any = 0 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_head_k = 128 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_head_v = 128 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_gqa = 8 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_k_gqa = 256 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_embd_v_gqa = 256 Aug 24 09:36:08 tensor ollama[5671]: print_info: f_norm_eps = 0.0e+00 Aug 24 09:36:08 tensor ollama[5671]: print_info: f_norm_rms_eps = 1.0e-06 Aug 24 09:36:08 tensor ollama[5671]: print_info: f_clamp_kqv = 0.0e+00 Aug 24 09:36:08 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00 Aug 24 09:36:08 tensor ollama[5671]: print_info: f_logit_scale = 0.0e+00 Aug 24 09:36:08 tensor ollama[5671]: print_info: f_attn_scale = 0.0e+00 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_ff = 11008 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_expert = 0 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_expert_used = 0 Aug 24 09:36:08 tensor ollama[5671]: print_info: causal attn = 1 Aug 24 09:36:08 tensor ollama[5671]: print_info: pooling type = -1 Aug 24 09:36:08 tensor ollama[5671]: print_info: rope type = 2 Aug 24 09:36:08 tensor ollama[5671]: print_info: rope scaling = linear Aug 24 09:36:08 tensor ollama[5671]: print_info: freq_base_train = 1000000.0 Aug 24 09:36:08 tensor ollama[5671]: print_info: freq_scale_train = 1 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_ctx_orig_yarn = 32768 Aug 24 09:36:08 tensor ollama[5671]: print_info: rope_finetuned = unknown Aug 24 09:36:08 tensor ollama[5671]: print_info: model type = 3B Aug 24 09:36:08 tensor ollama[5671]: print_info: model params = 3.09 B Aug 24 09:36:08 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 3B Instruct Aug 24 09:36:08 tensor ollama[5671]: print_info: vocab type = BPE Aug 24 09:36:08 tensor ollama[5671]: print_info: n_vocab = 151936 Aug 24 09:36:08 tensor ollama[5671]: print_info: n_merges = 151387 Aug 24 09:36:08 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 24 09:36:08 tensor ollama[5671]: print_info: max token length = 256 Aug 24 09:36:08 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 24 09:36:08 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device Aug 24 09:36:08 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model Aug 24 09:36:08 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba Aug 24 09:36:08 tensor ollama[5671]: goroutine 54 [running]: Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0002f6500, {0x25, 0x0, 0x1, {0xc0001cd208, 0x1, 0x1}, 0xc000502cd0, 0x0}, {0x7ffe0e254d54, ...}, ...) Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f Aug 24 09:36:08 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51 Aug 24 09:36:08 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.357+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" Aug 24 09:36:08 tensor ollama[5671]: time=2025-08-24T09:36:08.397+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model" Aug 24 09:36:08 tensor ollama[5671]: [GIN] 2025/08/24 - 09:36:08 | 500 | 769.338871ms | 127.0.0.1 | POST "/api/generate" Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 | 19.020204ms | 127.0.0.1 | GET "/api/tags" Aug 24 09:37:20 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:20 | 200 | 27.749µs | 127.0.0.1 | GET "/api/ps" Aug 24 09:37:21 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:21 | 200 | 56.493µs | 127.0.0.1 | GET "/api/version" Aug 24 09:37:42 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:42 | 200 | 4.36063473s | 127.0.0.1 | POST "/api/chat" Aug 24 09:37:43 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:43 | 200 | 448.384271ms | 127.0.0.1 | POST "/api/chat" Aug 24 09:37:44 tensor ollama[5671]: [GIN] 2025/08/24 - 09:37:44 | 200 | 1.045604379s | 127.0.0.1 | POST "/api/chat" Aug 24 09:38:30 tensor ollama[5671]: [GIN] 2025/08/24 - 09:38:30 | 200 | 3.194981361s | 127.0.0.1 | POST "/api/chat" Aug 24 09:41:19 tensor ollama[5671]: [GIN] 2025/08/24 - 09:41:19 | 200 | 3.217782328s | 127.0.0.1 | POST "/api/chat" Aug 24 09:42:56 tensor ollama[5671]: [GIN] 2025/08/24 - 09:42:56 | 200 | 41.686µs | 127.0.0.1 | GET "/api/version" Aug 24 13:23:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:14 | 200 | 12.156087ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:23:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:14 | 200 | 34.087µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:23:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:14 | 200 | 34.367µs | 127.0.0.1 | GET "/api/version" Aug 24 13:23:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:20 | 200 | 1.141412557s | 127.0.0.1 | POST "/api/chat" Aug 24 13:23:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:20 | 200 | 308.162597ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:23:21 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:21 | 200 | 940.241778ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:23:46 tensor ollama[5671]: [GIN] 2025/08/24 - 13:23:46 | 200 | 1.552475356s | 127.0.0.1 | POST "/api/chat" Aug 24 13:24:17 tensor ollama[5671]: [GIN] 2025/08/24 - 13:24:17 | 200 | 11.202262ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:24:17 tensor ollama[5671]: [GIN] 2025/08/24 - 13:24:17 | 200 | 72.504µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:26:04 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:04 | 200 | 12.27506ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:26:04 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:04 | 200 | 49.924µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:26:05 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:05 | 200 | 19.065232ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:26:05 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:05 | 200 | 30.361µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:26:05 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:05 | 200 | 36.62µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:07 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:07 | 200 | 33.405µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:11 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:11 | 200 | 35.863µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:13 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:13 | 200 | 1.105016887s | 127.0.0.1 | POST "/api/chat" Aug 24 13:26:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:14 | 200 | 329.014848ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:26:14 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:14 | 200 | 941.665313ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:26:19 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:19 | 200 | 35.37µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:20 | 200 | 33.177µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:24 | 200 | 29.952µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:26 | 200 | 11.000159ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:26:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:26 | 200 | 29.142µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:26:36 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:36 | 200 | 12.59344ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:26:36 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:36 | 200 | 30.42µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:26:39 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:39 | 200 | 31.166µs | 127.0.0.1 | GET "/api/version" Aug 24 13:26:42 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:42 | 200 | 1.050458592s | 127.0.0.1 | POST "/api/chat" Aug 24 13:26:43 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:43 | 200 | 338.196284ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:26:44 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:44 | 200 | 864.298854ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:26:52 tensor ollama[5671]: [GIN] 2025/08/24 - 13:26:52 | 200 | 843.821677ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:27:15 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:15 | 200 | 1.36406631s | 127.0.0.1 | POST "/api/chat" Aug 24 13:27:27 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:27 | 200 | 1.506646124s | 127.0.0.1 | POST "/api/chat" Aug 24 13:27:49 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:49 | 200 | 11.89996ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:27:49 tensor ollama[5671]: [GIN] 2025/08/24 - 13:27:49 | 200 | 33.547µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:28:00 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:00 | 200 | 39.705µs | 127.0.0.1 | GET "/api/version" Aug 24 13:28:10 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:10 | 200 | 1.514598358s | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:11 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:11 | 200 | 481.75731ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:12 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:12 | 200 | 1.219557438s | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:15 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:15 | 200 | 19.193942ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:28:15 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:15 | 200 | 87.819µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:28:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:24 | 200 | 8.827246ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:28:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:24 | 200 | 21.359µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:28:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:25 | 200 | 11.170147ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:28:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:25 | 200 | 31.662µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:28:27 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:27 | 200 | 35.987µs | 127.0.0.1 | GET "/api/version" Aug 24 13:28:28 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:28 | 200 | 36.86µs | 127.0.0.1 | GET "/api/version" Aug 24 13:28:32 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:32 | 200 | 39.54µs | 127.0.0.1 | GET "/api/version" Aug 24 13:28:34 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:34 | 200 | 33.855µs | 127.0.0.1 | GET "/api/version" Aug 24 13:28:37 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:37 | 200 | 31.254µs | 127.0.0.1 | GET "/api/version" Aug 24 13:28:41 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:41 | 200 | 2.071607151s | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:41 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:41 | 200 | 415.928147ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:42 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:42 | 200 | 1.051054961s | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:57 | 200 | 2.413244904s | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:57 | 200 | 338.352268ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:58 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:58 | 200 | 879.287437ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:28:59 tensor ollama[5671]: [GIN] 2025/08/24 - 13:28:59 | 200 | 36.756µs | 127.0.0.1 | GET "/api/version" Aug 24 13:29:04 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:04 | 200 | 34.321µs | 127.0.0.1 | GET "/api/version" Aug 24 13:29:08 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:08 | 200 | 12.256472ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:29:08 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:08 | 200 | 33.353µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:29:13 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:13 | 200 | 31.844µs | 127.0.0.1 | GET "/api/version" Aug 24 13:29:43 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:43 | 200 | 1.528696642s | 127.0.0.1 | POST "/api/chat" Aug 24 13:29:53 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:53 | 200 | 1.132260114s | 127.0.0.1 | POST "/api/chat" Aug 24 13:29:55 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:55 | 200 | 2.349533427s | 127.0.0.1 | POST "/api/chat" Aug 24 13:29:56 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:56 | 200 | 519.970128ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:29:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:29:57 | 200 | 805.813547ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:31:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:20 | 200 | 12.815312ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:31:20 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:20 | 200 | 38.961µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:31:53 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:53 | 200 | 13.795763ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:31:53 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:53 | 200 | 37.274µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:31:54 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:54 | 200 | 14.327849ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:31:54 tensor ollama[5671]: [GIN] 2025/08/24 - 13:31:54 | 200 | 25.079µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:32:39 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:39 | 200 | 12.446897ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:32:39 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:39 | 200 | 29.867µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:32:40 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:40 | 200 | 11.573829ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:32:40 tensor ollama[5671]: [GIN] 2025/08/24 - 13:32:40 | 200 | 28.836µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:33:58 tensor ollama[5671]: [GIN] 2025/08/24 - 13:33:58 | 200 | 34.315µs | 127.0.0.1 | GET "/api/version" Aug 24 13:34:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:34:57 | 200 | 2.433509594s | 127.0.0.1 | POST "/api/chat" Aug 24 13:34:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:34:57 | 200 | 581.535511ms | 127.0.0.1 | POST "/api/chat" Aug 24 13:34:58 tensor ollama[5671]: [GIN] 2025/08/24 - 13:34:58 | 200 | 1.166023038s | 127.0.0.1 | POST "/api/chat" Aug 24 13:35:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:26 | 200 | 12.633019ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:35:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:26 | 200 | 22.481µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:35:37 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:37 | 200 | 10.342028ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:35:37 tensor ollama[5671]: [GIN] 2025/08/24 - 13:35:37 | 200 | 34.32µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:38:35 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:35 | 200 | 15.966833ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:38:35 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:35 | 200 | 37.939µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:38:38 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:38 | 200 | 16.321552ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:38:38 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:38 | 200 | 57.795µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:38:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:57 | 200 | 11.743555ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:38:57 tensor ollama[5671]: [GIN] 2025/08/24 - 13:38:57 | 200 | 31.249µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:39:03 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:03 | 200 | 34.231µs | 127.0.0.1 | GET "/api/version" Aug 24 13:39:22 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:22 | 200 | 13.511931ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:39:22 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:22 | 200 | 32.03µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:39:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:26 | 200 | 34.621µs | 127.0.0.1 | GET "/api/version" Aug 24 13:39:45 tensor ollama[5671]: [GIN] 2025/08/24 - 13:39:45 | 200 | 3.187895355s | 127.0.0.1 | POST "/api/chat" Aug 24 13:40:07 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:07 | 200 | 7.228029ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:40:07 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:07 | 200 | 23.113µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:40:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:26 | 200 | 11.993859ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:40:26 tensor ollama[5671]: [GIN] 2025/08/24 - 13:40:26 | 200 | 38.602µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:42:16 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:16 | 200 | 11.183632ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:42:16 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:16 | 200 | 32.363µs | 127.0.0.1 | GET "/api/ps" Aug 24 13:42:24 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:24 | 200 | 37.845µs | 127.0.0.1 | GET "/api/version" Aug 24 13:42:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:25 | 200 | 9.6161ms | 127.0.0.1 | GET "/api/tags" Aug 24 13:42:25 tensor ollama[5671]: [GIN] 2025/08/24 - 13:42:25 | 200 | 42.35µs | 127.0.0.1 | GET "/api/ps" Aug 24 14:04:31 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:31 | 200 | 11.985424ms | 127.0.0.1 | GET "/api/tags" Aug 24 14:04:31 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:31 | 200 | 60.806µs | 127.0.0.1 | GET "/api/ps" Aug 24 14:04:34 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:34 | 200 | 46.013µs | 127.0.0.1 | GET "/api/version" Aug 24 14:04:40 tensor ollama[5671]: [GIN] 2025/08/24 - 14:04:40 | 200 | 35.65µs | 127.0.0.1 | GET "/api/version" Aug 24 14:07:50 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:50 | 200 | 25.432778ms | 127.0.0.1 | GET "/api/tags" Aug 24 14:07:50 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:50 | 200 | 76.826µs | 127.0.0.1 | GET "/api/ps" Aug 24 14:07:50 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:50 | 200 | 51.468µs | 127.0.0.1 | GET "/api/version" Aug 24 14:07:52 tensor ollama[5671]: [GIN] 2025/08/24 - 14:07:52 | 200 | 49.281µs | 127.0.0.1 | GET "/api/version" Aug 24 14:08:39 tensor ollama[5671]: [GIN] 2025/08/24 - 14:08:39 | 200 | 4.387403229s | 127.0.0.1 | POST "/api/chat" Aug 24 14:09:08 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:08 | 200 | 40.839µs | 127.0.0.1 | HEAD "/" Aug 24 14:09:08 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:08 | 404 | 21.351646ms | 127.0.0.1 | POST "/api/show" Aug 24 14:09:09 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:09 | 200 | 504.262345ms | 127.0.0.1 | POST "/api/pull" Aug 24 14:09:22 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:22 | 200 | 34.843µs | 127.0.0.1 | HEAD "/" Aug 24 14:09:22 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:22 | 404 | 19.725564ms | 127.0.0.1 | POST "/api/show" Aug 24 14:09:22 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:22 | 200 | 429.8143ms | 127.0.0.1 | POST "/api/pull" Aug 24 14:09:30 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:30 | 200 | 30.258µs | 127.0.0.1 | HEAD "/" Aug 24 14:09:30 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:30 | 200 | 95.776399ms | 127.0.0.1 | POST "/api/show" Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.370+02:00 level=INFO source=sched.go:540 msg="updated VRAM based on existing loaded models" gpu=GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c library=cuda total="23.7 GiB" available="4.4 GiB" Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest)) Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 3B Instruct Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 3B Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = other Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.name str = qwen-research Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.count u32 = 1 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.name str = Qwen2.5 Coder 3B Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.organization str = Qwen Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 13: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"] Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.block_count u32 = 36 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.context_length u32 = 32768 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.embedding_length u32 = 2048 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.feed_forward_length u32 = 11008 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count u32 = 16 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.attention.head_count_kv u32 = 2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.rope.freq_base f32 = 1000000.000000 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 22: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 23: general.file_type u32 = 15 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.pre str = qwen2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 151645 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 151643 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 151643 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 34: general.quantization_version u32 = 2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - type f32: 181 tensors Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - type q4_K: 216 tensors Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - type q6_K: 37 tensors Aug 24 14:09:30 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 24 14:09:30 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 24 14:09:30 tensor ollama[5671]: print_info: file size = 1.79 GiB (4.99 BPW) Aug 24 14:09:30 tensor ollama[5671]: load: printing all EOG tokens: Aug 24 14:09:30 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 24 14:09:30 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 24 14:09:30 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 24 14:09:30 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 24 14:09:30 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 24 14:09:30 tensor ollama[5671]: load: special tokens cache size = 22 Aug 24 14:09:30 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 24 14:09:30 tensor ollama[5671]: print_info: arch = qwen2 Aug 24 14:09:30 tensor ollama[5671]: print_info: vocab_only = 1 Aug 24 14:09:30 tensor ollama[5671]: print_info: model type = ?B Aug 24 14:09:30 tensor ollama[5671]: print_info: model params = 3.09 B Aug 24 14:09:30 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 3B Instruct Aug 24 14:09:30 tensor ollama[5671]: print_info: vocab type = BPE Aug 24 14:09:30 tensor ollama[5671]: print_info: n_vocab = 151936 Aug 24 14:09:30 tensor ollama[5671]: print_info: n_merges = 151387 Aug 24 14:09:30 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 24 14:09:30 tensor ollama[5671]: print_info: max token length = 256 Aug 24 14:09:30 tensor ollama[5671]: llama_model_load: vocab only - skipping tensors Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.718+02:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba --port 35749" Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.739+02:00 level=INFO source=runner.go:864 msg="starting go runner" Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.796+02:00 level=INFO source=server.go:488 msg="system memory" total="62.8 GiB" free="56.6 GiB" free_swap="8.0 GiB" Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.796+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba library=cuda parallel=1 required="2.7 GiB" gpus=1 Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.797+02:00 level=INFO source=server.go:531 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[4.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.7 GiB" memory.required.partial="2.7 GiB" memory.required.kv="144.0 MiB" memory.required.allocations="[2.7 GiB]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="243.4 MiB" memory.graph.full="300.8 MiB" memory.graph.partial="544.2 MiB" Aug 24 14:09:30 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 24 14:09:30 tensor ollama[5671]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 24 14:09:30 tensor ollama[5671]: ggml_cuda_init: found 1 CUDA devices: Aug 24 14:09:30 tensor ollama[5671]: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Aug 24 14:09:30 tensor ollama[5671]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Aug 24 14:09:30 tensor ollama[5671]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.843+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.844+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:35749" Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.851+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-c56c7710-41ab-216b-6adc-e6e5a05b0d3c Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" Aug 24 14:09:30 tensor ollama[5671]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 4711 MiB free Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.896+02:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" Aug 24 14:09:30 tensor ollama[5671]: time=2025-08-24T14:09:30.896+02:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: loaded meta data with 35 key-value pairs and 434 tensors from /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba (version GGUF V3 (latest)) Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 0: general.architecture str = qwen2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 1: general.type str = model Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 3B Instruct Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 3: general.finetune str = Instruct Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 5: general.size_label str = 3B Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 6: general.license str = other Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 7: general.license.name str = qwen-research Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 9: general.base_model.count u32 = 1 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 10: general.base_model.0.name str = Qwen2.5 Coder 3B Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 11: general.base_model.0.organization str = Qwen Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 13: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"] Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 15: qwen2.block_count u32 = 36 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 16: qwen2.context_length u32 = 32768 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 17: qwen2.embedding_length u32 = 2048 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 18: qwen2.feed_forward_length u32 = 11008 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 19: qwen2.attention.head_count u32 = 16 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 20: qwen2.attention.head_count_kv u32 = 2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 21: qwen2.rope.freq_base f32 = 1000000.000000 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 22: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 23: general.file_type u32 = 15 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 25: tokenizer.ggml.pre str = qwen2 Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Aug 24 14:09:30 tensor ollama[5671]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 151645 Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 151643 Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 151643 Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - kv 34: general.quantization_version u32 = 2 Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - type f32: 181 tensors Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - type q4_K: 216 tensors Aug 24 14:09:31 tensor ollama[5671]: llama_model_loader: - type q6_K: 37 tensors Aug 24 14:09:31 tensor ollama[5671]: print_info: file format = GGUF V3 (latest) Aug 24 14:09:31 tensor ollama[5671]: print_info: file type = Q4_K - Medium Aug 24 14:09:31 tensor ollama[5671]: print_info: file size = 1.79 GiB (4.99 BPW) Aug 24 14:09:31 tensor ollama[5671]: load: printing all EOG tokens: Aug 24 14:09:31 tensor ollama[5671]: load: - 151643 ('<|endoftext|>') Aug 24 14:09:31 tensor ollama[5671]: load: - 151645 ('<|im_end|>') Aug 24 14:09:31 tensor ollama[5671]: load: - 151662 ('<|fim_pad|>') Aug 24 14:09:31 tensor ollama[5671]: load: - 151663 ('<|repo_name|>') Aug 24 14:09:31 tensor ollama[5671]: load: - 151664 ('<|file_sep|>') Aug 24 14:09:31 tensor ollama[5671]: load: special tokens cache size = 22 Aug 24 14:09:31 tensor ollama[5671]: load: token to piece cache size = 0.9310 MB Aug 24 14:09:31 tensor ollama[5671]: print_info: arch = qwen2 Aug 24 14:09:31 tensor ollama[5671]: print_info: vocab_only = 0 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_ctx_train = 32768 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd = 2048 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_layer = 36 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_head = 16 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_head_kv = 2 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_rot = 128 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_swa = 0 Aug 24 14:09:31 tensor ollama[5671]: print_info: is_swa_any = 0 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_head_k = 128 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_head_v = 128 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_gqa = 8 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_k_gqa = 256 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_embd_v_gqa = 256 Aug 24 14:09:31 tensor ollama[5671]: print_info: f_norm_eps = 0.0e+00 Aug 24 14:09:31 tensor ollama[5671]: print_info: f_norm_rms_eps = 1.0e-06 Aug 24 14:09:31 tensor ollama[5671]: print_info: f_clamp_kqv = 0.0e+00 Aug 24 14:09:31 tensor ollama[5671]: print_info: f_max_alibi_bias = 0.0e+00 Aug 24 14:09:31 tensor ollama[5671]: print_info: f_logit_scale = 0.0e+00 Aug 24 14:09:31 tensor ollama[5671]: print_info: f_attn_scale = 0.0e+00 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_ff = 11008 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_expert = 0 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_expert_used = 0 Aug 24 14:09:31 tensor ollama[5671]: print_info: causal attn = 1 Aug 24 14:09:31 tensor ollama[5671]: print_info: pooling type = -1 Aug 24 14:09:31 tensor ollama[5671]: print_info: rope type = 2 Aug 24 14:09:31 tensor ollama[5671]: print_info: rope scaling = linear Aug 24 14:09:31 tensor ollama[5671]: print_info: freq_base_train = 1000000.0 Aug 24 14:09:31 tensor ollama[5671]: print_info: freq_scale_train = 1 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_ctx_orig_yarn = 32768 Aug 24 14:09:31 tensor ollama[5671]: print_info: rope_finetuned = unknown Aug 24 14:09:31 tensor ollama[5671]: print_info: model type = 3B Aug 24 14:09:31 tensor ollama[5671]: print_info: model params = 3.09 B Aug 24 14:09:31 tensor ollama[5671]: print_info: general.name = Qwen2.5 Coder 3B Instruct Aug 24 14:09:31 tensor ollama[5671]: print_info: vocab type = BPE Aug 24 14:09:31 tensor ollama[5671]: print_info: n_vocab = 151936 Aug 24 14:09:31 tensor ollama[5671]: print_info: n_merges = 151387 Aug 24 14:09:31 tensor ollama[5671]: print_info: BOS token = 151643 '<|endoftext|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOS token = 151645 '<|im_end|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOT token = 151645 '<|im_end|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: PAD token = 151643 '<|endoftext|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: LF token = 198 'Ċ' Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM MID token = 151660 '<|fim_middle|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM REP token = 151663 '<|repo_name|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: FIM SEP token = 151664 '<|file_sep|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token = 151643 '<|endoftext|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token = 151645 '<|im_end|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token = 151662 '<|fim_pad|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token = 151663 '<|repo_name|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: EOG token = 151664 '<|file_sep|>' Aug 24 14:09:31 tensor ollama[5671]: print_info: max token length = 256 Aug 24 14:09:31 tensor ollama[5671]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 24 14:09:31 tensor ollama[5671]: llama_model_load: error loading model: mmap failed: No such device Aug 24 14:09:31 tensor ollama[5671]: llama_model_load_from_file_impl: failed to load model Aug 24 14:09:31 tensor ollama[5671]: panic: unable to load model: /home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba Aug 24 14:09:31 tensor ollama[5671]: goroutine 38 [running]: Aug 24 14:09:31 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0000f9220, {0x25, 0x0, 0x1, {0xc0003bfa28, 0x1, 0x1}, 0xc000112680, 0x0}, {0x7ffe8415fd54, ...}, ...) Aug 24 14:09:31 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:747 +0x35f Aug 24 14:09:31 tensor ollama[5671]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 6 Aug 24 14:09:31 tensor ollama[5671]: github.com/ollama/ollama/runner/llamarunner/runner.go:833 +0x7ce Aug 24 14:09:31 tensor ollama[5671]: time=2025-08-24T14:09:31.238+02:00 level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2" Aug 24 14:09:31 tensor ollama[5671]: time=2025-08-24T14:09:31.398+02:00 level=INFO source=sched.go:441 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba error="llama runner process has terminated: error loading model: mmap failed: No such device\nllama_model_load_from_file_impl: failed to load model" Aug 24 14:09:31 tensor ollama[5671]: [GIN] 2025/08/24 - 14:09:31 | 500 | 1.293548336s | 127.0.0.1 | POST "/api/generate" ```
Author
Owner

@rick-github commented on GitHub (Aug 24, 2025):

Please wrap the log in a markdown code block so it's easier to read.

<!-- gh-comment-id:3218064806 --> @rick-github commented on GitHub (Aug 24, 2025): Please wrap the log in a markdown code block so it's easier to read.
Author
Owner

@rick-github commented on GitHub (Aug 24, 2025):

What's the output of:

grep "$(df /home/ollama/.ollama/models/blobs | tail -1 | cut -d' ' -f1)" /proc/mounts
<!-- gh-comment-id:3218071548 --> @rick-github commented on GitHub (Aug 24, 2025): What's the output of: ``` grep "$(df /home/ollama/.ollama/models/blobs | tail -1 | cut -d' ' -f1)" /proc/mounts ```
Author
Owner

@LaCocoRoco commented on GitHub (Aug 24, 2025):

openai /mnt/openai virtiofs rw,relatime 0 0

This is a mounted directory from the host system.
The Ubunut System is a VM in proxmox and /mnt/openai is from the Host machine.
I had remount virtiofs a while ago. I will test some different settings. Probably i did some wrong configurations.

<!-- gh-comment-id:3218073828 --> @LaCocoRoco commented on GitHub (Aug 24, 2025): openai /mnt/openai virtiofs rw,relatime 0 0 This is a mounted directory from the host system. The Ubunut System is a VM in proxmox and /mnt/openai is from the Host machine. I had remount virtiofs a while ago. I will test some different settings. Probably i did some wrong configurations.
Author
Owner

@LaCocoRoco commented on GitHub (Aug 24, 2025):

What's the output of:

grep "$(df /home/ollama/.ollama/models/blobs | tail -1 | cut -d' ' -f1)" /proc/mounts

You are my hero. Cache of the viriofs system was disabled. Now it works
Can you explain how you got to this conclusion?

<!-- gh-comment-id:3218080846 --> @LaCocoRoco commented on GitHub (Aug 24, 2025): > What's the output of: > > ``` > grep "$(df /home/ollama/.ollama/models/blobs | tail -1 | cut -d' ' -f1)" /proc/mounts > ``` You are my hero. Cache of the viriofs system was disabled. Now it works Can you explain how you got to this conclusion?
Author
Owner

@rick-github commented on GitHub (Aug 24, 2025):

The problem is that ollama is trying to mmap the weights file and the filesystem doesn't support that operation.

You can either configure virtiofs wtih --allow-mmap, or modify the model to not use mmap:

echo FROM qwen2.5-coder:7b > Modelfile
echo PARAMETER use_mmap false >> Modelfile
ollama create qwen2.5-coder:7b-nommap

Can you explain how you got to this conclusion?

ENODEV (No such device) means that the mmap operation is not supported on the file. Since the ls and sha256sum shows that it was a file and had the right contents, the next guess that it was something to do with the filesytem. The df got the name of the filesystem, and looking it up in /proc/mounts showed it was a virtual file system, not a real one.

<!-- gh-comment-id:3218082949 --> @rick-github commented on GitHub (Aug 24, 2025): The problem is that ollama is trying to mmap the weights file and the filesystem doesn't support that operation. You can either configure virtiofs wtih [`--allow-mmap`](https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md#:~:text=Default%3A%20auto.-,%2D%2Dallow%2Dmmap,-For%20shared%20directories), or modify the model to not use mmap: ``` echo FROM qwen2.5-coder:7b > Modelfile echo PARAMETER use_mmap false >> Modelfile ollama create qwen2.5-coder:7b-nommap ``` > Can you explain how you got to this conclusion? ENODEV (No such device) means that the mmap operation is not supported on the file. Since the `ls` and `sha256sum` shows that it was a file and had the right contents, the next guess that it was something to do with the filesytem. The `df` got the name of the filesystem, and looking it up in /proc/mounts showed it was a virtual file system, not a real one.
Author
Owner

@LaCocoRoco commented on GitHub (Aug 24, 2025):

The problem is that ollama is trying to mmap the weights file and the filesystem doesn't support that operation.

You can either configure virtiofs wtih --allow-mmap, or modify the model to not use mmap:

echo FROM qwen2.5-coder:7b > Modelfile
echo PARAMETER use_mmap false >> Modelfile
ollama create qwen2.5-coder:7b-nommap

Can you explain how you got to this conclusion?

ENODEV (No such device) means that the mmap operation is not supported on the file. Since the ls and sha256sum shows that it was a file and had the right contents, the next guess that it was something to do with the filesytem. The df got the name of the filesystem, and looking it up in /proc/mounts showed it was a virtual file system, not a real one.

Ah very good explanation. Thank you very much for the help and your time!

<!-- gh-comment-id:3218086618 --> @LaCocoRoco commented on GitHub (Aug 24, 2025): > The problem is that ollama is trying to mmap the weights file and the filesystem doesn't support that operation. > > You can either configure virtiofs wtih [`--allow-mmap`](https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md#:~:text=Default%3A%20auto.-,%2D%2Dallow%2Dmmap,-For%20shared%20directories), or modify the model to not use mmap: > > ``` > echo FROM qwen2.5-coder:7b > Modelfile > echo PARAMETER use_mmap false >> Modelfile > ollama create qwen2.5-coder:7b-nommap > ``` > > > Can you explain how you got to this conclusion? > > ENODEV (No such device) means that the mmap operation is not supported on the file. Since the `ls` and `sha256sum` shows that it was a file and had the right contents, the next guess that it was something to do with the filesytem. The `df` got the name of the filesystem, and looking it up in /proc/mounts showed it was a virtual file system, not a real one. Ah very good explanation. Thank you very much for the help and your time!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8008