[GH-ISSUE #14207] Crash in Vulkan backend on Intel Arc when using Ollama through VS Code Continue (Exception 0xe06d7363 during llama_decode) #55769

Closed
opened 2026-04-29 09:42:50 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Josejsi on GitHub (Feb 11, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14207

What is the issue?

Summary

Ollama crashes when running Qwen2.5‑7B (Q4_K) on an Intel Arc GPU only when the model is invoked through the VS Code Continue extension.
Running the model directly via:
ollama run qwen2.5:7b
works normally and does NOT crash.
However, when Continue sends requests to Ollama (streaming, incremental decoding, multiple small batches), the runner crashes during llama_decode with:
Exception 0xe06d7363 … signal arrived during external code execution wsarecv: An existing connection was forcibly closed by the remote host
This terminates the runner process and Ollama returns a 500 error.

Why this matters

The crash appears only when Continue triggers Ollama’s decoding pipeline.
This suggests a bug in:

  • the Vulkan backend under high‑frequency decode calls
  • batch handling
  • streaming token generation
  • or synchronization between CPU ↔ GPU during incremental decoding
    The model loads perfectly, all 29 layers offload to GPU, Flash Attention is enabled, and inference begins normally.
    The crash happens only during decoding, not during loading.

Hardware

  • GPU: Intel Arc 140V (16 GB shared memory)
  • Driver: Intel 32.0.101.8425
  • API Vulkan: 1.4.333
  • CPU: Intel Lunar Lake
  • RAM: 32 GB
  • OS: Windows 11

Ollama Version

Ollama 0.15.6

Model

Qwen2.5-7B

Steps to Reproduce

  1. Install Ollama 0.15.6 on Windows.
  2. Install the Continue extension in VS Code.
  3. Configure Continue to use Ollama as backend.
  4. Ask Continue any question that triggers streaming generation.
  5. The model loads successfully.
  6. Crash occurs during the first decoding steps.
    Running the same prompt directly in the terminal does not crash.

Relevant Log Snippet

Exception 0xe06d7363 0x19930520 0x1de98f9550 0x7ffa3647a80a signal arrived during external code execution llama_decode → crash wsarecv: An existing connection was forcibly closed by the remote host

Full log (including Vulkan memory reports, layer offloading, and crash stack trace) is available here:

Intel GPU Community Issue Tracker report:
https://github.com/IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT/issues/1356#issuecomment-3879936651

Additional Notes

  • The crash does NOT occur when running the model directly via terminal.
  • The crash happens consistently when Continue sends multiple small decode requests.
  • Reducing batch size or GPU layers reduces frequency but does not eliminate the crash.
  • Disabling Flash Attention also reduces frequency but does not fully fix it.
  • This may be:
    • a Vulkan backend bug in GGML/Ollama,
    • an Intel Arc Vulkan driver issue,
    • or an interaction between both.

Expected Behavior

Model should run inference through Continue without crashing the runner.

Actual Behavior

Runner process crashes during llama_decode, causing Ollama to return a 500 error.

Additional Context

Other users with Intel Arc GPUs have reported similar Vulkan crashes under high‑frequency decode workloads in GGML-based projects.
Please let me know if you need additional logs, environment variables, or a reproducible test environment.

Originally created by @Josejsi on GitHub (Feb 11, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14207 ### What is the issue? ### Summary Ollama crashes when running Qwen2.5‑7B (Q4_K) on an Intel Arc GPU **only when the model is invoked through the VS Code Continue extension**. Running the model directly via: ollama run qwen2.5:7b works normally and does NOT crash. However, when Continue sends requests to Ollama (streaming, incremental decoding, multiple small batches), the runner crashes during `llama_decode` with: Exception 0xe06d7363 … signal arrived during external code execution wsarecv: An existing connection was forcibly closed by the remote host This terminates the runner process and Ollama returns a 500 error. --- ### Why this matters The crash appears **only** when Continue triggers Ollama’s decoding pipeline. This suggests a bug in: - the Vulkan backend under high‑frequency decode calls - batch handling - streaming token generation - or synchronization between CPU ↔ GPU during incremental decoding The model loads perfectly, all 29 layers offload to GPU, Flash Attention is enabled, and inference begins normally. The crash happens **only during decoding**, not during loading. --- ### Hardware - **GPU:** Intel Arc 140V (16 GB shared memory) - **Driver:** Intel 32.0.101.8425 - **API Vulkan:** 1.4.333 - **CPU:** Intel Lunar Lake - **RAM:** 32 GB - **OS:** Windows 11 --- ### Ollama Version Ollama 0.15.6 --- ### Model Qwen2.5-7B --- ### Steps to Reproduce 1. Install Ollama 0.15.6 on Windows. 2. Install the Continue extension in VS Code. 3. Configure Continue to use Ollama as backend. 4. Ask Continue any question that triggers streaming generation. 5. The model loads successfully. 6. Crash occurs during the first decoding steps. Running the same prompt directly in the terminal **does not crash**. --- ### Relevant Log Snippet Exception 0xe06d7363 0x19930520 0x1de98f9550 0x7ffa3647a80a signal arrived during external code execution llama_decode → crash wsarecv: An existing connection was forcibly closed by the remote host Full log (including Vulkan memory reports, layer offloading, and crash stack trace) is available here: **Intel GPU Community Issue Tracker report:** https://github.com/IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT/issues/1356#issuecomment-3879936651 --- ### Additional Notes - The crash does NOT occur when running the model directly via terminal. - The crash happens consistently when Continue sends multiple small decode requests. - Reducing batch size or GPU layers reduces frequency but does not eliminate the crash. - Disabling Flash Attention also reduces frequency but does not fully fix it. - This may be: - a Vulkan backend bug in GGML/Ollama, - an Intel Arc Vulkan driver issue, - or an interaction between both. --- ### Expected Behavior Model should run inference through Continue without crashing the runner. --- ### Actual Behavior Runner process crashes during `llama_decode`, causing Ollama to return a 500 error. --- ### Additional Context Other users with Intel Arc GPUs have reported similar Vulkan crashes under high‑frequency decode workloads in GGML-based projects. Please let me know if you need additional logs, environment variables, or a reproducible test environment.
GiteaMirror added the bug label 2026-04-29 09:42:50 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 11, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:3886483817 --> @rick-github commented on GitHub (Feb 11, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@Josejsi commented on GitHub (Feb 11, 2026):

Click to view full Ollama Server Log
time=2026-02-11T21:37:37.754+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 50677"
time=2026-02-11T21:37:38.299+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-11T21:37:38.299+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-02-11T21:37:38.299+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=4 threads=8
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.36 GiB (4.91 BPW) 
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-02-11T21:37:38.846+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\josej\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 50687"
time=2026-02-11T21:37:38.857+01:00 level=INFO source=sched.go:463 msg="system memory" total="31.5 GiB" free="17.7 GiB" free_swap="19.5 GiB"
time=2026-02-11T21:37:38.857+01:00 level=INFO source=sched.go:470 msg="gpu memory" id=8680a064-0400-0000-0002-000000000000 library=Vulkan available="16.5 GiB" free="16.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-11T21:37:38.857+01:00 level=INFO source=server.go:498 msg="loading model" "model layers"=29 requested=-1
time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="4.1 GiB"
time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="448.0 MiB"
time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="478.0 MiB"
time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:272 msg="total memory" size="5.0 GiB"
time=2026-02-11T21:37:38.888+01:00 level=INFO source=runner.go:965 msg="starting go runner"
load_backend: loaded CPU backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140V GPU (16GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll
time=2026-02-11T21:37:38.958+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2026-02-11T21:37:38.959+01:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50687"
time=2026-02-11T21:37:39.016+01:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:8192 KvCacheType: NumThreads:4 GPULayers:29[ID:8680a064-0400-0000-0002-000000000000 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e046
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1269903360.00 bytes (1.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18172369387 total: 19442272747
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e046
ggml_dxgi_pdh_init called
time=2026-02-11T21:37:39.181+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB
time=2026-02-11T21:37:39.182+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1269903360.00 bytes (1.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18172369387 total: 19442272747
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e046
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1269903360.00 bytes (1.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18172369387 total: 19442272747
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Arc(TM) 140V GPU (16GB)) (unknown id) - 17330 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.36 GiB (4.91 BPW) 
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 3584
print_info: n_embd_inp       = 3584
print_info: n_layer          = 28
print_info: n_head           = 28
print_info: n_head_kv        = 4
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 7
print_info: n_embd_k_gqa     = 512
print_info: n_embd_v_gqa     = 512
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 18944
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = -1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 7B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e046
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1274810368.00 bytes (1.19 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18167462379 total: 19442272747
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e046
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 2319007744.00 bytes (2.16 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 17123265003 total: 19442272747
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors:      Vulkan0 model buffer size =  4168.09 MiB
load_tensors:  Vulkan_Host model buffer size =   292.36 MiB
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e046
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 5961330688.00 bytes (5.55 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 13480942059 total: 19442272747
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_context: Vulkan_Host  output buffer size =     0.59 MiB
llama_kv_cache:    Vulkan0 KV buffer size =   448.00 MiB
llama_kv_cache: size =  448.00 MiB (  8192 cells,  28 layers,  1/1 seqs), K (f16):  224.00 MiB, V (f16):  224.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:    Vulkan0 compute buffer size =   304.00 MiB
llama_context: Vulkan_Host compute buffer size =    23.01 MiB
llama_context: graph nodes  = 959
llama_context: graph splits = 2
time=2026-02-11T21:37:41.963+01:00 level=INFO source=server.go:1388 msg="llama runner started in 3.11 seconds"
time=2026-02-11T21:37:41.963+01:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
time=2026-02-11T21:37:41.963+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-11T21:37:41.964+01:00 level=INFO source=server.go:1388 msg="llama runner started in 3.11 seconds"
[GIN] 2026/02/11 - 21:37:53 | 200 |     14.9029ms |       127.0.0.1 | GET      "/api/tags"
Exception 0xe06d7363 0x19930520 0xe88d8f9510 0x7fff2d51a80a
PC=0x7fff2d51a80a
signal arrived during external code execution

runtime.cgocall(0x7ff7b4e4a690, 0xc000489b88)
	runtime/cgocall.go:167 +0x3e fp=0xc000489b60 sp=0xc000489af8 pc=0x7ff7b403243e
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x200c717ecf0, {0x199, 0x200fc8a5c00, 0x0, 0x200fc8a43d0, 0x200fc8a6410, 0x200c87fc0a0, 0x200c4071180})
	_cgo_gotypes.go:677 +0x50 fp=0xc000489b88 sp=0xc000489b60 pc=0x7ff7b44637d0
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0xc000489da0?, 0x1?)
	github.com/ollama/ollama/llama/llama.go:173 +0xed fp=0xc000489c70 sp=0xc000489b88 pc=0x7ff7b4466d0d
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0000d6be0, 0xc00033ec80, 0xc000489f28)
	github.com/ollama/ollama/runner/llamarunner/runner.go:494 +0x250 fp=0xc000489ee8 sp=0xc000489c70 pc=0x7ff7b451cad0
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0000d6be0, {0x7ff7b5697e30, 0xc00033e550})
	github.com/ollama/ollama/runner/llamarunner/runner.go:387 +0x1d5 fp=0xc000489fb8 sp=0xc000489ee8 pc=0x7ff7b451c715
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000489fe0 sp=0xc000489fb8 pc=0x7ff7b4521ae8
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x7ff7b403d8e1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]:
runtime.gopark(0x7ff7b403f0e0?, 0x7ff7b609ce00?, 0xa0?, 0x71?, 0xc00069724c?)
	runtime/proc.go:435 +0xce fp=0xc00021f630 sp=0xc00021f610 pc=0x7ff7b403598e
runtime.netpollblock(0x414?, 0xb3fd0406?, 0xf7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc00021f668 sp=0xc00021f630 pc=0x7ff7b3ffbdf7
internal/poll.runtime_pollWait(0x200fde7a910, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc00021f688 sp=0xc00021f668 pc=0x7ff7b4034b25
internal/poll.(*pollDesc).wait(0x7ff7b40ca7b3?, 0x0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00021f6b0 sp=0xc00021f688 pc=0x7ff7b40cbda7
internal/poll.execIO(0xc0006971a0, 0xc00021f758)
	internal/poll/fd_windows.go:177 +0x105 fp=0xc00021f728 sp=0xc00021f6b0 pc=0x7ff7b40cd205
internal/poll.(*FD).acceptOne(0xc000697188, 0x2c8, {0xc0005b81e0?, 0xc00021f7b8?, 0x7ff7b40d4ec5?}, 0xc00021f7ec?)
	internal/poll/fd_windows.go:946 +0x65 fp=0xc00021f788 sp=0xc00021f728 pc=0x7ff7b40d1785
internal/poll.(*FD).Accept(0xc000697188, 0xc00021f938)
	internal/poll/fd_windows.go:980 +0x1b6 fp=0xc00021f840 sp=0xc00021f788 pc=0x7ff7b40d1ab6
net.(*netFD).accept(0xc000697188)
	net/fd_windows.go:182 +0x4b fp=0xc00021f958 sp=0xc00021f840 pc=0x7ff7b414330b
net.(*TCPListener).accept(0xc0003d4f40)
	net/tcpsock_posix.go:159 +0x1b fp=0xc00021f9a8 sp=0xc00021f958 pc=0x7ff7b41598bb
net.(*TCPListener).Accept(0xc0003d4f40)
	net/tcpsock.go:380 +0x30 fp=0xc00021f9d8 sp=0xc00021f9a8 pc=0x7ff7b4158670
net/http.(*onceCloseListener).Accept(0xc00059c480?)
	<autogenerated>:1 +0x24 fp=0xc00021f9f0 sp=0xc00021f9d8 pc=0x7ff7b4371d84
net/http.(*Server).Serve(0xc000143900, {0x7ff7b5695770, 0xc0003d4f40})
	net/http/server.go:3424 +0x30c fp=0xc00021fb20 sp=0xc00021f9f0 pc=0x7ff7b434964c
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000a2020, 0x4, 0x6})
	github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc00021fcf0 sp=0xc00021fb20 pc=0x7ff7b4521875
github.com/ollama/ollama/runner.Execute({0xc0000a2010?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x145 fp=0xc00021fd30 sp=0xc00021fcf0 pc=0x7ff7b45f3c65
github.com/ollama/ollama/cmd.NewCLI.func3(0xc000143500?, {0x7ff7b548d93d?, 0x4?, 0x7ff7b548d941?})
	github.com/ollama/ollama/cmd/cmd.go:1979 +0x45 fp=0xc00021fd58 sp=0xc00021fd30 pc=0x7ff7b4ddad25
github.com/spf13/cobra.(*Command).execute(0xc00002d508, {0xc0003d4cc0, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00021fe78 sp=0xc00021fd58 pc=0x7ff7b41be4dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc000640f08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00021ff30 sp=0xc00021fe78 pc=0x7ff7b41bed25
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00021ff50 sp=0xc00021ff30 pc=0x7ff7b4ddb80d
runtime.main()
	runtime/proc.go:283 +0x27d fp=0xc00021ffe0 sp=0xc00021ff50 pc=0x7ff7b4004ddd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x7ff7b403d8e1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00006ffa8 sp=0xc00006ff88 pc=0x7ff7b403598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc00006ffe0 sp=0xc00006ffa8 pc=0x7ff7b40050f8
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff7b403d8e1
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000071f80 sp=0xc000071f60 pc=0x7ff7b403598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc00007e000)
	runtime/mgcsweep.go:316 +0xdf fp=0xc000071fc8 sp=0xc000071f80 pc=0x7ff7b3fedebf
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff7b3fe2285
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff7b403d8e1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff7b5680ce8?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x7ff7b403598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff7b60c42a0)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x7ff7b3feb909
runtime.bgscavenge(0xc00007e000)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x7ff7b3febe99
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7ff7b3fe2225
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff7b403d8e1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000087e30 sp=0xc000087e10 pc=0x7ff7b403598e
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc000087fe0 sp=0xc000087e30 pc=0x7ff7b3fe1207
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x7ff7b403d8e1
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc0000d7a40?, 0xc000510018?, 0x60?, 0x3f?, 0x7ff7b412c008?)
	runtime/proc.go:435 +0xce fp=0xc000073f18 sp=0xc000073ef8 pc=0x7ff7b403598e
runtime.chanrecv(0xc00003a460, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc000073f90 sp=0xc000073f18 pc=0x7ff7b3fd2d45
runtime.chanrecv1(0x7ff7b4004f40?, 0xc000073f76?)
	runtime/chan.go:506 +0x12 fp=0xc000073fb8 sp=0xc000073f90 pc=0x7ff7b3fd28d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc000073fe0 sp=0xc000073fb8 pc=0x7ff7b3fe54af
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff7b403d8e1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0003da8c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc0002081c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000213f38 sp=0xc000213f18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc000213fc8 sp=0xc000213f38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000213fe0 sp=0xc000213fc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000213fe8 sp=0xc000213fe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000208380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000215f38 sp=0xc000215f18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc000215fc8 sp=0xc000215f38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000215fe0 sp=0xc000215fc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000215fe8 sp=0xc000215fe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000208540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00020ff38 sp=0xc00020ff18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc00020ffc8 sp=0xc00020ff38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00020ffe0 sp=0xc00020ffc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00020ffe8 sp=0xc00020ffe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000208700 m=nil [GC worker (idle)]:
runtime.gopark(0xabfeed425c?, 0x3?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000211f38 sp=0xc000211f18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc000211fc8 sp=0xc000211f38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000211fe0 sp=0xc000211fc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000211fe8 sp=0xc000211fe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000484000 m=nil [GC worker (idle)]:
runtime.gopark(0xabfeed425c?, 0x3?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00048bf38 sp=0xc00048bf18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc00048bfc8 sp=0xc00048bf38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00048bfe0 sp=0xc00048bfc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00048bfe8 sp=0xc00048bfe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc0004841c0 m=nil [GC worker (idle)]:
runtime.gopark(0xabfeed425c?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00048df38 sp=0xc00048df18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc00048dfc8 sp=0xc00048df38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00048dfe0 sp=0xc00048dfc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00048dfe8 sp=0xc00048dfe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc0002088c0 m=nil [GC worker (idle)]:
runtime.gopark(0xabfeed425c?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000487f38 sp=0xc000487f18 pc=0x7ff7b403598e
runtime.gcBgMarkWorker(0xc00003ba40)
	runtime/mgc.go:1423 +0xe9 fp=0xc000487fc8 sp=0xc000487f38 pc=0x7ff7b3fe47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000487fe0 sp=0xc000487fc8 pc=0x7ff7b3fe4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000487fe8 sp=0xc000487fe0 pc=0x7ff7b403d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc000484700 m=nil [select]:
runtime.gopark(0xc000049a78?, 0x2?, 0xa?, 0x0?, 0xc0000498cc?)
	runtime/proc.go:435 +0xce fp=0xc000049700 sp=0xc0000496e0 pc=0x7ff7b403598e
runtime.selectgo(0xc000049a78, 0xc0000498c8, 0xb99?, 0x0, 0x1?, 0x1)
	runtime/select.go:351 +0x837 fp=0xc000049838 sp=0xc000049700 pc=0x7ff7b4016437
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0000d6be0, {0x7ff7b5695920, 0xc0000007e0}, 0xc000192640)
	github.com/ollama/ollama/runner/llamarunner/runner.go:716 +0xbe5 fp=0xc000049ac0 sp=0xc000049838 pc=0x7ff7b451ea85
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x7ff7b5695920?, 0xc0000007e0?}, 0xc000049b40?)
	<autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x7ff7b4521ef6
net/http.HandlerFunc.ServeHTTP(0xc000626f00?, {0x7ff7b5695920?, 0xc0000007e0?}, 0xc000049b60?)
	net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x7ff7b4345c89
net/http.(*ServeMux).ServeHTTP(0x7ff7b3fdb785?, {0x7ff7b5695920, 0xc0000007e0}, 0xc000192640)
	net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x7ff7b4347b84
net/http.serverHandler.ServeHTTP({0x7ff7b5691d70?}, {0x7ff7b5695920?, 0xc0000007e0?}, 0x1?)
	net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x7ff7b436560e
net/http.(*conn).serve(0xc00059c480, {0x7ff7b5697df8, 0xc000451b00})
	net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff7b4344185
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff7b4349a48
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff7b403d8e1
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485

goroutine 27 gp=0xc0004848c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc000697420?, 0xc8?, 0x74?, 0xc0006974cc?)
	runtime/proc.go:435 +0xce fp=0xc00049fd58 sp=0xc00049fd38 pc=0x7ff7b403598e
runtime.netpollblock(0x41c?, 0xb3fd0406?, 0xf7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc00049fd90 sp=0xc00049fd58 pc=0x7ff7b3ffbdf7
internal/poll.runtime_pollWait(0x200fde7a7f8, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc00049fdb0 sp=0xc00049fd90 pc=0x7ff7b4034b25
internal/poll.(*pollDesc).wait(0x41c?, 0x72?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00049fdd8 sp=0xc00049fdb0 pc=0x7ff7b40cbda7
internal/poll.execIO(0xc000697420, 0x7ff7b550c9c8)
	internal/poll/fd_windows.go:177 +0x105 fp=0xc00049fe50 sp=0xc00049fdd8 pc=0x7ff7b40cd205
internal/poll.(*FD).Read(0xc000697408, {0xc000671211, 0x1, 0x1})
	internal/poll/fd_windows.go:438 +0x29b fp=0xc00049fef0 sp=0xc00049fe50 pc=0x7ff7b40cdedb
net.(*netFD).Read(0xc000697408, {0xc000671211?, 0xc00061d658?, 0xc00049ff70?})
	net/fd_posix.go:55 +0x25 fp=0xc00049ff38 sp=0xc00049fef0 pc=0x7ff7b41411e5
net.(*conn).Read(0xc000076550, {0xc000671211?, 0x0?, 0x0?})
	net/net.go:194 +0x45 fp=0xc00049ff80 sp=0xc00049ff38 pc=0x7ff7b4150905
net/http.(*connReader).backgroundRead(0xc000671200)
	net/http/server.go:690 +0x37 fp=0xc00049ffc8 sp=0xc00049ff80 pc=0x7ff7b433e057
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc00049ffe0 sp=0xc00049ffc8 pc=0x7ff7b433df85
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00049ffe8 sp=0xc00049ffe0 pc=0x7ff7b403d8e1
created by net/http.(*connReader).startBackgroundRead in goroutine 9
	net/http/server.go:686 +0xb6
rax     0xfffffffe
rbx     0xe88d8f9498
rcx     0x0
rdx     0xe88d8f8de0
rdi     0xe06d7363
rsi     0x1
rbp     0x4
rsp     0xe88d8f9370
r8      0x1
r9      0xe06d7363
r10     0x200cc405fd0
r11     0x200fc7c0000
r12     0x0
r13     0x200c93b2b30
r14     0x2000
r15     0x0
rip     0x7fff2d51a80a
rflags  0x202
cs      0x33
fs      0x53
gs      0x2b
time=2026-02-11T21:38:19.806+01:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:50687/completion\": read tcp 127.0.0.1:50698->127.0.0.1:50687: wsarecv: An existing connection was forcibly closed by the remote host."
[GIN] 2026/02/11 - 21:38:19 | 500 |   42.2943218s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/02/11 - 21:38:24 | 200 |      2.8609ms |       127.0.0.1 | GET      "/api/tags"
<!-- gh-comment-id:3887076028 --> @Josejsi commented on GitHub (Feb 11, 2026): <details> <summary>Click to view full Ollama Server Log</summary> ```text time=2026-02-11T21:37:37.754+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 50677" time=2026-02-11T21:37:38.299+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-11T21:37:38.299+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-02-11T21:37:38.299+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=4 threads=8 llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5 llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-7... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-7B llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-02-11T21:37:38.846+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\josej\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 50687" time=2026-02-11T21:37:38.857+01:00 level=INFO source=sched.go:463 msg="system memory" total="31.5 GiB" free="17.7 GiB" free_swap="19.5 GiB" time=2026-02-11T21:37:38.857+01:00 level=INFO source=sched.go:470 msg="gpu memory" id=8680a064-0400-0000-0002-000000000000 library=Vulkan available="16.5 GiB" free="16.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-11T21:37:38.857+01:00 level=INFO source=server.go:498 msg="loading model" "model layers"=29 requested=-1 time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="4.1 GiB" time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="448.0 MiB" time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="478.0 MiB" time=2026-02-11T21:37:38.858+01:00 level=INFO source=device.go:272 msg="total memory" size="5.0 GiB" time=2026-02-11T21:37:38.888+01:00 level=INFO source=runner.go:965 msg="starting go runner" load_backend: loaded CPU backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Arc(TM) 140V GPU (16GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat load_backend: loaded Vulkan backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll time=2026-02-11T21:37:38.958+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2026-02-11T21:37:38.959+01:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50687" time=2026-02-11T21:37:39.016+01:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:8192 KvCacheType: NumThreads:4 GPULayers:29[ID:8680a064-0400-0000-0002-000000000000 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e046 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1269903360.00 bytes (1.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18172369387 total: 19442272747 ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e046 ggml_dxgi_pdh_init called time=2026-02-11T21:37:39.181+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB time=2026-02-11T21:37:39.182+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1269903360.00 bytes (1.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18172369387 total: 19442272747 ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e046 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1269903360.00 bytes (1.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18172369387 total: 19442272747 llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Arc(TM) 140V GPU (16GB)) (unknown id) - 17330 MiB free llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5 llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-7... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-7B llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 3584 print_info: n_embd_inp = 3584 print_info: n_layer = 28 print_info: n_head = 28 print_info: n_head_kv = 4 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 7 print_info: n_embd_k_gqa = 512 print_info: n_embd_v_gqa = 512 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 18944 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = -1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 7B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e046 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1274810368.00 bytes (1.19 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18167462379 total: 19442272747 ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e046 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 2319007744.00 bytes (2.16 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 17123265003 total: 19442272747 load_tensors: offloading 28 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 29/29 layers to GPU load_tensors: Vulkan0 model buffer size = 4168.09 MiB load_tensors: Vulkan_Host model buffer size = 292.36 MiB ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e046 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) 140V GPU (16GB), LUID: 0x000000000000E046, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E468, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) 140V GPU (16GB)) with LUID 0x000000000000e046 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 5961330688.00 bytes (5.55 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 13480942059 total: 19442272747 llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 8192 llama_context: n_ctx_seq = 8192 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized llama_context: Vulkan_Host output buffer size = 0.59 MiB llama_kv_cache: Vulkan0 KV buffer size = 448.00 MiB llama_kv_cache: size = 448.00 MiB ( 8192 cells, 28 layers, 1/1 seqs), K (f16): 224.00 MiB, V (f16): 224.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Vulkan0 compute buffer size = 304.00 MiB llama_context: Vulkan_Host compute buffer size = 23.01 MiB llama_context: graph nodes = 959 llama_context: graph splits = 2 time=2026-02-11T21:37:41.963+01:00 level=INFO source=server.go:1388 msg="llama runner started in 3.11 seconds" time=2026-02-11T21:37:41.963+01:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 time=2026-02-11T21:37:41.963+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-02-11T21:37:41.964+01:00 level=INFO source=server.go:1388 msg="llama runner started in 3.11 seconds" [GIN] 2026/02/11 - 21:37:53 | 200 | 14.9029ms | 127.0.0.1 | GET "/api/tags" Exception 0xe06d7363 0x19930520 0xe88d8f9510 0x7fff2d51a80a PC=0x7fff2d51a80a signal arrived during external code execution runtime.cgocall(0x7ff7b4e4a690, 0xc000489b88) runtime/cgocall.go:167 +0x3e fp=0xc000489b60 sp=0xc000489af8 pc=0x7ff7b403243e github.com/ollama/ollama/llama._Cfunc_llama_decode(0x200c717ecf0, {0x199, 0x200fc8a5c00, 0x0, 0x200fc8a43d0, 0x200fc8a6410, 0x200c87fc0a0, 0x200c4071180}) _cgo_gotypes.go:677 +0x50 fp=0xc000489b88 sp=0xc000489b60 pc=0x7ff7b44637d0 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0xc000489da0?, 0x1?) github.com/ollama/ollama/llama/llama.go:173 +0xed fp=0xc000489c70 sp=0xc000489b88 pc=0x7ff7b4466d0d github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0000d6be0, 0xc00033ec80, 0xc000489f28) github.com/ollama/ollama/runner/llamarunner/runner.go:494 +0x250 fp=0xc000489ee8 sp=0xc000489c70 pc=0x7ff7b451cad0 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0000d6be0, {0x7ff7b5697e30, 0xc00033e550}) github.com/ollama/ollama/runner/llamarunner/runner.go:387 +0x1d5 fp=0xc000489fb8 sp=0xc000489ee8 pc=0x7ff7b451c715 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000489fe0 sp=0xc000489fb8 pc=0x7ff7b4521ae8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x7ff7b403d8e1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5 goroutine 1 gp=0xc0000021c0 m=nil [IO wait]: runtime.gopark(0x7ff7b403f0e0?, 0x7ff7b609ce00?, 0xa0?, 0x71?, 0xc00069724c?) runtime/proc.go:435 +0xce fp=0xc00021f630 sp=0xc00021f610 pc=0x7ff7b403598e runtime.netpollblock(0x414?, 0xb3fd0406?, 0xf7?) runtime/netpoll.go:575 +0xf7 fp=0xc00021f668 sp=0xc00021f630 pc=0x7ff7b3ffbdf7 internal/poll.runtime_pollWait(0x200fde7a910, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00021f688 sp=0xc00021f668 pc=0x7ff7b4034b25 internal/poll.(*pollDesc).wait(0x7ff7b40ca7b3?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00021f6b0 sp=0xc00021f688 pc=0x7ff7b40cbda7 internal/poll.execIO(0xc0006971a0, 0xc00021f758) internal/poll/fd_windows.go:177 +0x105 fp=0xc00021f728 sp=0xc00021f6b0 pc=0x7ff7b40cd205 internal/poll.(*FD).acceptOne(0xc000697188, 0x2c8, {0xc0005b81e0?, 0xc00021f7b8?, 0x7ff7b40d4ec5?}, 0xc00021f7ec?) internal/poll/fd_windows.go:946 +0x65 fp=0xc00021f788 sp=0xc00021f728 pc=0x7ff7b40d1785 internal/poll.(*FD).Accept(0xc000697188, 0xc00021f938) internal/poll/fd_windows.go:980 +0x1b6 fp=0xc00021f840 sp=0xc00021f788 pc=0x7ff7b40d1ab6 net.(*netFD).accept(0xc000697188) net/fd_windows.go:182 +0x4b fp=0xc00021f958 sp=0xc00021f840 pc=0x7ff7b414330b net.(*TCPListener).accept(0xc0003d4f40) net/tcpsock_posix.go:159 +0x1b fp=0xc00021f9a8 sp=0xc00021f958 pc=0x7ff7b41598bb net.(*TCPListener).Accept(0xc0003d4f40) net/tcpsock.go:380 +0x30 fp=0xc00021f9d8 sp=0xc00021f9a8 pc=0x7ff7b4158670 net/http.(*onceCloseListener).Accept(0xc00059c480?) <autogenerated>:1 +0x24 fp=0xc00021f9f0 sp=0xc00021f9d8 pc=0x7ff7b4371d84 net/http.(*Server).Serve(0xc000143900, {0x7ff7b5695770, 0xc0003d4f40}) net/http/server.go:3424 +0x30c fp=0xc00021fb20 sp=0xc00021f9f0 pc=0x7ff7b434964c github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000a2020, 0x4, 0x6}) github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc00021fcf0 sp=0xc00021fb20 pc=0x7ff7b4521875 github.com/ollama/ollama/runner.Execute({0xc0000a2010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x145 fp=0xc00021fd30 sp=0xc00021fcf0 pc=0x7ff7b45f3c65 github.com/ollama/ollama/cmd.NewCLI.func3(0xc000143500?, {0x7ff7b548d93d?, 0x4?, 0x7ff7b548d941?}) github.com/ollama/ollama/cmd/cmd.go:1979 +0x45 fp=0xc00021fd58 sp=0xc00021fd30 pc=0x7ff7b4ddad25 github.com/spf13/cobra.(*Command).execute(0xc00002d508, {0xc0003d4cc0, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00021fe78 sp=0xc00021fd58 pc=0x7ff7b41be4dc github.com/spf13/cobra.(*Command).ExecuteC(0xc000640f08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00021ff30 sp=0xc00021fe78 pc=0x7ff7b41bed25 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00021ff50 sp=0xc00021ff30 pc=0x7ff7b4ddb80d runtime.main() runtime/proc.go:283 +0x27d fp=0xc00021ffe0 sp=0xc00021ff50 pc=0x7ff7b4004ddd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x7ff7b403d8e1 goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00006ffa8 sp=0xc00006ff88 pc=0x7ff7b403598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc00006ffe0 sp=0xc00006ffa8 pc=0x7ff7b40050f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff7b403d8e1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000071f80 sp=0xc000071f60 pc=0x7ff7b403598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00007e000) runtime/mgcsweep.go:316 +0xdf fp=0xc000071fc8 sp=0xc000071f80 pc=0x7ff7b3fedebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff7b3fe2285 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff7b403d8e1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x7ff7b5680ce8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x7ff7b403598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x7ff7b60c42a0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x7ff7b3feb909 runtime.bgscavenge(0xc00007e000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x7ff7b3febe99 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7ff7b3fe2225 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff7b403d8e1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003340 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087e30 sp=0xc000087e10 pc=0x7ff7b403598e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc000087fe0 sp=0xc000087e30 pc=0x7ff7b3fe1207 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x7ff7b403d8e1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc000003dc0 m=nil [chan receive]: runtime.gopark(0xc0000d7a40?, 0xc000510018?, 0x60?, 0x3f?, 0x7ff7b412c008?) runtime/proc.go:435 +0xce fp=0xc000073f18 sp=0xc000073ef8 pc=0x7ff7b403598e runtime.chanrecv(0xc00003a460, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000073f90 sp=0xc000073f18 pc=0x7ff7b3fd2d45 runtime.chanrecv1(0x7ff7b4004f40?, 0xc000073f76?) runtime/chan.go:506 +0x12 fp=0xc000073fb8 sp=0xc000073f90 pc=0x7ff7b3fd28d2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc000073fe0 sp=0xc000073fb8 pc=0x7ff7b3fe54af runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff7b403d8e1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0003da8c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc0002081c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000213f38 sp=0xc000213f18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc000213fc8 sp=0xc000213f38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000213fe0 sp=0xc000213fc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000213fe8 sp=0xc000213fe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc000208380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000215f38 sp=0xc000215f18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc000215fc8 sp=0xc000215f38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000215fe0 sp=0xc000215fc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000215fe8 sp=0xc000215fe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000208540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00020ff38 sp=0xc00020ff18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc00020ffc8 sp=0xc00020ff38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00020ffe0 sp=0xc00020ffc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00020ffe8 sp=0xc00020ffe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc000208700 m=nil [GC worker (idle)]: runtime.gopark(0xabfeed425c?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000211f38 sp=0xc000211f18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc000211fc8 sp=0xc000211f38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000211fe0 sp=0xc000211fc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000211fe8 sp=0xc000211fe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000484000 m=nil [GC worker (idle)]: runtime.gopark(0xabfeed425c?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00048bf38 sp=0xc00048bf18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc00048bfc8 sp=0xc00048bf38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00048bfe0 sp=0xc00048bfc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00048bfe8 sp=0xc00048bfe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc0004841c0 m=nil [GC worker (idle)]: runtime.gopark(0xabfeed425c?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00048df38 sp=0xc00048df18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc00048dfc8 sp=0xc00048df38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00048dfe0 sp=0xc00048dfc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00048dfe8 sp=0xc00048dfe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc0002088c0 m=nil [GC worker (idle)]: runtime.gopark(0xabfeed425c?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000487f38 sp=0xc000487f18 pc=0x7ff7b403598e runtime.gcBgMarkWorker(0xc00003ba40) runtime/mgc.go:1423 +0xe9 fp=0xc000487fc8 sp=0xc000487f38 pc=0x7ff7b3fe47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000487fe0 sp=0xc000487fc8 pc=0x7ff7b3fe4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000487fe8 sp=0xc000487fe0 pc=0x7ff7b403d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc000484700 m=nil [select]: runtime.gopark(0xc000049a78?, 0x2?, 0xa?, 0x0?, 0xc0000498cc?) runtime/proc.go:435 +0xce fp=0xc000049700 sp=0xc0000496e0 pc=0x7ff7b403598e runtime.selectgo(0xc000049a78, 0xc0000498c8, 0xb99?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc000049838 sp=0xc000049700 pc=0x7ff7b4016437 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0000d6be0, {0x7ff7b5695920, 0xc0000007e0}, 0xc000192640) github.com/ollama/ollama/runner/llamarunner/runner.go:716 +0xbe5 fp=0xc000049ac0 sp=0xc000049838 pc=0x7ff7b451ea85 github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x7ff7b5695920?, 0xc0000007e0?}, 0xc000049b40?) <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x7ff7b4521ef6 net/http.HandlerFunc.ServeHTTP(0xc000626f00?, {0x7ff7b5695920?, 0xc0000007e0?}, 0xc000049b60?) net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x7ff7b4345c89 net/http.(*ServeMux).ServeHTTP(0x7ff7b3fdb785?, {0x7ff7b5695920, 0xc0000007e0}, 0xc000192640) net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x7ff7b4347b84 net/http.serverHandler.ServeHTTP({0x7ff7b5691d70?}, {0x7ff7b5695920?, 0xc0000007e0?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x7ff7b436560e net/http.(*conn).serve(0xc00059c480, {0x7ff7b5697df8, 0xc000451b00}) net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff7b4344185 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff7b4349a48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff7b403d8e1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 27 gp=0xc0004848c0 m=nil [IO wait]: runtime.gopark(0x0?, 0xc000697420?, 0xc8?, 0x74?, 0xc0006974cc?) runtime/proc.go:435 +0xce fp=0xc00049fd58 sp=0xc00049fd38 pc=0x7ff7b403598e runtime.netpollblock(0x41c?, 0xb3fd0406?, 0xf7?) runtime/netpoll.go:575 +0xf7 fp=0xc00049fd90 sp=0xc00049fd58 pc=0x7ff7b3ffbdf7 internal/poll.runtime_pollWait(0x200fde7a7f8, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00049fdb0 sp=0xc00049fd90 pc=0x7ff7b4034b25 internal/poll.(*pollDesc).wait(0x41c?, 0x72?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00049fdd8 sp=0xc00049fdb0 pc=0x7ff7b40cbda7 internal/poll.execIO(0xc000697420, 0x7ff7b550c9c8) internal/poll/fd_windows.go:177 +0x105 fp=0xc00049fe50 sp=0xc00049fdd8 pc=0x7ff7b40cd205 internal/poll.(*FD).Read(0xc000697408, {0xc000671211, 0x1, 0x1}) internal/poll/fd_windows.go:438 +0x29b fp=0xc00049fef0 sp=0xc00049fe50 pc=0x7ff7b40cdedb net.(*netFD).Read(0xc000697408, {0xc000671211?, 0xc00061d658?, 0xc00049ff70?}) net/fd_posix.go:55 +0x25 fp=0xc00049ff38 sp=0xc00049fef0 pc=0x7ff7b41411e5 net.(*conn).Read(0xc000076550, {0xc000671211?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc00049ff80 sp=0xc00049ff38 pc=0x7ff7b4150905 net/http.(*connReader).backgroundRead(0xc000671200) net/http/server.go:690 +0x37 fp=0xc00049ffc8 sp=0xc00049ff80 pc=0x7ff7b433e057 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00049ffe0 sp=0xc00049ffc8 pc=0x7ff7b433df85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00049ffe8 sp=0xc00049ffe0 pc=0x7ff7b403d8e1 created by net/http.(*connReader).startBackgroundRead in goroutine 9 net/http/server.go:686 +0xb6 rax 0xfffffffe rbx 0xe88d8f9498 rcx 0x0 rdx 0xe88d8f8de0 rdi 0xe06d7363 rsi 0x1 rbp 0x4 rsp 0xe88d8f9370 r8 0x1 r9 0xe06d7363 r10 0x200cc405fd0 r11 0x200fc7c0000 r12 0x0 r13 0x200c93b2b30 r14 0x2000 r15 0x0 rip 0x7fff2d51a80a rflags 0x202 cs 0x33 fs 0x53 gs 0x2b time=2026-02-11T21:38:19.806+01:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:50687/completion\": read tcp 127.0.0.1:50698->127.0.0.1:50687: wsarecv: An existing connection was forcibly closed by the remote host." [GIN] 2026/02/11 - 21:38:19 | 500 | 42.2943218s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/02/11 - 21:38:24 | 200 | 2.8609ms | 127.0.0.1 | GET "/api/tags" ``` </details>
Author
Owner

@Josejsi commented on GitHub (Feb 12, 2026):

Update: Tested on v0.16.1 - Issue Persists

The issue remains even after updating to version 0.16.1. Ollama correctly detects the Intel Arc 140V and loads the model into VRAM (approx. 5GB for Qwen2.5 7B), but crashes during inference.

Key Technical Details from Logs:

  • Backend: Vulkan (ggml-vulkan.dll).
  • Error Code: Exception 0xe06d7363 at rip 0x7ff88155a80a.
  • Hardware: Intel(R) Arc(TM) 140V GPU (16GB).
  • Symptom: The runner subprocess terminates abruptly when POST /api/chat is called.
    +2

Full debug logs show that ggml_vulkan is properly initialized but fails during the actual compute execution.

Click to view full Ollama Debug Log
PS C:\Windows\system32\WindowsPowerShell\v1.0> $env:OLLAMA_VULKAN
1
PS C:\Windows\system32\WindowsPowerShell\v1.0> ollama serve
time=2026-02-13T09:13:05.588+01:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\josej\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES:]"
time=2026-02-13T09:13:05.605+01:00 level=INFO source=images.go:473 msg="total blobs: 10"
time=2026-02-13T09:13:05.606+01:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-13T09:13:05.608+01:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.16.1)"
time=2026-02-13T09:13:05.608+01:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2026-02-13T09:13:05.611+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-13T09:13:05.635+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51404"
time=2026-02-13T09:13:05.637+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12
time=2026-02-13T09:13:05.784+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=169.127ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[]
time=2026-02-13T09:13:05.785+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51411"
time=2026-02-13T09:13:05.786+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-02-13T09:13:05.962+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=177.9347ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-02-13T09:13:05.963+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51417"
time=2026-02-13T09:13:05.964+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2026-02-13T09:13:06.083+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=120.352ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[]
time=2026-02-13T09:13:06.084+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51423"
time=2026-02-13T09:13:06.084+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan
time=2026-02-13T09:13:06.564+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=480.5851ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan]" extra_envs=map[]
time=2026-02-13T09:13:06.564+01:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
time=2026-02-13T09:13:06.564+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=954.3483ms
time=2026-02-13T09:13:06.564+01:00 level=INFO source=types.go:42 msg="inference compute" id=8680a064-0400-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(TM) 140V GPU (16GB)" libdirs=ollama,vulkan driver=0.0 pci_id="" type=iGPU total="18.1 GiB" available="17.0 GiB"
time=2026-02-13T09:13:06.564+01:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="18.1 GiB" default_num_ctx=4096
[GIN] 2026/02/13 - 09:13:52 | 404 |       1.641ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-13T09:13:52.294+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/02/13 - 09:13:52 | 200 |    125.6741ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-13T09:13:52.298+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/02/13 - 09:13:52 | 200 |     129.444ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/02/13 - 09:13:56 | 404 |      1.6212ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-13T09:13:56.562+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
time=2026-02-13T09:13:56.563+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/02/13 - 09:13:56 | 200 |    124.8013ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/02/13 - 09:13:56 | 200 |    125.5029ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-13T09:14:00.815+01:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-02-13T09:14:00.815+01:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-02-13T09:14:00.830+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60299"
time=2026-02-13T09:14:00.830+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan
time=2026-02-13T09:14:01.346+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=530.3492ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan]" extra_envs=map[]
time=2026-02-13T09:14:01.346+01:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=530.8737ms
time=2026-02-13T09:14:01.346+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-13T09:14:01.346+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-02-13T09:14:01.347+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=4 threads=8
time=2026-02-13T09:14:01.347+01:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-02-13T09:14:01.362+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
time=2026-02-13T09:14:01.362+01:00 level=DEBUG source=sched.go:231 msg="loading first model" model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.36 GiB (4.91 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-02-13T09:14:01.704+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\josej\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 60306"
time=2026-02-13T09:14:01.705+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan
time=2026-02-13T09:14:01.743+01:00 level=INFO source=sched.go:463 msg="system memory" total="31.5 GiB" free="15.6 GiB" free_swap="15.9 GiB"
time=2026-02-13T09:14:01.751+01:00 level=INFO source=sched.go:470 msg="gpu memory" id=8680a064-0400-0000-0002-000000000000 library=Vulkan available="16.4 GiB" free="16.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-13T09:14:01.751+01:00 level=INFO source=server.go:498 msg="loading model" "model layers"=29 requested=-1
time=2026-02-13T09:14:01.752+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=qwen2.attention.key_length default=128
time=2026-02-13T09:14:01.752+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=qwen2.attention.value_length default=128
time=2026-02-13T09:14:01.753+01:00 level=DEBUG source=ggml.go:635 msg="default cache size estimate" "attention MiB"=448 "attention bytes"=469762048 "recurrent MiB"=0 "recurrent bytes"=0
time=2026-02-13T09:14:01.755+01:00 level=DEBUG source=server.go:976 msg="available gpu" id=8680a064-0400-0000-0002-000000000000 library=Vulkan "available layer vram"="16.3 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
time=2026-02-13T09:14:01.756+01:00 level=DEBUG source=server.go:976 msg="available gpu" id=8680a064-0400-0000-0002-000000000000 library=Vulkan "available layer vram"="15.5 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="730.4 MiB"
time=2026-02-13T09:14:01.756+01:00 level=DEBUG source=server.go:669 msg=memory estimate.Vulkan0.ID=8680a064-0400-0000-0002-000000000000 estimate.Vulkan0.Weights="[149112832 149112832 149112832 131135488 131135488 149112832 131135488 148639744 131135488 131608576 131135488 131608576 148639744 131135488 149112832 131135488 131135488 149112832 131135488 131135488 149112832 131135488 131135488 149112832 149112832 149112832 149112832 149112832 447082496]" estimate.Vulkan0.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" estimate.Vulkan0.Graph=501221376
time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="4.1 GiB"
time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="448.0 MiB"
time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="478.0 MiB"
time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:272 msg="total memory" size="5.0 GiB"
time=2026-02-13T09:14:01.786+01:00 level=INFO source=runner.go:965 msg="starting go runner"
time=2026-02-13T09:14:01.799+01:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-02-13T09:14:01.813+01:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140V GPU (16GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll
time=2026-02-13T09:14:01.852+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2026-02-13T09:14:01.854+01:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:60306"
time=2026-02-13T09:14:01.857+01:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:8192 KvCacheType: NumThreads:4 GPULayers:29[ID:8680a064-0400-0000-0002-000000000000 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e406
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1340088320.00 bytes (1.25 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18102184427 total: 19442272747
time=2026-02-13T09:14:02.045+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e406
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
time=2026-02-13T09:14:02.045+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
[DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1298079744.00 bytes (1.21 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18144193003 total: 19442272747
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e406
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1298079744.00 bytes (1.21 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18144193003 total: 19442272747
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Arc(TM) 140V GPU (16GB)) (unknown id) - 17303 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.36 GiB (4.91 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 3584
print_info: n_embd_inp       = 3584
print_info: n_layer          = 28
print_info: n_head           = 28
print_info: n_head_kv        = 4
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 7
print_info: n_embd_k_gqa     = 512
print_info: n_embd_v_gqa     = 512
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 18944
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = -1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 7B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: layer   0 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   1 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   2 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   3 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   4 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   5 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   6 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   7 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   8 assigned to device Vulkan0, is_swa = 0
load_tensors: layer   9 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  10 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  11 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  12 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  13 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  14 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  15 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  16 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  17 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  18 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  19 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  20 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  21 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  22 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  23 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  24 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  25 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  26 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  27 assigned to device Vulkan0, is_swa = 0
load_tensors: layer  28 assigned to device Vulkan0, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor output_norm.weight
create_tensor: loading tensor output.weight
create_tensor: loading tensor blk.0.attn_norm.weight
create_tensor: loading tensor blk.0.attn_q.weight
create_tensor: loading tensor blk.0.attn_k.weight
create_tensor: loading tensor blk.0.attn_v.weight
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.attn_q.bias
create_tensor: loading tensor blk.0.attn_k.bias
create_tensor: loading tensor blk.0.attn_v.bias
create_tensor: loading tensor blk.0.ffn_norm.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.1.attn_norm.weight
create_tensor: loading tensor blk.1.attn_q.weight
create_tensor: loading tensor blk.1.attn_k.weight
create_tensor: loading tensor blk.1.attn_v.weight
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.attn_q.bias
create_tensor: loading tensor blk.1.attn_k.bias
create_tensor: loading tensor blk.1.attn_v.bias
create_tensor: loading tensor blk.1.ffn_norm.weight
create_tensor: loading tensor blk.1.ffn_gate.weight
create_tensor: loading tensor blk.1.ffn_down.weight
create_tensor: loading tensor blk.1.ffn_up.weight
create_tensor: loading tensor blk.2.attn_norm.weight
create_tensor: loading tensor blk.2.attn_q.weight
create_tensor: loading tensor blk.2.attn_k.weight
create_tensor: loading tensor blk.2.attn_v.weight
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.attn_q.bias
create_tensor: loading tensor blk.2.attn_k.bias
create_tensor: loading tensor blk.2.attn_v.bias
create_tensor: loading tensor blk.2.ffn_norm.weight
create_tensor: loading tensor blk.2.ffn_gate.weight
create_tensor: loading tensor blk.2.ffn_down.weight
create_tensor: loading tensor blk.2.ffn_up.weight
create_tensor: loading tensor blk.3.attn_norm.weight
create_tensor: loading tensor blk.3.attn_q.weight
create_tensor: loading tensor blk.3.attn_k.weight
create_tensor: loading tensor blk.3.attn_v.weight
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.attn_q.bias
create_tensor: loading tensor blk.3.attn_k.bias
create_tensor: loading tensor blk.3.attn_v.bias
create_tensor: loading tensor blk.3.ffn_norm.weight
create_tensor: loading tensor blk.3.ffn_gate.weight
create_tensor: loading tensor blk.3.ffn_down.weight
create_tensor: loading tensor blk.3.ffn_up.weight
create_tensor: loading tensor blk.4.attn_norm.weight
create_tensor: loading tensor blk.4.attn_q.weight
create_tensor: loading tensor blk.4.attn_k.weight
create_tensor: loading tensor blk.4.attn_v.weight
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.attn_q.bias
create_tensor: loading tensor blk.4.attn_k.bias
create_tensor: loading tensor blk.4.attn_v.bias
create_tensor: loading tensor blk.4.ffn_norm.weight
create_tensor: loading tensor blk.4.ffn_gate.weight
create_tensor: loading tensor blk.4.ffn_down.weight
create_tensor: loading tensor blk.4.ffn_up.weight
create_tensor: loading tensor blk.5.attn_norm.weight
create_tensor: loading tensor blk.5.attn_q.weight
create_tensor: loading tensor blk.5.attn_k.weight
create_tensor: loading tensor blk.5.attn_v.weight
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.attn_q.bias
create_tensor: loading tensor blk.5.attn_k.bias
create_tensor: loading tensor blk.5.attn_v.bias
create_tensor: loading tensor blk.5.ffn_norm.weight
create_tensor: loading tensor blk.5.ffn_gate.weight
create_tensor: loading tensor blk.5.ffn_down.weight
create_tensor: loading tensor blk.5.ffn_up.weight
create_tensor: loading tensor blk.6.attn_norm.weight
create_tensor: loading tensor blk.6.attn_q.weight
create_tensor: loading tensor blk.6.attn_k.weight
create_tensor: loading tensor blk.6.attn_v.weight
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.attn_q.bias
create_tensor: loading tensor blk.6.attn_k.bias
create_tensor: loading tensor blk.6.attn_v.bias
create_tensor: loading tensor blk.6.ffn_norm.weight
create_tensor: loading tensor blk.6.ffn_gate.weight
create_tensor: loading tensor blk.6.ffn_down.weight
create_tensor: loading tensor blk.6.ffn_up.weight
create_tensor: loading tensor blk.7.attn_norm.weight
create_tensor: loading tensor blk.7.attn_q.weight
create_tensor: loading tensor blk.7.attn_k.weight
create_tensor: loading tensor blk.7.attn_v.weight
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.attn_q.bias
create_tensor: loading tensor blk.7.attn_k.bias
create_tensor: loading tensor blk.7.attn_v.bias
create_tensor: loading tensor blk.7.ffn_norm.weight
create_tensor: loading tensor blk.7.ffn_gate.weight
create_tensor: loading tensor blk.7.ffn_down.weight
create_tensor: loading tensor blk.7.ffn_up.weight
create_tensor: loading tensor blk.8.attn_norm.weight
create_tensor: loading tensor blk.8.attn_q.weight
create_tensor: loading tensor blk.8.attn_k.weight
create_tensor: loading tensor blk.8.attn_v.weight
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.attn_q.bias
create_tensor: loading tensor blk.8.attn_k.bias
create_tensor: loading tensor blk.8.attn_v.bias
create_tensor: loading tensor blk.8.ffn_norm.weight
create_tensor: loading tensor blk.8.ffn_gate.weight
create_tensor: loading tensor blk.8.ffn_down.weight
create_tensor: loading tensor blk.8.ffn_up.weight
create_tensor: loading tensor blk.9.attn_norm.weight
create_tensor: loading tensor blk.9.attn_q.weight
create_tensor: loading tensor blk.9.attn_k.weight
create_tensor: loading tensor blk.9.attn_v.weight
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.attn_q.bias
create_tensor: loading tensor blk.9.attn_k.bias
create_tensor: loading tensor blk.9.attn_v.bias
create_tensor: loading tensor blk.9.ffn_norm.weight
create_tensor: loading tensor blk.9.ffn_gate.weight
create_tensor: loading tensor blk.9.ffn_down.weight
create_tensor: loading tensor blk.9.ffn_up.weight
create_tensor: loading tensor blk.10.attn_norm.weight
create_tensor: loading tensor blk.10.attn_q.weight
create_tensor: loading tensor blk.10.attn_k.weight
create_tensor: loading tensor blk.10.attn_v.weight
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.attn_q.bias
create_tensor: loading tensor blk.10.attn_k.bias
create_tensor: loading tensor blk.10.attn_v.bias
create_tensor: loading tensor blk.10.ffn_norm.weight
create_tensor: loading tensor blk.10.ffn_gate.weight
create_tensor: loading tensor blk.10.ffn_down.weight
create_tensor: loading tensor blk.10.ffn_up.weight
create_tensor: loading tensor blk.11.attn_norm.weight
create_tensor: loading tensor blk.11.attn_q.weight
create_tensor: loading tensor blk.11.attn_k.weight
create_tensor: loading tensor blk.11.attn_v.weight
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.attn_q.bias
create_tensor: loading tensor blk.11.attn_k.bias
create_tensor: loading tensor blk.11.attn_v.bias
create_tensor: loading tensor blk.11.ffn_norm.weight
create_tensor: loading tensor blk.11.ffn_gate.weight
create_tensor: loading tensor blk.11.ffn_down.weight
create_tensor: loading tensor blk.11.ffn_up.weight
create_tensor: loading tensor blk.12.attn_norm.weight
create_tensor: loading tensor blk.12.attn_q.weight
create_tensor: loading tensor blk.12.attn_k.weight
create_tensor: loading tensor blk.12.attn_v.weight
create_tensor: loading tensor blk.12.attn_output.weight
create_tensor: loading tensor blk.12.attn_q.bias
create_tensor: loading tensor blk.12.attn_k.bias
create_tensor: loading tensor blk.12.attn_v.bias
create_tensor: loading tensor blk.12.ffn_norm.weight
create_tensor: loading tensor blk.12.ffn_gate.weight
create_tensor: loading tensor blk.12.ffn_down.weight
create_tensor: loading tensor blk.12.ffn_up.weight
create_tensor: loading tensor blk.13.attn_norm.weight
create_tensor: loading tensor blk.13.attn_q.weight
create_tensor: loading tensor blk.13.attn_k.weight
create_tensor: loading tensor blk.13.attn_v.weight
create_tensor: loading tensor blk.13.attn_output.weight
create_tensor: loading tensor blk.13.attn_q.bias
create_tensor: loading tensor blk.13.attn_k.bias
create_tensor: loading tensor blk.13.attn_v.bias
create_tensor: loading tensor blk.13.ffn_norm.weight
create_tensor: loading tensor blk.13.ffn_gate.weight
create_tensor: loading tensor blk.13.ffn_down.weight
create_tensor: loading tensor blk.13.ffn_up.weight
create_tensor: loading tensor blk.14.attn_norm.weight
create_tensor: loading tensor blk.14.attn_q.weight
create_tensor: loading tensor blk.14.attn_k.weight
create_tensor: loading tensor blk.14.attn_v.weight
create_tensor: loading tensor blk.14.attn_output.weight
create_tensor: loading tensor blk.14.attn_q.bias
create_tensor: loading tensor blk.14.attn_k.bias
create_tensor: loading tensor blk.14.attn_v.bias
create_tensor: loading tensor blk.14.ffn_norm.weight
create_tensor: loading tensor blk.14.ffn_gate.weight
create_tensor: loading tensor blk.14.ffn_down.weight
create_tensor: loading tensor blk.14.ffn_up.weight
create_tensor: loading tensor blk.15.attn_norm.weight
create_tensor: loading tensor blk.15.attn_q.weight
create_tensor: loading tensor blk.15.attn_k.weight
create_tensor: loading tensor blk.15.attn_v.weight
create_tensor: loading tensor blk.15.attn_output.weight
create_tensor: loading tensor blk.15.attn_q.bias
create_tensor: loading tensor blk.15.attn_k.bias
create_tensor: loading tensor blk.15.attn_v.bias
create_tensor: loading tensor blk.15.ffn_norm.weight
create_tensor: loading tensor blk.15.ffn_gate.weight
create_tensor: loading tensor blk.15.ffn_down.weight
create_tensor: loading tensor blk.15.ffn_up.weight
create_tensor: loading tensor blk.16.attn_norm.weight
create_tensor: loading tensor blk.16.attn_q.weight
create_tensor: loading tensor blk.16.attn_k.weight
create_tensor: loading tensor blk.16.attn_v.weight
create_tensor: loading tensor blk.16.attn_output.weight
create_tensor: loading tensor blk.16.attn_q.bias
create_tensor: loading tensor blk.16.attn_k.bias
create_tensor: loading tensor blk.16.attn_v.bias
create_tensor: loading tensor blk.16.ffn_norm.weight
create_tensor: loading tensor blk.16.ffn_gate.weight
create_tensor: loading tensor blk.16.ffn_down.weight
create_tensor: loading tensor blk.16.ffn_up.weight
create_tensor: loading tensor blk.17.attn_norm.weight
create_tensor: loading tensor blk.17.attn_q.weight
create_tensor: loading tensor blk.17.attn_k.weight
create_tensor: loading tensor blk.17.attn_v.weight
create_tensor: loading tensor blk.17.attn_output.weight
create_tensor: loading tensor blk.17.attn_q.bias
create_tensor: loading tensor blk.17.attn_k.bias
create_tensor: loading tensor blk.17.attn_v.bias
create_tensor: loading tensor blk.17.ffn_norm.weight
create_tensor: loading tensor blk.17.ffn_gate.weight
create_tensor: loading tensor blk.17.ffn_down.weight
create_tensor: loading tensor blk.17.ffn_up.weight
create_tensor: loading tensor blk.18.attn_norm.weight
create_tensor: loading tensor blk.18.attn_q.weight
create_tensor: loading tensor blk.18.attn_k.weight
create_tensor: loading tensor blk.18.attn_v.weight
create_tensor: loading tensor blk.18.attn_output.weight
create_tensor: loading tensor blk.18.attn_q.bias
create_tensor: loading tensor blk.18.attn_k.bias
create_tensor: loading tensor blk.18.attn_v.bias
create_tensor: loading tensor blk.18.ffn_norm.weight
create_tensor: loading tensor blk.18.ffn_gate.weight
create_tensor: loading tensor blk.18.ffn_down.weight
create_tensor: loading tensor blk.18.ffn_up.weight
create_tensor: loading tensor blk.19.attn_norm.weight
create_tensor: loading tensor blk.19.attn_q.weight
create_tensor: loading tensor blk.19.attn_k.weight
create_tensor: loading tensor blk.19.attn_v.weight
create_tensor: loading tensor blk.19.attn_output.weight
create_tensor: loading tensor blk.19.attn_q.bias
create_tensor: loading tensor blk.19.attn_k.bias
create_tensor: loading tensor blk.19.attn_v.bias
create_tensor: loading tensor blk.19.ffn_norm.weight
create_tensor: loading tensor blk.19.ffn_gate.weight
create_tensor: loading tensor blk.19.ffn_down.weight
create_tensor: loading tensor blk.19.ffn_up.weight
create_tensor: loading tensor blk.20.attn_norm.weight
create_tensor: loading tensor blk.20.attn_q.weight
create_tensor: loading tensor blk.20.attn_k.weight
create_tensor: loading tensor blk.20.attn_v.weight
create_tensor: loading tensor blk.20.attn_output.weight
create_tensor: loading tensor blk.20.attn_q.bias
create_tensor: loading tensor blk.20.attn_k.bias
create_tensor: loading tensor blk.20.attn_v.bias
create_tensor: loading tensor blk.20.ffn_norm.weight
create_tensor: loading tensor blk.20.ffn_gate.weight
create_tensor: loading tensor blk.20.ffn_down.weight
create_tensor: loading tensor blk.20.ffn_up.weight
create_tensor: loading tensor blk.21.attn_norm.weight
create_tensor: loading tensor blk.21.attn_q.weight
create_tensor: loading tensor blk.21.attn_k.weight
create_tensor: loading tensor blk.21.attn_v.weight
create_tensor: loading tensor blk.21.attn_output.weight
create_tensor: loading tensor blk.21.attn_q.bias
create_tensor: loading tensor blk.21.attn_k.bias
create_tensor: loading tensor blk.21.attn_v.bias
create_tensor: loading tensor blk.21.ffn_norm.weight
create_tensor: loading tensor blk.21.ffn_gate.weight
create_tensor: loading tensor blk.21.ffn_down.weight
create_tensor: loading tensor blk.21.ffn_up.weight
create_tensor: loading tensor blk.22.attn_norm.weight
create_tensor: loading tensor blk.22.attn_q.weight
create_tensor: loading tensor blk.22.attn_k.weight
create_tensor: loading tensor blk.22.attn_v.weight
create_tensor: loading tensor blk.22.attn_output.weight
create_tensor: loading tensor blk.22.attn_q.bias
create_tensor: loading tensor blk.22.attn_k.bias
create_tensor: loading tensor blk.22.attn_v.bias
create_tensor: loading tensor blk.22.ffn_norm.weight
create_tensor: loading tensor blk.22.ffn_gate.weight
create_tensor: loading tensor blk.22.ffn_down.weight
create_tensor: loading tensor blk.22.ffn_up.weight
create_tensor: loading tensor blk.23.attn_norm.weight
create_tensor: loading tensor blk.23.attn_q.weight
create_tensor: loading tensor blk.23.attn_k.weight
create_tensor: loading tensor blk.23.attn_v.weight
create_tensor: loading tensor blk.23.attn_output.weight
create_tensor: loading tensor blk.23.attn_q.bias
create_tensor: loading tensor blk.23.attn_k.bias
create_tensor: loading tensor blk.23.attn_v.bias
create_tensor: loading tensor blk.23.ffn_norm.weight
create_tensor: loading tensor blk.23.ffn_gate.weight
create_tensor: loading tensor blk.23.ffn_down.weight
create_tensor: loading tensor blk.23.ffn_up.weight
create_tensor: loading tensor blk.24.attn_norm.weight
create_tensor: loading tensor blk.24.attn_q.weight
create_tensor: loading tensor blk.24.attn_k.weight
create_tensor: loading tensor blk.24.attn_v.weight
create_tensor: loading tensor blk.24.attn_output.weight
create_tensor: loading tensor blk.24.attn_q.bias
create_tensor: loading tensor blk.24.attn_k.bias
create_tensor: loading tensor blk.24.attn_v.bias
create_tensor: loading tensor blk.24.ffn_norm.weight
create_tensor: loading tensor blk.24.ffn_gate.weight
create_tensor: loading tensor blk.24.ffn_down.weight
create_tensor: loading tensor blk.24.ffn_up.weight
create_tensor: loading tensor blk.25.attn_norm.weight
create_tensor: loading tensor blk.25.attn_q.weight
create_tensor: loading tensor blk.25.attn_k.weight
create_tensor: loading tensor blk.25.attn_v.weight
create_tensor: loading tensor blk.25.attn_output.weight
create_tensor: loading tensor blk.25.attn_q.bias
create_tensor: loading tensor blk.25.attn_k.bias
create_tensor: loading tensor blk.25.attn_v.bias
create_tensor: loading tensor blk.25.ffn_norm.weight
create_tensor: loading tensor blk.25.ffn_gate.weight
create_tensor: loading tensor blk.25.ffn_down.weight
create_tensor: loading tensor blk.25.ffn_up.weight
create_tensor: loading tensor blk.26.attn_norm.weight
create_tensor: loading tensor blk.26.attn_q.weight
create_tensor: loading tensor blk.26.attn_k.weight
create_tensor: loading tensor blk.26.attn_v.weight
create_tensor: loading tensor blk.26.attn_output.weight
create_tensor: loading tensor blk.26.attn_q.bias
create_tensor: loading tensor blk.26.attn_k.bias
create_tensor: loading tensor blk.26.attn_v.bias
create_tensor: loading tensor blk.26.ffn_norm.weight
create_tensor: loading tensor blk.26.ffn_gate.weight
create_tensor: loading tensor blk.26.ffn_down.weight
create_tensor: loading tensor blk.26.ffn_up.weight
create_tensor: loading tensor blk.27.attn_norm.weight
create_tensor: loading tensor blk.27.attn_q.weight
create_tensor: loading tensor blk.27.attn_k.weight
create_tensor: loading tensor blk.27.attn_v.weight
create_tensor: loading tensor blk.27.attn_output.weight
create_tensor: loading tensor blk.27.attn_q.bias
create_tensor: loading tensor blk.27.attn_k.bias
create_tensor: loading tensor blk.27.attn_v.bias
create_tensor: loading tensor blk.27.ffn_norm.weight
create_tensor: loading tensor blk.27.ffn_gate.weight
create_tensor: loading tensor blk.27.ffn_down.weight
create_tensor: loading tensor blk.27.ffn_up.weight
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e406
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1296371712.00 bytes (1.21 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18145901035 total: 19442272747
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e406
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 2340569088.00 bytes (2.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 17101703659 total: 19442272747
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors:      Vulkan0 model buffer size =  4168.09 MiB
load_tensors:  Vulkan_Host model buffer size =   292.36 MiB
ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x000000000000e406
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB
Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 5983940608.00 bytes (5.57 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 13458332139 total: 19442272747
load_all_data: device Vulkan0 does not support async, host buffers or events
time=2026-02-13T09:14:03.300+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.00"
time=2026-02-13T09:14:03.802+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.11"
time=2026-02-13T09:14:04.053+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.16"
time=2026-02-13T09:14:04.304+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.22"
time=2026-02-13T09:14:04.555+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.26"
time=2026-02-13T09:14:04.807+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.32"
time=2026-02-13T09:14:05.058+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.36"
time=2026-02-13T09:14:05.309+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.40"
time=2026-02-13T09:14:05.560+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.44"
time=2026-02-13T09:14:05.813+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.47"
time=2026-02-13T09:14:06.068+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.51"
time=2026-02-13T09:14:06.319+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.55"
time=2026-02-13T09:14:06.570+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.60"
time=2026-02-13T09:14:06.821+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.63"
time=2026-02-13T09:14:07.072+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.68"
time=2026-02-13T09:14:07.323+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.72"
time=2026-02-13T09:14:07.574+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.77"
time=2026-02-13T09:14:07.825+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.81"
time=2026-02-13T09:14:08.075+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.86"
time=2026-02-13T09:14:08.328+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.91"
load_all_data: buffer type Vulkan_Host is not the default buffer type for device Vulkan0 for async uploads
time=2026-02-13T09:14:08.579+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.93"
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
set_abort_callback: call
llama_context: Vulkan_Host  output buffer size =     0.59 MiB
llama_kv_cache: layer   0: dev = Vulkan0
llama_kv_cache: layer   1: dev = Vulkan0
llama_kv_cache: layer   2: dev = Vulkan0
llama_kv_cache: layer   3: dev = Vulkan0
llama_kv_cache: layer   4: dev = Vulkan0
llama_kv_cache: layer   5: dev = Vulkan0
llama_kv_cache: layer   6: dev = Vulkan0
llama_kv_cache: layer   7: dev = Vulkan0
llama_kv_cache: layer   8: dev = Vulkan0
llama_kv_cache: layer   9: dev = Vulkan0
llama_kv_cache: layer  10: dev = Vulkan0
llama_kv_cache: layer  11: dev = Vulkan0
llama_kv_cache: layer  12: dev = Vulkan0
llama_kv_cache: layer  13: dev = Vulkan0
llama_kv_cache: layer  14: dev = Vulkan0
llama_kv_cache: layer  15: dev = Vulkan0
llama_kv_cache: layer  16: dev = Vulkan0
llama_kv_cache: layer  17: dev = Vulkan0
llama_kv_cache: layer  18: dev = Vulkan0
llama_kv_cache: layer  19: dev = Vulkan0
llama_kv_cache: layer  20: dev = Vulkan0
llama_kv_cache: layer  21: dev = Vulkan0
llama_kv_cache: layer  22: dev = Vulkan0
llama_kv_cache: layer  23: dev = Vulkan0
llama_kv_cache: layer  24: dev = Vulkan0
llama_kv_cache: layer  25: dev = Vulkan0
llama_kv_cache: layer  26: dev = Vulkan0
llama_kv_cache: layer  27: dev = Vulkan0
llama_kv_cache:    Vulkan0 KV buffer size =   448.00 MiB
time=2026-02-13T09:14:08.831+01:00 level=DEBUG source=server.go:1394 msg="model load progress 1.00"
llama_kv_cache: size =  448.00 MiB (  8192 cells,  28 layers,  1/1 seqs), K (f16):  224.00 MiB, V (f16):  224.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 2712
llama_context: reserving full memory module
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
llama_context: Flash Attention was auto, set to enabled
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
time=2026-02-13T09:14:09.082+01:00 level=DEBUG source=server.go:1397 msg="model load completed, waiting for server to become available" status="llm server loading model"
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context:    Vulkan0 compute buffer size =   304.00 MiB
llama_context: Vulkan_Host compute buffer size =    23.01 MiB
llama_context: graph nodes  = 959
llama_context: graph splits = 2
time=2026-02-13T09:14:09.333+01:00 level=INFO source=server.go:1388 msg="llama runner started in 7.59 seconds"
time=2026-02-13T09:14:09.333+01:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
time=2026-02-13T09:14:09.333+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-13T09:14:09.334+01:00 level=INFO source=server.go:1388 msg="llama runner started in 7.59 seconds"
time=2026-02-13T09:14:09.334+01:00 level=DEBUG source=sched.go:549 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen2.5:7b runner.inference="[{ID:8680a064-0400-0000-0002-000000000000 Library:Vulkan}]" runner.size="5.0 GiB" runner.vram="5.0 GiB" runner.parallel=1 runner.pid=14564 runner.model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 runner.num_ctx=8192
time=2026-02-13T09:14:09.352+01:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=13350 format=""
time=2026-02-13T09:14:09.370+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=2968 used=0 remaining=2968
Exception 0xe06d7363 0x19930520 0x970f0f93c0 0x7ff88155a80a
PC=0x7ff88155a80a
signal arrived during external code execution

runtime.cgocall(0x7ff6bb438c50, 0xc000209b88)
        runtime/cgocall.go:167 +0x3e fp=0xc000209b60 sp=0xc000209af8 pc=0x7ff6ba5b243e
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x227f16fa740, {0x200, 0x227ef86fa00, 0x0, 0x227ef870210, 0x227ef96e350, 0x227ef998eb0, 0x227ef882e10})
        _cgo_gotypes.go:677 +0x50 fp=0xc000209b88 sp=0xc000209b60 pc=0x7ff6baa56a70
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
        github.com/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0xc000209da0?, 0x1?)
        github.com/ollama/ollama/llama/llama.go:173 +0xed fp=0xc000209c70 sp=0xc000209b88 pc=0x7ff6baa59fad
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004b2280, 0xc00039c320, 0xc000209f28)
        github.com/ollama/ollama/runner/llamarunner/runner.go:494 +0x250 fp=0xc000209ee8 sp=0xc000209c70 pc=0x7ff6bab06ff0
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004b2280, {0x7ff6bbca8b70, 0xc0000d2b90})
        github.com/ollama/ollama/runner/llamarunner/runner.go:387 +0x1d5 fp=0xc000209fb8 sp=0xc000209ee8 pc=0x7ff6bab06c35
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000209fe0 sp=0xc000209fb8 pc=0x7ff6bab0c008
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000209fe8 sp=0xc000209fe0 pc=0x7ff6ba5bd9a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]:
runtime.gopark(0x7ff6ba5bf1a0?, 0x7ff6bc73aa40?, 0x20?, 0x60?, 0xc0004e60cc?)
        runtime/proc.go:435 +0xce fp=0xc0003e5630 sp=0xc0003e5610 pc=0x7ff6ba5b598e
runtime.netpollblock(0x43c?, 0xba550406?, 0xf6?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0003e5668 sp=0xc0003e5630 pc=0x7ff6ba57bdf7
internal/poll.runtime_pollWait(0x227ed360190, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc0003e5688 sp=0xc0003e5668 pc=0x7ff6ba5b4b25
internal/poll.(*pollDesc).wait(0x7ff6ba64a8f3?, 0x0?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003e56b0 sp=0xc0003e5688 pc=0x7ff6ba64bee7
internal/poll.execIO(0xc0004e6020, 0xc00047f758)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc0003e5728 sp=0xc0003e56b0 pc=0x7ff6ba64d345
internal/poll.(*FD).acceptOne(0xc0004e6008, 0x448, {0xc0004de1e0?, 0xc00047f7b8?, 0x7ff6ba655005?}, 0xc00047f7ec?)
        internal/poll/fd_windows.go:946 +0x65 fp=0xc0003e5788 sp=0xc0003e5728 pc=0x7ff6ba6518c5
internal/poll.(*FD).Accept(0xc0004e6008, 0xc0003e5938)
        internal/poll/fd_windows.go:980 +0x1b6 fp=0xc0003e5840 sp=0xc0003e5788 pc=0x7ff6ba651bf6
net.(*netFD).accept(0xc0004e6008)
        net/fd_windows.go:182 +0x4b fp=0xc0003e5958 sp=0xc0003e5840 pc=0x7ff6ba6c344b
net.(*TCPListener).accept(0xc000051c40)
        net/tcpsock_posix.go:159 +0x1b fp=0xc0003e59a8 sp=0xc0003e5958 pc=0x7ff6ba6d99fb
net.(*TCPListener).Accept(0xc000051c40)
        net/tcpsock.go:380 +0x30 fp=0xc0003e59d8 sp=0xc0003e59a8 pc=0x7ff6ba6d87b0
net/http.(*onceCloseListener).Accept(0xc0004b61b0?)
        <autogenerated>:1 +0x24 fp=0xc0003e59f0 sp=0xc0003e59d8 pc=0x7ff6ba8f1ec4
net/http.(*Server).Serve(0xc0004b0700, {0x7ff6bbca6310, 0xc000051c40})
        net/http/server.go:3424 +0x30c fp=0xc0003e5b20 sp=0xc0003e59f0 pc=0x7ff6ba8c978c
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000a4020, 0x4, 0x6})
        github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc0003e5cf0 sp=0xc0003e5b20 pc=0x7ff6bab0bd95
github.com/ollama/ollama/runner.Execute({0xc0000a4010?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:25 +0x1a5 fp=0xc0003e5d30 sp=0xc0003e5cf0 pc=0x7ff6babde6a5
github.com/ollama/ollama/cmd.NewCLI.func3(0xc0004b0200?, {0x7ff6bba8c247?, 0x4?, 0x7ff6bba8c24b?})
        github.com/ollama/ollama/cmd/cmd.go:2237 +0x45 fp=0xc0003e5d58 sp=0xc0003e5d30 pc=0x7ff6bb3c8045
github.com/spf13/cobra.(*Command).execute(0xc000236008, {0xc000051100, 0x4, 0x4})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003e5e78 sp=0xc0003e5d58 pc=0x7ff6ba73e61c
github.com/spf13/cobra.(*Command).ExecuteC(0xc000467208)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003e5f30 sp=0xc0003e5e78 pc=0x7ff6ba73ee65
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0003e5f50 sp=0xc0003e5f30 pc=0x7ff6bb3c9dcd
runtime.main()
        runtime/proc.go:283 +0x27d fp=0xc0003e5fe0 sp=0xc0003e5f50 pc=0x7ff6ba584ddd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0003e5fe8 sp=0xc0003e5fe0 pc=0x7ff6ba5bd9a1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00006ffa8 sp=0xc00006ff88 pc=0x7ff6ba5b598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0xc00006ffe0 sp=0xc00006ffa8 pc=0x7ff6ba5850f8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff6ba5bd9a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000071f80 sp=0xc000071f60 pc=0x7ff6ba5b598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0xc00007e000)
        runtime/mgcsweep.go:316 +0xdf fp=0xc000071fc8 sp=0xc000071f80 pc=0x7ff6ba56debf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff6ba562285
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff6ba5bd9a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff6bbc90f60?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x7ff6ba5b598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff6bc764640)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x7ff6ba56b909
runtime.bgscavenge(0xc00007e000)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x7ff6ba56be99
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7ff6ba562225
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff6ba5bd9a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x7ff6bbb0c910?, 0x0?, 0xc0?, 0x1000000010?)
        runtime/proc.go:435 +0xce fp=0xc000087e30 sp=0xc000087e10 pc=0x7ff6ba5b598e
runtime.runfinq()
        runtime/mfinal.go:196 +0x107 fp=0xc000087fe0 sp=0xc000087e30 pc=0x7ff6ba561207
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x7ff6ba5bd9a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc000171680?, 0xc0003a6030?, 0x60?, 0x3f?, 0x7ff6ba6ac148?)
        runtime/proc.go:435 +0xce fp=0xc000073f18 sp=0xc000073ef8 pc=0x7ff6ba5b598e
runtime.chanrecv(0xc00008a3f0, 0x0, 0x1)
        runtime/chan.go:664 +0x445 fp=0xc000073f90 sp=0xc000073f18 pc=0x7ff6ba552d45
runtime.chanrecv1(0x7ff6ba584f40?, 0xc000073f76?)
        runtime/chan.go:506 +0x12 fp=0xc000073fb8 sp=0xc000073f90 pc=0x7ff6ba5528d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x2f fp=0xc000073fe0 sp=0xc000073fb8 pc=0x7ff6ba5654af
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff6ba5bd9a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0003da380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc0004861c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000491f38 sp=0xc000491f18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc000491fc8 sp=0xc000491f38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000491fe0 sp=0xc000491fc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000491fe8 sp=0xc000491fe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000486380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000493f38 sp=0xc000493f18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc000493fc8 sp=0xc000493f38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000493fe0 sp=0xc000493fc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000493fe8 sp=0xc000493fe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000486540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00048df38 sp=0xc00048df18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc00048dfc8 sp=0xc00048df38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00048dfe0 sp=0xc00048dfc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00048dfe8 sp=0xc00048dfe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000486700 m=nil [GC worker (idle)]:
runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00048ff38 sp=0xc00048ff18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc00048ffc8 sp=0xc00048ff38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00048ffe0 sp=0xc00048ffc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00048ffe8 sp=0xc00048ffe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000206000 m=nil [GC worker (idle)]:
runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00020df38 sp=0xc00020df18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc00020dfc8 sp=0xc00020df38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00020dfe0 sp=0xc00020dfc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00020dfe8 sp=0xc00020dfe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0003da540 m=nil [GC worker (idle)]:
runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc0002061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00020ff38 sp=0xc00020ff18 pc=0x7ff6ba5b598e
runtime.gcBgMarkWorker(0xc00008bb90)
        runtime/mgc.go:1423 +0xe9 fp=0xc00020ffc8 sp=0xc00020ff38 pc=0x7ff6ba5647a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00020ffe0 sp=0xc00020ffc8 pc=0x7ff6ba564685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00020ffe8 sp=0xc00020ffe0 pc=0x7ff6ba5bd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 23 gp=0xc000206540 m=nil [select]:
runtime.gopark(0xc000049a78?, 0x2?, 0xa?, 0x0?, 0xc0000498cc?)
        runtime/proc.go:435 +0xce fp=0xc000049700 sp=0xc0000496e0 pc=0x7ff6ba5b598e
runtime.selectgo(0xc000049a78, 0xc0000498c8, 0xb98?, 0x0, 0x1?, 0x1)
        runtime/select.go:351 +0x837 fp=0xc000049838 sp=0xc000049700 pc=0x7ff6ba596437
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0004b2280, {0x7ff6bbca64c0, 0xc000001500}, 0xc0002cf900)
        github.com/ollama/ollama/runner/llamarunner/runner.go:716 +0xbe5 fp=0xc000049ac0 sp=0xc000049838 pc=0x7ff6bab08fa5
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x7ff6bbca64c0?, 0xc000001500?}, 0xc000049b40?)
        <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x7ff6bab0c416
net/http.HandlerFunc.ServeHTTP(0xc0000c6000?, {0x7ff6bbca64c0?, 0xc000001500?}, 0xc000049b60?)
        net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x7ff6ba8c5dc9
net/http.(*ServeMux).ServeHTTP(0x7ff6ba55b785?, {0x7ff6bbca64c0, 0xc000001500}, 0xc0002cf900)
        net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x7ff6ba8c7cc4
net/http.serverHandler.ServeHTTP({0x7ff6bbca2730?}, {0x7ff6bbca64c0?, 0xc000001500?}, 0x1?)
        net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x7ff6ba8e574e
net/http.(*conn).serve(0xc0004b61b0, {0x7ff6bbca8b38, 0xc0001c73b0})
        net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff6ba8c42c5
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff6ba8c9b88
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff6ba5bd9a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3454 +0x485

goroutine 71 gp=0xc000506380 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc0004e62a0?, 0x48?, 0x63?, 0xc0004e634c?)
        runtime/proc.go:435 +0xce fp=0xc000251d58 sp=0xc000251d38 pc=0x7ff6ba5b598e
runtime.netpollblock(0x444?, 0xba550406?, 0xf6?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000251d90 sp=0xc000251d58 pc=0x7ff6ba57bdf7
internal/poll.runtime_pollWait(0x227ed360078, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000251db0 sp=0xc000251d90 pc=0x7ff6ba5b4b25
internal/poll.(*pollDesc).wait(0x444?, 0x72?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000251dd8 sp=0xc000251db0 pc=0x7ff6ba64bee7
internal/poll.execIO(0xc0004e62a0, 0x7ff6bbb0d0a8)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc000251e50 sp=0xc000251dd8 pc=0x7ff6ba64d345
internal/poll.(*FD).Read(0xc0004e6288, {0xc0003240a1, 0x1, 0x1})
        internal/poll/fd_windows.go:438 +0x29b fp=0xc000251ef0 sp=0xc000251e50 pc=0x7ff6ba64e01b
net.(*netFD).Read(0xc0004e6288, {0xc0003240a1?, 0xc0002ae058?, 0xc000251f70?})
        net/fd_posix.go:55 +0x25 fp=0xc000251f38 sp=0xc000251ef0 pc=0x7ff6ba6c1325
net.(*conn).Read(0xc000076240, {0xc0003240a1?, 0x0?, 0x0?})
        net/net.go:194 +0x45 fp=0xc000251f80 sp=0xc000251f38 pc=0x7ff6ba6d0a45
net/http.(*connReader).backgroundRead(0xc000324090)
        net/http/server.go:690 +0x37 fp=0xc000251fc8 sp=0xc000251f80 pc=0x7ff6ba8be197
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc000251fe0 sp=0xc000251fc8 pc=0x7ff6ba8be0c5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000251fe8 sp=0xc000251fe0 pc=0x7ff6ba5bd9a1
created by net/http.(*connReader).startBackgroundRead in goroutine 23
        net/http/server.go:686 +0xb6
rax     0xfffffffe
rbx     0x970f0f9348
rcx     0x0
rdx     0x970f0f8c90
rdi     0xe06d7363
rsi     0x1
rbp     0x4
rsp     0x970f0f9220
r8      0x1
r9      0xe06d7363
r10     0x7ff88135984c
r11     0x970f0f8cf0
r12     0x0
r13     0x227f6d53a70
r14     0x200
r15     0x0
rip     0x7ff88155a80a
rflags  0x202
cs      0x33
fs      0x53
gs      0x2b
time=2026-02-13T09:14:37.205+01:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:60306/completion\": read tcp 127.0.0.1:60321->127.0.0.1:60306: wsarecv: An existing connection was forcibly closed by the remote host."
[GIN] 2026/02/13 - 09:14:37 | 500 |   36.4781606s |       127.0.0.1 | POST     "/api/chat"
time=2026-02-13T09:14:37.206+01:00 level=DEBUG source=sched.go:557 msg="context for request finished"
time=2026-02-13T09:14:37.206+01:00 level=DEBUG source=sched.go:310 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen2.5:7b runner.inference="[{ID:8680a064-0400-0000-0002-000000000000 Library:Vulkan}]" runner.size="5.0 GiB" runner.vram="5.0 GiB" runner.parallel=1 runner.pid=14564 runner.model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 runner.num_ctx=8192 duration=30m0s
time=2026-02-13T09:14:37.206+01:00 level=DEBUG source=sched.go:328 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen2.5:7b runner.inference="[{ID:8680a064-0400-0000-0002-000000000000 Library:Vulkan}]" runner.size="5.0 GiB" runner.vram="5.0 GiB" runner.parallel=1 runner.pid=14564 runner.model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 runner.num_ctx=8192 refCount=0
<!-- gh-comment-id:3893202174 --> @Josejsi commented on GitHub (Feb 12, 2026): Update: Tested on v0.16.1 - Issue Persists The issue remains even after updating to version 0.16.1. Ollama correctly detects the Intel Arc 140V and loads the model into VRAM (approx. 5GB for Qwen2.5 7B), but crashes during inference. Key Technical Details from Logs: - Backend: Vulkan (ggml-vulkan.dll). - Error Code: Exception 0xe06d7363 at rip 0x7ff88155a80a. - Hardware: Intel(R) Arc(TM) 140V GPU (16GB). - Symptom: The runner subprocess terminates abruptly when POST /api/chat is called. +2 Full debug logs show that ggml_vulkan is properly initialized but fails during the actual compute execution. <details> <summary>Click to view full Ollama Debug Log</summary> ```text PS C:\Windows\system32\WindowsPowerShell\v1.0> $env:OLLAMA_VULKAN 1 PS C:\Windows\system32\WindowsPowerShell\v1.0> ollama serve time=2026-02-13T09:13:05.588+01:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\josej\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES:]" time=2026-02-13T09:13:05.605+01:00 level=INFO source=images.go:473 msg="total blobs: 10" time=2026-02-13T09:13:05.606+01:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-13T09:13:05.608+01:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.16.1)" time=2026-02-13T09:13:05.608+01:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2026-02-13T09:13:05.611+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-13T09:13:05.635+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51404" time=2026-02-13T09:13:05.637+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 time=2026-02-13T09:13:05.784+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=169.127ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[] time=2026-02-13T09:13:05.785+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51411" time=2026-02-13T09:13:05.786+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 time=2026-02-13T09:13:05.962+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=177.9347ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] time=2026-02-13T09:13:05.963+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51417" time=2026-02-13T09:13:05.964+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2026-02-13T09:13:06.083+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=120.352ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[] time=2026-02-13T09:13:06.084+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51423" time=2026-02-13T09:13:06.084+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan time=2026-02-13T09:13:06.564+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=480.5851ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan]" extra_envs=map[] time=2026-02-13T09:13:06.564+01:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1 time=2026-02-13T09:13:06.564+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=954.3483ms time=2026-02-13T09:13:06.564+01:00 level=INFO source=types.go:42 msg="inference compute" id=8680a064-0400-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(TM) 140V GPU (16GB)" libdirs=ollama,vulkan driver=0.0 pci_id="" type=iGPU total="18.1 GiB" available="17.0 GiB" time=2026-02-13T09:13:06.564+01:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="18.1 GiB" default_num_ctx=4096 [GIN] 2026/02/13 - 09:13:52 | 404 | 1.641ms | 127.0.0.1 | POST "/api/show" time=2026-02-13T09:13:52.294+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/02/13 - 09:13:52 | 200 | 125.6741ms | 127.0.0.1 | POST "/api/show" time=2026-02-13T09:13:52.298+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/02/13 - 09:13:52 | 200 | 129.444ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/02/13 - 09:13:56 | 404 | 1.6212ms | 127.0.0.1 | POST "/api/show" time=2026-02-13T09:13:56.562+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 time=2026-02-13T09:13:56.563+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/02/13 - 09:13:56 | 200 | 124.8013ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/02/13 - 09:13:56 | 200 | 125.5029ms | 127.0.0.1 | POST "/api/show" time=2026-02-13T09:14:00.815+01:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" time=2026-02-13T09:14:00.815+01:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2026-02-13T09:14:00.830+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60299" time=2026-02-13T09:14:00.830+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan time=2026-02-13T09:14:01.346+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=530.3492ms OLLAMA_LIBRARY_PATH="[C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan]" extra_envs=map[] time=2026-02-13T09:14:01.346+01:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=530.8737ms time=2026-02-13T09:14:01.346+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-13T09:14:01.346+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-02-13T09:14:01.347+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=4 threads=8 time=2026-02-13T09:14:01.347+01:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2026-02-13T09:14:01.362+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 time=2026-02-13T09:14:01.362+01:00 level=DEBUG source=sched.go:231 msg="loading first model" model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5 llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-7... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-7B llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-02-13T09:14:01.704+01:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\josej\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 60306" time=2026-02-13T09:14:01.705+01:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 PATH="C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\vulkan;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Autofirma\\Autofirma;C:\\Program Files\\dotnet\\;C:\\Users\\josej\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\josej\\AppData\\Local\\Programs\\Ollama;C:\\Users\\josej\\AppData\\Local\\Programs\\Microsoft VS Code\\bin" OLLAMA_LIBRARY_PATH=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan time=2026-02-13T09:14:01.743+01:00 level=INFO source=sched.go:463 msg="system memory" total="31.5 GiB" free="15.6 GiB" free_swap="15.9 GiB" time=2026-02-13T09:14:01.751+01:00 level=INFO source=sched.go:470 msg="gpu memory" id=8680a064-0400-0000-0002-000000000000 library=Vulkan available="16.4 GiB" free="16.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-13T09:14:01.751+01:00 level=INFO source=server.go:498 msg="loading model" "model layers"=29 requested=-1 time=2026-02-13T09:14:01.752+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=qwen2.attention.key_length default=128 time=2026-02-13T09:14:01.752+01:00 level=DEBUG source=ggml.go:300 msg="key with type not found" key=qwen2.attention.value_length default=128 time=2026-02-13T09:14:01.753+01:00 level=DEBUG source=ggml.go:635 msg="default cache size estimate" "attention MiB"=448 "attention bytes"=469762048 "recurrent MiB"=0 "recurrent bytes"=0 time=2026-02-13T09:14:01.755+01:00 level=DEBUG source=server.go:976 msg="available gpu" id=8680a064-0400-0000-0002-000000000000 library=Vulkan "available layer vram"="16.3 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" time=2026-02-13T09:14:01.756+01:00 level=DEBUG source=server.go:976 msg="available gpu" id=8680a064-0400-0000-0002-000000000000 library=Vulkan "available layer vram"="15.5 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="730.4 MiB" time=2026-02-13T09:14:01.756+01:00 level=DEBUG source=server.go:669 msg=memory estimate.Vulkan0.ID=8680a064-0400-0000-0002-000000000000 estimate.Vulkan0.Weights="[149112832 149112832 149112832 131135488 131135488 149112832 131135488 148639744 131135488 131608576 131135488 131608576 148639744 131135488 149112832 131135488 131135488 149112832 131135488 131135488 149112832 131135488 131135488 149112832 149112832 149112832 149112832 149112832 447082496]" estimate.Vulkan0.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" estimate.Vulkan0.Graph=501221376 time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="4.1 GiB" time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="448.0 MiB" time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="478.0 MiB" time=2026-02-13T09:14:01.756+01:00 level=INFO source=device.go:272 msg="total memory" size="5.0 GiB" time=2026-02-13T09:14:01.786+01:00 level=INFO source=runner.go:965 msg="starting go runner" time=2026-02-13T09:14:01.799+01:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2026-02-13T09:14:01.813+01:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Arc(TM) 140V GPU (16GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat load_backend: loaded Vulkan backend from C:\Users\josej\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll time=2026-02-13T09:14:01.852+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2026-02-13T09:14:01.854+01:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:60306" time=2026-02-13T09:14:01.857+01:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:8192 KvCacheType: NumThreads:4 GPULayers:29[ID:8680a064-0400-0000-0002-000000000000 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e406 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1340088320.00 bytes (1.25 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18102184427 total: 19442272747 time=2026-02-13T09:14:02.045+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e406 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info time=2026-02-13T09:14:02.045+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" [DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1298079744.00 bytes (1.21 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18144193003 total: 19442272747 ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e406 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1298079744.00 bytes (1.21 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18144193003 total: 19442272747 llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Arc(TM) 140V GPU (16GB)) (unknown id) - 17303 MiB free llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5 llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-7... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-7B llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 3584 print_info: n_embd_inp = 3584 print_info: n_layer = 28 print_info: n_head = 28 print_info: n_head_kv = 4 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 7 print_info: n_embd_k_gqa = 512 print_info: n_embd_v_gqa = 512 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 18944 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = -1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 7B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: layer 0 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 1 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 2 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 3 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 4 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 5 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 6 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 7 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 8 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 9 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 10 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 11 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 12 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 13 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 14 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 15 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 16 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 17 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 18 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 19 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 20 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 21 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 22 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 23 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 24 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 25 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 26 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 27 assigned to device Vulkan0, is_swa = 0 load_tensors: layer 28 assigned to device Vulkan0, is_swa = 0 create_tensor: loading tensor token_embd.weight create_tensor: loading tensor output_norm.weight create_tensor: loading tensor output.weight create_tensor: loading tensor blk.0.attn_norm.weight create_tensor: loading tensor blk.0.attn_q.weight create_tensor: loading tensor blk.0.attn_k.weight create_tensor: loading tensor blk.0.attn_v.weight create_tensor: loading tensor blk.0.attn_output.weight create_tensor: loading tensor blk.0.attn_q.bias create_tensor: loading tensor blk.0.attn_k.bias create_tensor: loading tensor blk.0.attn_v.bias create_tensor: loading tensor blk.0.ffn_norm.weight create_tensor: loading tensor blk.0.ffn_gate.weight create_tensor: loading tensor blk.0.ffn_down.weight create_tensor: loading tensor blk.0.ffn_up.weight create_tensor: loading tensor blk.1.attn_norm.weight create_tensor: loading tensor blk.1.attn_q.weight create_tensor: loading tensor blk.1.attn_k.weight create_tensor: loading tensor blk.1.attn_v.weight create_tensor: loading tensor blk.1.attn_output.weight create_tensor: loading tensor blk.1.attn_q.bias create_tensor: loading tensor blk.1.attn_k.bias create_tensor: loading tensor blk.1.attn_v.bias create_tensor: loading tensor blk.1.ffn_norm.weight create_tensor: loading tensor blk.1.ffn_gate.weight create_tensor: loading tensor blk.1.ffn_down.weight create_tensor: loading tensor blk.1.ffn_up.weight create_tensor: loading tensor blk.2.attn_norm.weight create_tensor: loading tensor blk.2.attn_q.weight create_tensor: loading tensor blk.2.attn_k.weight create_tensor: loading tensor blk.2.attn_v.weight create_tensor: loading tensor blk.2.attn_output.weight create_tensor: loading tensor blk.2.attn_q.bias create_tensor: loading tensor blk.2.attn_k.bias create_tensor: loading tensor blk.2.attn_v.bias create_tensor: loading tensor blk.2.ffn_norm.weight create_tensor: loading tensor blk.2.ffn_gate.weight create_tensor: loading tensor blk.2.ffn_down.weight create_tensor: loading tensor blk.2.ffn_up.weight create_tensor: loading tensor blk.3.attn_norm.weight create_tensor: loading tensor blk.3.attn_q.weight create_tensor: loading tensor blk.3.attn_k.weight create_tensor: loading tensor blk.3.attn_v.weight create_tensor: loading tensor blk.3.attn_output.weight create_tensor: loading tensor blk.3.attn_q.bias create_tensor: loading tensor blk.3.attn_k.bias create_tensor: loading tensor blk.3.attn_v.bias create_tensor: loading tensor blk.3.ffn_norm.weight create_tensor: loading tensor blk.3.ffn_gate.weight create_tensor: loading tensor blk.3.ffn_down.weight create_tensor: loading tensor blk.3.ffn_up.weight create_tensor: loading tensor blk.4.attn_norm.weight create_tensor: loading tensor blk.4.attn_q.weight create_tensor: loading tensor blk.4.attn_k.weight create_tensor: loading tensor blk.4.attn_v.weight create_tensor: loading tensor blk.4.attn_output.weight create_tensor: loading tensor blk.4.attn_q.bias create_tensor: loading tensor blk.4.attn_k.bias create_tensor: loading tensor blk.4.attn_v.bias create_tensor: loading tensor blk.4.ffn_norm.weight create_tensor: loading tensor blk.4.ffn_gate.weight create_tensor: loading tensor blk.4.ffn_down.weight create_tensor: loading tensor blk.4.ffn_up.weight create_tensor: loading tensor blk.5.attn_norm.weight create_tensor: loading tensor blk.5.attn_q.weight create_tensor: loading tensor blk.5.attn_k.weight create_tensor: loading tensor blk.5.attn_v.weight create_tensor: loading tensor blk.5.attn_output.weight create_tensor: loading tensor blk.5.attn_q.bias create_tensor: loading tensor blk.5.attn_k.bias create_tensor: loading tensor blk.5.attn_v.bias create_tensor: loading tensor blk.5.ffn_norm.weight create_tensor: loading tensor blk.5.ffn_gate.weight create_tensor: loading tensor blk.5.ffn_down.weight create_tensor: loading tensor blk.5.ffn_up.weight create_tensor: loading tensor blk.6.attn_norm.weight create_tensor: loading tensor blk.6.attn_q.weight create_tensor: loading tensor blk.6.attn_k.weight create_tensor: loading tensor blk.6.attn_v.weight create_tensor: loading tensor blk.6.attn_output.weight create_tensor: loading tensor blk.6.attn_q.bias create_tensor: loading tensor blk.6.attn_k.bias create_tensor: loading tensor blk.6.attn_v.bias create_tensor: loading tensor blk.6.ffn_norm.weight create_tensor: loading tensor blk.6.ffn_gate.weight create_tensor: loading tensor blk.6.ffn_down.weight create_tensor: loading tensor blk.6.ffn_up.weight create_tensor: loading tensor blk.7.attn_norm.weight create_tensor: loading tensor blk.7.attn_q.weight create_tensor: loading tensor blk.7.attn_k.weight create_tensor: loading tensor blk.7.attn_v.weight create_tensor: loading tensor blk.7.attn_output.weight create_tensor: loading tensor blk.7.attn_q.bias create_tensor: loading tensor blk.7.attn_k.bias create_tensor: loading tensor blk.7.attn_v.bias create_tensor: loading tensor blk.7.ffn_norm.weight create_tensor: loading tensor blk.7.ffn_gate.weight create_tensor: loading tensor blk.7.ffn_down.weight create_tensor: loading tensor blk.7.ffn_up.weight create_tensor: loading tensor blk.8.attn_norm.weight create_tensor: loading tensor blk.8.attn_q.weight create_tensor: loading tensor blk.8.attn_k.weight create_tensor: loading tensor blk.8.attn_v.weight create_tensor: loading tensor blk.8.attn_output.weight create_tensor: loading tensor blk.8.attn_q.bias create_tensor: loading tensor blk.8.attn_k.bias create_tensor: loading tensor blk.8.attn_v.bias create_tensor: loading tensor blk.8.ffn_norm.weight create_tensor: loading tensor blk.8.ffn_gate.weight create_tensor: loading tensor blk.8.ffn_down.weight create_tensor: loading tensor blk.8.ffn_up.weight create_tensor: loading tensor blk.9.attn_norm.weight create_tensor: loading tensor blk.9.attn_q.weight create_tensor: loading tensor blk.9.attn_k.weight create_tensor: loading tensor blk.9.attn_v.weight create_tensor: loading tensor blk.9.attn_output.weight create_tensor: loading tensor blk.9.attn_q.bias create_tensor: loading tensor blk.9.attn_k.bias create_tensor: loading tensor blk.9.attn_v.bias create_tensor: loading tensor blk.9.ffn_norm.weight create_tensor: loading tensor blk.9.ffn_gate.weight create_tensor: loading tensor blk.9.ffn_down.weight create_tensor: loading tensor blk.9.ffn_up.weight create_tensor: loading tensor blk.10.attn_norm.weight create_tensor: loading tensor blk.10.attn_q.weight create_tensor: loading tensor blk.10.attn_k.weight create_tensor: loading tensor blk.10.attn_v.weight create_tensor: loading tensor blk.10.attn_output.weight create_tensor: loading tensor blk.10.attn_q.bias create_tensor: loading tensor blk.10.attn_k.bias create_tensor: loading tensor blk.10.attn_v.bias create_tensor: loading tensor blk.10.ffn_norm.weight create_tensor: loading tensor blk.10.ffn_gate.weight create_tensor: loading tensor blk.10.ffn_down.weight create_tensor: loading tensor blk.10.ffn_up.weight create_tensor: loading tensor blk.11.attn_norm.weight create_tensor: loading tensor blk.11.attn_q.weight create_tensor: loading tensor blk.11.attn_k.weight create_tensor: loading tensor blk.11.attn_v.weight create_tensor: loading tensor blk.11.attn_output.weight create_tensor: loading tensor blk.11.attn_q.bias create_tensor: loading tensor blk.11.attn_k.bias create_tensor: loading tensor blk.11.attn_v.bias create_tensor: loading tensor blk.11.ffn_norm.weight create_tensor: loading tensor blk.11.ffn_gate.weight create_tensor: loading tensor blk.11.ffn_down.weight create_tensor: loading tensor blk.11.ffn_up.weight create_tensor: loading tensor blk.12.attn_norm.weight create_tensor: loading tensor blk.12.attn_q.weight create_tensor: loading tensor blk.12.attn_k.weight create_tensor: loading tensor blk.12.attn_v.weight create_tensor: loading tensor blk.12.attn_output.weight create_tensor: loading tensor blk.12.attn_q.bias create_tensor: loading tensor blk.12.attn_k.bias create_tensor: loading tensor blk.12.attn_v.bias create_tensor: loading tensor blk.12.ffn_norm.weight create_tensor: loading tensor blk.12.ffn_gate.weight create_tensor: loading tensor blk.12.ffn_down.weight create_tensor: loading tensor blk.12.ffn_up.weight create_tensor: loading tensor blk.13.attn_norm.weight create_tensor: loading tensor blk.13.attn_q.weight create_tensor: loading tensor blk.13.attn_k.weight create_tensor: loading tensor blk.13.attn_v.weight create_tensor: loading tensor blk.13.attn_output.weight create_tensor: loading tensor blk.13.attn_q.bias create_tensor: loading tensor blk.13.attn_k.bias create_tensor: loading tensor blk.13.attn_v.bias create_tensor: loading tensor blk.13.ffn_norm.weight create_tensor: loading tensor blk.13.ffn_gate.weight create_tensor: loading tensor blk.13.ffn_down.weight create_tensor: loading tensor blk.13.ffn_up.weight create_tensor: loading tensor blk.14.attn_norm.weight create_tensor: loading tensor blk.14.attn_q.weight create_tensor: loading tensor blk.14.attn_k.weight create_tensor: loading tensor blk.14.attn_v.weight create_tensor: loading tensor blk.14.attn_output.weight create_tensor: loading tensor blk.14.attn_q.bias create_tensor: loading tensor blk.14.attn_k.bias create_tensor: loading tensor blk.14.attn_v.bias create_tensor: loading tensor blk.14.ffn_norm.weight create_tensor: loading tensor blk.14.ffn_gate.weight create_tensor: loading tensor blk.14.ffn_down.weight create_tensor: loading tensor blk.14.ffn_up.weight create_tensor: loading tensor blk.15.attn_norm.weight create_tensor: loading tensor blk.15.attn_q.weight create_tensor: loading tensor blk.15.attn_k.weight create_tensor: loading tensor blk.15.attn_v.weight create_tensor: loading tensor blk.15.attn_output.weight create_tensor: loading tensor blk.15.attn_q.bias create_tensor: loading tensor blk.15.attn_k.bias create_tensor: loading tensor blk.15.attn_v.bias create_tensor: loading tensor blk.15.ffn_norm.weight create_tensor: loading tensor blk.15.ffn_gate.weight create_tensor: loading tensor blk.15.ffn_down.weight create_tensor: loading tensor blk.15.ffn_up.weight create_tensor: loading tensor blk.16.attn_norm.weight create_tensor: loading tensor blk.16.attn_q.weight create_tensor: loading tensor blk.16.attn_k.weight create_tensor: loading tensor blk.16.attn_v.weight create_tensor: loading tensor blk.16.attn_output.weight create_tensor: loading tensor blk.16.attn_q.bias create_tensor: loading tensor blk.16.attn_k.bias create_tensor: loading tensor blk.16.attn_v.bias create_tensor: loading tensor blk.16.ffn_norm.weight create_tensor: loading tensor blk.16.ffn_gate.weight create_tensor: loading tensor blk.16.ffn_down.weight create_tensor: loading tensor blk.16.ffn_up.weight create_tensor: loading tensor blk.17.attn_norm.weight create_tensor: loading tensor blk.17.attn_q.weight create_tensor: loading tensor blk.17.attn_k.weight create_tensor: loading tensor blk.17.attn_v.weight create_tensor: loading tensor blk.17.attn_output.weight create_tensor: loading tensor blk.17.attn_q.bias create_tensor: loading tensor blk.17.attn_k.bias create_tensor: loading tensor blk.17.attn_v.bias create_tensor: loading tensor blk.17.ffn_norm.weight create_tensor: loading tensor blk.17.ffn_gate.weight create_tensor: loading tensor blk.17.ffn_down.weight create_tensor: loading tensor blk.17.ffn_up.weight create_tensor: loading tensor blk.18.attn_norm.weight create_tensor: loading tensor blk.18.attn_q.weight create_tensor: loading tensor blk.18.attn_k.weight create_tensor: loading tensor blk.18.attn_v.weight create_tensor: loading tensor blk.18.attn_output.weight create_tensor: loading tensor blk.18.attn_q.bias create_tensor: loading tensor blk.18.attn_k.bias create_tensor: loading tensor blk.18.attn_v.bias create_tensor: loading tensor blk.18.ffn_norm.weight create_tensor: loading tensor blk.18.ffn_gate.weight create_tensor: loading tensor blk.18.ffn_down.weight create_tensor: loading tensor blk.18.ffn_up.weight create_tensor: loading tensor blk.19.attn_norm.weight create_tensor: loading tensor blk.19.attn_q.weight create_tensor: loading tensor blk.19.attn_k.weight create_tensor: loading tensor blk.19.attn_v.weight create_tensor: loading tensor blk.19.attn_output.weight create_tensor: loading tensor blk.19.attn_q.bias create_tensor: loading tensor blk.19.attn_k.bias create_tensor: loading tensor blk.19.attn_v.bias create_tensor: loading tensor blk.19.ffn_norm.weight create_tensor: loading tensor blk.19.ffn_gate.weight create_tensor: loading tensor blk.19.ffn_down.weight create_tensor: loading tensor blk.19.ffn_up.weight create_tensor: loading tensor blk.20.attn_norm.weight create_tensor: loading tensor blk.20.attn_q.weight create_tensor: loading tensor blk.20.attn_k.weight create_tensor: loading tensor blk.20.attn_v.weight create_tensor: loading tensor blk.20.attn_output.weight create_tensor: loading tensor blk.20.attn_q.bias create_tensor: loading tensor blk.20.attn_k.bias create_tensor: loading tensor blk.20.attn_v.bias create_tensor: loading tensor blk.20.ffn_norm.weight create_tensor: loading tensor blk.20.ffn_gate.weight create_tensor: loading tensor blk.20.ffn_down.weight create_tensor: loading tensor blk.20.ffn_up.weight create_tensor: loading tensor blk.21.attn_norm.weight create_tensor: loading tensor blk.21.attn_q.weight create_tensor: loading tensor blk.21.attn_k.weight create_tensor: loading tensor blk.21.attn_v.weight create_tensor: loading tensor blk.21.attn_output.weight create_tensor: loading tensor blk.21.attn_q.bias create_tensor: loading tensor blk.21.attn_k.bias create_tensor: loading tensor blk.21.attn_v.bias create_tensor: loading tensor blk.21.ffn_norm.weight create_tensor: loading tensor blk.21.ffn_gate.weight create_tensor: loading tensor blk.21.ffn_down.weight create_tensor: loading tensor blk.21.ffn_up.weight create_tensor: loading tensor blk.22.attn_norm.weight create_tensor: loading tensor blk.22.attn_q.weight create_tensor: loading tensor blk.22.attn_k.weight create_tensor: loading tensor blk.22.attn_v.weight create_tensor: loading tensor blk.22.attn_output.weight create_tensor: loading tensor blk.22.attn_q.bias create_tensor: loading tensor blk.22.attn_k.bias create_tensor: loading tensor blk.22.attn_v.bias create_tensor: loading tensor blk.22.ffn_norm.weight create_tensor: loading tensor blk.22.ffn_gate.weight create_tensor: loading tensor blk.22.ffn_down.weight create_tensor: loading tensor blk.22.ffn_up.weight create_tensor: loading tensor blk.23.attn_norm.weight create_tensor: loading tensor blk.23.attn_q.weight create_tensor: loading tensor blk.23.attn_k.weight create_tensor: loading tensor blk.23.attn_v.weight create_tensor: loading tensor blk.23.attn_output.weight create_tensor: loading tensor blk.23.attn_q.bias create_tensor: loading tensor blk.23.attn_k.bias create_tensor: loading tensor blk.23.attn_v.bias create_tensor: loading tensor blk.23.ffn_norm.weight create_tensor: loading tensor blk.23.ffn_gate.weight create_tensor: loading tensor blk.23.ffn_down.weight create_tensor: loading tensor blk.23.ffn_up.weight create_tensor: loading tensor blk.24.attn_norm.weight create_tensor: loading tensor blk.24.attn_q.weight create_tensor: loading tensor blk.24.attn_k.weight create_tensor: loading tensor blk.24.attn_v.weight create_tensor: loading tensor blk.24.attn_output.weight create_tensor: loading tensor blk.24.attn_q.bias create_tensor: loading tensor blk.24.attn_k.bias create_tensor: loading tensor blk.24.attn_v.bias create_tensor: loading tensor blk.24.ffn_norm.weight create_tensor: loading tensor blk.24.ffn_gate.weight create_tensor: loading tensor blk.24.ffn_down.weight create_tensor: loading tensor blk.24.ffn_up.weight create_tensor: loading tensor blk.25.attn_norm.weight create_tensor: loading tensor blk.25.attn_q.weight create_tensor: loading tensor blk.25.attn_k.weight create_tensor: loading tensor blk.25.attn_v.weight create_tensor: loading tensor blk.25.attn_output.weight create_tensor: loading tensor blk.25.attn_q.bias create_tensor: loading tensor blk.25.attn_k.bias create_tensor: loading tensor blk.25.attn_v.bias create_tensor: loading tensor blk.25.ffn_norm.weight create_tensor: loading tensor blk.25.ffn_gate.weight create_tensor: loading tensor blk.25.ffn_down.weight create_tensor: loading tensor blk.25.ffn_up.weight create_tensor: loading tensor blk.26.attn_norm.weight create_tensor: loading tensor blk.26.attn_q.weight create_tensor: loading tensor blk.26.attn_k.weight create_tensor: loading tensor blk.26.attn_v.weight create_tensor: loading tensor blk.26.attn_output.weight create_tensor: loading tensor blk.26.attn_q.bias create_tensor: loading tensor blk.26.attn_k.bias create_tensor: loading tensor blk.26.attn_v.bias create_tensor: loading tensor blk.26.ffn_norm.weight create_tensor: loading tensor blk.26.ffn_gate.weight create_tensor: loading tensor blk.26.ffn_down.weight create_tensor: loading tensor blk.26.ffn_up.weight create_tensor: loading tensor blk.27.attn_norm.weight create_tensor: loading tensor blk.27.attn_q.weight create_tensor: loading tensor blk.27.attn_k.weight create_tensor: loading tensor blk.27.attn_v.weight create_tensor: loading tensor blk.27.attn_output.weight create_tensor: loading tensor blk.27.attn_q.bias create_tensor: loading tensor blk.27.attn_k.bias create_tensor: loading tensor blk.27.attn_v.bias create_tensor: loading tensor blk.27.ffn_norm.weight create_tensor: loading tensor blk.27.ffn_gate.weight create_tensor: loading tensor blk.27.ffn_down.weight create_tensor: loading tensor blk.27.ffn_up.weight ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e406 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 1296371712.00 bytes (1.21 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 18145901035 total: 19442272747 ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e406 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 2340569088.00 bytes (2.18 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 17101703659 total: 19442272747 load_tensors: offloading 28 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 29/29 layers to GPU load_tensors: Vulkan0 model buffer size = 4168.09 MiB load_tensors: Vulkan_Host model buffer size = 292.36 MiB ggml_backend_vk_get_device_memory called: uuid 8680a064-0400-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x000000000000e406 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: Intel(R) Arc(TM) Graphics, LUID: 0x000000000000E406, Dedicated: 0.12 GB, Shared: 17.98 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x000000000000E828, Dedicated: 0.00 GB, Shared: 17.98 GB Integrated GPU (Intel(R) Arc(TM) Graphics) with LUID 0x000000000000e406 detected. Shared Total: 19308055019.00 bytes (17.98 GB), Shared Usage: 5983940608.00 bytes (5.57 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 13458332139 total: 19442272747 load_all_data: device Vulkan0 does not support async, host buffers or events time=2026-02-13T09:14:03.300+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.00" time=2026-02-13T09:14:03.802+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.11" time=2026-02-13T09:14:04.053+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.16" time=2026-02-13T09:14:04.304+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.22" time=2026-02-13T09:14:04.555+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.26" time=2026-02-13T09:14:04.807+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.32" time=2026-02-13T09:14:05.058+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.36" time=2026-02-13T09:14:05.309+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.40" time=2026-02-13T09:14:05.560+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.44" time=2026-02-13T09:14:05.813+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.47" time=2026-02-13T09:14:06.068+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.51" time=2026-02-13T09:14:06.319+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.55" time=2026-02-13T09:14:06.570+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.60" time=2026-02-13T09:14:06.821+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.63" time=2026-02-13T09:14:07.072+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.68" time=2026-02-13T09:14:07.323+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.72" time=2026-02-13T09:14:07.574+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.77" time=2026-02-13T09:14:07.825+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.81" time=2026-02-13T09:14:08.075+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.86" time=2026-02-13T09:14:08.328+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.91" load_all_data: buffer type Vulkan_Host is not the default buffer type for device Vulkan0 for async uploads time=2026-02-13T09:14:08.579+01:00 level=DEBUG source=server.go:1394 msg="model load progress 0.93" llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 8192 llama_context: n_ctx_seq = 8192 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized set_abort_callback: call llama_context: Vulkan_Host output buffer size = 0.59 MiB llama_kv_cache: layer 0: dev = Vulkan0 llama_kv_cache: layer 1: dev = Vulkan0 llama_kv_cache: layer 2: dev = Vulkan0 llama_kv_cache: layer 3: dev = Vulkan0 llama_kv_cache: layer 4: dev = Vulkan0 llama_kv_cache: layer 5: dev = Vulkan0 llama_kv_cache: layer 6: dev = Vulkan0 llama_kv_cache: layer 7: dev = Vulkan0 llama_kv_cache: layer 8: dev = Vulkan0 llama_kv_cache: layer 9: dev = Vulkan0 llama_kv_cache: layer 10: dev = Vulkan0 llama_kv_cache: layer 11: dev = Vulkan0 llama_kv_cache: layer 12: dev = Vulkan0 llama_kv_cache: layer 13: dev = Vulkan0 llama_kv_cache: layer 14: dev = Vulkan0 llama_kv_cache: layer 15: dev = Vulkan0 llama_kv_cache: layer 16: dev = Vulkan0 llama_kv_cache: layer 17: dev = Vulkan0 llama_kv_cache: layer 18: dev = Vulkan0 llama_kv_cache: layer 19: dev = Vulkan0 llama_kv_cache: layer 20: dev = Vulkan0 llama_kv_cache: layer 21: dev = Vulkan0 llama_kv_cache: layer 22: dev = Vulkan0 llama_kv_cache: layer 23: dev = Vulkan0 llama_kv_cache: layer 24: dev = Vulkan0 llama_kv_cache: layer 25: dev = Vulkan0 llama_kv_cache: layer 26: dev = Vulkan0 llama_kv_cache: layer 27: dev = Vulkan0 llama_kv_cache: Vulkan0 KV buffer size = 448.00 MiB time=2026-02-13T09:14:08.831+01:00 level=DEBUG source=server.go:1394 msg="model load progress 1.00" llama_kv_cache: size = 448.00 MiB ( 8192 cells, 28 layers, 1/1 seqs), K (f16): 224.00 MiB, V (f16): 224.00 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 2 llama_context: max_nodes = 2712 llama_context: reserving full memory module llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 llama_context: Flash Attention was auto, set to enabled graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 time=2026-02-13T09:14:09.082+01:00 level=DEBUG source=server.go:1397 msg="model load completed, waiting for server to become available" status="llm server loading model" graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: Vulkan0 compute buffer size = 304.00 MiB llama_context: Vulkan_Host compute buffer size = 23.01 MiB llama_context: graph nodes = 959 llama_context: graph splits = 2 time=2026-02-13T09:14:09.333+01:00 level=INFO source=server.go:1388 msg="llama runner started in 7.59 seconds" time=2026-02-13T09:14:09.333+01:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 time=2026-02-13T09:14:09.333+01:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-02-13T09:14:09.334+01:00 level=INFO source=server.go:1388 msg="llama runner started in 7.59 seconds" time=2026-02-13T09:14:09.334+01:00 level=DEBUG source=sched.go:549 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen2.5:7b runner.inference="[{ID:8680a064-0400-0000-0002-000000000000 Library:Vulkan}]" runner.size="5.0 GiB" runner.vram="5.0 GiB" runner.parallel=1 runner.pid=14564 runner.model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 runner.num_ctx=8192 time=2026-02-13T09:14:09.352+01:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=13350 format="" time=2026-02-13T09:14:09.370+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=2968 used=0 remaining=2968 Exception 0xe06d7363 0x19930520 0x970f0f93c0 0x7ff88155a80a PC=0x7ff88155a80a signal arrived during external code execution runtime.cgocall(0x7ff6bb438c50, 0xc000209b88) runtime/cgocall.go:167 +0x3e fp=0xc000209b60 sp=0xc000209af8 pc=0x7ff6ba5b243e github.com/ollama/ollama/llama._Cfunc_llama_decode(0x227f16fa740, {0x200, 0x227ef86fa00, 0x0, 0x227ef870210, 0x227ef96e350, 0x227ef998eb0, 0x227ef882e10}) _cgo_gotypes.go:677 +0x50 fp=0xc000209b88 sp=0xc000209b60 pc=0x7ff6baa56a70 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0xc000209da0?, 0x1?) github.com/ollama/ollama/llama/llama.go:173 +0xed fp=0xc000209c70 sp=0xc000209b88 pc=0x7ff6baa59fad github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004b2280, 0xc00039c320, 0xc000209f28) github.com/ollama/ollama/runner/llamarunner/runner.go:494 +0x250 fp=0xc000209ee8 sp=0xc000209c70 pc=0x7ff6bab06ff0 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004b2280, {0x7ff6bbca8b70, 0xc0000d2b90}) github.com/ollama/ollama/runner/llamarunner/runner.go:387 +0x1d5 fp=0xc000209fb8 sp=0xc000209ee8 pc=0x7ff6bab06c35 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000209fe0 sp=0xc000209fb8 pc=0x7ff6bab0c008 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000209fe8 sp=0xc000209fe0 pc=0x7ff6ba5bd9a1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5 goroutine 1 gp=0xc0000021c0 m=nil [IO wait]: runtime.gopark(0x7ff6ba5bf1a0?, 0x7ff6bc73aa40?, 0x20?, 0x60?, 0xc0004e60cc?) runtime/proc.go:435 +0xce fp=0xc0003e5630 sp=0xc0003e5610 pc=0x7ff6ba5b598e runtime.netpollblock(0x43c?, 0xba550406?, 0xf6?) runtime/netpoll.go:575 +0xf7 fp=0xc0003e5668 sp=0xc0003e5630 pc=0x7ff6ba57bdf7 internal/poll.runtime_pollWait(0x227ed360190, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0003e5688 sp=0xc0003e5668 pc=0x7ff6ba5b4b25 internal/poll.(*pollDesc).wait(0x7ff6ba64a8f3?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003e56b0 sp=0xc0003e5688 pc=0x7ff6ba64bee7 internal/poll.execIO(0xc0004e6020, 0xc00047f758) internal/poll/fd_windows.go:177 +0x105 fp=0xc0003e5728 sp=0xc0003e56b0 pc=0x7ff6ba64d345 internal/poll.(*FD).acceptOne(0xc0004e6008, 0x448, {0xc0004de1e0?, 0xc00047f7b8?, 0x7ff6ba655005?}, 0xc00047f7ec?) internal/poll/fd_windows.go:946 +0x65 fp=0xc0003e5788 sp=0xc0003e5728 pc=0x7ff6ba6518c5 internal/poll.(*FD).Accept(0xc0004e6008, 0xc0003e5938) internal/poll/fd_windows.go:980 +0x1b6 fp=0xc0003e5840 sp=0xc0003e5788 pc=0x7ff6ba651bf6 net.(*netFD).accept(0xc0004e6008) net/fd_windows.go:182 +0x4b fp=0xc0003e5958 sp=0xc0003e5840 pc=0x7ff6ba6c344b net.(*TCPListener).accept(0xc000051c40) net/tcpsock_posix.go:159 +0x1b fp=0xc0003e59a8 sp=0xc0003e5958 pc=0x7ff6ba6d99fb net.(*TCPListener).Accept(0xc000051c40) net/tcpsock.go:380 +0x30 fp=0xc0003e59d8 sp=0xc0003e59a8 pc=0x7ff6ba6d87b0 net/http.(*onceCloseListener).Accept(0xc0004b61b0?) <autogenerated>:1 +0x24 fp=0xc0003e59f0 sp=0xc0003e59d8 pc=0x7ff6ba8f1ec4 net/http.(*Server).Serve(0xc0004b0700, {0x7ff6bbca6310, 0xc000051c40}) net/http/server.go:3424 +0x30c fp=0xc0003e5b20 sp=0xc0003e59f0 pc=0x7ff6ba8c978c github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000a4020, 0x4, 0x6}) github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc0003e5cf0 sp=0xc0003e5b20 pc=0x7ff6bab0bd95 github.com/ollama/ollama/runner.Execute({0xc0000a4010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:25 +0x1a5 fp=0xc0003e5d30 sp=0xc0003e5cf0 pc=0x7ff6babde6a5 github.com/ollama/ollama/cmd.NewCLI.func3(0xc0004b0200?, {0x7ff6bba8c247?, 0x4?, 0x7ff6bba8c24b?}) github.com/ollama/ollama/cmd/cmd.go:2237 +0x45 fp=0xc0003e5d58 sp=0xc0003e5d30 pc=0x7ff6bb3c8045 github.com/spf13/cobra.(*Command).execute(0xc000236008, {0xc000051100, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003e5e78 sp=0xc0003e5d58 pc=0x7ff6ba73e61c github.com/spf13/cobra.(*Command).ExecuteC(0xc000467208) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003e5f30 sp=0xc0003e5e78 pc=0x7ff6ba73ee65 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0003e5f50 sp=0xc0003e5f30 pc=0x7ff6bb3c9dcd runtime.main() runtime/proc.go:283 +0x27d fp=0xc0003e5fe0 sp=0xc0003e5f50 pc=0x7ff6ba584ddd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0003e5fe8 sp=0xc0003e5fe0 pc=0x7ff6ba5bd9a1 goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00006ffa8 sp=0xc00006ff88 pc=0x7ff6ba5b598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc00006ffe0 sp=0xc00006ffa8 pc=0x7ff6ba5850f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff6ba5bd9a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000071f80 sp=0xc000071f60 pc=0x7ff6ba5b598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00007e000) runtime/mgcsweep.go:316 +0xdf fp=0xc000071fc8 sp=0xc000071f80 pc=0x7ff6ba56debf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff6ba562285 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff6ba5bd9a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x7ff6bbc90f60?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x7ff6ba5b598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x7ff6bc764640) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x7ff6ba56b909 runtime.bgscavenge(0xc00007e000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x7ff6ba56be99 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7ff6ba562225 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff6ba5bd9a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003340 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x7ff6bbb0c910?, 0x0?, 0xc0?, 0x1000000010?) runtime/proc.go:435 +0xce fp=0xc000087e30 sp=0xc000087e10 pc=0x7ff6ba5b598e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc000087fe0 sp=0xc000087e30 pc=0x7ff6ba561207 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x7ff6ba5bd9a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc000003dc0 m=nil [chan receive]: runtime.gopark(0xc000171680?, 0xc0003a6030?, 0x60?, 0x3f?, 0x7ff6ba6ac148?) runtime/proc.go:435 +0xce fp=0xc000073f18 sp=0xc000073ef8 pc=0x7ff6ba5b598e runtime.chanrecv(0xc00008a3f0, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000073f90 sp=0xc000073f18 pc=0x7ff6ba552d45 runtime.chanrecv1(0x7ff6ba584f40?, 0xc000073f76?) runtime/chan.go:506 +0x12 fp=0xc000073fb8 sp=0xc000073f90 pc=0x7ff6ba5528d2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc000073fe0 sp=0xc000073fb8 pc=0x7ff6ba5654af runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff6ba5bd9a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0003da380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc0004861c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000491f38 sp=0xc000491f18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc000491fc8 sp=0xc000491f38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000491fe0 sp=0xc000491fc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000491fe8 sp=0xc000491fe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc000486380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000493f38 sp=0xc000493f18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc000493fc8 sp=0xc000493f38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000493fe0 sp=0xc000493fc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000493fe8 sp=0xc000493fe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000486540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00048df38 sp=0xc00048df18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc00048dfc8 sp=0xc00048df38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00048dfe0 sp=0xc00048dfc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00048dfe8 sp=0xc00048dfe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc000486700 m=nil [GC worker (idle)]: runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00048ff38 sp=0xc00048ff18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc00048ffc8 sp=0xc00048ff38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00048ffe0 sp=0xc00048ffc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00048ffe8 sp=0xc00048ffe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000206000 m=nil [GC worker (idle)]: runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00020df38 sp=0xc00020df18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc00020dfc8 sp=0xc00020df38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00020dfe0 sp=0xc00020dfc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00020dfe8 sp=0xc00020dfe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0003da540 m=nil [GC worker (idle)]: runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc0002061c0 m=nil [GC worker (idle)]: runtime.gopark(0x3d8afc587c48?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00020ff38 sp=0xc00020ff18 pc=0x7ff6ba5b598e runtime.gcBgMarkWorker(0xc00008bb90) runtime/mgc.go:1423 +0xe9 fp=0xc00020ffc8 sp=0xc00020ff38 pc=0x7ff6ba5647a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00020ffe0 sp=0xc00020ffc8 pc=0x7ff6ba564685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00020ffe8 sp=0xc00020ffe0 pc=0x7ff6ba5bd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 23 gp=0xc000206540 m=nil [select]: runtime.gopark(0xc000049a78?, 0x2?, 0xa?, 0x0?, 0xc0000498cc?) runtime/proc.go:435 +0xce fp=0xc000049700 sp=0xc0000496e0 pc=0x7ff6ba5b598e runtime.selectgo(0xc000049a78, 0xc0000498c8, 0xb98?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc000049838 sp=0xc000049700 pc=0x7ff6ba596437 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0004b2280, {0x7ff6bbca64c0, 0xc000001500}, 0xc0002cf900) github.com/ollama/ollama/runner/llamarunner/runner.go:716 +0xbe5 fp=0xc000049ac0 sp=0xc000049838 pc=0x7ff6bab08fa5 github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x7ff6bbca64c0?, 0xc000001500?}, 0xc000049b40?) <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x7ff6bab0c416 net/http.HandlerFunc.ServeHTTP(0xc0000c6000?, {0x7ff6bbca64c0?, 0xc000001500?}, 0xc000049b60?) net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x7ff6ba8c5dc9 net/http.(*ServeMux).ServeHTTP(0x7ff6ba55b785?, {0x7ff6bbca64c0, 0xc000001500}, 0xc0002cf900) net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x7ff6ba8c7cc4 net/http.serverHandler.ServeHTTP({0x7ff6bbca2730?}, {0x7ff6bbca64c0?, 0xc000001500?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x7ff6ba8e574e net/http.(*conn).serve(0xc0004b61b0, {0x7ff6bbca8b38, 0xc0001c73b0}) net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff6ba8c42c5 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff6ba8c9b88 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff6ba5bd9a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 71 gp=0xc000506380 m=nil [IO wait]: runtime.gopark(0x0?, 0xc0004e62a0?, 0x48?, 0x63?, 0xc0004e634c?) runtime/proc.go:435 +0xce fp=0xc000251d58 sp=0xc000251d38 pc=0x7ff6ba5b598e runtime.netpollblock(0x444?, 0xba550406?, 0xf6?) runtime/netpoll.go:575 +0xf7 fp=0xc000251d90 sp=0xc000251d58 pc=0x7ff6ba57bdf7 internal/poll.runtime_pollWait(0x227ed360078, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000251db0 sp=0xc000251d90 pc=0x7ff6ba5b4b25 internal/poll.(*pollDesc).wait(0x444?, 0x72?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000251dd8 sp=0xc000251db0 pc=0x7ff6ba64bee7 internal/poll.execIO(0xc0004e62a0, 0x7ff6bbb0d0a8) internal/poll/fd_windows.go:177 +0x105 fp=0xc000251e50 sp=0xc000251dd8 pc=0x7ff6ba64d345 internal/poll.(*FD).Read(0xc0004e6288, {0xc0003240a1, 0x1, 0x1}) internal/poll/fd_windows.go:438 +0x29b fp=0xc000251ef0 sp=0xc000251e50 pc=0x7ff6ba64e01b net.(*netFD).Read(0xc0004e6288, {0xc0003240a1?, 0xc0002ae058?, 0xc000251f70?}) net/fd_posix.go:55 +0x25 fp=0xc000251f38 sp=0xc000251ef0 pc=0x7ff6ba6c1325 net.(*conn).Read(0xc000076240, {0xc0003240a1?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc000251f80 sp=0xc000251f38 pc=0x7ff6ba6d0a45 net/http.(*connReader).backgroundRead(0xc000324090) net/http/server.go:690 +0x37 fp=0xc000251fc8 sp=0xc000251f80 pc=0x7ff6ba8be197 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc000251fe0 sp=0xc000251fc8 pc=0x7ff6ba8be0c5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000251fe8 sp=0xc000251fe0 pc=0x7ff6ba5bd9a1 created by net/http.(*connReader).startBackgroundRead in goroutine 23 net/http/server.go:686 +0xb6 rax 0xfffffffe rbx 0x970f0f9348 rcx 0x0 rdx 0x970f0f8c90 rdi 0xe06d7363 rsi 0x1 rbp 0x4 rsp 0x970f0f9220 r8 0x1 r9 0xe06d7363 r10 0x7ff88135984c r11 0x970f0f8cf0 r12 0x0 r13 0x227f6d53a70 r14 0x200 r15 0x0 rip 0x7ff88155a80a rflags 0x202 cs 0x33 fs 0x53 gs 0x2b time=2026-02-13T09:14:37.205+01:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:60306/completion\": read tcp 127.0.0.1:60321->127.0.0.1:60306: wsarecv: An existing connection was forcibly closed by the remote host." [GIN] 2026/02/13 - 09:14:37 | 500 | 36.4781606s | 127.0.0.1 | POST "/api/chat" time=2026-02-13T09:14:37.206+01:00 level=DEBUG source=sched.go:557 msg="context for request finished" time=2026-02-13T09:14:37.206+01:00 level=DEBUG source=sched.go:310 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen2.5:7b runner.inference="[{ID:8680a064-0400-0000-0002-000000000000 Library:Vulkan}]" runner.size="5.0 GiB" runner.vram="5.0 GiB" runner.parallel=1 runner.pid=14564 runner.model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 runner.num_ctx=8192 duration=30m0s time=2026-02-13T09:14:37.206+01:00 level=DEBUG source=sched.go:328 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen2.5:7b runner.inference="[{ID:8680a064-0400-0000-0002-000000000000 Library:Vulkan}]" runner.size="5.0 GiB" runner.vram="5.0 GiB" runner.parallel=1 runner.pid=14564 runner.model=C:\Users\josej\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 runner.num_ctx=8192 refCount=0 ``` </details>
Author
Owner

@Josejsi commented on GitHub (Feb 15, 2026):

Update: Migrated to LM Studio for better hardware acceleration
I've decided to move away from Ollama for running LLMs and switched to LM Studio, as it handles my hardware much more efficiently.
Current Setup & Performance:
Hardware: Running on an Intel Arc 140V.
Model: DeepSeek-R1 14B (Distill) with a full 48-layer GPU offload.
Stability: Unlike my previous experience, VRAM management is now rock solid (~13GB/16GB usage), allowing for a lag-free experience in VS Code.
Architecture: I'm now using LM Studio as the primary local server for Chat and Edit roles, while keeping Ollama strictly for embeddings (nomic-embed-text) to handle codebase indexing.

<!-- gh-comment-id:3905030370 --> @Josejsi commented on GitHub (Feb 15, 2026): Update: Migrated to LM Studio for better hardware acceleration I've decided to move away from Ollama for running LLMs and switched to LM Studio, as it handles my hardware much more efficiently. Current Setup & Performance: Hardware: Running on an Intel Arc 140V. Model: DeepSeek-R1 14B (Distill) with a full 48-layer GPU offload. Stability: Unlike my previous experience, VRAM management is now rock solid (~13GB/16GB usage), allowing for a lag-free experience in VS Code. Architecture: I'm now using LM Studio as the primary local server for Chat and Edit roles, while keeping Ollama strictly for embeddings (nomic-embed-text) to handle codebase indexing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55769