[GH-ISSUE #12887] Ollama runs Gemma3 perfectly but not llama3.2:1b #8541

Closed
opened 2026-04-12 21:14:51 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @FadyAboujaoude on GitHub (Oct 31, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12887

What is the issue?

Full transparency I am running an old Laptop (i7 6700HQ 16gb Ram Nvidia 960m)

I am unable to run specific models (deepseek-r1:1.5b ; deepseek-r1:1.5b )
I thought it was a memory limitation but I am able to run larger models like Gemma3:latest

I tried deleting and re-downloading the models but not luck.
I tried using 0 GPU and only the CPU but no luck (OLLAMA_NUM_GPU=0 ollama run llama3.2:1b)

I tried using Claude and ChatGPT to help and they believe it might be a instructions compatibility issue with my old CPU (something to do with AVX2).

Is there a way to confirm/fix this?

Relevant log output

test@test-N501VW:~$ ollama list
NAME                ID              SIZE      MODIFIED     
llama3.1:8b         46e0c10c039e    4.9 GB    17 hours ago    
gemma:latest        a72c7f4d0a15    5.0 GB    18 hours ago    
llama3.2:1b         baf6a787fdff    1.3 GB    20 hours ago    
deepseek-r1:1.5b    e0979632db5a    1.1 GB    20 hours ago    
gemma3:latest       a2af6cc3eb7f    3.3 GB    20 hours ago    
gemma3:270m         e7d36fb2c3b3    291 MB    21 hours ago    
test@test-N501VW:~$ ollama run gemma3
>>> test
Okay! This is a test response. 

Is there anything specific you wanted me to do or any particular test you had in mind? For example, did you want me to:

*   Answer a question?
*   Generate some text?
*   Perform a calculation?

Let me know!

>>> 
test@test-N501VW:~$ ollama run llama3.2:1b
Error: 500 Internal Server Error: llama runner process has terminated: exit status 2
test@test-N501VW:~$ OLLAMA_NUM_GPU=0 ollama run llama3.2:1b
Error: 500 Internal Server Error: llama runner process has terminated: exit status 2


Oct 31 10:46:28 test-N501VW ollama[498989]: [GIN] 2025/10/31 - 10:46:28 | 200 |      73.944µs |       127.0.0.1 | HEAD     "/"
Oct 31 10:46:28 test-N501VW ollama[498989]: [GIN] 2025/10/31 - 10:46:28 | 200 |   73.804673ms |       127.0.0.1 | POST     "/api/show"
Oct 31 10:46:28 test-N501VW ollama[498989]: time=2025-10-31T10:46:28.835-05:00 level=INFO source=server.go:385 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39707"
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type  f32:   34 tensors
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type q8_0:  113 tensors
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file format = GGUF V3 (latest)
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file type   = Q8_0
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file size   = 1.22 GiB (8.50 BPW)
Oct 31 10:46:29 test-N501VW ollama[498989]: load: printing all EOG tokens:
Oct 31 10:46:29 test-N501VW ollama[498989]: load:   - 128001 ('<|end_of_text|>')
Oct 31 10:46:29 test-N501VW ollama[498989]: load:   - 128008 ('<|eom_id|>')
Oct 31 10:46:29 test-N501VW ollama[498989]: load:   - 128009 ('<|eot_id|>')
Oct 31 10:46:29 test-N501VW ollama[498989]: load: special tokens cache size = 256
Oct 31 10:46:29 test-N501VW ollama[498989]: load: token to piece cache size = 0.7999 MB
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: arch             = llama
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: vocab_only       = 1
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: model type       = ?B
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: model params     = 1.24 B
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: general.name     = Llama 3.2 1B Instruct
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: vocab type       = BPE
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: n_vocab          = 128256
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: n_merges         = 280147
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: BOS token        = 128000 '<|begin_of_text|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOS token        = 128009 '<|eot_id|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOT token        = 128009 '<|eot_id|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOM token        = 128008 '<|eom_id|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: LF token         = 198 'Ċ'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOG token        = 128001 '<|end_of_text|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOG token        = 128008 '<|eom_id|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOG token        = 128009 '<|eot_id|>'
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: max token length = 256
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_load: vocab only - skipping tensors
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.434-05:00 level=INFO source=server.go:385 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --port 32957"
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.434-05:00 level=INFO source=server.go:455 msg="system memory" total="15.5 GiB" free="8.1 GiB" free_swap="1.7 GiB"
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.435-05:00 level=INFO source=server.go:507 msg=offload library=CUDA layers.requested=-1 layers.model=17 layers.offload=14 layers.split=[14] memory.available="[1.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.3 GiB" memory.required.partial="1.9 GiB" memory.required.kv="128.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="986.2 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="280.0 MiB" memory.graph.partial="464.0 MiB"
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.450-05:00 level=INFO source=runner.go:910 msg="starting go runner"
Oct 31 10:46:29 test-N501VW ollama[498989]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_cuda_init: found 1 CUDA devices:
Oct 31 10:46:29 test-N501VW ollama[498989]:   Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-7f4c4e77-8139-54fd-7463-1a4a9b316065
Oct 31 10:46:29 test-N501VW ollama[498989]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.554-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.554-05:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:32957"
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.564-05:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:4 GPULayers:14[ID:GPU-7f4c4e77-8139-54fd-7463-1a4a9b316065 Layers:14(2..15)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.565-05:00 level=INFO source=server.go:1236 msg="waiting for llama runner to start responding"
Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.565-05:00 level=INFO source=server.go:1270 msg="waiting for server to become available" status="llm server loading model"
Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_backend_cuda_device_get_memory device GPU-7f4c4e77-8139-54fd-7463-1a4a9b316065 utilizing NVML memory reporting free: 2085814272 total: 2147483648
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 960M) (0000:01:00.0) - 1989 MiB free
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type  f32:   34 tensors
Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type q8_0:  113 tensors
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file format = GGUF V3 (latest)
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file type   = Q8_0
Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file size   = 1.22 GiB (8.50 BPW)
Oct 31 10:46:29 test-N501VW ollama[498989]: load: printing all EOG tokens:
Oct 31 10:46:29 test-N501VW ollama[498989]: load:   - 128001 ('<|end_of_text|>')
Oct 31 10:46:29 test-N501VW ollama[498989]: load:   - 128008 ('<|eom_id|>')
Oct 31 10:46:29 test-N501VW ollama[498989]: load:   - 128009 ('<|eot_id|>')
Oct 31 10:46:29 test-N501VW ollama[498989]: load: special tokens cache size = 256
Oct 31 10:46:30 test-N501VW ollama[498989]: load: token to piece cache size = 0.7999 MB
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: arch             = llama
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: vocab_only       = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_ctx_train      = 131072
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd           = 2048
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_layer          = 16
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_head           = 32
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_head_kv        = 8
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_rot            = 64
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_swa            = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: is_swa_any       = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_head_k    = 64
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_head_v    = 64
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_gqa            = 4
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_k_gqa     = 512
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_v_gqa     = 512
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_norm_eps       = 0.0e+00
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_norm_rms_eps   = 1.0e-05
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_clamp_kqv      = 0.0e+00
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_max_alibi_bias = 0.0e+00
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_logit_scale    = 0.0e+00
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_attn_scale     = 0.0e+00
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_ff             = 8192
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_expert         = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_expert_used    = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: causal attn      = 1
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: pooling type     = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: rope type        = 0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: rope scaling     = linear
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: freq_base_train  = 500000.0
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: freq_scale_train = 1
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_ctx_orig_yarn  = 131072
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: rope_finetuned   = unknown
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: model type       = 1B
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: model params     = 1.24 B
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: general.name     = Llama 3.2 1B Instruct
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: vocab type       = BPE
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_vocab          = 128256
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_merges         = 280147
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: BOS token        = 128000 '<|begin_of_text|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOS token        = 128009 '<|eot_id|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOT token        = 128009 '<|eot_id|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOM token        = 128008 '<|eom_id|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: LF token         = 198 'Ċ'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOG token        = 128001 '<|end_of_text|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOG token        = 128008 '<|eom_id|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOG token        = 128009 '<|eot_id|>'
Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: max token length = 256
Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: offloading 14 repeating layers to GPU
Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: offloaded 14/17 layers to GPU
Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors:        CUDA0 model buffer size =   862.97 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors:   CPU_Mapped model buffer size =  1252.41 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_init_from_model: model default pooling_type is [0], but [-1] was specified
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: constructing llama_context
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_seq_max     = 1
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ctx         = 4096
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ctx_per_seq = 4096
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_batch       = 512
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ubatch      = 512
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: causal_attn   = 1
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: flash_attn    = disabled
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: kv_unified    = false
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: freq_base     = 500000.0
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: freq_scale    = 1
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context:        CPU  output buffer size =     0.50 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache:      CUDA0 KV buffer size =   112.00 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache:        CPU KV buffer size =    16.00 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache: size =  128.00 MiB (  4096 cells,  16 layers,  1/1 seqs), K (f16):   64.00 MiB, V (f16):   64.00 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: graph_reserve: failed to allocate compute buffers
Oct 31 10:46:30 test-N501VW ollama[498989]: SIGSEGV: segmentation violation
Oct 31 10:46:30 test-N501VW ollama[498989]: PC=0x71fa9b10cec2 m=0 sigcode=1 addr=0x59120c3cb488
Oct 31 10:46:30 test-N501VW ollama[498989]: signal arrived during cgo execution
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 14 gp=0xc000103a40 m=0 mp=0x59149f033e20 [syscall]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.cgocall(0x59149df78c00, 0xc0000f1bf8)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/cgocall.go:167 +0x4b fp=0xc0000f1bd0 sp=0xc0000f1b98 pc=0x59149d2685cb
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x5914abfb3990, {0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...})
Oct 31 10:46:30 test-N501VW ollama[498989]:         _cgo_gotypes.go:749 +0x4e fp=0xc0000f1bf8 sp=0xc0000f1bd0 pc=0x59149d61f4ee
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama.NewContextWithModel.func1(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/llama/llama.go:280
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama.NewContextWithModel(0xc0000120b8, {{0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...}})
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/llama/llama.go:280 +0x158 fp=0xc0000f1d98 sp=0xc0000f1bf8 pc=0x59149d6232b8
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000351e00, {0xe, 0x0, 0x1, {0xc00047bbb4, 0x1, 0x1}, 0xc0005034b0, 0x0}, {0x7fff91a29d29, ...}, ...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:797 +0x198 fp=0xc0000f1ee0 sp=0xc0000f1d98 pc=0x59149d6dfdb8
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:879 +0x175 fp=0xc0000f1fe0 sp=0xc0000f1ee0 pc=0x59149d6e0e55
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc0000f1fe8 sp=0xc0000f1fe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 11
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:879 +0x7ce
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 1 gp=0xc000002380 m=nil [IO wait]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00012d790 sp=0xc00012d770 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.netpollblock(0xc00012d7e0?, 0x9d204946?, 0x14?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/netpoll.go:575 +0xf7 fp=0xc00012d7c8 sp=0xc00012d790 pc=0x59149d230537
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.runtime_pollWait(0x71fb63c97eb0, 0x72)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/netpoll.go:351 +0x85 fp=0xc00012d7e8 sp=0xc00012d7c8 pc=0x59149d26ac65
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).wait(0xc0001ccd80?, 0x900000036?, 0x0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00012d810 sp=0xc00012d7e8 pc=0x59149d2f1ba7
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).waitRead(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         internal/poll/fd_poll_runtime.go:89
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*FD).Accept(0xc0001ccd80)
Oct 31 10:46:30 test-N501VW ollama[498989]:         internal/poll/fd_unix.go:620 +0x295 fp=0xc00012d8b8 sp=0xc00012d810 pc=0x59149d2f6f75
Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*netFD).accept(0xc0001ccd80)
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/fd_unix.go:172 +0x29 fp=0xc00012d970 sp=0xc00012d8b8 pc=0x59149d369e49
Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*TCPListener).accept(0xc000410a00)
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/tcpsock_posix.go:159 +0x1b fp=0xc00012d9c0 sp=0xc00012d970 pc=0x59149d37f7fb
Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*TCPListener).Accept(0xc000410a00)
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/tcpsock.go:380 +0x30 fp=0xc00012d9f0 sp=0xc00012d9c0 pc=0x59149d37e6b0
Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*onceCloseListener).Accept(0xc0004c23f0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         <autogenerated>:1 +0x24 fp=0xc00012da08 sp=0xc00012d9f0 pc=0x59149d595e04
Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*Server).Serve(0xc0001f3800, {0x59149e776fe8, 0xc000410a00})
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/http/server.go:3424 +0x30c fp=0xc00012db38 sp=0xc00012da08 pc=0x59149d56d6cc
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034260, 0x4, 0x4})
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x8f5 fp=0xc00012dd08 sp=0xc00012db38 pc=0x59149d6e1815
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner.Execute({0xc000034250?, 0x0?, 0x0?})
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00012dd30 sp=0xc00012dd08 pc=0x59149d77fe54
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001f3400?, {0x59149e2800aa?, 0x4?, 0x59149e2800ae?})
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/cmd/cmd.go:1769 +0x45 fp=0xc00012dd58 sp=0xc00012dd30 pc=0x59149df08f45
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).execute(0xc0004c5508, {0xc000410800, 0x4, 0x4})
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00012de78 sp=0xc00012dd58 pc=0x59149d3e349c
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).ExecuteC(0xc0004ae908)
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00012df30 sp=0xc00012de78 pc=0x59149d3e3ce5
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).Execute(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/spf13/cobra@v1.7.0/command.go:992
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).ExecuteContext(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/spf13/cobra@v1.7.0/command.go:985
Oct 31 10:46:30 test-N501VW ollama[498989]: main.main()
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00012df50 sp=0xc00012df30 pc=0x59149df09a0d
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.main()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:283 +0x29d fp=0xc00012dfe0 sp=0xc00012df50 pc=0x59149d237bbd
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00012dfe8 sp=0xc00012dfe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000070fa8 sp=0xc000070f88 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:441
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.forcegchelper()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:348 +0xb8 fp=0xc000070fe0 sp=0xc000070fa8 pc=0x59149d237ef8
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.init.7 in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:336 +0x1a
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000071780 sp=0xc000071760 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:441
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.bgsweep(0xc00009c000)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgcsweep.go:316 +0xdf fp=0xc0000717c8 sp=0xc000071780 pc=0x59149d22269f
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcenable.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:204 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x59149d216a85
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcenable in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:204 +0x66
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x10000?, 0x59149e448950?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000071f78 sp=0xc000071f58 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:441
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.(*scavengerState).park(0x59149f031000)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgcscavenge.go:425 +0x49 fp=0xc000071fa8 sp=0xc000071f78 pc=0x59149d2200e9
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.bgscavenge(0xc00009c000)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgcscavenge.go:658 +0x59 fp=0xc000071fc8 sp=0xc000071fa8 pc=0x59149d220679
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcenable.gowrap2()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:205 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x59149d216a25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcenable in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:205 +0xa5
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000070688?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000070630 sp=0xc000070610 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.runfinq()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mfinal.go:196 +0x107 fp=0xc0000707e0 sp=0xc000070630 pc=0x59149d215a47
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.createfing in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mfinal.go:166 +0x3d
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 6 gp=0xc0001d08c0 m=nil [chan receive]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0xc0002255e0?, 0xc000590018?, 0x60?, 0x27?, 0x59149d350a88?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000072718 sp=0xc0000726f8 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.chanrecv(0xc0000a6310, 0x0, 0x1)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/chan.go:664 +0x445 fp=0xc000072790 sp=0xc000072718 pc=0x59149d207525
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.chanrecv1(0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/chan.go:506 +0x12 fp=0xc0000727b8 sp=0xc000072790 pc=0x59149d2070b2
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1796
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1799 +0x2f fp=0xc0000727e0 sp=0xc0000727b8 pc=0x59149d219c2f
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by unique.runtime_registerUniqueMapCleanup in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1794 +0x85
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 7 gp=0xc0001d0e00 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000072f38 sp=0xc000072f18 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc000072fc8 sp=0xc000072f38 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc000072fe0 sp=0xc000072fc8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00006c738 sp=0xc00006c718 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc00006c7c8 sp=0xc00006c738 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc00006c7e0 sp=0xc00006c7c8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00006cf38 sp=0xc00006cf18 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc00006cfc8 sp=0xc00006cf38 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc00006cfe0 sp=0xc00006cfc8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e91526?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 8 gp=0xc0001d0fc0 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e4d11f?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000073738 sp=0xc000073718 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc0000737c8 sp=0xc000073738 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc0000737e0 sp=0xc0000737c8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc0000737e8 sp=0xc0000737e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 9 gp=0xc0001d1180 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e91cd5?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000073f38 sp=0xc000073f18 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc000073fc8 sp=0xc000073f38 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc000073fe0 sp=0xc000073fc8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e4b182?, 0x3?, 0x58?, 0x9?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00006d738 sp=0xc00006d718 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc00006d7c8 sp=0xc00006d738 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc00006d7e0 sp=0xc00006d7c8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e9141c?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x59149d218f49
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x59149d218e25
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/mgc.go:1339 +0x105
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 10 gp=0xc000103500 m=nil [sync.WaitGroup.Wait]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x60?, 0x40?, 0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc00011ce20 sp=0xc00011ce00 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:441
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.semacquire1(0xc000351e20, 0x0, 0x1, 0x0, 0x18)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/sema.go:188 +0x229 fp=0xc00011ce88 sp=0xc00011ce20 pc=0x59149d24b189
Oct 31 10:46:30 test-N501VW ollama[498989]: sync.runtime_SemacquireWaitGroup(0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/sema.go:110 +0x25 fp=0xc00011cec0 sp=0xc00011ce88 pc=0x59149d26d385
Oct 31 10:46:30 test-N501VW ollama[498989]: sync.(*WaitGroup).Wait(0x0?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         sync/waitgroup.go:118 +0x48 fp=0xc00011cee8 sp=0xc00011cec0 pc=0x59149d27e9e8
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc000351e00, {0x59149e779590, 0xc00050e690})
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:334 +0x4b fp=0xc00011cfb8 sp=0xc00011cee8 pc=0x59149d6dc98b
Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x28 fp=0xc00011cfe0 sp=0xc00011cfb8 pc=0x59149d6e1a88
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x4c5
Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 11 gp=0xc0001036c0 m=nil [IO wait]:
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x71fb63c537a8?, 0xc0001cce00?, 0x70?, 0x79?, 0xb?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/proc.go:435 +0xce fp=0xc000047948 sp=0xc000047928 pc=0x59149d26ba4e
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.netpollblock(0x59149d28e8b8?, 0x9d204946?, 0x14?)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/netpoll.go:575 +0xf7 fp=0xc000047980 sp=0xc000047948 pc=0x59149d230537
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.runtime_pollWait(0x71fb63c97d98, 0x72)
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/netpoll.go:351 +0x85 fp=0xc0000479a0 sp=0xc000047980 pc=0x59149d26ac65
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).wait(0xc0001cce00?, 0xc0004d7000?, 0x0)
Oct 31 10:46:30 test-N501VW ollama[498989]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000479c8 sp=0xc0000479a0 pc=0x59149d2f1ba7
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).waitRead(...)
Oct 31 10:46:30 test-N501VW ollama[498989]:         internal/poll/fd_poll_runtime.go:89
Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*FD).Read(0xc0001cce00, {0xc0004d7000, 0x1000, 0x1000})
Oct 31 10:46:30 test-N501VW ollama[498989]:         internal/poll/fd_unix.go:165 +0x27a fp=0xc000047a60 sp=0xc0000479c8 pc=0x59149d2f2e9a
Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*netFD).Read(0xc0001cce00, {0xc0004d7000?, 0xc000047ad0?, 0x59149d2f2065?})
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/fd_posix.go:55 +0x25 fp=0xc000047aa8 sp=0xc000047a60 pc=0x59149d367ea5
Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*conn).Read(0xc00013e528, {0xc0004d7000?, 0x0?, 0x0?})
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/net.go:194 +0x45 fp=0xc000047af0 sp=0xc000047aa8 pc=0x59149d376265
Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*connReader).Read(0xc0004842d0, {0xc0004d7000, 0x1000, 0x1000})
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/http/server.go:798 +0x159 fp=0xc000047b40 sp=0xc000047af0 pc=0x59149d562579
Oct 31 10:46:30 test-N501VW ollama[498989]: bufio.(*Reader).fill(0xc000110660)
Oct 31 10:46:30 test-N501VW ollama[498989]:         bufio/bufio.go:113 +0x103 fp=0xc000047b78 sp=0xc000047b40 pc=0x59149d38da03
Oct 31 10:46:30 test-N501VW ollama[498989]: bufio.(*Reader).Peek(0xc000110660, 0x4)
Oct 31 10:46:30 test-N501VW ollama[498989]:         bufio/bufio.go:152 +0x53 fp=0xc000047b98 sp=0xc000047b78 pc=0x59149d38db33
Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*conn).serve(0xc0004c23f0, {0x59149e779558, 0xc000484120})
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/http/server.go:2137 +0x785 fp=0xc000047fb8 sp=0xc000047b98 pc=0x59149d568365
Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*Server).Serve.gowrap3()
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x59149d56dac8
Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({})
Oct 31 10:46:30 test-N501VW ollama[498989]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x59149d2730a1
Oct 31 10:46:30 test-N501VW ollama[498989]: created by net/http.(*Server).Serve in goroutine 1
Oct 31 10:46:30 test-N501VW ollama[498989]:         net/http/server.go:3454 +0x485
Oct 31 10:46:30 test-N501VW ollama[498989]: rax    0x59120c3cb488
Oct 31 10:46:30 test-N501VW ollama[498989]: rbx    0x1
Oct 31 10:46:30 test-N501VW ollama[498989]: rcx    0x5914aba53c50
Oct 31 10:46:30 test-N501VW ollama[498989]: rdx    0xffffffffac081e70
Oct 31 10:46:30 test-N501VW ollama[498989]: rdi    0x5914ac084fb0
Oct 31 10:46:30 test-N501VW ollama[498989]: rsi    0x0
Oct 31 10:46:30 test-N501VW ollama[498989]: rbp    0x5914ac07f810
Oct 31 10:46:30 test-N501VW ollama[498989]: rsp    0x7fff91a285d8
Oct 31 10:46:30 test-N501VW ollama[498989]: r8     0x5914af442
Oct 31 10:46:30 test-N501VW ollama[498989]: r9     0x7
Oct 31 10:46:30 test-N501VW ollama[498989]: r10    0x5914af442240
Oct 31 10:46:30 test-N501VW ollama[498989]: r11    0xada15c26cda4350d
Oct 31 10:46:30 test-N501VW ollama[498989]: r12    0x5914ac07f810
Oct 31 10:46:30 test-N501VW ollama[498989]: r13    0x0
Oct 31 10:46:30 test-N501VW ollama[498989]: r14    0x0
Oct 31 10:46:30 test-N501VW ollama[498989]: r15    0x5914abfb3990
Oct 31 10:46:30 test-N501VW ollama[498989]: rip    0x71fa9b10cec2
Oct 31 10:46:30 test-N501VW ollama[498989]: rflags 0x10202
Oct 31 10:46:30 test-N501VW ollama[498989]: cs     0x33
Oct 31 10:46:30 test-N501VW ollama[498989]: fs     0x0
Oct 31 10:46:30 test-N501VW ollama[498989]: gs     0x0
Oct 31 10:46:30 test-N501VW ollama[498989]: time=2025-10-31T10:46:30.569-05:00 level=INFO source=sched.go:446 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 error="llama runner process has terminated: exit status 2"
Oct 31 10:46:30 test-N501VW ollama[498989]: [GIN] 2025/10/31 - 10:46:30 | 500 |  1.960116545s |       127.0.0.1 | POST     "/api/generate"


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.7

Originally created by @FadyAboujaoude on GitHub (Oct 31, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12887 ### What is the issue? Full transparency I am running an old Laptop (i7 6700HQ 16gb Ram Nvidia 960m) I am unable to run specific models (deepseek-r1:1.5b ; deepseek-r1:1.5b ) I thought it was a memory limitation but I am able to run larger models like Gemma3:latest I tried deleting and re-downloading the models but not luck. I tried using 0 GPU and only the CPU but no luck (OLLAMA_NUM_GPU=0 ollama run llama3.2:1b) I tried using Claude and ChatGPT to help and they believe it might be a instructions compatibility issue with my old CPU (something to do with AVX2). Is there a way to confirm/fix this? ### Relevant log output ```shell test@test-N501VW:~$ ollama list NAME ID SIZE MODIFIED llama3.1:8b 46e0c10c039e 4.9 GB 17 hours ago gemma:latest a72c7f4d0a15 5.0 GB 18 hours ago llama3.2:1b baf6a787fdff 1.3 GB 20 hours ago deepseek-r1:1.5b e0979632db5a 1.1 GB 20 hours ago gemma3:latest a2af6cc3eb7f 3.3 GB 20 hours ago gemma3:270m e7d36fb2c3b3 291 MB 21 hours ago test@test-N501VW:~$ ollama run gemma3 >>> test Okay! This is a test response. Is there anything specific you wanted me to do or any particular test you had in mind? For example, did you want me to: * Answer a question? * Generate some text? * Perform a calculation? Let me know! >>> test@test-N501VW:~$ ollama run llama3.2:1b Error: 500 Internal Server Error: llama runner process has terminated: exit status 2 test@test-N501VW:~$ OLLAMA_NUM_GPU=0 ollama run llama3.2:1b Error: 500 Internal Server Error: llama runner process has terminated: exit status 2 Oct 31 10:46:28 test-N501VW ollama[498989]: [GIN] 2025/10/31 - 10:46:28 | 200 | 73.944µs | 127.0.0.1 | HEAD "/" Oct 31 10:46:28 test-N501VW ollama[498989]: [GIN] 2025/10/31 - 10:46:28 | 200 | 73.804673ms | 127.0.0.1 | POST "/api/show" Oct 31 10:46:28 test-N501VW ollama[498989]: time=2025-10-31T10:46:28.835-05:00 level=INFO source=server.go:385 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39707" Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 0: general.architecture str = llama Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 1: general.type str = model Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 3: general.finetune str = Instruct Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 5: general.size_label str = 1B Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 18: general.file_type u32 = 7 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type f32: 34 tensors Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type q8_0: 113 tensors Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file format = GGUF V3 (latest) Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file type = Q8_0 Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file size = 1.22 GiB (8.50 BPW) Oct 31 10:46:29 test-N501VW ollama[498989]: load: printing all EOG tokens: Oct 31 10:46:29 test-N501VW ollama[498989]: load: - 128001 ('<|end_of_text|>') Oct 31 10:46:29 test-N501VW ollama[498989]: load: - 128008 ('<|eom_id|>') Oct 31 10:46:29 test-N501VW ollama[498989]: load: - 128009 ('<|eot_id|>') Oct 31 10:46:29 test-N501VW ollama[498989]: load: special tokens cache size = 256 Oct 31 10:46:29 test-N501VW ollama[498989]: load: token to piece cache size = 0.7999 MB Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: arch = llama Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: vocab_only = 1 Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: model type = ?B Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: model params = 1.24 B Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: general.name = Llama 3.2 1B Instruct Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: vocab type = BPE Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: n_vocab = 128256 Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: n_merges = 280147 Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: BOS token = 128000 '<|begin_of_text|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOS token = 128009 '<|eot_id|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOT token = 128009 '<|eot_id|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOM token = 128008 '<|eom_id|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: LF token = 198 'Ċ' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOG token = 128001 '<|end_of_text|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOG token = 128008 '<|eom_id|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: EOG token = 128009 '<|eot_id|>' Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: max token length = 256 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_load: vocab only - skipping tensors Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.434-05:00 level=INFO source=server.go:385 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --port 32957" Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.434-05:00 level=INFO source=server.go:455 msg="system memory" total="15.5 GiB" free="8.1 GiB" free_swap="1.7 GiB" Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.435-05:00 level=INFO source=server.go:507 msg=offload library=CUDA layers.requested=-1 layers.model=17 layers.offload=14 layers.split=[14] memory.available="[1.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.3 GiB" memory.required.partial="1.9 GiB" memory.required.kv="128.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="986.2 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="280.0 MiB" memory.graph.partial="464.0 MiB" Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.450-05:00 level=INFO source=runner.go:910 msg="starting go runner" Oct 31 10:46:29 test-N501VW ollama[498989]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_cuda_init: found 1 CUDA devices: Oct 31 10:46:29 test-N501VW ollama[498989]: Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-7f4c4e77-8139-54fd-7463-1a4a9b316065 Oct 31 10:46:29 test-N501VW ollama[498989]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.554-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.554-05:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:32957" Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.564-05:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:4 GPULayers:14[ID:GPU-7f4c4e77-8139-54fd-7463-1a4a9b316065 Layers:14(2..15)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.565-05:00 level=INFO source=server.go:1236 msg="waiting for llama runner to start responding" Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.565-05:00 level=INFO source=server.go:1270 msg="waiting for server to become available" status="llm server loading model" Oct 31 10:46:29 test-N501VW ollama[498989]: ggml_backend_cuda_device_get_memory device GPU-7f4c4e77-8139-54fd-7463-1a4a9b316065 utilizing NVML memory reporting free: 2085814272 total: 2147483648 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 960M) (0000:01:00.0) - 1989 MiB free Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 0: general.architecture str = llama Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 1: general.type str = model Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 3: general.finetune str = Instruct Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 5: general.size_label str = 1B Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 18: general.file_type u32 = 7 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type f32: 34 tensors Oct 31 10:46:29 test-N501VW ollama[498989]: llama_model_loader: - type q8_0: 113 tensors Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file format = GGUF V3 (latest) Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file type = Q8_0 Oct 31 10:46:29 test-N501VW ollama[498989]: print_info: file size = 1.22 GiB (8.50 BPW) Oct 31 10:46:29 test-N501VW ollama[498989]: load: printing all EOG tokens: Oct 31 10:46:29 test-N501VW ollama[498989]: load: - 128001 ('<|end_of_text|>') Oct 31 10:46:29 test-N501VW ollama[498989]: load: - 128008 ('<|eom_id|>') Oct 31 10:46:29 test-N501VW ollama[498989]: load: - 128009 ('<|eot_id|>') Oct 31 10:46:29 test-N501VW ollama[498989]: load: special tokens cache size = 256 Oct 31 10:46:30 test-N501VW ollama[498989]: load: token to piece cache size = 0.7999 MB Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: arch = llama Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: vocab_only = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_ctx_train = 131072 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd = 2048 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_layer = 16 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_head = 32 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_head_kv = 8 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_rot = 64 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_swa = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: is_swa_any = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_head_k = 64 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_head_v = 64 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_gqa = 4 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_k_gqa = 512 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_embd_v_gqa = 512 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_norm_eps = 0.0e+00 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_norm_rms_eps = 1.0e-05 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_clamp_kqv = 0.0e+00 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_max_alibi_bias = 0.0e+00 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_logit_scale = 0.0e+00 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: f_attn_scale = 0.0e+00 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_ff = 8192 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_expert = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_expert_used = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: causal attn = 1 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: pooling type = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: rope type = 0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: rope scaling = linear Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: freq_base_train = 500000.0 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: freq_scale_train = 1 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_ctx_orig_yarn = 131072 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: rope_finetuned = unknown Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: model type = 1B Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: model params = 1.24 B Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: general.name = Llama 3.2 1B Instruct Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: vocab type = BPE Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_vocab = 128256 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: n_merges = 280147 Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: BOS token = 128000 '<|begin_of_text|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOS token = 128009 '<|eot_id|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOT token = 128009 '<|eot_id|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOM token = 128008 '<|eom_id|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: LF token = 198 'Ċ' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOG token = 128001 '<|end_of_text|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOG token = 128008 '<|eom_id|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: EOG token = 128009 '<|eot_id|>' Oct 31 10:46:30 test-N501VW ollama[498989]: print_info: max token length = 256 Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: loading model tensors, this can take a while... (mmap = true) Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: offloading 14 repeating layers to GPU Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: offloaded 14/17 layers to GPU Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: CUDA0 model buffer size = 862.97 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: load_tensors: CPU_Mapped model buffer size = 1252.41 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: llama_init_from_model: model default pooling_type is [0], but [-1] was specified Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: constructing llama_context Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_seq_max = 1 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ctx = 4096 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ctx_per_seq = 4096 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_batch = 512 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ubatch = 512 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: causal_attn = 1 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: flash_attn = disabled Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: kv_unified = false Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: freq_base = 500000.0 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: freq_scale = 1 Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized Oct 31 10:46:30 test-N501VW ollama[498989]: llama_context: CPU output buffer size = 0.50 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache: CUDA0 KV buffer size = 112.00 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache: CPU KV buffer size = 16.00 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache: size = 128.00 MiB ( 4096 cells, 16 layers, 1/1 seqs), K (f16): 64.00 MiB, V (f16): 64.00 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: graph_reserve: failed to allocate compute buffers Oct 31 10:46:30 test-N501VW ollama[498989]: SIGSEGV: segmentation violation Oct 31 10:46:30 test-N501VW ollama[498989]: PC=0x71fa9b10cec2 m=0 sigcode=1 addr=0x59120c3cb488 Oct 31 10:46:30 test-N501VW ollama[498989]: signal arrived during cgo execution Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 14 gp=0xc000103a40 m=0 mp=0x59149f033e20 [syscall]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.cgocall(0x59149df78c00, 0xc0000f1bf8) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/cgocall.go:167 +0x4b fp=0xc0000f1bd0 sp=0xc0000f1b98 pc=0x59149d2685cb Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x5914abfb3990, {0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...}) Oct 31 10:46:30 test-N501VW ollama[498989]: _cgo_gotypes.go:749 +0x4e fp=0xc0000f1bf8 sp=0xc0000f1bd0 pc=0x59149d61f4ee Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama.NewContextWithModel.func1(...) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama/llama.go:280 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama.NewContextWithModel(0xc0000120b8, {{0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...}}) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/llama/llama.go:280 +0x158 fp=0xc0000f1d98 sp=0xc0000f1bf8 pc=0x59149d6232b8 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000351e00, {0xe, 0x0, 0x1, {0xc00047bbb4, 0x1, 0x1}, 0xc0005034b0, 0x0}, {0x7fff91a29d29, ...}, ...) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:797 +0x198 fp=0xc0000f1ee0 sp=0xc0000f1d98 pc=0x59149d6dfdb8 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2() Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:879 +0x175 fp=0xc0000f1fe0 sp=0xc0000f1ee0 pc=0x59149d6e0e55 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc0000f1fe8 sp=0xc0000f1fe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 11 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:879 +0x7ce Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 1 gp=0xc000002380 m=nil [IO wait]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00012d790 sp=0xc00012d770 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.netpollblock(0xc00012d7e0?, 0x9d204946?, 0x14?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/netpoll.go:575 +0xf7 fp=0xc00012d7c8 sp=0xc00012d790 pc=0x59149d230537 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.runtime_pollWait(0x71fb63c97eb0, 0x72) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/netpoll.go:351 +0x85 fp=0xc00012d7e8 sp=0xc00012d7c8 pc=0x59149d26ac65 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).wait(0xc0001ccd80?, 0x900000036?, 0x0) Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00012d810 sp=0xc00012d7e8 pc=0x59149d2f1ba7 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).waitRead(...) Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll/fd_poll_runtime.go:89 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*FD).Accept(0xc0001ccd80) Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll/fd_unix.go:620 +0x295 fp=0xc00012d8b8 sp=0xc00012d810 pc=0x59149d2f6f75 Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*netFD).accept(0xc0001ccd80) Oct 31 10:46:30 test-N501VW ollama[498989]: net/fd_unix.go:172 +0x29 fp=0xc00012d970 sp=0xc00012d8b8 pc=0x59149d369e49 Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*TCPListener).accept(0xc000410a00) Oct 31 10:46:30 test-N501VW ollama[498989]: net/tcpsock_posix.go:159 +0x1b fp=0xc00012d9c0 sp=0xc00012d970 pc=0x59149d37f7fb Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*TCPListener).Accept(0xc000410a00) Oct 31 10:46:30 test-N501VW ollama[498989]: net/tcpsock.go:380 +0x30 fp=0xc00012d9f0 sp=0xc00012d9c0 pc=0x59149d37e6b0 Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*onceCloseListener).Accept(0xc0004c23f0?) Oct 31 10:46:30 test-N501VW ollama[498989]: <autogenerated>:1 +0x24 fp=0xc00012da08 sp=0xc00012d9f0 pc=0x59149d595e04 Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*Server).Serve(0xc0001f3800, {0x59149e776fe8, 0xc000410a00}) Oct 31 10:46:30 test-N501VW ollama[498989]: net/http/server.go:3424 +0x30c fp=0xc00012db38 sp=0xc00012da08 pc=0x59149d56d6cc Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034260, 0x4, 0x4}) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x8f5 fp=0xc00012dd08 sp=0xc00012db38 pc=0x59149d6e1815 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner.Execute({0xc000034250?, 0x0?, 0x0?}) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00012dd30 sp=0xc00012dd08 pc=0x59149d77fe54 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001f3400?, {0x59149e2800aa?, 0x4?, 0x59149e2800ae?}) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/cmd/cmd.go:1769 +0x45 fp=0xc00012dd58 sp=0xc00012dd30 pc=0x59149df08f45 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).execute(0xc0004c5508, {0xc000410800, 0x4, 0x4}) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00012de78 sp=0xc00012dd58 pc=0x59149d3e349c Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).ExecuteC(0xc0004ae908) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00012df30 sp=0xc00012de78 pc=0x59149d3e3ce5 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).Execute(...) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra@v1.7.0/command.go:992 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra.(*Command).ExecuteContext(...) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/spf13/cobra@v1.7.0/command.go:985 Oct 31 10:46:30 test-N501VW ollama[498989]: main.main() Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00012df50 sp=0xc00012df30 pc=0x59149df09a0d Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.main() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:283 +0x29d fp=0xc00012dfe0 sp=0xc00012df50 pc=0x59149d237bbd Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00012dfe8 sp=0xc00012dfe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000070fa8 sp=0xc000070f88 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:441 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.forcegchelper() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:348 +0xb8 fp=0xc000070fe0 sp=0xc000070fa8 pc=0x59149d237ef8 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.init.7 in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:336 +0x1a Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000071780 sp=0xc000071760 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:441 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.bgsweep(0xc00009c000) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgcsweep.go:316 +0xdf fp=0xc0000717c8 sp=0xc000071780 pc=0x59149d22269f Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcenable.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:204 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x59149d216a85 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcenable in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:204 +0x66 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x10000?, 0x59149e448950?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000071f78 sp=0xc000071f58 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:441 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.(*scavengerState).park(0x59149f031000) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgcscavenge.go:425 +0x49 fp=0xc000071fa8 sp=0xc000071f78 pc=0x59149d2200e9 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.bgscavenge(0xc00009c000) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgcscavenge.go:658 +0x59 fp=0xc000071fc8 sp=0xc000071fa8 pc=0x59149d220679 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcenable.gowrap2() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:205 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x59149d216a25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcenable in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:205 +0xa5 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000070688?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000070630 sp=0xc000070610 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.runfinq() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mfinal.go:196 +0x107 fp=0xc0000707e0 sp=0xc000070630 pc=0x59149d215a47 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.createfing in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mfinal.go:166 +0x3d Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 6 gp=0xc0001d08c0 m=nil [chan receive]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0xc0002255e0?, 0xc000590018?, 0x60?, 0x27?, 0x59149d350a88?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000072718 sp=0xc0000726f8 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.chanrecv(0xc0000a6310, 0x0, 0x1) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/chan.go:664 +0x445 fp=0xc000072790 sp=0xc000072718 pc=0x59149d207525 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.chanrecv1(0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/chan.go:506 +0x12 fp=0xc0000727b8 sp=0xc000072790 pc=0x59149d2070b2 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.unique_runtime_registerUniqueMapCleanup.func2(...) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1796 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1799 +0x2f fp=0xc0000727e0 sp=0xc0000727b8 pc=0x59149d219c2f Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by unique.runtime_registerUniqueMapCleanup in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1794 +0x85 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 7 gp=0xc0001d0e00 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000072f38 sp=0xc000072f18 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc000072fc8 sp=0xc000072f38 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc000072fe0 sp=0xc000072fc8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00006c738 sp=0xc00006c718 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc00006c7c8 sp=0xc00006c738 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc00006c7e0 sp=0xc00006c7c8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00006cf38 sp=0xc00006cf18 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc00006cfc8 sp=0xc00006cf38 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc00006cfe0 sp=0xc00006cfc8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e91526?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 8 gp=0xc0001d0fc0 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e4d11f?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000073738 sp=0xc000073718 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc0000737c8 sp=0xc000073738 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc0000737e0 sp=0xc0000737c8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc0000737e8 sp=0xc0000737e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 9 gp=0xc0001d1180 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e91cd5?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000073f38 sp=0xc000073f18 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc000073fc8 sp=0xc000073f38 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc000073fe0 sp=0xc000073fc8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e4b182?, 0x3?, 0x58?, 0x9?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00006d738 sp=0xc00006d718 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc00006d7c8 sp=0xc00006d738 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc00006d7e0 sp=0xc00006d7c8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x179b3e7e9141c?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkWorker(0xc0000a78f0) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x59149d218f49 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gcBgMarkStartWorkers.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x59149d218e25 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by runtime.gcBgMarkStartWorkers in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/mgc.go:1339 +0x105 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 10 gp=0xc000103500 m=nil [sync.WaitGroup.Wait]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x0?, 0x0?, 0x60?, 0x40?, 0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc00011ce20 sp=0xc00011ce00 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goparkunlock(...) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:441 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.semacquire1(0xc000351e20, 0x0, 0x1, 0x0, 0x18) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/sema.go:188 +0x229 fp=0xc00011ce88 sp=0xc00011ce20 pc=0x59149d24b189 Oct 31 10:46:30 test-N501VW ollama[498989]: sync.runtime_SemacquireWaitGroup(0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/sema.go:110 +0x25 fp=0xc00011cec0 sp=0xc00011ce88 pc=0x59149d26d385 Oct 31 10:46:30 test-N501VW ollama[498989]: sync.(*WaitGroup).Wait(0x0?) Oct 31 10:46:30 test-N501VW ollama[498989]: sync/waitgroup.go:118 +0x48 fp=0xc00011cee8 sp=0xc00011cec0 pc=0x59149d27e9e8 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc000351e00, {0x59149e779590, 0xc00050e690}) Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:334 +0x4b fp=0xc00011cfb8 sp=0xc00011cee8 pc=0x59149d6dc98b Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x28 fp=0xc00011cfe0 sp=0xc00011cfb8 pc=0x59149d6e1a88 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x4c5 Oct 31 10:46:30 test-N501VW ollama[498989]: goroutine 11 gp=0xc0001036c0 m=nil [IO wait]: Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.gopark(0x71fb63c537a8?, 0xc0001cce00?, 0x70?, 0x79?, 0xb?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/proc.go:435 +0xce fp=0xc000047948 sp=0xc000047928 pc=0x59149d26ba4e Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.netpollblock(0x59149d28e8b8?, 0x9d204946?, 0x14?) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/netpoll.go:575 +0xf7 fp=0xc000047980 sp=0xc000047948 pc=0x59149d230537 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.runtime_pollWait(0x71fb63c97d98, 0x72) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/netpoll.go:351 +0x85 fp=0xc0000479a0 sp=0xc000047980 pc=0x59149d26ac65 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).wait(0xc0001cce00?, 0xc0004d7000?, 0x0) Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000479c8 sp=0xc0000479a0 pc=0x59149d2f1ba7 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*pollDesc).waitRead(...) Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll/fd_poll_runtime.go:89 Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll.(*FD).Read(0xc0001cce00, {0xc0004d7000, 0x1000, 0x1000}) Oct 31 10:46:30 test-N501VW ollama[498989]: internal/poll/fd_unix.go:165 +0x27a fp=0xc000047a60 sp=0xc0000479c8 pc=0x59149d2f2e9a Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*netFD).Read(0xc0001cce00, {0xc0004d7000?, 0xc000047ad0?, 0x59149d2f2065?}) Oct 31 10:46:30 test-N501VW ollama[498989]: net/fd_posix.go:55 +0x25 fp=0xc000047aa8 sp=0xc000047a60 pc=0x59149d367ea5 Oct 31 10:46:30 test-N501VW ollama[498989]: net.(*conn).Read(0xc00013e528, {0xc0004d7000?, 0x0?, 0x0?}) Oct 31 10:46:30 test-N501VW ollama[498989]: net/net.go:194 +0x45 fp=0xc000047af0 sp=0xc000047aa8 pc=0x59149d376265 Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*connReader).Read(0xc0004842d0, {0xc0004d7000, 0x1000, 0x1000}) Oct 31 10:46:30 test-N501VW ollama[498989]: net/http/server.go:798 +0x159 fp=0xc000047b40 sp=0xc000047af0 pc=0x59149d562579 Oct 31 10:46:30 test-N501VW ollama[498989]: bufio.(*Reader).fill(0xc000110660) Oct 31 10:46:30 test-N501VW ollama[498989]: bufio/bufio.go:113 +0x103 fp=0xc000047b78 sp=0xc000047b40 pc=0x59149d38da03 Oct 31 10:46:30 test-N501VW ollama[498989]: bufio.(*Reader).Peek(0xc000110660, 0x4) Oct 31 10:46:30 test-N501VW ollama[498989]: bufio/bufio.go:152 +0x53 fp=0xc000047b98 sp=0xc000047b78 pc=0x59149d38db33 Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*conn).serve(0xc0004c23f0, {0x59149e779558, 0xc000484120}) Oct 31 10:46:30 test-N501VW ollama[498989]: net/http/server.go:2137 +0x785 fp=0xc000047fb8 sp=0xc000047b98 pc=0x59149d568365 Oct 31 10:46:30 test-N501VW ollama[498989]: net/http.(*Server).Serve.gowrap3() Oct 31 10:46:30 test-N501VW ollama[498989]: net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x59149d56dac8 Oct 31 10:46:30 test-N501VW ollama[498989]: runtime.goexit({}) Oct 31 10:46:30 test-N501VW ollama[498989]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x59149d2730a1 Oct 31 10:46:30 test-N501VW ollama[498989]: created by net/http.(*Server).Serve in goroutine 1 Oct 31 10:46:30 test-N501VW ollama[498989]: net/http/server.go:3454 +0x485 Oct 31 10:46:30 test-N501VW ollama[498989]: rax 0x59120c3cb488 Oct 31 10:46:30 test-N501VW ollama[498989]: rbx 0x1 Oct 31 10:46:30 test-N501VW ollama[498989]: rcx 0x5914aba53c50 Oct 31 10:46:30 test-N501VW ollama[498989]: rdx 0xffffffffac081e70 Oct 31 10:46:30 test-N501VW ollama[498989]: rdi 0x5914ac084fb0 Oct 31 10:46:30 test-N501VW ollama[498989]: rsi 0x0 Oct 31 10:46:30 test-N501VW ollama[498989]: rbp 0x5914ac07f810 Oct 31 10:46:30 test-N501VW ollama[498989]: rsp 0x7fff91a285d8 Oct 31 10:46:30 test-N501VW ollama[498989]: r8 0x5914af442 Oct 31 10:46:30 test-N501VW ollama[498989]: r9 0x7 Oct 31 10:46:30 test-N501VW ollama[498989]: r10 0x5914af442240 Oct 31 10:46:30 test-N501VW ollama[498989]: r11 0xada15c26cda4350d Oct 31 10:46:30 test-N501VW ollama[498989]: r12 0x5914ac07f810 Oct 31 10:46:30 test-N501VW ollama[498989]: r13 0x0 Oct 31 10:46:30 test-N501VW ollama[498989]: r14 0x0 Oct 31 10:46:30 test-N501VW ollama[498989]: r15 0x5914abfb3990 Oct 31 10:46:30 test-N501VW ollama[498989]: rip 0x71fa9b10cec2 Oct 31 10:46:30 test-N501VW ollama[498989]: rflags 0x10202 Oct 31 10:46:30 test-N501VW ollama[498989]: cs 0x33 Oct 31 10:46:30 test-N501VW ollama[498989]: fs 0x0 Oct 31 10:46:30 test-N501VW ollama[498989]: gs 0x0 Oct 31 10:46:30 test-N501VW ollama[498989]: time=2025-10-31T10:46:30.569-05:00 level=INFO source=sched.go:446 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 error="llama runner process has terminated: exit status 2" Oct 31 10:46:30 test-N501VW ollama[498989]: [GIN] 2025/10/31 - 10:46:30 | 500 | 1.960116545s | 127.0.0.1 | POST "/api/generate" ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.7
GiteaMirror added the bug label 2026-04-12 21:14:51 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 31, 2025):

Server log will help in debugging.

OLLAMA_NUM_GPU is not an ollama configuration variable. Since you can't load the model to set the layer count. the easiest thing to do is create a copy of the model:

$ echo FROM llama3.2:1b > Modelfile
$ echo PARAMETER num_gpu 0 >> Modelfile
$ ollama create llama3.2:1b-nogpu
$ ollama run llama3.2:1b-nogpu ''
$ ollama ps
NAME                 ID              SIZE      PROCESSOR    CONTEXT    UNTIL   
llama3.2:1b-nogpu    97b384972d45    1.4 GB    100% CPU     4096       Forever    
<!-- gh-comment-id:3473662553 --> @rick-github commented on GitHub (Oct 31, 2025): [Server log](https://docs.ollama.com/troubleshooting) will help in debugging. `OLLAMA_NUM_GPU` is not an ollama configuration variable. Since you can't load the model to set the layer count. the easiest thing to do is create a copy of the model: ```console $ echo FROM llama3.2:1b > Modelfile $ echo PARAMETER num_gpu 0 >> Modelfile $ ollama create llama3.2:1b-nogpu $ ollama run llama3.2:1b-nogpu '' $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL llama3.2:1b-nogpu 97b384972d45 1.4 GB 100% CPU 4096 Forever ```
Author
Owner

@FadyAboujaoude commented on GitHub (Oct 31, 2025):

Hi Rick
thank you for the quick reply
I added the server log to the issue above

is there anything specific I should look for?
I found a failure message but I an interpret it

<!-- gh-comment-id:3473733552 --> @FadyAboujaoude commented on GitHub (Oct 31, 2025): Hi Rick thank you for the quick reply I added the server log to the issue above is there anything specific I should look for? I found a failure message but I an interpret it
Author
Owner

@rick-github commented on GitHub (Oct 31, 2025):

Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache: size =  128.00 MiB (  4096 cells,  16 layers,  1/1 seqs), K (f16):   64.00 MiB, V (f16):   64.00 MiB
Oct 31 10:46:30 test-N501VW ollama[498989]: graph_reserve: failed to allocate compute buffers

Failed due to lack of memory.

Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.435-05:00 level=INFO source=server.go:507
 msg=offload library=CUDA layers.requested=-1 layers.model=17 layers.offload=14 layers.split=[14] memory.available="[1.9 GiB]"
 memory.gpu_overhead="0 B" memory.required.full="2.3 GiB" memory.required.partial="1.9 GiB" memory.required.kv="128.0 MiB"
 memory.required.allocations="[1.9 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="986.2 MiB"
 memory.weights.nonrepeating="266.2 MiB" memory.graph.full="280.0 MiB" memory.graph.partial="464.0 MiB"

The ollama server saw 1.9G free and allocated 1.9G to hold 14 layers. So no margin for error, and the runner paid the price by running out of memory. See here for some ways to mitigate this.

What version of ollama are you running (ollama -v)? There's been work on improving the memory allocation, if you are not using a recent release it may pay to update.

<!-- gh-comment-id:3473764589 --> @rick-github commented on GitHub (Oct 31, 2025): ``` Oct 31 10:46:30 test-N501VW ollama[498989]: llama_kv_cache: size = 128.00 MiB ( 4096 cells, 16 layers, 1/1 seqs), K (f16): 64.00 MiB, V (f16): 64.00 MiB Oct 31 10:46:30 test-N501VW ollama[498989]: graph_reserve: failed to allocate compute buffers ``` Failed due to lack of memory. ``` Oct 31 10:46:29 test-N501VW ollama[498989]: time=2025-10-31T10:46:29.435-05:00 level=INFO source=server.go:507 msg=offload library=CUDA layers.requested=-1 layers.model=17 layers.offload=14 layers.split=[14] memory.available="[1.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.3 GiB" memory.required.partial="1.9 GiB" memory.required.kv="128.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="986.2 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="280.0 MiB" memory.graph.partial="464.0 MiB" ``` The ollama server saw 1.9G free and allocated 1.9G to hold 14 layers. So no margin for error, and the runner paid the price by running out of memory. See [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288) for some ways to mitigate this. What version of ollama are you running (`ollama -v`)? There's been work on improving the memory allocation, if you are not using a recent release it may pay to update.
Author
Owner

@FadyAboujaoude commented on GitHub (Oct 31, 2025):

test@test-N501VW:~$ ollama --version
ollama version is 0.12.7

I will try the mitigation steps

thank you!

<!-- gh-comment-id:3473778827 --> @FadyAboujaoude commented on GitHub (Oct 31, 2025): test@test-N501VW:~$ ollama --version **ollama version is 0.12.7** I will try the mitigation steps thank you!
Author
Owner

@FadyAboujaoude commented on GitHub (Oct 31, 2025):

just an update
ran your code for no GPU usage and it was successful

$ echo FROM llama3.2:1b > Modelfile
$ echo PARAMETER num_gpu 0 >> Modelfile
$ ollama create llama3.2:1b-nogpu
$ ollama run llama3.2:1b-nogpu ''
$ ollama ps

(I was unsure where to change the variables on the config.go file you mentioned in the other post)

thank you!

<!-- gh-comment-id:3473961327 --> @FadyAboujaoude commented on GitHub (Oct 31, 2025): just an update ran your code for no GPU usage and it was successful $ echo FROM llama3.2:1b > Modelfile $ echo PARAMETER num_gpu 0 >> Modelfile $ ollama create llama3.2:1b-nogpu $ ollama run llama3.2:1b-nogpu '' $ ollama ps (I was unsure where to change the variables on the config.go file you mentioned in the other post) thank you!
Author
Owner

@jessegross commented on GitHub (Oct 31, 2025):

The reason for the difference in behavior between models is that gemma3 runs on the Ollama engine, whereas llama and qwen2 (as used by deepseek-r1:1.5b) default to running on the old engine. The Ollama engine has better memory management and should allow you to run the model at least partially on the GPU.

In the case of those two model architectures, you can force them to run on the Ollama engine by setting OLLAMA_NEW_ENGINE=1. We are working to migrate more models over so that this is the default.

<!-- gh-comment-id:3474534058 --> @jessegross commented on GitHub (Oct 31, 2025): The reason for the difference in behavior between models is that gemma3 runs on the Ollama engine, whereas llama and qwen2 (as used by deepseek-r1:1.5b) default to running on the old engine. The Ollama engine has better memory management and should allow you to run the model at least partially on the GPU. In the case of those two model architectures, you can force them to run on the Ollama engine by setting OLLAMA_NEW_ENGINE=1. We are working to migrate more models over so that this is the default.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8541