[GH-ISSUE #8936] v0.5.8 a lot slower than v0.5.7 #31558

Closed
opened 2026-04-22 12:06:21 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @ghost on GitHub (Feb 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8936

What is the issue?

I was hoping to take advantage of AVX512 on a Xeon W-2235 yet I'm finding the speeds in v0.5.8 is a lot slower than v0.5.7. Specs are 128MB DDR4 in 4 channels (60GB/sec bandwidth). No GPU (runs headless).

For llama3.3:70B
v.0.5.7: response token 1.15/sec, prompt token 2.18/sec
v.0.5.8: response token 0.34/sec, prompt token 0.35/sec

Relevant log output

v0.5.8:
2025/02/08 10:31:40 routes.go:1186: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-08T10:31:40.944+11:00 level=INFO source=images.go:432 msg="total blobs: 35"
time=2025-02-08T10:31:40.945+11:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-08T10:31:40.946+11:00 level=INFO source=routes.go:1237 msg="Listening on 127.0.0.1:11434 (version 0.5.8-rc11)"
time=2025-02-08T10:31:40.946+11:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-02-08T10:31:40.947+11:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-02-08T10:31:40.947+11:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12
time=2025-02-08T10:31:40.979+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library C:\\Windows\\system32\\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T10:31:40.984+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library C:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T10:31:40.985+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library c:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T10:31:41.002+11:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library C:\\Windows\\system32\\nvcuda.dll"
time=2025-02-08T10:31:41.024+11:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library C:\\Windows\\System32\\nvcuda.dll"
time=2025-02-08T10:31:43.139+11:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-02-08T10:31:43.139+11:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.7 GiB" available="116.6 GiB"
time=2025-02-08T10:33:28.451+11:00 level=INFO source=server.go:100 msg="system memory" total="127.7 GiB" free="117.6 GiB" free_swap="135.0 GiB"
time=2025-02-08T10:33:28.455+11:00 level=INFO source=memory.go:356 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[117.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[43.2 GiB]" memory.weights.total="40.7 GiB" memory.weights.repeating="39.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
time=2025-02-08T10:33:28.481+11:00 level=INFO source=server.go:381 msg="starting llama server" cmd="C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\XXX\\.ollama\\models\\blobs\\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --ctx-size 8192 --batch-size 512 --threads 6 --no-mmap --parallel 4 --port 55254"
time=2025-02-08T10:33:28.493+11:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-02-08T10:33:28.493+11:00 level=INFO source=server.go:558 msg="waiting for llama runner to start responding"
time=2025-02-08T10:33:28.494+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server error"
time=2025-02-08T10:33:28.549+11:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-02-08T10:33:28.550+11:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(clang)" threads=6
time=2025-02-08T10:33:28.551+11:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:55254"
time=2025-02-08T10:33:28.746+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 39.59 GiB (4.82 BPW) 
llm_load_print_meta: general.name     = Llama 3.1 70B Instruct 2024 12
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors:          CPU model buffer size = 40543.11 MiB
llama_new_context_with_model: n_seq_max     = 4
llama_new_context_with_model: n_ctx         = 8192
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch       = 2048
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 500000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 80, can_shift = 1
llama_kv_cache_init:        CPU KV buffer size =  2560.00 MiB
llama_new_context_with_model: KV self size  = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     2.08 MiB
llama_new_context_with_model:        CPU compute buffer size =  1104.01 MiB
llama_new_context_with_model: graph nodes  = 2566
llama_new_context_with_model: graph splits = 1
time=2025-02-08T10:33:49.811+11:00 level=INFO source=server.go:597 msg="llama runner started in 21.32 seconds"
[GIN] 2025/02/08 - 10:41:57 | 200 |         8m29s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/02/08 - 10:47:40 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:47:56 | 200 |       528.7µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:48:11 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:48:21 | 200 |       525.7µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:48:32 | 200 |       527.4µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:48:43 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:48:59 | 500 |          7m1s |       127.0.0.1 | POST     "/api/chat"

v0.5.7:
2025/02/08 10:51:02 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-08T10:51:02.938+11:00 level=INFO source=images.go:432 msg="total blobs: 35"
time=2025-02-08T10:51:02.942+11:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-08T10:51:02.948+11:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-02-08T10:51:02.949+11:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cuda_v12_avx rocm_avx cpu cpu_avx cpu_avx2 cuda_v11_avx]"
time=2025-02-08T10:51:02.951+11:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-02-08T10:51:02.951+11:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-02-08T10:51:02.951+11:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12
time=2025-02-08T10:51:02.973+11:00 level=INFO source=gpu.go:654 msg="Unable to load NVML management library C:\\Windows\\system32\\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T10:51:02.982+11:00 level=INFO source=gpu.go:654 msg="Unable to load NVML management library C:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T10:51:02.983+11:00 level=INFO source=gpu.go:654 msg="Unable to load NVML management library c:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T10:51:02.996+11:00 level=INFO source=gpu.go:620 msg="no nvidia devices detected by library C:\\Windows\\system32\\nvcuda.dll"
time=2025-02-08T10:51:03.005+11:00 level=INFO source=gpu.go:620 msg="no nvidia devices detected by library C:\\Windows\\System32\\nvcuda.dll"
time=2025-02-08T10:51:03.140+11:00 level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered"
time=2025-02-08T10:51:03.140+11:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="127.7 GiB" available="116.5 GiB"
[GIN] 2025/02/08 - 10:53:14 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
time=2025-02-08T10:53:17.676+11:00 level=INFO source=server.go:104 msg="system memory" total="127.7 GiB" free="117.5 GiB" free_swap="135.0 GiB"
time=2025-02-08T10:53:17.677+11:00 level=INFO source=memory.go:356 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[117.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[43.2 GiB]" memory.weights.total="40.7 GiB" memory.weights.repeating="39.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
time=2025-02-08T10:53:17.694+11:00 level=INFO source=server.go:376 msg="starting llama server" cmd="C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\runners\\cpu_avx2\\ollama_llama_server.exe runner --model C:\\Users\\XXX\\.ollama\\models\\blobs\\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --ctx-size 8192 --batch-size 512 --threads 6 --no-mmap --parallel 4 --port 55303"
time=2025-02-08T10:53:19.124+11:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-02-08T10:53:19.124+11:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
time=2025-02-08T10:53:19.125+11:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2025-02-08T10:53:19.159+11:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-02-08T10:53:19.163+11:00 level=INFO source=runner.go:937 msg=system info="CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(clang)" threads=6
time=2025-02-08T10:53:19.164+11:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:55303"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
time=2025-02-08T10:53:19.379+11:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 39.59 GiB (4.82 BPW) 
llm_load_print_meta: general.name     = Llama 3.1 70B Instruct 2024 12
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors:          CPU model buffer size = 40543.11 MiB
llama_new_context_with_model: n_seq_max     = 4
llama_new_context_with_model: n_ctx         = 8192
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch       = 2048
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 500000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 80, can_shift = 1
llama_kv_cache_init:        CPU KV buffer size =  2560.00 MiB
llama_new_context_with_model: KV self size  = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     2.08 MiB
llama_new_context_with_model:        CPU compute buffer size =  1104.01 MiB
llama_new_context_with_model: graph nodes  = 2566
llama_new_context_with_model: graph splits = 1
time=2025-02-08T10:53:35.180+11:00 level=INFO source=server.go:594 msg="llama runner started in 16.06 seconds"
[GIN] 2025/02/08 - 10:55:28 | 200 |         2m11s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/02/08 - 10:55:46 | 200 |       529.3µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:55:53 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:55:57 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:56:02 | 200 |       518.5µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:56:07 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 10:58:31 | 200 |          3m2s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/02/08 - 11:02:18 | 200 |         3m47s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/02/08 - 12:10:28 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:31 | 200 |       612.2µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:35 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:37 | 200 |       566.6µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:40 | 200 |       587.2µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:41 | 200 |       574.3µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:43 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:44 | 200 |       575.1µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:46 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:10:48 | 200 |       584.3µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/02/08 - 12:16:36 | 200 |       591.6µs |       127.0.0.1 | GET      "/api/version"

OS

Windows

GPU

No response

CPU

Intel

Ollama version

0.5.7 and 0.5.8

Originally created by @ghost on GitHub (Feb 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8936 ### What is the issue? I was hoping to take advantage of AVX512 on a Xeon W-2235 yet I'm finding the speeds in v0.5.8 is a lot slower than v0.5.7. Specs are 128MB DDR4 in 4 channels (60GB/sec bandwidth). No GPU (runs headless). For llama3.3:70B v.0.5.7: response token 1.15/sec, prompt token 2.18/sec v.0.5.8: response token 0.34/sec, prompt token 0.35/sec ### Relevant log output ```shell v0.5.8: 2025/02/08 10:31:40 routes.go:1186: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-08T10:31:40.944+11:00 level=INFO source=images.go:432 msg="total blobs: 35" time=2025-02-08T10:31:40.945+11:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-08T10:31:40.946+11:00 level=INFO source=routes.go:1237 msg="Listening on 127.0.0.1:11434 (version 0.5.8-rc11)" time=2025-02-08T10:31:40.946+11:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-02-08T10:31:40.947+11:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-02-08T10:31:40.947+11:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12 time=2025-02-08T10:31:40.979+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library C:\\Windows\\system32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T10:31:40.984+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library C:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T10:31:40.985+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library c:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T10:31:41.002+11:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library C:\\Windows\\system32\\nvcuda.dll" time=2025-02-08T10:31:41.024+11:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library C:\\Windows\\System32\\nvcuda.dll" time=2025-02-08T10:31:43.139+11:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-02-08T10:31:43.139+11:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.7 GiB" available="116.6 GiB" time=2025-02-08T10:33:28.451+11:00 level=INFO source=server.go:100 msg="system memory" total="127.7 GiB" free="117.6 GiB" free_swap="135.0 GiB" time=2025-02-08T10:33:28.455+11:00 level=INFO source=memory.go:356 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[117.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[43.2 GiB]" memory.weights.total="40.7 GiB" memory.weights.repeating="39.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" time=2025-02-08T10:33:28.481+11:00 level=INFO source=server.go:381 msg="starting llama server" cmd="C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\XXX\\.ollama\\models\\blobs\\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --ctx-size 8192 --batch-size 512 --threads 6 --no-mmap --parallel 4 --port 55254" time=2025-02-08T10:33:28.493+11:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2025-02-08T10:33:28.493+11:00 level=INFO source=server.go:558 msg="waiting for llama runner to start responding" time=2025-02-08T10:33:28.494+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server error" time=2025-02-08T10:33:28.549+11:00 level=INFO source=runner.go:936 msg="starting go runner" time=2025-02-08T10:33:28.550+11:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(clang)" threads=6 time=2025-02-08T10:33:28.551+11:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:55254" time=2025-02-08T10:33:28.746+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 39.59 GiB (4.82 BPW) llm_load_print_meta: general.name = Llama 3.1 70B Instruct 2024 12 llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: CPU model buffer size = 40543.11 MiB llama_new_context_with_model: n_seq_max = 4 llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 80, can_shift = 1 llama_kv_cache_init: CPU KV buffer size = 2560.00 MiB llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB llama_new_context_with_model: CPU output buffer size = 2.08 MiB llama_new_context_with_model: CPU compute buffer size = 1104.01 MiB llama_new_context_with_model: graph nodes = 2566 llama_new_context_with_model: graph splits = 1 time=2025-02-08T10:33:49.811+11:00 level=INFO source=server.go:597 msg="llama runner started in 21.32 seconds" [GIN] 2025/02/08 - 10:41:57 | 200 | 8m29s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/02/08 - 10:47:40 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:47:56 | 200 | 528.7µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:48:11 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:48:21 | 200 | 525.7µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:48:32 | 200 | 527.4µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:48:43 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:48:59 | 500 | 7m1s | 127.0.0.1 | POST "/api/chat" v0.5.7: 2025/02/08 10:51:02 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-08T10:51:02.938+11:00 level=INFO source=images.go:432 msg="total blobs: 35" time=2025-02-08T10:51:02.942+11:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-08T10:51:02.948+11:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)" time=2025-02-08T10:51:02.949+11:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cuda_v12_avx rocm_avx cpu cpu_avx cpu_avx2 cuda_v11_avx]" time=2025-02-08T10:51:02.951+11:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2025-02-08T10:51:02.951+11:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-02-08T10:51:02.951+11:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12 time=2025-02-08T10:51:02.973+11:00 level=INFO source=gpu.go:654 msg="Unable to load NVML management library C:\\Windows\\system32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T10:51:02.982+11:00 level=INFO source=gpu.go:654 msg="Unable to load NVML management library C:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T10:51:02.983+11:00 level=INFO source=gpu.go:654 msg="Unable to load NVML management library c:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T10:51:02.996+11:00 level=INFO source=gpu.go:620 msg="no nvidia devices detected by library C:\\Windows\\system32\\nvcuda.dll" time=2025-02-08T10:51:03.005+11:00 level=INFO source=gpu.go:620 msg="no nvidia devices detected by library C:\\Windows\\System32\\nvcuda.dll" time=2025-02-08T10:51:03.140+11:00 level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered" time=2025-02-08T10:51:03.140+11:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="127.7 GiB" available="116.5 GiB" [GIN] 2025/02/08 - 10:53:14 | 200 | 0s | 127.0.0.1 | GET "/api/version" time=2025-02-08T10:53:17.676+11:00 level=INFO source=server.go:104 msg="system memory" total="127.7 GiB" free="117.5 GiB" free_swap="135.0 GiB" time=2025-02-08T10:53:17.677+11:00 level=INFO source=memory.go:356 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[117.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[43.2 GiB]" memory.weights.total="40.7 GiB" memory.weights.repeating="39.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" time=2025-02-08T10:53:17.694+11:00 level=INFO source=server.go:376 msg="starting llama server" cmd="C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\runners\\cpu_avx2\\ollama_llama_server.exe runner --model C:\\Users\\XXX\\.ollama\\models\\blobs\\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --ctx-size 8192 --batch-size 512 --threads 6 --no-mmap --parallel 4 --port 55303" time=2025-02-08T10:53:19.124+11:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2025-02-08T10:53:19.124+11:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding" time=2025-02-08T10:53:19.125+11:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2025-02-08T10:53:19.159+11:00 level=INFO source=runner.go:936 msg="starting go runner" time=2025-02-08T10:53:19.163+11:00 level=INFO source=runner.go:937 msg=system info="CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(clang)" threads=6 time=2025-02-08T10:53:19.164+11:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:55303" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors time=2025-02-08T10:53:19.379+11:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 39.59 GiB (4.82 BPW) llm_load_print_meta: general.name = Llama 3.1 70B Instruct 2024 12 llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: CPU model buffer size = 40543.11 MiB llama_new_context_with_model: n_seq_max = 4 llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 80, can_shift = 1 llama_kv_cache_init: CPU KV buffer size = 2560.00 MiB llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB llama_new_context_with_model: CPU output buffer size = 2.08 MiB llama_new_context_with_model: CPU compute buffer size = 1104.01 MiB llama_new_context_with_model: graph nodes = 2566 llama_new_context_with_model: graph splits = 1 time=2025-02-08T10:53:35.180+11:00 level=INFO source=server.go:594 msg="llama runner started in 16.06 seconds" [GIN] 2025/02/08 - 10:55:28 | 200 | 2m11s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/02/08 - 10:55:46 | 200 | 529.3µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:55:53 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:55:57 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:56:02 | 200 | 518.5µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:56:07 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 10:58:31 | 200 | 3m2s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/02/08 - 11:02:18 | 200 | 3m47s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/02/08 - 12:10:28 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:31 | 200 | 612.2µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:35 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:37 | 200 | 566.6µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:40 | 200 | 587.2µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:41 | 200 | 574.3µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:43 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:44 | 200 | 575.1µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:46 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:10:48 | 200 | 584.3µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/02/08 - 12:16:36 | 200 | 591.6µs | 127.0.0.1 | GET "/api/version" ``` ### OS Windows ### GPU _No response_ ### CPU Intel ### Ollama version 0.5.7 and 0.5.8
GiteaMirror added the bug label 2026-04-22 12:06:21 -05:00
Author
Owner

@jmorganca commented on GitHub (Feb 8, 2025):

Sorry about that and thanks for reporting. Does this directory exist for you? C:\Users\<your username>\AppData\Local\Programs\Ollama\lib\ollama?

<!-- gh-comment-id:2644409687 --> @jmorganca commented on GitHub (Feb 8, 2025): Sorry about that and thanks for reporting. Does this directory exist for you? `C:\Users\<your username>\AppData\Local\Programs\Ollama\lib\ollama`?
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

Sorry about that and thanks for reporting. Does this directory exist for you? C:\Users\<your username>\AppData\Local\Programs\Ollama\lib\ollama?
Yes it does. I've re-installed v0.5.7 over v0.5.8 and v0.5.7 shows the below:

Volume in drive C has no label.
Volume Serial Number is B2B6-AE26

Directory of C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama

08/02/2025 12:32 PM

.
08/02/2025 10:49 AM ..
16/01/2025 05:33 PM 15,296 api-ms-win-crt-convert-l1-1-0.dll
16/01/2025 05:33 PM 11,704 api-ms-win-crt-environment-l1-1-0.dll
16/01/2025 05:33 PM 13,248 api-ms-win-crt-filesystem-l1-1-0.dll
16/01/2025 05:33 PM 12,216 api-ms-win-crt-heap-l1-1-0.dll
16/01/2025 05:33 PM 11,712 api-ms-win-crt-locale-l1-1-0.dll
16/01/2025 05:33 PM 20,416 api-ms-win-crt-math-l1-1-0.dll
16/01/2025 05:33 PM 15,800 api-ms-win-crt-runtime-l1-1-0.dll
16/01/2025 05:33 PM 17,336 api-ms-win-crt-stdio-l1-1-0.dll
16/01/2025 05:33 PM 17,344 api-ms-win-crt-string-l1-1-0.dll
16/01/2025 05:33 PM 13,760 api-ms-win-crt-time-l1-1-0.dll
16/01/2025 05:33 PM 113,725,888 cublas64_11.dll
16/01/2025 05:33 PM 100,196,800 cublas64_12.dll
16/01/2025 05:33 PM 241,664,960 cublasLt64_11.dll
16/01/2025 05:33 PM 472,810,936 cublasLt64_12.dll
16/01/2025 05:33 PM 434,112 cudart32_110.dll
16/01/2025 05:33 PM 505,792 cudart64_110.dll
16/01/2025 05:33 PM 562,112 cudart64_12.dll
16/01/2025 05:33 PM 661,432 hipblas.dll
16/01/2025 05:33 PM 565,176 msvcp140.dll
16/01/2025 05:33 PM 22,976 msvcp140_1.dll
16/01/2025 05:33 PM 184,768 msvcp140_2.dll
16/01/2025 05:33 PM 55,232 msvcp140_atomic_wait.dll
16/01/2025 05:33 PM 19,392 msvcp140_codecvt_ids.dll
08/02/2025 12:32 PM 0 output.txt
08/02/2025 10:49 AM rocblas
16/01/2025 05:33 PM 414,733,240 rocblas.dll
08/02/2025 10:50 AM runners
16/01/2025 05:33 PM 96,192 vcruntime140.dll
16/01/2025 05:33 PM 36,288 vcruntime140_1.dll
27 File(s) 1,346,424,128 bytes
4 Dir(s) 1,373,859,561,472 bytes free

<!-- gh-comment-id:2644413407 --> @ghost commented on GitHub (Feb 8, 2025): > Sorry about that and thanks for reporting. Does this directory exist for you? `C:\Users\<your username>\AppData\Local\Programs\Ollama\lib\ollama`? Yes it does. I've re-installed v0.5.7 over v0.5.8 and v0.5.7 shows the below: Volume in drive C has no label. Volume Serial Number is B2B6-AE26 Directory of C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama 08/02/2025 12:32 PM <DIR> . 08/02/2025 10:49 AM <DIR> .. 16/01/2025 05:33 PM 15,296 api-ms-win-crt-convert-l1-1-0.dll 16/01/2025 05:33 PM 11,704 api-ms-win-crt-environment-l1-1-0.dll 16/01/2025 05:33 PM 13,248 api-ms-win-crt-filesystem-l1-1-0.dll 16/01/2025 05:33 PM 12,216 api-ms-win-crt-heap-l1-1-0.dll 16/01/2025 05:33 PM 11,712 api-ms-win-crt-locale-l1-1-0.dll 16/01/2025 05:33 PM 20,416 api-ms-win-crt-math-l1-1-0.dll 16/01/2025 05:33 PM 15,800 api-ms-win-crt-runtime-l1-1-0.dll 16/01/2025 05:33 PM 17,336 api-ms-win-crt-stdio-l1-1-0.dll 16/01/2025 05:33 PM 17,344 api-ms-win-crt-string-l1-1-0.dll 16/01/2025 05:33 PM 13,760 api-ms-win-crt-time-l1-1-0.dll 16/01/2025 05:33 PM 113,725,888 cublas64_11.dll 16/01/2025 05:33 PM 100,196,800 cublas64_12.dll 16/01/2025 05:33 PM 241,664,960 cublasLt64_11.dll 16/01/2025 05:33 PM 472,810,936 cublasLt64_12.dll 16/01/2025 05:33 PM 434,112 cudart32_110.dll 16/01/2025 05:33 PM 505,792 cudart64_110.dll 16/01/2025 05:33 PM 562,112 cudart64_12.dll 16/01/2025 05:33 PM 661,432 hipblas.dll 16/01/2025 05:33 PM 565,176 msvcp140.dll 16/01/2025 05:33 PM 22,976 msvcp140_1.dll 16/01/2025 05:33 PM 184,768 msvcp140_2.dll 16/01/2025 05:33 PM 55,232 msvcp140_atomic_wait.dll 16/01/2025 05:33 PM 19,392 msvcp140_codecvt_ids.dll 08/02/2025 12:32 PM 0 output.txt 08/02/2025 10:49 AM <DIR> rocblas 16/01/2025 05:33 PM 414,733,240 rocblas.dll 08/02/2025 10:50 AM <DIR> runners 16/01/2025 05:33 PM 96,192 vcruntime140.dll 16/01/2025 05:33 PM 36,288 vcruntime140_1.dll 27 File(s) 1,346,424,128 bytes 4 Dir(s) 1,373,859,561,472 bytes free
Author
Owner

@jmorganca commented on GitHub (Feb 8, 2025):

Hi @vic9827 did you install via the zip file directly or by re-running the 0.5.8-rc11 installer? Thanks so much again

<!-- gh-comment-id:2644417990 --> @jmorganca commented on GitHub (Feb 8, 2025): Hi @vic9827 did you install via the zip file directly or by re-running the 0.5.8-rc11 installer? Thanks so much again
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

v.0.5.8:
Directory of C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama

08/02/2025 12:41 PM

.
08/02/2025 12:40 PM ..
07/02/2025 08:44 PM 15,296 api-ms-win-crt-convert-l1-1-0.dll
07/02/2025 08:44 PM 11,704 api-ms-win-crt-environment-l1-1-0.dll
07/02/2025 08:44 PM 13,248 api-ms-win-crt-filesystem-l1-1-0.dll
07/02/2025 08:44 PM 12,224 api-ms-win-crt-heap-l1-1-0.dll
07/02/2025 08:44 PM 11,712 api-ms-win-crt-locale-l1-1-0.dll
07/02/2025 08:44 PM 20,408 api-ms-win-crt-math-l1-1-0.dll
07/02/2025 08:44 PM 15,800 api-ms-win-crt-runtime-l1-1-0.dll
07/02/2025 08:44 PM 17,336 api-ms-win-crt-stdio-l1-1-0.dll
07/02/2025 08:44 PM 17,344 api-ms-win-crt-string-l1-1-0.dll
07/02/2025 08:44 PM 13,760 api-ms-win-crt-time-l1-1-0.dll
08/02/2025 12:40 PM cuda_v11
08/02/2025 12:41 PM cuda_v12
07/02/2025 08:44 PM 409,024 ggml-base.dll
07/02/2025 08:44 PM 431,032 ggml-cpu-alderlake.dll
07/02/2025 08:44 PM 431,040 ggml-cpu-haswell.dll
07/02/2025 08:44 PM 542,144 ggml-cpu-icelake.dll
07/02/2025 08:44 PM 397,240 ggml-cpu-sandybridge.dll
07/02/2025 08:44 PM 543,168 ggml-cpu-skylakex.dll
07/02/2025 08:44 PM 619,456 msvcp140.dll
07/02/2025 08:44 PM 22,976 msvcp140_1.dll
07/02/2025 08:44 PM 197,568 msvcp140_2.dll
08/02/2025 12:41 PM rocm
07/02/2025 08:44 PM 77,248 vcruntime140.dll
20 File(s) 3,819,728 bytes
5 Dir(s) 1,373,661,609,984 bytes free

V0.5.8 doesn't have a runners folder?

v0.5.7:
Directory of C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\runners

08/02/2025 10:50 AM

.
08/02/2025 12:36 PM ..
08/02/2025 10:50 AM cpu_avx
08/02/2025 10:50 AM cpu_avx2
08/02/2025 10:50 AM cuda_v11_avx
08/02/2025 10:50 AM cuda_v12_avx
08/02/2025 10:51 AM rocm_avx
0 File(s) 0 bytes
7 Dir(s) 1,373,858,648,064 bytes free

<!-- gh-comment-id:2644418245 --> @ghost commented on GitHub (Feb 8, 2025): v.0.5.8: Directory of C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama 08/02/2025 12:41 PM <DIR> . 08/02/2025 12:40 PM <DIR> .. 07/02/2025 08:44 PM 15,296 api-ms-win-crt-convert-l1-1-0.dll 07/02/2025 08:44 PM 11,704 api-ms-win-crt-environment-l1-1-0.dll 07/02/2025 08:44 PM 13,248 api-ms-win-crt-filesystem-l1-1-0.dll 07/02/2025 08:44 PM 12,224 api-ms-win-crt-heap-l1-1-0.dll 07/02/2025 08:44 PM 11,712 api-ms-win-crt-locale-l1-1-0.dll 07/02/2025 08:44 PM 20,408 api-ms-win-crt-math-l1-1-0.dll 07/02/2025 08:44 PM 15,800 api-ms-win-crt-runtime-l1-1-0.dll 07/02/2025 08:44 PM 17,336 api-ms-win-crt-stdio-l1-1-0.dll 07/02/2025 08:44 PM 17,344 api-ms-win-crt-string-l1-1-0.dll 07/02/2025 08:44 PM 13,760 api-ms-win-crt-time-l1-1-0.dll 08/02/2025 12:40 PM <DIR> cuda_v11 08/02/2025 12:41 PM <DIR> cuda_v12 07/02/2025 08:44 PM 409,024 ggml-base.dll 07/02/2025 08:44 PM 431,032 ggml-cpu-alderlake.dll 07/02/2025 08:44 PM 431,040 ggml-cpu-haswell.dll 07/02/2025 08:44 PM 542,144 ggml-cpu-icelake.dll 07/02/2025 08:44 PM 397,240 ggml-cpu-sandybridge.dll 07/02/2025 08:44 PM 543,168 ggml-cpu-skylakex.dll 07/02/2025 08:44 PM 619,456 msvcp140.dll 07/02/2025 08:44 PM 22,976 msvcp140_1.dll 07/02/2025 08:44 PM 197,568 msvcp140_2.dll 08/02/2025 12:41 PM <DIR> rocm 07/02/2025 08:44 PM 77,248 vcruntime140.dll 20 File(s) 3,819,728 bytes 5 Dir(s) 1,373,661,609,984 bytes free V0.5.8 doesn't have a runners folder? v0.5.7: Directory of C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\runners 08/02/2025 10:50 AM <DIR> . 08/02/2025 12:36 PM <DIR> .. 08/02/2025 10:50 AM <DIR> cpu_avx 08/02/2025 10:50 AM <DIR> cpu_avx2 08/02/2025 10:50 AM <DIR> cuda_v11_avx 08/02/2025 10:50 AM <DIR> cuda_v12_avx 08/02/2025 10:51 AM <DIR> rocm_avx 0 File(s) 0 bytes 7 Dir(s) 1,373,858,648,064 bytes free
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

Hi @vic9827 did you install via the zip file directly or by re-running the 0.5.8-rc11 installer? Thanks so much again

reran the installer. the installer was the one signed on Saturday, 8 February 2025 7:56:27 AM

<!-- gh-comment-id:2644419064 --> @ghost commented on GitHub (Feb 8, 2025): > Hi [@vic9827](https://github.com/vic9827) did you install via the zip file directly or by re-running the 0.5.8-rc11 installer? Thanks so much again reran the installer. the installer was the one signed on Saturday, 8 February 2025 7:56:27 AM
Author
Owner

@jmorganca commented on GitHub (Feb 8, 2025):

Thanks @vic9827. A few more things:

  1. It seems like C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama has the right data in it now – possible to try running a model again?
  2. What does ollama -v show for you?
  3. Would it be possible to run Ollama via OLLAMA_DEBUG=1 ollama serve to run Ollama in another terminal window (you may have to stop the app while doing this). This should give us logs with more info (make sure to review since there will be some paths)

Thanks so much

<!-- gh-comment-id:2644419870 --> @jmorganca commented on GitHub (Feb 8, 2025): Thanks @vic9827. A few more things: 1. It seems like `C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama` has the right data in it now – possible to try running a model again? 2. What does `ollama -v` show for you? 3. Would it be possible to run Ollama via `OLLAMA_DEBUG=1 ollama serve` to run Ollama in another terminal window (you may have to stop the app while doing this). This should give us logs with more info (make sure to review since there will be some paths) Thanks so much
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

C:>ollama -v
ollama version is 0.5.8-rc11

server.log of v.0.5.8:

2025/02/08 12:58:52 routes.go:1186: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\XXX\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-08T12:58:52.848+11:00 level=INFO source=images.go:432 msg="total blobs: 35"
time=2025-02-08T12:58:52.849+11:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-08T12:58:52.850+11:00 level=INFO source=routes.go:1237 msg="Listening on 127.0.0.1:11434 (version 0.5.8-rc11)"
time=2025-02-08T12:58:52.850+11:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2025-02-08T12:58:52.850+11:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-02-08T12:58:52.850+11:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-02-08T12:58:52.850+11:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12
time=2025-02-08T12:58:52.852+11:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-02-08T12:58:52.852+11:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvml.dll
time=2025-02-08T12:58:52.852+11:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\nvml.dll C:\Windows\system32\nvml.dll C:\Windows\nvml.dll C:\Windows\System32\Wbem\nvml.dll C:\Windows\System32\WindowsPowerShell\v1.0\nvml.dll C:\Windows\System32\OpenSSH\nvml.dll C:\Program Files\Docker\Docker\resources\bin\nvml.dll C:\Users\XXX\AppData\Local\Microsoft\WindowsApps\nvml.dll C:\Users\XXX\AppData\Local\Programs\Ollama\nvml.dll C:\Users\XXX\AppData\Local\Programs\Ollama\nvml.dll c:\Windows\System32\nvml.dll]"
time=2025-02-08T12:58:52.853+11:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\Windows\system32\nvml.dll c:\Windows\System32\nvml.dll]"
nvmlInit_v2 err: 4
time=2025-02-08T12:58:52.891+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library C:\Windows\system32\nvml.dll: nvml vram init failure: 4"
nvmlInit_v2 err: 4
time=2025-02-08T12:58:52.893+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library c:\Windows\System32\nvml.dll: nvml vram init failure: 4"
time=2025-02-08T12:58:52.893+11:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvcuda.dll
time=2025-02-08T12:58:52.894+11:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\nvcuda.dll C:\Windows\system32\nvcuda.dll C:\Windows\nvcuda.dll C:\Windows\System32\Wbem\nvcuda.dll C:\Windows\System32\WindowsPowerShell\v1.0\nvcuda.dll C:\Windows\System32\OpenSSH\nvcuda.dll C:\Program Files\Docker\Docker\resources\bin\nvcuda.dll C:\Users\XXX\AppData\Local\Microsoft\WindowsApps\nvcuda.dll C:\Users\XXX\AppData\Local\Programs\Ollama\nvcuda.dll C:\Users\XXX\AppData\Local\Programs\Ollama\nvcuda.dll c:\windows\system
\nvcuda.dll]"
time=2025-02-08T12:58:52.895+11:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[C:\Windows\system32\nvcuda.dll]
initializing C:\Windows\system32\nvcuda.dll
dlsym: cuInit - 00007FF8EA530240
dlsym: cuDriverGetVersion - 00007FF8EA5302E0
dlsym: cuDeviceGetCount - 00007FF8EA530AD6
dlsym: cuDeviceGet - 00007FF8EA530AD0
dlsym: cuDeviceGetAttribute - 00007FF8EA530430
dlsym: cuDeviceGetUuid - 00007FF8EA530AE2
dlsym: cuDeviceGetName - 00007FF8EA530ADC
dlsym: cuCtxCreate_v3 - 00007FF8EA530B54
dlsym: cuMemGetInfo_v2 - 00007FF8EA530C56
dlsym: cuCtxDestroy - 00007FF8EA530B66
calling cuInit
cuInit err: 100
time=2025-02-08T12:58:52.941+11:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library C:\Windows\system32\nvcuda.dll"
time=2025-02-08T12:58:52.941+11:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=cudart64_.dll
time=2025-02-08T12:58:52.943+11:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\cudart64_
.dll C:\Windows\system32\cudart64_.dll C:\Windows\cudart64_.dll C:\Windows\System32\Wbem\cudart64_.dll C:\Windows\System32\WindowsPowerShell\v1.0\cudart64_.dll C:\Windows\System32\OpenSSH\cudart64_.dll C:\Program Files\Docker\Docker\resources\bin\cudart64_.dll C:\Users\XXX\AppData\Local\Microsoft\WindowsApps\cudart64_.dll C:\Users\XXX\AppData\Local\Programs\Ollama\cudart64_.dll C:\Users\XXX\AppData\Local\Programs\Ollama\cudart64_.dll C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\cuda_v\cudart64_.dll c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\bin\cudart64_*.dll]"
time=2025-02-08T12:58:52.954+11:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\cuda_v11\cudart64_110.dll C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\cudart64_12.dll]"
cudaSetDevice err: 100
time=2025-02-08T12:58:52.972+11:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\cuda_v11\cudart64_110.dll: cudart init failure: 100"
cudaSetDevice err: 100
time=2025-02-08T12:58:52.974+11:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\cudart64_12.dll: cudart init failure: 100"
time=2025-02-08T12:58:52.985+11:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found."
time=2025-02-08T12:58:52.985+11:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-02-08T12:58:52.985+11:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.7 GiB" available="117.3 GiB"
time=2025-02-08T12:59:11.771+11:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="117.3 GiB" before.free_swap="134.8 GiB" now.total="127.7 GiB" now.free="117.3 GiB" now.free_swap="134.7 GiB"
time=2025-02-08T12:59:11.771+11:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x7ff6b63595a0 gpu_count=1
time=2025-02-08T12:59:11.859+11:00 level=DEBUG source=sched.go:211 msg="cpu mode with first model, loading"
time=2025-02-08T12:59:11.859+11:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="117.3 GiB" before.free_swap="134.7 GiB" now.total="127.7 GiB" now.free="117.3 GiB" now.free_swap="134.7 GiB"
time=2025-02-08T12:59:11.860+11:00 level=INFO source=server.go:100 msg="system memory" total="127.7 GiB" free="117.3 GiB" free_swap="134.7 GiB"
time=2025-02-08T12:59:11.860+11:00 level=DEBUG source=memory.go:107 msg=evaluating library=cpu gpu_count=1 available="[117.3 GiB]"
time=2025-02-08T12:59:11.861+11:00 level=INFO source=memory.go:356 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[117.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[43.2 GiB]" memory.weights.total="40.7 GiB" memory.weights.repeating="39.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
time=2025-02-08T12:59:11.863+11:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[]
time=2025-02-08T12:59:11.888+11:00 level=DEBUG source=gpu.go:695 msg="no filter required for library cpu"
time=2025-02-08T12:59:11.888+11:00 level=INFO source=server.go:381 msg="starting llama server" cmd="C:\Users\XXX\AppData\Local\Programs\Ollama\ollama.exe runner --model C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --ctx-size 8192 --batch-size 512 --verbose --threads 6 --no-mmap --parallel 4 --port 55567"
time=2025-02-08T12:59:11.888+11:00 level=DEBUG source=server.go:399 msg=subprocess environment="[PATH=C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\Docker\Docker\resources\bin;C:\Users\XXX\AppData\Local\Microsoft\WindowsApps;;C:\Users\XXX\AppData\Local\Programs\Ollama;C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama]"
time=2025-02-08T12:59:11.900+11:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-02-08T12:59:11.900+11:00 level=INFO source=server.go:558 msg="waiting for llama runner to start responding"
time=2025-02-08T12:59:11.901+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server error"
time=2025-02-08T12:59:11.964+11:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-02-08T12:59:11.964+11:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(clang)" threads=6
time=2025-02-08T12:59:11.964+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\system32
time=2025-02-08T12:59:11.966+11:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:55567"
time=2025-02-08T12:59:12.037+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows
time=2025-02-08T12:59:12.041+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\System32\Wbem
time=2025-02-08T12:59:12.046+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\System32\WindowsPowerShell\v1.0
time=2025-02-08T12:59:12.047+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\System32\OpenSSH
time=2025-02-08T12:59:12.048+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path="C:\Program Files\Docker\Docker\resources\bin"
time=2025-02-08T12:59:12.049+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Users\XXX\AppData\Local\Microsoft\WindowsApps
time=2025-02-08T12:59:12.050+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Users\XXX\AppData\Local\Programs\Ollama
time=2025-02-08T12:59:12.052+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from C:\Users\XXX.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv 3: general.version str = 2024-12
llama_model_loader: - kv 4: general.finetune str = Instruct
llama_model_loader: - kv 5: general.basename str = Llama-3.1
llama_model_loader: - kv 6: general.size_label str = 70B
llama_model_loader: - kv 7: general.license str = llama3.1
llama_model_loader: - kv 8: general.base_model.count u32 = 1
llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B
llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama
llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv 14: llama.block_count u32 = 80
llama_model_loader: - kv 15: llama.context_length u32 = 131072
llama_model_loader: - kv 16: llama.embedding_length u32 = 8192
llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 18: llama.attention.head_count u32 = 64
llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 22: llama.attention.key_length u32 = 128
llama_model_loader: - kv 23: llama.attention.value_length u32 = 128
llama_model_loader: - kv 24: general.file_type u32 = 15
llama_model_loader: - kv 25: llama.vocab_size u32 = 128256
llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe
time=2025-02-08T12:59:12.154+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 35: general.quantization_version u32 = 2
llama_model_loader: - type f32: 162 tensors
llama_model_loader: - type q4_K: 441 tensors
llama_model_loader: - type q5_K: 40 tensors
llama_model_loader: - type q6_K: 81 tensors
llm_load_vocab: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG
llm_load_vocab: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG
llm_load_vocab: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG
llm_load_vocab: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG
llm_load_vocab: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG
llm_load_vocab: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG
llm_load_vocab: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG
llm_load_vocab: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG
llm_load_vocab: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG
llm_load_vocab: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG
llm_load_vocab: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG
llm_load_vocab: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG
llm_load_vocab: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG
llm_load_vocab: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG
llm_load_vocab: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG
llm_load_vocab: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG
llm_load_vocab: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG
llm_load_vocab: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG
llm_load_vocab: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG
llm_load_vocab: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG
llm_load_vocab: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG
llm_load_vocab: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG
llm_load_vocab: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG
llm_load_vocab: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG
llm_load_vocab: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG
llm_load_vocab: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG
llm_load_vocab: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG
llm_load_vocab: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG
llm_load_vocab: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG
llm_load_vocab: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG
llm_load_vocab: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG
llm_load_vocab: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG
llm_load_vocab: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG
llm_load_vocab: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG
llm_load_vocab: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG
llm_load_vocab: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG
llm_load_vocab: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG
llm_load_vocab: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG
llm_load_vocab: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG
llm_load_vocab: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG
llm_load_vocab: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG
llm_load_vocab: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG
llm_load_vocab: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG
llm_load_vocab: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG
llm_load_vocab: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG
llm_load_vocab: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG
llm_load_vocab: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG
llm_load_vocab: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG
llm_load_vocab: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG
llm_load_vocab: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG
llm_load_vocab: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG
llm_load_vocab: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG
llm_load_vocab: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG
llm_load_vocab: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG
llm_load_vocab: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG
llm_load_vocab: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG
llm_load_vocab: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG
llm_load_vocab: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG
llm_load_vocab: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG
llm_load_vocab: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG
llm_load_vocab: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG
llm_load_vocab: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG
llm_load_vocab: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG
llm_load_vocab: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG
llm_load_vocab: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG
llm_load_vocab: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG
llm_load_vocab: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG
llm_load_vocab: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG
llm_load_vocab: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG
llm_load_vocab: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG
llm_load_vocab: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG
llm_load_vocab: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG
llm_load_vocab: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG
llm_load_vocab: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG
llm_load_vocab: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG
llm_load_vocab: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG
llm_load_vocab: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG
llm_load_vocab: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG
llm_load_vocab: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG
llm_load_vocab: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG
llm_load_vocab: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG
llm_load_vocab: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG
llm_load_vocab: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG
llm_load_vocab: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG
llm_load_vocab: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG
llm_load_vocab: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG
llm_load_vocab: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG
llm_load_vocab: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG
llm_load_vocab: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG
llm_load_vocab: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG
llm_load_vocab: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG
llm_load_vocab: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG
llm_load_vocab: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG
llm_load_vocab: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG
llm_load_vocab: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG
llm_load_vocab: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG
llm_load_vocab: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG
llm_load_vocab: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG
llm_load_vocab: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG
llm_load_vocab: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG
llm_load_vocab: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG
llm_load_vocab: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG
llm_load_vocab: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG
llm_load_vocab: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG
llm_load_vocab: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG
llm_load_vocab: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG
llm_load_vocab: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG
llm_load_vocab: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG
llm_load_vocab: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG
llm_load_vocab: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG
llm_load_vocab: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG
llm_load_vocab: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG
llm_load_vocab: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG
llm_load_vocab: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG
llm_load_vocab: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG
llm_load_vocab: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG
llm_load_vocab: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG
llm_load_vocab: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG
llm_load_vocab: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG
llm_load_vocab: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG
llm_load_vocab: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG
llm_load_vocab: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG
llm_load_vocab: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG
llm_load_vocab: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG
llm_load_vocab: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG
llm_load_vocab: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG
llm_load_vocab: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG
llm_load_vocab: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG
llm_load_vocab: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG
llm_load_vocab: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG
llm_load_vocab: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG
llm_load_vocab: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG
llm_load_vocab: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG
llm_load_vocab: control token: 128010 '<|python_tag|>' is not marked as EOG
llm_load_vocab: control token: 128006 '<|start_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
llm_load_vocab: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
llm_load_vocab: control token: 128000 '<|begin_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG
llm_load_vocab: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG
llm_load_vocab: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG
llm_load_vocab: control token: 128007 '<|end_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG
llm_load_vocab: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG
llm_load_vocab: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG
llm_load_vocab: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG
llm_load_vocab: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG
llm_load_vocab: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG
llm_load_vocab: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG
llm_load_vocab: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG
llm_load_vocab: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG
llm_load_vocab: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG
llm_load_vocab: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG
llm_load_vocab: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG
llm_load_vocab: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG
llm_load_vocab: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG
llm_load_vocab: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG
llm_load_vocab: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG
llm_load_vocab: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG
llm_load_vocab: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG
llm_load_vocab: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG
llm_load_vocab: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG
llm_load_vocab: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG
llm_load_vocab: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
llm_load_vocab: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG
llm_load_vocab: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG
llm_load_vocab: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG
llm_load_vocab: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG
llm_load_vocab: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG
llm_load_vocab: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG
llm_load_vocab: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG
llm_load_vocab: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG
llm_load_vocab: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG
llm_load_vocab: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG
llm_load_vocab: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG
llm_load_vocab: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG
llm_load_vocab: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG
llm_load_vocab: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG
llm_load_vocab: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG
llm_load_vocab: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG
llm_load_vocab: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG
llm_load_vocab: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG
llm_load_vocab: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG
llm_load_vocab: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG
llm_load_vocab: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG
llm_load_vocab: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG
llm_load_vocab: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG
llm_load_vocab: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG
llm_load_vocab: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG
llm_load_vocab: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG
llm_load_vocab: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG
llm_load_vocab: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG
llm_load_vocab: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG
llm_load_vocab: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG
llm_load_vocab: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG
llm_load_vocab: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG
llm_load_vocab: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG
llm_load_vocab: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG
llm_load_vocab: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG
llm_load_vocab: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG
llm_load_vocab: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG
llm_load_vocab: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG
llm_load_vocab: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG
llm_load_vocab: control token: 128001 '<|end_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG
llm_load_vocab: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG
llm_load_vocab: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG
llm_load_vocab: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG
llm_load_vocab: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG
llm_load_vocab: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG
llm_load_vocab: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG
llm_load_vocab: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG
llm_load_vocab: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG
llm_load_vocab: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG
llm_load_vocab: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG
llm_load_vocab: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG
llm_load_vocab: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG
llm_load_vocab: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG
llm_load_vocab: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG
llm_load_vocab: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG
llm_load_vocab: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG
llm_load_vocab: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG
llm_load_vocab: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG
llm_load_vocab: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG
llm_load_vocab: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG
llm_load_vocab: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG
llm_load_vocab: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG
llm_load_vocab: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG
llm_load_vocab: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG
llm_load_vocab: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG
llm_load_vocab: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG
llm_load_vocab: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG
llm_load_vocab: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG
llm_load_vocab: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG
llm_load_vocab: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG
llm_load_vocab: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG
llm_load_vocab: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG
llm_load_vocab: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG
llm_load_vocab: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG
llm_load_vocab: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG
llm_load_vocab: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG
llm_load_vocab: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG
llm_load_vocab: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG
llm_load_vocab: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG
llm_load_vocab: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG
llm_load_vocab: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG
llm_load_vocab: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG
llm_load_vocab: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG
llm_load_vocab: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG
llm_load_vocab: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG
llm_load_vocab: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG
llm_load_vocab: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG
llm_load_vocab: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG
llm_load_vocab: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 39.59 GiB (4.82 BPW)
llm_load_print_meta: general.name = Llama 3.1 70B Instruct 2024 12
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: CPU model buffer size = 40543.11 MiB
load_all_data: no device found for buffer type CPU for async uploads
time=2025-02-08T12:59:12.908+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.01"
time=2025-02-08T12:59:13.159+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.04"
time=2025-02-08T12:59:13.410+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.05"
time=2025-02-08T12:59:13.662+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.07"
time=2025-02-08T12:59:13.913+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.09"
time=2025-02-08T12:59:14.163+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.11"
time=2025-02-08T12:59:14.413+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.12"
time=2025-02-08T12:59:14.664+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.14"
time=2025-02-08T12:59:14.916+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.15"
time=2025-02-08T12:59:15.167+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.17"
time=2025-02-08T12:59:15.418+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.19"
time=2025-02-08T12:59:15.669+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.21"
time=2025-02-08T12:59:15.919+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.23"
time=2025-02-08T12:59:16.170+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.24"
time=2025-02-08T12:59:16.421+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.26"
time=2025-02-08T12:59:16.671+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.28"
time=2025-02-08T12:59:16.924+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.30"
time=2025-02-08T12:59:17.174+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.31"
time=2025-02-08T12:59:17.426+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.33"
time=2025-02-08T12:59:17.676+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.35"
time=2025-02-08T12:59:17.928+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.37"
time=2025-02-08T12:59:18.178+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.38"
time=2025-02-08T12:59:18.429+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.40"
time=2025-02-08T12:59:18.679+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.42"
time=2025-02-08T12:59:18.931+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.44"
time=2025-02-08T12:59:19.182+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.45"
time=2025-02-08T12:59:19.433+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.47"
time=2025-02-08T12:59:19.684+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.49"
time=2025-02-08T12:59:19.934+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.51"
time=2025-02-08T12:59:20.186+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.52"
time=2025-02-08T12:59:20.437+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.54"
time=2025-02-08T12:59:20.688+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.56"
time=2025-02-08T12:59:20.939+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.58"
time=2025-02-08T12:59:21.189+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.59"
time=2025-02-08T12:59:21.441+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.61"
time=2025-02-08T12:59:21.692+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.63"
time=2025-02-08T12:59:21.943+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.65"
time=2025-02-08T12:59:22.128+11:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=C:\Users\XXX.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d
time=2025-02-08T12:59:22.195+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.66"
time=2025-02-08T12:59:22.446+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.68"
time=2025-02-08T12:59:22.697+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.70"
time=2025-02-08T12:59:22.948+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.72"
time=2025-02-08T12:59:23.199+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.73"
time=2025-02-08T12:59:23.449+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.75"
time=2025-02-08T12:59:23.701+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.77"
time=2025-02-08T12:59:23.952+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.79"
time=2025-02-08T12:59:24.203+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.80"
time=2025-02-08T12:59:24.454+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.82"
time=2025-02-08T12:59:24.705+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.84"
time=2025-02-08T12:59:24.956+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.85"
time=2025-02-08T12:59:25.208+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.87"
time=2025-02-08T12:59:25.458+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.89"
time=2025-02-08T12:59:25.709+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.91"
time=2025-02-08T12:59:25.960+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.92"
time=2025-02-08T12:59:26.211+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.94"
time=2025-02-08T12:59:26.462+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.95"
time=2025-02-08T12:59:26.714+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.98"
time=2025-02-08T12:59:26.965+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.99"
llama_new_context_with_model: n_seq_max = 4
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 80, can_shift = 1
llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 64: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 65: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 66: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 67: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 68: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 69: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 70: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 71: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 72: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 73: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 74: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 75: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 76: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 77: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 78: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 79: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
time=2025-02-08T12:59:27.216+11:00 level=DEBUG source=server.go:603 msg="model load progress 1.00"
time=2025-02-08T12:59:27.467+11:00 level=DEBUG source=server.go:606 msg="model load completed, waiting for server to become available" status="llm server loading model"
llama_kv_cache_init: CPU KV buffer size = 2560.00 MiB
llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model: CPU output buffer size = 2.08 MiB
llama_new_context_with_model: CPU compute buffer size = 1104.01 MiB
llama_new_context_with_model: graph nodes = 2566
llama_new_context_with_model: graph splits = 1
time=2025-02-08T12:59:27.718+11:00 level=INFO source=server.go:597 msg="llama runner started in 15.82 seconds"
time=2025-02-08T12:59:27.718+11:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=C:\Users\XXX.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d
time=2025-02-08T12:59:27.719+11:00 level=DEBUG source=routes.go:1461 msg="chat request" images=0 prompt="<|start_header_id|>user<|end_header_id|>\n\n### Task:\nYou are an autocompletion system. Continue the text in <text> based on the completion type in <type> and the given language. \n\n### Instructions:\n1. Analyze <text> for context and meaning. \n2. Use <type> to guide your output: \n - General: Provide a natural, concise continuation. \n - Search Query: Complete as if generating a realistic search query. \n3. Start as if you are directly continuing <text>. Do not repeat, paraphrase, or respond as a model. Simply complete the text. \n4. Ensure the continuation:\n - Flows naturally from <text>. \n - Avoids repetition, overexplaining, or unrelated ideas. \n5. If unsure, return: { \"text\": \"\" }. \n\n### Output Rules:\n- Respond only in JSON format: { \"text\": \"<your_completion>\" }.\n\n### Examples:\n#### Example 1: \nInput: \nGeneral \nThe sun was setting over the horizon, painting the sky \nOutput: \n{ "text": "with vibrant shades of orange and pink." }\n\n#### Example 2: \nInput: \nSearch Query \nTop-rated restaurants in \nOutput: \n{ "text": "New York City for Italian cuisine." } \n\n---\n### Context:\n<chat_history>\n\n</chat_history>\nsearch query \nOLLAMA_DEBUG=1 \n#### Output:\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2025-02-08T12:59:27.719+11:00 level=DEBUG source=routes.go:1461 msg="chat request" images=0 prompt="<|start_header_id|>user<|end_header_id|>\n\n XXXXXXXXXXXXXXXXXXXX<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2025-02-08T12:59:27.722+11:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=85 used=0 remaining=85

<!-- gh-comment-id:2644428602 --> @ghost commented on GitHub (Feb 8, 2025): C:>ollama -v ollama version is 0.5.8-rc11 server.log of v.0.5.8: > 2025/02/08 12:58:52 routes.go:1186: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-08T12:58:52.848+11:00 level=INFO source=images.go:432 msg="total blobs: 35" time=2025-02-08T12:58:52.849+11:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-08T12:58:52.850+11:00 level=INFO source=routes.go:1237 msg="Listening on 127.0.0.1:11434 (version 0.5.8-rc11)" time=2025-02-08T12:58:52.850+11:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2025-02-08T12:58:52.850+11:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-02-08T12:58:52.850+11:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-02-08T12:58:52.850+11:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12 time=2025-02-08T12:58:52.852+11:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-02-08T12:58:52.852+11:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvml.dll time=2025-02-08T12:58:52.852+11:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\Windows\\system32\\nvml.dll C:\\Windows\\nvml.dll C:\\Windows\\System32\\Wbem\\nvml.dll C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\Windows\\System32\\OpenSSH\\nvml.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvml.dll C:\\Users\\XXX\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-02-08T12:58:52.853+11:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\\Windows\\system32\\nvml.dll c:\\Windows\\System32\\nvml.dll]" nvmlInit_v2 err: 4 time=2025-02-08T12:58:52.891+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library C:\\Windows\\system32\\nvml.dll: nvml vram init failure: 4" nvmlInit_v2 err: 4 time=2025-02-08T12:58:52.893+11:00 level=INFO source=gpu.go:636 msg="Unable to load NVML management library c:\\Windows\\System32\\nvml.dll: nvml vram init failure: 4" time=2025-02-08T12:58:52.893+11:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvcuda.dll time=2025-02-08T12:58:52.894+11:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\Windows\\system32\\nvcuda.dll C:\\Windows\\nvcuda.dll C:\\Windows\\System32\\Wbem\\nvcuda.dll C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\Windows\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvcuda.dll C:\\Users\\XXX\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]" time=2025-02-08T12:58:52.895+11:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[C:\Windows\system32\nvcuda.dll] initializing C:\Windows\system32\nvcuda.dll dlsym: cuInit - 00007FF8EA530240 dlsym: cuDriverGetVersion - 00007FF8EA5302E0 dlsym: cuDeviceGetCount - 00007FF8EA530AD6 dlsym: cuDeviceGet - 00007FF8EA530AD0 dlsym: cuDeviceGetAttribute - 00007FF8EA530430 dlsym: cuDeviceGetUuid - 00007FF8EA530AE2 dlsym: cuDeviceGetName - 00007FF8EA530ADC dlsym: cuCtxCreate_v3 - 00007FF8EA530B54 dlsym: cuMemGetInfo_v2 - 00007FF8EA530C56 dlsym: cuCtxDestroy - 00007FF8EA530B66 calling cuInit cuInit err: 100 time=2025-02-08T12:58:52.941+11:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library C:\\Windows\\system32\\nvcuda.dll" time=2025-02-08T12:58:52.941+11:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=cudart64_*.dll time=2025-02-08T12:58:52.943+11:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\Windows\\system32\\cudart64_*.dll C:\\Windows\\cudart64_*.dll C:\\Windows\\System32\\Wbem\\cudart64_*.dll C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\Windows\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\cudart64_*.dll C:\\Users\\XXX\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]" time=2025-02-08T12:58:52.954+11:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll]" cudaSetDevice err: 100 time=2025-02-08T12:58:52.972+11:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll: cudart init failure: 100" cudaSetDevice err: 100 time=2025-02-08T12:58:52.974+11:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: cudart init failure: 100" time=2025-02-08T12:58:52.985+11:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found." time=2025-02-08T12:58:52.985+11:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-02-08T12:58:52.985+11:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.7 GiB" available="117.3 GiB" time=2025-02-08T12:59:11.771+11:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="117.3 GiB" before.free_swap="134.8 GiB" now.total="127.7 GiB" now.free="117.3 GiB" now.free_swap="134.7 GiB" time=2025-02-08T12:59:11.771+11:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x7ff6b63595a0 gpu_count=1 time=2025-02-08T12:59:11.859+11:00 level=DEBUG source=sched.go:211 msg="cpu mode with first model, loading" time=2025-02-08T12:59:11.859+11:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="117.3 GiB" before.free_swap="134.7 GiB" now.total="127.7 GiB" now.free="117.3 GiB" now.free_swap="134.7 GiB" time=2025-02-08T12:59:11.860+11:00 level=INFO source=server.go:100 msg="system memory" total="127.7 GiB" free="117.3 GiB" free_swap="134.7 GiB" time=2025-02-08T12:59:11.860+11:00 level=DEBUG source=memory.go:107 msg=evaluating library=cpu gpu_count=1 available="[117.3 GiB]" time=2025-02-08T12:59:11.861+11:00 level=INFO source=memory.go:356 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[117.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[43.2 GiB]" memory.weights.total="40.7 GiB" memory.weights.repeating="39.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" time=2025-02-08T12:59:11.863+11:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[] time=2025-02-08T12:59:11.888+11:00 level=DEBUG source=gpu.go:695 msg="no filter required for library cpu" time=2025-02-08T12:59:11.888+11:00 level=INFO source=server.go:381 msg="starting llama server" cmd="C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\XXX\\.ollama\\models\\blobs\\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --ctx-size 8192 --batch-size 512 --verbose --threads 6 --no-mmap --parallel 4 --port 55567" time=2025-02-08T12:59:11.888+11:00 level=DEBUG source=server.go:399 msg=subprocess environment="[PATH=C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Users\\XXX\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama;C:\\Users\\XXX\\AppData\\Local\\Programs\\Ollama\\lib\\ollama]" time=2025-02-08T12:59:11.900+11:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2025-02-08T12:59:11.900+11:00 level=INFO source=server.go:558 msg="waiting for llama runner to start responding" time=2025-02-08T12:59:11.901+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server error" time=2025-02-08T12:59:11.964+11:00 level=INFO source=runner.go:936 msg="starting go runner" time=2025-02-08T12:59:11.964+11:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(clang)" threads=6 time=2025-02-08T12:59:11.964+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\system32 time=2025-02-08T12:59:11.966+11:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:55567" time=2025-02-08T12:59:12.037+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows time=2025-02-08T12:59:12.041+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\System32\Wbem time=2025-02-08T12:59:12.046+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\System32\WindowsPowerShell\v1.0 time=2025-02-08T12:59:12.047+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Windows\System32\OpenSSH time=2025-02-08T12:59:12.048+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path="C:\\Program Files\\Docker\\Docker\\resources\\bin" time=2025-02-08T12:59:12.049+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Users\XXX\AppData\Local\Microsoft\WindowsApps time=2025-02-08T12:59:12.050+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Users\XXX\AppData\Local\Programs\Ollama time=2025-02-08T12:59:12.052+11:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=C:\Users\XXX\AppData\Local\Programs\Ollama\lib\ollama llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe time=2025-02-08T12:59:12.154+11:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors llm_load_vocab: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG llm_load_vocab: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG llm_load_vocab: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG llm_load_vocab: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG llm_load_vocab: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG llm_load_vocab: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG llm_load_vocab: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG llm_load_vocab: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG llm_load_vocab: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG llm_load_vocab: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG llm_load_vocab: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG llm_load_vocab: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG llm_load_vocab: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG llm_load_vocab: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG llm_load_vocab: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG llm_load_vocab: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG llm_load_vocab: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG llm_load_vocab: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG llm_load_vocab: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG llm_load_vocab: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG llm_load_vocab: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG llm_load_vocab: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG llm_load_vocab: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG llm_load_vocab: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG llm_load_vocab: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG llm_load_vocab: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG llm_load_vocab: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG llm_load_vocab: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG llm_load_vocab: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG llm_load_vocab: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG llm_load_vocab: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG llm_load_vocab: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG llm_load_vocab: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG llm_load_vocab: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG llm_load_vocab: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG llm_load_vocab: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG llm_load_vocab: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG llm_load_vocab: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG llm_load_vocab: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG llm_load_vocab: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG llm_load_vocab: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG llm_load_vocab: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG llm_load_vocab: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG llm_load_vocab: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG llm_load_vocab: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG llm_load_vocab: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG llm_load_vocab: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG llm_load_vocab: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG llm_load_vocab: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG llm_load_vocab: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG llm_load_vocab: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG llm_load_vocab: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG llm_load_vocab: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG llm_load_vocab: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG llm_load_vocab: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG llm_load_vocab: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG llm_load_vocab: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG llm_load_vocab: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG llm_load_vocab: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG llm_load_vocab: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG llm_load_vocab: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG llm_load_vocab: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG llm_load_vocab: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG llm_load_vocab: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG llm_load_vocab: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG llm_load_vocab: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG llm_load_vocab: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG llm_load_vocab: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG llm_load_vocab: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG llm_load_vocab: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG llm_load_vocab: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG llm_load_vocab: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG llm_load_vocab: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG llm_load_vocab: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG llm_load_vocab: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG llm_load_vocab: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG llm_load_vocab: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG llm_load_vocab: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG llm_load_vocab: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG llm_load_vocab: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG llm_load_vocab: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG llm_load_vocab: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG llm_load_vocab: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG llm_load_vocab: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG llm_load_vocab: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG llm_load_vocab: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG llm_load_vocab: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG llm_load_vocab: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG llm_load_vocab: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG llm_load_vocab: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG llm_load_vocab: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG llm_load_vocab: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG llm_load_vocab: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG llm_load_vocab: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG llm_load_vocab: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG llm_load_vocab: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG llm_load_vocab: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG llm_load_vocab: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG llm_load_vocab: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG llm_load_vocab: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG llm_load_vocab: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG llm_load_vocab: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG llm_load_vocab: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG llm_load_vocab: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG llm_load_vocab: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG llm_load_vocab: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG llm_load_vocab: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG llm_load_vocab: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG llm_load_vocab: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG llm_load_vocab: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG llm_load_vocab: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG llm_load_vocab: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG llm_load_vocab: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG llm_load_vocab: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG llm_load_vocab: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG llm_load_vocab: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG llm_load_vocab: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG llm_load_vocab: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG llm_load_vocab: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG llm_load_vocab: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG llm_load_vocab: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG llm_load_vocab: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG llm_load_vocab: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG llm_load_vocab: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG llm_load_vocab: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG llm_load_vocab: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG llm_load_vocab: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG llm_load_vocab: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG llm_load_vocab: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG llm_load_vocab: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG llm_load_vocab: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG llm_load_vocab: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG llm_load_vocab: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG llm_load_vocab: control token: 128010 '<|python_tag|>' is not marked as EOG llm_load_vocab: control token: 128006 '<|start_header_id|>' is not marked as EOG llm_load_vocab: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG llm_load_vocab: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG llm_load_vocab: control token: 128000 '<|begin_of_text|>' is not marked as EOG llm_load_vocab: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG llm_load_vocab: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG llm_load_vocab: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG llm_load_vocab: control token: 128007 '<|end_header_id|>' is not marked as EOG llm_load_vocab: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG llm_load_vocab: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG llm_load_vocab: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG llm_load_vocab: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG llm_load_vocab: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG llm_load_vocab: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG llm_load_vocab: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG llm_load_vocab: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG llm_load_vocab: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG llm_load_vocab: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG llm_load_vocab: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG llm_load_vocab: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG llm_load_vocab: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG llm_load_vocab: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG llm_load_vocab: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG llm_load_vocab: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG llm_load_vocab: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG llm_load_vocab: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG llm_load_vocab: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG llm_load_vocab: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG llm_load_vocab: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG llm_load_vocab: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG llm_load_vocab: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG llm_load_vocab: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG llm_load_vocab: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG llm_load_vocab: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG llm_load_vocab: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG llm_load_vocab: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG llm_load_vocab: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG llm_load_vocab: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG llm_load_vocab: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG llm_load_vocab: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG llm_load_vocab: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG llm_load_vocab: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG llm_load_vocab: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG llm_load_vocab: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG llm_load_vocab: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG llm_load_vocab: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG llm_load_vocab: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG llm_load_vocab: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG llm_load_vocab: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG llm_load_vocab: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG llm_load_vocab: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG llm_load_vocab: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG llm_load_vocab: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG llm_load_vocab: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG llm_load_vocab: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG llm_load_vocab: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG llm_load_vocab: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG llm_load_vocab: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG llm_load_vocab: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG llm_load_vocab: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG llm_load_vocab: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG llm_load_vocab: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG llm_load_vocab: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG llm_load_vocab: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG llm_load_vocab: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG llm_load_vocab: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG llm_load_vocab: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG llm_load_vocab: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG llm_load_vocab: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG llm_load_vocab: control token: 128001 '<|end_of_text|>' is not marked as EOG llm_load_vocab: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG llm_load_vocab: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG llm_load_vocab: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG llm_load_vocab: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG llm_load_vocab: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG llm_load_vocab: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG llm_load_vocab: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG llm_load_vocab: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG llm_load_vocab: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG llm_load_vocab: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG llm_load_vocab: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG llm_load_vocab: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG llm_load_vocab: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG llm_load_vocab: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG llm_load_vocab: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG llm_load_vocab: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG llm_load_vocab: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG llm_load_vocab: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG llm_load_vocab: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG llm_load_vocab: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG llm_load_vocab: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG llm_load_vocab: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG llm_load_vocab: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG llm_load_vocab: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG llm_load_vocab: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG llm_load_vocab: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG llm_load_vocab: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG llm_load_vocab: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG llm_load_vocab: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG llm_load_vocab: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG llm_load_vocab: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG llm_load_vocab: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG llm_load_vocab: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG llm_load_vocab: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG llm_load_vocab: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG llm_load_vocab: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG llm_load_vocab: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG llm_load_vocab: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG llm_load_vocab: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG llm_load_vocab: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG llm_load_vocab: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG llm_load_vocab: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG llm_load_vocab: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG llm_load_vocab: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG llm_load_vocab: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG llm_load_vocab: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG llm_load_vocab: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG llm_load_vocab: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG llm_load_vocab: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG llm_load_vocab: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 39.59 GiB (4.82 BPW) llm_load_print_meta: general.name = Llama 3.1 70B Instruct 2024 12 llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: CPU model buffer size = 40543.11 MiB load_all_data: no device found for buffer type CPU for async uploads time=2025-02-08T12:59:12.908+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.01" time=2025-02-08T12:59:13.159+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.04" time=2025-02-08T12:59:13.410+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.05" time=2025-02-08T12:59:13.662+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.07" time=2025-02-08T12:59:13.913+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.09" time=2025-02-08T12:59:14.163+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.11" time=2025-02-08T12:59:14.413+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.12" time=2025-02-08T12:59:14.664+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.14" time=2025-02-08T12:59:14.916+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.15" time=2025-02-08T12:59:15.167+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.17" time=2025-02-08T12:59:15.418+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.19" time=2025-02-08T12:59:15.669+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.21" time=2025-02-08T12:59:15.919+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.23" time=2025-02-08T12:59:16.170+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.24" time=2025-02-08T12:59:16.421+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.26" time=2025-02-08T12:59:16.671+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.28" time=2025-02-08T12:59:16.924+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.30" time=2025-02-08T12:59:17.174+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.31" time=2025-02-08T12:59:17.426+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.33" time=2025-02-08T12:59:17.676+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.35" time=2025-02-08T12:59:17.928+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.37" time=2025-02-08T12:59:18.178+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.38" time=2025-02-08T12:59:18.429+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.40" time=2025-02-08T12:59:18.679+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.42" time=2025-02-08T12:59:18.931+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.44" time=2025-02-08T12:59:19.182+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.45" time=2025-02-08T12:59:19.433+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.47" time=2025-02-08T12:59:19.684+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.49" time=2025-02-08T12:59:19.934+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.51" time=2025-02-08T12:59:20.186+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.52" time=2025-02-08T12:59:20.437+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.54" time=2025-02-08T12:59:20.688+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.56" time=2025-02-08T12:59:20.939+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.58" time=2025-02-08T12:59:21.189+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.59" time=2025-02-08T12:59:21.441+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.61" time=2025-02-08T12:59:21.692+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.63" time=2025-02-08T12:59:21.943+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.65" time=2025-02-08T12:59:22.128+11:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d time=2025-02-08T12:59:22.195+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.66" time=2025-02-08T12:59:22.446+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.68" time=2025-02-08T12:59:22.697+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.70" time=2025-02-08T12:59:22.948+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.72" time=2025-02-08T12:59:23.199+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.73" time=2025-02-08T12:59:23.449+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.75" time=2025-02-08T12:59:23.701+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.77" time=2025-02-08T12:59:23.952+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.79" time=2025-02-08T12:59:24.203+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.80" time=2025-02-08T12:59:24.454+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.82" time=2025-02-08T12:59:24.705+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.84" time=2025-02-08T12:59:24.956+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.85" time=2025-02-08T12:59:25.208+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.87" time=2025-02-08T12:59:25.458+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.89" time=2025-02-08T12:59:25.709+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.91" time=2025-02-08T12:59:25.960+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.92" time=2025-02-08T12:59:26.211+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.94" time=2025-02-08T12:59:26.462+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.95" time=2025-02-08T12:59:26.714+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.98" time=2025-02-08T12:59:26.965+11:00 level=DEBUG source=server.go:603 msg="model load progress 0.99" llama_new_context_with_model: n_seq_max = 4 llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 80, can_shift = 1 llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 64: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 65: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 66: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 67: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 68: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 69: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 70: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 71: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 72: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 73: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 74: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 75: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 76: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 77: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 78: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 79: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 time=2025-02-08T12:59:27.216+11:00 level=DEBUG source=server.go:603 msg="model load progress 1.00" time=2025-02-08T12:59:27.467+11:00 level=DEBUG source=server.go:606 msg="model load completed, waiting for server to become available" status="llm server loading model" llama_kv_cache_init: CPU KV buffer size = 2560.00 MiB llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB llama_new_context_with_model: CPU output buffer size = 2.08 MiB llama_new_context_with_model: CPU compute buffer size = 1104.01 MiB llama_new_context_with_model: graph nodes = 2566 llama_new_context_with_model: graph splits = 1 time=2025-02-08T12:59:27.718+11:00 level=INFO source=server.go:597 msg="llama runner started in 15.82 seconds" time=2025-02-08T12:59:27.718+11:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=C:\Users\XXX\.ollama\models\blobs\sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d time=2025-02-08T12:59:27.719+11:00 level=DEBUG source=routes.go:1461 msg="chat request" images=0 prompt="<|start_header_id|>user<|end_header_id|>\n\n### Task:\nYou are an autocompletion system. Continue the text in `<text>` based on the **completion type** in `<type>` and the given language. \n\n### **Instructions**:\n1. Analyze `<text>` for context and meaning. \n2. Use `<type>` to guide your output: \n - **General**: Provide a natural, concise continuation. \n - **Search Query**: Complete as if generating a realistic search query. \n3. Start as if you are directly continuing `<text>`. Do **not** repeat, paraphrase, or respond as a model. Simply complete the text. \n4. Ensure the continuation:\n - Flows naturally from `<text>`. \n - Avoids repetition, overexplaining, or unrelated ideas. \n5. If unsure, return: `{ \"text\": \"\" }`. \n\n### **Output Rules**:\n- Respond only in JSON format: `{ \"text\": \"<your_completion>\" }`.\n\n### **Examples**:\n#### Example 1: \nInput: \n<type>General</type> \n<text>The sun was setting over the horizon, painting the sky</text> \nOutput: \n{ \"text\": \"with vibrant shades of orange and pink.\" }\n\n#### Example 2: \nInput: \n<type>Search Query</type> \n<text>Top-rated restaurants in</text> \nOutput: \n{ \"text\": \"New York City for Italian cuisine.\" } \n\n---\n### Context:\n<chat_history>\n\n</chat_history>\n<type>search query</type> \n<text>OLLAMA_DEBUG=1</text> \n#### Output:\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" time=2025-02-08T12:59:27.719+11:00 level=DEBUG source=routes.go:1461 msg="chat request" images=0 prompt="<|start_header_id|>user<|end_header_id|>\n\n XXXXXXXXXXXXXXXXXXXX<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" time=2025-02-08T12:59:27.722+11:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=85 used=0 remaining=85
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

v0.5.8 app.log

time=2025-02-08T12:58:52.742+11:00 level=INFO source=logging.go:50 msg="ollama app started"
time=2025-02-08T12:58:52.743+11:00 level=INFO source=lifecycle.go:19 msg="app config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\XXX\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-08T12:58:52.771+11:00 level=DEBUG source=lifecycle.go:34 msg="starting callback loop"
time=2025-02-08T12:58:52.771+11:00 level=DEBUG source=store.go:60 msg="loaded existing store C:\Users\XXX\AppData\Local\Ollama\config.json - ID: 6d1b25fa-4845-4505-9ee6-373722b790a6"
time=2025-02-08T12:58:52.771+11:00 level=DEBUG source=lifecycle.go:68 msg="Not first time, skipping first run notification"
time=2025-02-08T12:58:52.773+11:00 level=DEBUG source=server.go:181 msg="heartbeat from server: Head "http://127.0.0.1:11434/": dial tcp 127.0.0.1:11434: connectex: No connection could be made because the target machine actively refused it."
time=2025-02-08T12:58:52.773+11:00 level=INFO source=server.go:182 msg="unable to connect to server"
time=2025-02-08T12:58:52.773+11:00 level=DEBUG source=eventloop.go:22 msg="starting event handling loop"
time=2025-02-08T12:58:52.773+11:00 level=INFO source=server.go:141 msg="starting server..."
time=2025-02-08T12:58:52.782+11:00 level=INFO source=server.go:127 msg="started ollama server with pid 12696"
time=2025-02-08T12:58:52.782+11:00 level=INFO source=server.go:129 msg="ollama server logs C:\Users\XXX\AppData\Local\Ollama\server.log"
time=2025-02-08T12:58:54.848+11:00 level=DEBUG source=eventloop.go:145 msg="unmanaged app message, lParm: 0x204"
time=2025-02-08T12:58:55.778+11:00 level=DEBUG source=updater.go:74 msg="checking for available update" requestURL="https://ollama.com/api/update?arch=amd64&nonce=lCI0WQwLU_03jz5AKxS95w&os=windows&ts=1738979935&version=0.5.8-rc11"
time=2025-02-08T12:58:55.785+11:00 level=DEBUG source=logging_windows.go:12 msg="viewing logs with start C:\Users\XXX\AppData\Local\Ollama"
time=2025-02-08T12:58:56.174+11:00 level=DEBUG source=updater.go:83 msg="check update response 204 (current version is up to date)"

<!-- gh-comment-id:2644436982 --> @ghost commented on GitHub (Feb 8, 2025): v0.5.8 app.log > time=2025-02-08T12:58:52.742+11:00 level=INFO source=logging.go:50 msg="ollama app started" time=2025-02-08T12:58:52.743+11:00 level=INFO source=lifecycle.go:19 msg="app config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-08T12:58:52.771+11:00 level=DEBUG source=lifecycle.go:34 msg="starting callback loop" time=2025-02-08T12:58:52.771+11:00 level=DEBUG source=store.go:60 msg="loaded existing store C:\\Users\\XXX\\AppData\\Local\\Ollama\\config.json - ID: 6d1b25fa-4845-4505-9ee6-373722b790a6" time=2025-02-08T12:58:52.771+11:00 level=DEBUG source=lifecycle.go:68 msg="Not first time, skipping first run notification" time=2025-02-08T12:58:52.773+11:00 level=DEBUG source=server.go:181 msg="heartbeat from server: Head \"http://127.0.0.1:11434/\": dial tcp 127.0.0.1:11434: connectex: No connection could be made because the target machine actively refused it." time=2025-02-08T12:58:52.773+11:00 level=INFO source=server.go:182 msg="unable to connect to server" time=2025-02-08T12:58:52.773+11:00 level=DEBUG source=eventloop.go:22 msg="starting event handling loop" time=2025-02-08T12:58:52.773+11:00 level=INFO source=server.go:141 msg="starting server..." time=2025-02-08T12:58:52.782+11:00 level=INFO source=server.go:127 msg="started ollama server with pid 12696" time=2025-02-08T12:58:52.782+11:00 level=INFO source=server.go:129 msg="ollama server logs C:\\Users\\XXX\\AppData\\Local\\Ollama\\server.log" time=2025-02-08T12:58:54.848+11:00 level=DEBUG source=eventloop.go:145 msg="unmanaged app message, lParm: 0x204" time=2025-02-08T12:58:55.778+11:00 level=DEBUG source=updater.go:74 msg="checking for available update" requestURL="https://ollama.com/api/update?arch=amd64&nonce=lCI0WQwLU_03jz5AKxS95w&os=windows&ts=1738979935&version=0.5.8-rc11" time=2025-02-08T12:58:55.785+11:00 level=DEBUG source=logging_windows.go:12 msg="viewing logs with start C:\\Users\\XXX\\AppData\\Local\\Ollama" time=2025-02-08T12:58:56.174+11:00 level=DEBUG source=updater.go:83 msg="check update response 204 (current version is up to date)"
Author
Owner

@mxyng commented on GitHub (Feb 8, 2025):

It appears one of the system dependencies is missing which prevents the backend from loading correctly. This will be fixed in the next rc

<!-- gh-comment-id:2644478761 --> @mxyng commented on GitHub (Feb 8, 2025): It appears one of the system dependencies is missing which prevents the backend from loading correctly. This will be fixed in the next rc
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

Thank you. Confirmed it fixes the error. I copied vcruntime140_1.dll from the c:\windows\system folder to the lib/ollama folder and it has resolved the issue.

Speeds for v0.5.8 are now inline with v0.5.7. I'm getting response of 1.11t/s and prompt of 2.17t/s.

Can you advise how to check whether the AVX512-VNNI module is loaded and working?

I have a Cascade Lake W-2235 CPU and it seems to be loading the next generation Ice Lake dll in the ollama lib folder according to the logs.

Maybe it would be a good idea to put the supported CPU flags being used in the log file. v0.5.8 doesn't seem to include all the flags i.e
time=2025-02-08T16:16:31.990+11:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(clang)" threads=6

<!-- gh-comment-id:2644514452 --> @ghost commented on GitHub (Feb 8, 2025): Thank you. Confirmed it fixes the error. I copied vcruntime140_1.dll from the c:\windows\system folder to the lib/ollama folder and it has resolved the issue. Speeds for v0.5.8 are now inline with v0.5.7. I'm getting response of 1.11t/s and prompt of 2.17t/s. Can you advise how to check whether the AVX512-VNNI module is loaded and working? I have a Cascade Lake W-2235 CPU and it seems to be loading the next generation Ice Lake dll in the ollama lib folder according to the logs. Maybe it would be a good idea to put the supported CPU flags being used in the log file. v0.5.8 doesn't seem to include all the flags i.e time=2025-02-08T16:16:31.990+11:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(clang)" threads=6
Author
Owner

@ghost commented on GitHub (Feb 8, 2025):

Here are the supported AVX512 extensions from Intel:
Skylake doesn't support VNNI and BF16.
Cascade Lake doesn't support BF16.

Extension (1st Generation) Intel® Xeon® Scalable Processors(formerly codenamed Skylake) 2nd Generation Intel® Xeon® Scalable Processors(formerly codenamed Cascade Lake) 3rd Generation Intel® Xeon® Scalable Processors(formerly codenamed Cooper Lake)
AVX512F Yes Yes Yes
AVX512CD Yes Yes Yes
AVX512DQ Yes Yes Yes
AVX512BW Yes Yes Yes
AVX512VL Yes Yes Yes
AVX512_VNNI No Yes Yes
AVX512_BF16 No No Yes
<!-- gh-comment-id:2644521244 --> @ghost commented on GitHub (Feb 8, 2025): Here are the supported AVX512 extensions from Intel: Skylake doesn't support VNNI and BF16. Cascade Lake doesn't support BF16. Extension | (1st Generation) Intel® Xeon® Scalable Processors(formerly codenamed Skylake) | 2nd Generation Intel® Xeon® Scalable Processors(formerly codenamed Cascade Lake) | 3rd Generation Intel® Xeon® Scalable Processors(formerly codenamed Cooper Lake) -- | -- | -- | -- AVX512F | Yes | Yes | Yes AVX512CD | Yes | Yes | Yes AVX512DQ | Yes | Yes | Yes AVX512BW | Yes | Yes | Yes AVX512VL | Yes | Yes | Yes AVX512_VNNI | No | Yes | Yes AVX512_BF16 | No | No | Yes
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31558