[GH-ISSUE #6240] Not executed in gpu amd rx 6750 GRE #29664

Closed
opened 2026-04-22 08:44:44 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @21307369 on GitHub (Aug 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6240

image

2024/08/08 14:17:22 routes.go:1108: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/lsmir2/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR:]"
time=2024-08-08T14:17:22.106+08:00 level=INFO source=images.go:781 msg="total blobs: 5"
time=2024-08-08T14:17:22.106+08:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
time=2024-08-08T14:17:22.107+08:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11434 (version 0.3.4)"
time=2024-08-08T14:17:22.113+08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama1166271814 error="remove /var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama1166271814: directory not empty"
time=2024-08-08T14:17:22.114+08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama2116948276 error="remove /var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama2116948276: directory not empty"
time=2024-08-08T14:17:22.114+08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama3670480486 error="remove /var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama3670480486: directory not empty"
time=2024-08-08T14:17:22.115+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama4115735691/runners
time=2024-08-08T14:17:22.146+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]"
time=2024-08-08T14:17:22.146+08:00 level=INFO source=types.go:105 msg="inference compute" id="" library=cpu compute="" driver=0.0 name="" total="32.0 GiB" available="16.1 GiB"
time=2024-08-08T14:17:24.808+08:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[16.1 GiB]" memory.required.full="5.7 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[5.7 GiB]" memory.weights.total="4.6 GiB" memory.weights.repeating="4.1 GiB" memory.weights.nonrepeating="485.6 MiB" memory.graph.full="561.0 MiB" memory.graph.partial="789.6 MiB"
time=2024-08-08T14:17:24.809+08:00 level=INFO source=server.go:392 msg="starting llama server" cmd="/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama4115735691/runners/cpu_avx2/ollama_llama_server --model /Users/lsmir2/.ollama/models/blobs/sha256-816441b33390807d429fbdb1de7e33bb4d569ac68e2203bdbca5d8d79b5c7266 --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 57883"
time=2024-08-08T14:17:24.816+08:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
time=2024-08-08T14:17:24.816+08:00 level=INFO source=server.go:592 msg="waiting for llama runner to start responding"
time=2024-08-08T14:17:24.817+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=3535 commit="1e6f6554" tid="0x107f92600" timestamp=1723097844
INFO [main] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="0x107f92600" timestamp=1723097844 total_threads=6
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="57883" tid="0x107f92600" timestamp=1723097844
llama_model_loader: loaded meta data with 23 key-value pairs and 283 tensors from /Users/lsmir2/.ollama/models/blobs/sha256-816441b33390807d429fbdb1de7e33bb4d569ac68e2203bdbca5d8d79b5c7266 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = chatglm
llama_model_loader: - kv 1: general.name str = codegeex4-all-9b
llama_model_loader: - kv 2: chatglm.context_length u32 = 131072
llama_model_loader: - kv 3: chatglm.embedding_length u32 = 4096
llama_model_loader: - kv 4: chatglm.feed_forward_length u32 = 13696
llama_model_loader: - kv 5: chatglm.block_count u32 = 40
llama_model_loader: - kv 6: chatglm.attention.head_count u32 = 32
llama_model_loader: - kv 7: chatglm.attention.head_count_kv u32 = 2
llama_model_loader: - kv 8: chatglm.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 9: general.file_type u32 = 2
llama_model_loader: - kv 10: chatglm.rope.dimension_count u32 = 64
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 12: chatglm.rope.freq_base f32 = 5000000.000000
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = chatglm-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151552] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151552] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151073] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 151329
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 151329
llama_model_loader: - kv 20: tokenizer.ggml.eot_token_id u32 = 151336
llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 151329
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 121 tensors
llama_model_loader: - type q4_0: 161 tensors
llama_model_loader: - type q6_K: 1 tensors
time=2024-08-08T14:17:25.069+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens cache size = 223
llm_load_vocab: token to piece cache size = 0.9732 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = chatglm
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 151552
llm_load_print_meta: n_merges = 151073
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 2
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 16
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 13696
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 5000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 9B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 9.40 B
llm_load_print_meta: model size = 5.08 GiB (4.64 BPW)
llm_load_print_meta: general.name = codegeex4-all-9b
llm_load_print_meta: EOS token = 151329 '<|endoftext|>'
llm_load_print_meta: UNK token = 151329 '<|endoftext|>'
llm_load_print_meta: PAD token = 151329 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 151336 '<|user|>'
llm_load_print_meta: max token length = 1024
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: CPU buffer size = 5196.84 MiB
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 320.00 MiB
llama_new_context_with_model: KV self size = 320.00 MiB, K (f16): 160.00 MiB, V (f16): 160.00 MiB
llama_new_context_with_model: CPU output buffer size = 2.38 MiB
llama_new_context_with_model: CPU compute buffer size = 561.01 MiB
llama_new_context_with_model: graph nodes = 1606
llama_new_context_with_model: graph splits = 1
INFO [main] model loaded | tid="0x107f92600" timestamp=1723097848
time=2024-08-08T14:17:28.352+08:00 level=INFO source=server.go:631 msg="llama runner started in 3.54 seconds"
[GIN] 2024/08/08 - 14:17:33 | 200 | 8.515598533s | 127.0.0.1 | POST "/v1/chat/completions"

Originally created by @21307369 on GitHub (Aug 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6240 <img width="898" alt="image" src="https://github.com/user-attachments/assets/87a706b4-7749-443f-86ea-86a3c7de1cc1"> 2024/08/08 14:17:22 routes.go:1108: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/lsmir2/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR:]" time=2024-08-08T14:17:22.106+08:00 level=INFO source=images.go:781 msg="total blobs: 5" time=2024-08-08T14:17:22.106+08:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0" time=2024-08-08T14:17:22.107+08:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11434 (version 0.3.4)" time=2024-08-08T14:17:22.113+08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama1166271814 error="remove /var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama1166271814: directory not empty" time=2024-08-08T14:17:22.114+08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama2116948276 error="remove /var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama2116948276: directory not empty" time=2024-08-08T14:17:22.114+08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama3670480486 error="remove /var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama3670480486: directory not empty" time=2024-08-08T14:17:22.115+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama4115735691/runners time=2024-08-08T14:17:22.146+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]" time=2024-08-08T14:17:22.146+08:00 level=INFO source=types.go:105 msg="inference compute" id="" library=cpu compute="" driver=0.0 name="" total="32.0 GiB" available="16.1 GiB" time=2024-08-08T14:17:24.808+08:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[16.1 GiB]" memory.required.full="5.7 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[5.7 GiB]" memory.weights.total="4.6 GiB" memory.weights.repeating="4.1 GiB" memory.weights.nonrepeating="485.6 MiB" memory.graph.full="561.0 MiB" memory.graph.partial="789.6 MiB" time=2024-08-08T14:17:24.809+08:00 level=INFO source=server.go:392 msg="starting llama server" cmd="/var/folders/3j/0tc5g9350n128f02l1tm845m0000gn/T/ollama4115735691/runners/cpu_avx2/ollama_llama_server --model /Users/lsmir2/.ollama/models/blobs/sha256-816441b33390807d429fbdb1de7e33bb4d569ac68e2203bdbca5d8d79b5c7266 --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 57883" time=2024-08-08T14:17:24.816+08:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 time=2024-08-08T14:17:24.816+08:00 level=INFO source=server.go:592 msg="waiting for llama runner to start responding" time=2024-08-08T14:17:24.817+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=3535 commit="1e6f6554" tid="0x107f92600" timestamp=1723097844 INFO [main] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="0x107f92600" timestamp=1723097844 total_threads=6 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="57883" tid="0x107f92600" timestamp=1723097844 llama_model_loader: loaded meta data with 23 key-value pairs and 283 tensors from /Users/lsmir2/.ollama/models/blobs/sha256-816441b33390807d429fbdb1de7e33bb4d569ac68e2203bdbca5d8d79b5c7266 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = chatglm llama_model_loader: - kv 1: general.name str = codegeex4-all-9b llama_model_loader: - kv 2: chatglm.context_length u32 = 131072 llama_model_loader: - kv 3: chatglm.embedding_length u32 = 4096 llama_model_loader: - kv 4: chatglm.feed_forward_length u32 = 13696 llama_model_loader: - kv 5: chatglm.block_count u32 = 40 llama_model_loader: - kv 6: chatglm.attention.head_count u32 = 32 llama_model_loader: - kv 7: chatglm.attention.head_count_kv u32 = 2 llama_model_loader: - kv 8: chatglm.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 9: general.file_type u32 = 2 llama_model_loader: - kv 10: chatglm.rope.dimension_count u32 = 64 llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 12: chatglm.rope.freq_base f32 = 5000000.000000 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = chatglm-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151552] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151552] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151073] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 151329 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 151329 llama_model_loader: - kv 20: tokenizer.ggml.eot_token_id u32 = 151336 llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 151329 llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 121 tensors llama_model_loader: - type q4_0: 161 tensors llama_model_loader: - type q6_K: 1 tensors time=2024-08-08T14:17:25.069+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens cache size = 223 llm_load_vocab: token to piece cache size = 0.9732 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = chatglm llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 151552 llm_load_print_meta: n_merges = 151073 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 2 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 16 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 13696 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 5000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 9B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 9.40 B llm_load_print_meta: model size = 5.08 GiB (4.64 BPW) llm_load_print_meta: general.name = codegeex4-all-9b llm_load_print_meta: EOS token = 151329 '<|endoftext|>' llm_load_print_meta: UNK token = 151329 '<|endoftext|>' llm_load_print_meta: PAD token = 151329 '<|endoftext|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 151336 '<|user|>' llm_load_print_meta: max token length = 1024 llm_load_tensors: ggml ctx size = 0.14 MiB llm_load_tensors: CPU buffer size = 5196.84 MiB llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 5000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 320.00 MiB llama_new_context_with_model: KV self size = 320.00 MiB, K (f16): 160.00 MiB, V (f16): 160.00 MiB llama_new_context_with_model: CPU output buffer size = 2.38 MiB llama_new_context_with_model: CPU compute buffer size = 561.01 MiB llama_new_context_with_model: graph nodes = 1606 llama_new_context_with_model: graph splits = 1 INFO [main] model loaded | tid="0x107f92600" timestamp=1723097848 time=2024-08-08T14:17:28.352+08:00 level=INFO source=server.go:631 msg="llama runner started in 3.54 seconds" [GIN] 2024/08/08 - 14:17:33 | 200 | 8.515598533s | 127.0.0.1 | POST "/v1/chat/completions"
GiteaMirror added the question label 2026-04-22 08:44:44 -05:00
Author
Owner

@dhiltgen commented on GitHub (Aug 9, 2024):

This line in the log

time=2024-08-08T14:17:22.146+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]"

Is missing all the GPU LLM libraries, which indicates you most likely didn't install using the instructions at https://ollama.com/download but used a packaged version not maintained by the Ollama maintainers which omits GPU support. Please consult the documentation for your packaging source to find which package you need to install to get GPU support, or use the official Ollama install instructions and the GPU should work.

<!-- gh-comment-id:2278604836 --> @dhiltgen commented on GitHub (Aug 9, 2024): This line in the log ``` time=2024-08-08T14:17:22.146+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]" ``` Is missing all the GPU LLM libraries, which indicates you most likely didn't install using the instructions at https://ollama.com/download but used a packaged version not maintained by the Ollama maintainers which omits GPU support. Please consult the documentation for your packaging source to find which package you need to install to get GPU support, or use the official Ollama install instructions and the GPU should work.
Author
Owner

@21307369 commented on GitHub (Aug 11, 2024):

I am using the installation package from https://ollama.com/download. Currently, the official version tested on windows10 and MacOS 12.7.6 does not support the 6750GRE graphics card.

<!-- gh-comment-id:2282859021 --> @21307369 commented on GitHub (Aug 11, 2024): I am using the installation package from https://ollama.com/download. Currently, the official version tested on windows10 and MacOS 12.7.6 does not support the 6750GRE graphics card.
Author
Owner

@dhiltgen commented on GitHub (Sep 3, 2024):

Sorry, I got confused. You're on an x86 Mac. That set of runners is expected, and GPUs are not currentlysupported on x86 Macs. That's tracked via issue #1016

<!-- gh-comment-id:2327233474 --> @dhiltgen commented on GitHub (Sep 3, 2024): Sorry, I got confused. You're on an x86 Mac. That set of runners is expected, and GPUs are not currentlysupported on x86 Macs. That's tracked via issue #1016
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29664