[GH-ISSUE #5108] ollama run loading a long time #3223

Closed
opened 2026-04-12 13:43:42 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @wangzi2124 on GitHub (Jun 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5108

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

20240618105223

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

No response

Originally created by @wangzi2124 on GitHub (Jun 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5108 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? <img width="1178" alt="20240618105223" src="https://github.com/ollama/ollama/assets/13045190/24009f4a-46ee-4ec4-a5c5-582a80208aeb"> ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 13:43:42 -05:00
Author
Owner

@rb81 commented on GitHub (Jun 18, 2024):

+1 on this. Linux with CPU only.

<!-- gh-comment-id:2175505863 --> @rb81 commented on GitHub (Jun 18, 2024): +1 on this. Linux with CPU only.
Author
Owner

@jmorganca commented on GitHub (Jun 18, 2024):

Would it be possible to share the logs? Thanks so much!

<!-- gh-comment-id:2175869213 --> @jmorganca commented on GitHub (Jun 18, 2024): Would it be possible to share the logs? Thanks so much!
Author
Owner

@pdevine commented on GitHub (Jun 18, 2024):

Also the GPU type. @rb81 I would imagine your issue is different than @wangzi2124 , but if you could both post the specs of your systems that would really help.

<!-- gh-comment-id:2176611998 --> @pdevine commented on GitHub (Jun 18, 2024): Also the GPU type. @rb81 I would imagine your issue is different than @wangzi2124 , but if you could both post the specs of your systems that would really help.
Author
Owner

@dhiltgen commented on GitHub (Jun 18, 2024):

In addition to server logs, disabling mmap might yield a faster load, depending on what is leading to the slowdown.

curl http://localhost:11434/api/generate -d '{
  "model": "llava:7b-lora",
  "prompt": "Why is the sky blue?",
  "stream": false, "options": {"use_mmap": false}
}'
<!-- gh-comment-id:2176691679 --> @dhiltgen commented on GitHub (Jun 18, 2024): In addition to server logs, disabling mmap might yield a faster load, depending on what is leading to the slowdown. ``` curl http://localhost:11434/api/generate -d '{ "model": "llava:7b-lora", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false} }' ```
Author
Owner

@rb81 commented on GitHub (Jun 18, 2024):

@pdevine - I'm running Linux, Intel i7, 64GB RAM, CPU only.

<!-- gh-comment-id:2176831840 --> @rb81 commented on GitHub (Jun 18, 2024): @pdevine - I'm running Linux, Intel i7, 64GB RAM, CPU only.
Author
Owner

@wangzi2124 commented on GitHub (Jun 19, 2024):

This situation may occur if the command is not ended correctly, such as /bye, Ctrl+D, and the remote connection tool is closed.
Solution: Restart the service #
sudo systemctl daemon-reload
sudo systemctl restart ollama.service

<!-- gh-comment-id:2178013625 --> @wangzi2124 commented on GitHub (Jun 19, 2024): This situation may occur if the command is not ended correctly, such as /bye, Ctrl+D, and the remote connection tool is closed. Solution: Restart the service # sudo systemctl daemon-reload sudo systemctl restart ollama.service
Author
Owner

@wangzi2124 commented on GitHub (Jun 19, 2024):

I can only see the operation log, no error, I can't find the run log

<!-- gh-comment-id:2178026600 --> @wangzi2124 commented on GitHub (Jun 19, 2024): I can only see the operation log, no error, I can't find the run log
Author
Owner

@wangzi2124 commented on GitHub (Jun 19, 2024):

还有 GPU 类型。 @rb81我想你的问题不同于@wangzi2124,但如果你们都能发布你们系统的规格,那将会很有帮助。

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
Stepping: 1
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4800.20
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_p
erfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 s
se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs i
bpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc
cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

<!-- gh-comment-id:2178034793 --> @wangzi2124 commented on GitHub (Jun 19, 2024): > 还有 GPU 类型。 @rb81我想你的问题不同于@wangzi2124,但如果你们都能发布你们系统的规格,那将会很有帮助。 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-87 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz CPU family: 6 Model: 79 Thread(s) per core: 2 Core(s) per socket: 22 Socket(s): 2 Stepping: 1 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 4800.20 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_p erfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 s se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs i bpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
Author
Owner

@wangzi2124 commented on GitHub (Jun 19, 2024):

可以分享日志吗?非常感谢!

After running the model, disconnect the server link tool without ending the model command, such as: /bye, Ctrl+D
This will happen

<!-- gh-comment-id:2178037984 --> @wangzi2124 commented on GitHub (Jun 19, 2024): > 可以分享日志吗?非常感谢! After running the model, disconnect the server link tool without ending the model command, such as: /bye, Ctrl+D This will happen
Author
Owner

@wangzi2124 commented on GitHub (Jun 19, 2024):

sudo systemctl restart ollama.service

Modelfile

FROM "./data/Model/Llama3-8B-Chinese-Chat-GGUF/Llama3-8B-Chinese-Chat-q4_1-v1.gguf"

set the temperature to 1 [higher is more creative, lower is more coherent]

PARAMETER temperature 1

添加参数

PARAMETER num_gpu 5

许多聊天模式需要提示模板才能正确回答。默认提示模板可以使用TEMPLATE中的Modelfile指令指定

TEMPLATE "[INST] {{ .Prompt }} [/INST]"

TEMPLATE "
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>
"

SYSTEM "
You are a helpful assistant.
"
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>

<!-- gh-comment-id:2178049525 --> @wangzi2124 commented on GitHub (Jun 19, 2024): > sudo systemctl restart ollama.service # Modelfile FROM "./data/Model/Llama3-8B-Chinese-Chat-GGUF/Llama3-8B-Chinese-Chat-q4_1-v1.gguf" # set the temperature to 1 [higher is more creative, lower is more coherent] PARAMETER temperature 1 # 添加参数 # PARAMETER num_gpu 5 # 许多聊天模式需要提示模板才能正确回答。默认提示模板可以使用TEMPLATE中的Modelfile指令指定 # TEMPLATE "[INST] {{ .Prompt }} [/INST]" TEMPLATE " {{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|> " SYSTEM " You are a helpful assistant. " PARAMETER stop <|start_header_id|> PARAMETER stop <|end_header_id|> PARAMETER stop <|eot_id|>
Author
Owner

@wangzi2124 commented on GitHub (Jun 19, 2024):

@pdevine - I'm running Linux, Intel i7, 64GB RAM, CPU only.

ollama create qwen0_5b -f Modelfile in
add PARAMETER num_gpu 5 try it

<!-- gh-comment-id:2178054759 --> @wangzi2124 commented on GitHub (Jun 19, 2024): > @pdevine - I'm running Linux, Intel i7, 64GB RAM, CPU only. ollama create qwen0_5b -f Modelfile in add PARAMETER num_gpu 5 try it
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3223