mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 08:02:14 -05:00
[GH-ISSUE #3511] On Windows, launching ollama from the shortcut or executable by clicking causes very slow tokens generation, but launching from commandline is fast #48675
Closed
opened 2026-04-28 09:04:53 -05:00 by GiteaMirror
·
44 comments
No Branch/Tag Specified
main
parth-mlx-decode-checkpoints
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#48675
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @lrq3000 on GitHub (Apr 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3511
Originally assigned to: @dhiltgen on GitHub.
What is the issue?
Since I installed ollama (v0.1.30) on Windows 11 Pro, I run into a peculiar issue. When I launch ollama from the installed shortcut, which launches "ollama app.exe", or when I boot up my OS (which also starts up the same shortcut as configured by the ollama installer), ollama is extremely slow. If I do a "ollama run deepseek-coder", the model startup takes a very long time, several minutes, and when I type any input (eg, Hello), it takes again several minutes to generate each token (instead of 200-500ms/T with the workarounds).
However, I could fix the issue by simply closing the systray icon, and then either:
ollama servein a terminal, but then I need to keep this open and I don't get the ollama systray icon.ollama run deepseek-coder(or any other model), which will then also launch the ollama systray icon, just like launchingollama app.exe, but this time it works flawlessly, just likeollama serve.I can confirm I can easily reproduce the bug simply by launching
ollama app.exemanually, and the bug is not present withollama serveandollama run <model>(onceollama app.exeis first closed of course).I read the logs but I did not find anything particularly telling. I will post a trace soon.
/EDIT: Here are the logs for when I launch
ollama app.exeand it's slower (I launchedollama app.exefrom the Windows shortcut thenollama run deepseek-coder:6.7b-instruct-q8_0then I typeHelloas a prompt, then CTRL-C to stop generation that was too long after 2 tokens):app.log
server.log
Here are the logs for when I launch
ollama run deepseek-coder:6.7b-instruct-q8_0directly whenollama app.exeis killed:app.log
server.log
What did you expect to see?
200-500ms/T generation speed and much faster model initialization, instead of several minutes for each.
Steps to reproduce
Launch
ollama app.exeon Windows, this will be much slower thanollama serveorollama run <model>.Are there any recent changes that introduced the issue?
I don't know, I never used ollama before (since it was not available on Windows until recently).
OS
Windows
Architecture
x86
Platform
No response
Ollama version
0.1.30
GPU
Nvidia
GPU info
Nvidia GeForce 3060 Laptop
CPU
Intel
Other software
Intel i7-12700h
@dhiltgen commented on GitHub (Apr 12, 2024):
This definitely isn't expected behavior. Looking at the logs, they seem ~identical, so I'm not sure what's going wrong. Is it possible there's an AV involved which could be slowing things down? Can you fire up Task Manager and look at it for both scenarios and see if you see any notable differences?
@lrq3000 commented on GitHub (Apr 13, 2024):
@dhiltgen Thank you for your attention and help in this issue. There is indeed an AV but looking up at the task manager it did not seem involved, but I will try again by disabling it.
@lrq3000 commented on GitHub (Apr 13, 2024):
/TL;DR: the issue now happens systematically when double-clicking on the
ollama app.exeexecutable (without even a shortcut), but not when launching it fromcmd.exeor PowerShell. More precisely, launching by double-clicking makesollama.exeuse 3-4x as much CPU and also increases the RAM memory usage, and hence causes models to not fit memory/cpu when they should run fine on my system.@dhiltgen Unfortunately disabling all my protection systems (antivirus and firewall) did not help. The only thing that helps is closing the
ollama app.exethat is launched at startup and relaunching it or just typingollama run <model>and it works very fine, even when all my protection systems are activated.In the task manager I don't remember seeing anything in particular but I'll try again.
/EDIT: I retried several times to monitor what is happening in the task manager, there are differences (observed in both the native task manager and Process Explorer).
Additional observations not related to the task manager:
/byeand thenollama run <model>again (although the startup time is faster since the model is still loaded in memory it seems, but generation is as slow).After I wrote the above observations, my ollama got updated today to 0.1.31, and although I still observe the points above after the update, it seems to have slightly changed the behavior in a more reproducible way (or maybe I just did not notice before - /EDIT: actually I'm pretty sure this behavior happened before too in past releases but I just did not notice because I just did not expect this to be possible). Either way, hopefully this will ease a lot tracking down this issue. Here are the new insights I think you will find very interesting:
C:\Users\<username>\AppData\Local\Programs\Ollama\ollama app.exe(hence no need for a shortcut). Then simply run a big enough model likeollama run deepseek-coder:6.7b-instruct-q8_0for a 16 GB RAM system such as mine, and it will start behaving badly, including when closing it will make the whole OS lag and unresponsive (a sign of a RAM overflow).C:\Users\<username>\AppData\Local\Programs\Ollama\ollama app.exeina terminal(I tried both with the old terminal and powershell, it works in both cases) and then againollama run deepseek-coder:6.7b-instruct-q8_0; or by directly launchingC:\Users\<username>\AppData\Local\Programs\Ollama\ollama app.exewithout launchingollama app.exebeforehand (this latter behavior did not change).Note that being administrator or not does not change anything: I did all my tests without being an administrator, but I tested launching the icon as an administrator by creating a shortcut with admin mode checked, and it did not help. Whereas launching ollama app.exe from the commandline is always fast even if in user mode.
Hence, it seems the issue happens only when launching
ollama app.exeby double-clicking it or at startup (which emulates double-clicking it from its shortcut). A simple fix is to launchollama app.exeby a batch command (and ollama could do this in its installer, instead of just creating a shortcut in the Startup folder of the startup menu, by placing a batch file there, or just prependcmd.exe /k "path-to-ollama-app.exe"in the shortcut), but the correct fix is when we will find what causes the somehow different behavior between launching in commandline or from double-clicking that somehow increases both the memory and CPU usages ofollama app.exe.(My guess is that Windows OS is treating both cases differently because of the UI library used somehow, and it then bundles a bunch of additional UI related libraries or stuff that ollama doesn't need, and it then causes the app to be just a bit too heavy than it can be for my system RAM to support the model I loaded. This would explain why the issue does not seem to happen with small enough models. But really I don't have enough data to ascertain this is the case.)My new guess is that some logic inside of
ollama app.exeor some library used by it is detecting how (environment) the app is launched and is behaving differently depending on this condition, by attaching more useless libraries (UI libraries?) that snowball and makes the whole app much more resource consuming. My guess would be the UI/systray icon library.If you have any idea for me to debug this further, please let me know, I'm eager to understand what's really happening here even though I already have a workaround!
@joubertdj commented on GitHub (Apr 14, 2024):
@lrq3000 : Thanks for finding a workaround ... and for highlighting it here ... I thought I broke something ... I am only playing with Ollama and Langchain at the moment ... and I thought I did something (as I am playing with my own indexing and metadata etc.) that it suddenly became SO slow ... but I am also experiencing this slow down. After killing the "ollama app.exe" and starting it the 3way you mentioned it then improved ... however ... to me it still feels far slower than it was previously (considering I am also using it on an equivalent laptop as yours, albeit Windows 10 Pro).
@joubertdj commented on GitHub (Apr 14, 2024):
PS. I can confirm that the pre-release of 0.1.32 addresses the issue. It seems that the primary reason for it being slow was that it tried ONLY to use vRAM and not system RAM ...
@dhiltgen commented on GitHub (Apr 15, 2024):
@joubertdj just to confirm, you're seeing good/consistent performance with 0.1.32 regardless of how the app is started?
In the 0.1.32 release we've changed to using subprocesses to manage the GPU.
@lrq3000 could you give it a try as well and see if the problem goes away in 0.1.32 for you?
If that release doesn't resolve it, the next things we can look at are what the processes are set to for Priority in the Task Manager Details view, and possibly looking at resmon.exe's cpu/disk/memory usage in these two scenarios.
@joubertdj commented on GitHub (Apr 16, 2024):
That is correct. It looks like it solved the issue for me.
@lrq3000 commented on GitHub (Apr 17, 2024):
I just tried v0.1.32 stable (not pre-release) and unfortunately no change, the same behavior appears, but worsened.
I can see the new
ollama_llama_server.exeprocess, and now instead ofollama app.exebeing the culprit, it's the server.exe.So now what is happening is that when I am launching from commandline, which previously worked very fine, now it starts generating a sentence but then stops after a few token and the same issue happens (which is that it essentially freezes for a long time before generating the next token - like really really long, it's a real freeze). I could wait the whole evening and I'm not sure I would get one whole response from the LLM.
In comparison, when I launch from the shortcut, this issue happens right away after the first generated token.
One thing I noticed is that when this freeze happens, the CPU gets overutilized, it shoots up from like 1-10% to 40% and stays there even if I CTRL-C halt the LLM generation and even after /bye. When the LLM generates the tokens correctly, the CPU is always much less used.
@joubertdj commented on GitHub (Apr 17, 2024):
I just upgraded to the stable release. Although I have to admit, the upgrade via the system icon (Update available restart) didn’t work at all, maybe that was a pre_release thing. I had to download OllamaSetup.exe manually, kill all the ollama processes (as even Quit Ollama didn’t work) and just install it via the installer downloaded.
My performance was “stable”, meaning, it is as I expected for this class of machine and better than v0.1.31.
When you run your application, and it is busy generating tokens. How much RAM does ollama_lama_server.exe consume? When I stressed tested a big LLM in v0.1.31 it got stuck at a maximum of 2GB (or at least the ollama.exe got stuck at that level, initially I thought they compiled the executable to be a 32-Bit exe). With v0.1.32 it consumes +- 9GB (using deepseek-coder:6.7b-instruct-fp16). However, even 9GB is a tad “low” for the 6.7b fp16 is it not?
When I ran the “deepseek-coder:6.7b-instruct-q4_1” model the memory utilization was again at +-9GB, which seems high for an q4_1 model … or am I missing something again?
In both instances the GPU was barely used … maybe once or twice during the entire token generation, and then it was never used again? Maybe the integrated RTX3050’s vRAM is not sufficient or something?
From: Stephen Karl Larroque @.>
Sent: Wednesday, April 17, 2024 2:20 AM
To: ollama/ollama @.>
Cc: joubertdj @.>; Mention @.>
Subject: Re: [ollama/ollama] Ollama app.exe is extremely slow on Windows, but not ollama serve nor ollama run (Issue #3511)
I just tried v0.1.32 stable (not pre-release) and unfortunately no change, the same behavior appears, but worsened.
I can see the new ollama_llama_server.exe process, and now instead of ollama app.exe being the culprit, it's the server.exe.
So now what is happening is that when I am launching from commandline, which previously worked very fine, now it starts generating a sentence but then stops after a few token and the same issue happens (which is that it essentially freezes for a long time before generating the next token - like really really long, it's a real freeze). I could wait the whole evening and I'm not sure I would get one whole response from the LLM.
In comparison, when I launch from the shortcut, this issue happens right away after the first generated token.
One thing I noticed is that when this freeze happens, the CPU gets overutilized, it shoots up from like 1-10% to 40% and stays there even if I CTRL-C halt the LLM generation and even after /bye. When the LLM generates the tokens correctly, the CPU is always much less used.
—
Reply to this email directly, view it on GitHub https://github.com/ollama/ollama/issues/3511#issuecomment-2060106905 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ATGJBFUXJGTX3NL6C4G77T3Y5W53HAVCNFSM6AAAAABF2IBXZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRQGEYDMOJQGU .
You are receiving this because you were mentioned. https://github.com/notifications/beacon/ATGJBFV7MGN2OPSKFIJANADY5W53HA5CNFSM6AAAAABF2IBXZ6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTT2ZK6JS.gif Message ID: @.*** @.***> >
@lrq3000 commented on GitHub (Apr 17, 2024):
So I ran additional tests to observe whether it's an issue with a lack of RAM. I uninstalled and reinstalled several times both versions from the exe installers on the GitHub releases page.
v0.1.31 commandline: 1463 MB RAM to 1616 RAM -- when reloading model after some time after /bye, it consumes only 1254-1360 MB RAM. In fact, without /bye, by waiting for some time, the RAM usage goes as low as 1090 MB. These figures do not change when sending prompts to the model. CPU usage: from 0% to ~60% while generating tokens (other processes cumulatively take only 3-4% CPU).
Note that when lacking RAM, this indeed reproduces the issue. But when there is no lack of RAM, the issue persists only when launching from clicking on the executable or shortcut (see below).
v0.1.31 from clicking on binary or shortcut: 1614 MB, stays at 25-30% CPU usage even after CTRL-C break and /bye for a while, a dozen seconds maybe (until next token generation?) then stops and falls back to 0%.
But even so, 2.1 GB RAM still available and despite this, program is still horribly slow. So it's not only caused by insufficient available RAM.
After waiting, it still goes to 1531 MB. And it still horribly slow. Ah after some time after reloading, during generation, it can go lower to 1392 MB. Still very very slow, despite 2.1GB free RAM.
CPU usage: 30-40% while generating tokens (other processes cumulatively take only 3-4% CPU).
v0.1.32 commandline: 1462MB RAM right away, same RAM after waiting a while. Fast generation, as before. CPU usage: from 0% to ~60% while generating tokens (other processes cumulatively take only 3-4% CPU) (same as before).
v0.1.32 from clicking on binary or shortcut: 1460 MB RAM right away, same RAM after waiting a while. Slow generation as before. CPU usage: 30-40% while generating tokens (other processes cumulatively take only 3-4% CPU) (as before).
Key take aways:
Note that I am not only using ollama, I regularly use this exact same model (same quantization, same number of parameters, same instruct mode) with koboldcpp and gpt4all and there is no such issue, I always get fast tokens generation there. This issue is specific to ollama and only when launching by clicking on the executable/shortcut.
So indeed my issue seems to be different from @joubertdj 's, and is still present in the latest v0.1.32 .
Given that my system is pretty vanilla, I expect others will run into the same issue at some point, so even if we can't figure out the culprit right now, I think it's worth keeping this issue open for others to find it and contribute to the detective work.
@joubertdj commented on GitHub (Apr 18, 2024):
@lrq3000 : This morning, I read your feedback and without initially starting Ollama from the icon, I started it from the console via "ollama serve". Its performance was, at minimum, four times better/faster than when I started it via the icon!!! I thought, myabe, this was only due to a fresh start, so I stopped "olama serve" and started it with the icon. It was noticebly slower!!!????
You are correct, although I may have had some different issue previously, there is definitely some performance issue here somewhere. I still however do not fully understand how a 7B model that is 4GB (almost) in file size only takes 2GB or RAM though ... that portion has me a bit "scratching-my-head" ...
@lrq3000 commented on GitHub (Apr 18, 2024):
@joubertdj I'm not sure either about the difference but I think this filesize is not the model but the size the app is taking in memory, and also Windows uses compression methods. The figures I reported are from the Windows native task manager. When I use Process Explorer, I get the expected total memory size of ~8GB, but I do not report it because it does not change, this is simply the model's size, so there is no difference when I launch via the icon or via commandline, or between different versions, etc.
@mmacphail commented on GitHub (Apr 21, 2024):
Thank you !!! I have exactly the same problem. I was wondering why after a fresh install of ollama it was fast, and then sometimes it was slow. Fucking hell !!
@papyr commented on GitHub (Apr 28, 2024):
This happens in AMD and the GUI / window does not open.
I can verify its running in the task mgr, but there not open window. with latest release...
@dhiltgen commented on GitHub (Apr 28, 2024):
We've reshuffled the packaging model a bit on windows in the latest release. I still don't understand the root cause on this performance bug, but it's possible that reshuffling might have an impact, so I'd suggest folks give 0.1.33 a try.
@qsdhj commented on GitHub (May 3, 2024):
I have the same issue now. After the update to ollama 0.1.33 on Windows 11 Pro
To be honest I am unsure if I have the same problem.
The first prompt I do is working normal.
From the second prompt onwards, my GPU gets max utilized for 2-6 minutes, than I get the answer from the llm.
I tried it with the cli and with langchain, get the same issue with both methods.
This happened after pdating from an older version of ollama to 0.1.33 and I also installed torch with cuda support.
Before that I got a GPU Utilisation from around 40%-50% instead of the max utilisation.
I now have cuda installed over the nvidia installer and with torch. Maybe that is my problem?

nvidia-smi:
torch.version: '2.3.0+cu118'
@dhiltgen commented on GitHub (May 11, 2024):
I still haven't managed to reproduce this or figure out what the culprit is for the strange slow-down in performance.
For folks who have seen this behavior, can you share a bit more about your system setup so that hopefully I can find a way to repro and ultimately fix whatever it is?
Get-WmiObject -Class Win32_Processor -ComputerName. | Select-Object -Property [a-z]*systeminfo | find "Virtual Memory"@lrq3000 commented on GitHub (May 13, 2024):
@dhiltgen
OS version
Windows 11 pro.
Personal system or work system
Personal system.
AV is Avira but I also used Avast in the past and I also tried to uninstall it fully at some point in my tests.
Laptop or Desktop?
Laptop. It is always plugged and I am using an external fan stand (KLIM Cyclone).
CPU model
Additionally I used Intel's app to list the number of P-cores and E-cores:



System memory
/EDIT: done the tests (sorry for the delay).
ollama run ...) but not generating:@qsdhj commented on GitHub (May 14, 2024):
OS version
Windows 11 Enterprise
Version: 23H2
System
CPU
11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
RAM
32,0 GB
VRAM
6GB
NVIDIA RTX A3000
I think my problem was different from the one, called in this issue. I use ollama together with a HuggingFace Sentence-Transformer in Langchain and LlamaIndex.
My problem is that if I install the torch version with CUDA to use it with my Embedding modell, I get this weird behaivour in Ollama, where the GPU is running on 100% load for a few minutes until the llm is responsing.
Terminating my Python script, and the ollama processes, fixes it for the first call to an llm, after that its the same, until I restart windows.
Installing the non CUDA torch version fixed that, just now I can't use the GPU for creating embeddings, so its not a usable solution for me.
@lowfatgeek commented on GitHub (May 23, 2024):
I am faced with exactly the same issue as well. I have to run "ollama serve" in the terminal and I have to keep the Terminal window open side by side with the browser (OpenWebUI), or else the performance becomes so slow and sluggish. Even if I maximize the browser and Terminal window behind it, the performance is so slow. Both Terminal and Browser should be open side by side.
I'm on Intel i5-13450HX, 16GB RAM, and RTX4050 6GB GPU.
@dhiltgen commented on GitHub (May 31, 2024):
It seems so far that the common theme is recent CPUs on a laptop. Perhaps there's some quirk causing ollama to run on efficiency cores? Has anyone experienced this slowdown on a desktop or a system without efficiency cores?
Perhaps another experiment to try - open Task Manager, and see if any of the ollama processes have a green leaf indicating they're being scheduled on the efficiency cores. If so, right click on the process and uncheck
Efficiency modeIf this does turn out to be the underlying cause, there does appear to be an API we can call to programmatically adjust how the process is scheduled.
@wac81 commented on GitHub (Jun 3, 2024):
Check the power management items, including bios, my case is, if the screen is turned on, it runs fast, if the screen is turned off, it runs very slow, check this time mainly occupied in cpu
@chyok commented on GitHub (Jul 4, 2024):
Hi @dhiltgen ,
Some problem here, I am working on a desktop computer. When I open using the desktop icons, the CPU only allocates to 4 small cores, while the other 8 big guy are just observing. If I use the
ollama serve, everything is normal and fast.My PC detail is as follows:
Windows 11 23H2 version 22631.2861
i7-12700K
RTX 3060Ti
It feels like Windows is treating it as a background task and only allocating the small cores to it.
@dhiltgen commented on GitHub (Jul 22, 2024):
@chyok when you see it running on the "small cores" does it also show up with the green leaf icon in TaskManager? I haven't been able to reproduce this in my setups, but if someone can confirm definitively this is the case, then I should be able to code up a fix to force it to not go into efficiency mode.
@chyok commented on GitHub (Jul 23, 2024):
It's quite strange that it doesn't enter efficiency mode, and it's very easy to reproduce the problem on my system. I had to use "process lasso" to force it to be scheduled on the big cores.
@dhiltgen commented on GitHub (Jul 30, 2024):
From what folks are seeing, it sounds like this may not be an efficiency setting, but a process priority setting.
In Task Manager, what is the Priority for the
ollama_llama_server.exeprocess?In Win 10 click the "Details" tab at the top. In Win 11, click the Details option on the left. Then right-click on
ollama_llama_server.exeWin10 example:

@chyok commented on GitHub (Jul 31, 2024):
I'm traveling outside, and unable to use computer, but I'm quite certain that the priority is normal, even though I've tried to increase it, but it still hasn't worked. Setting the relevance and manually specifying the big core can solve the issue, but this process is new each time, so I need to set it every time.
@lrq3000 commented on GitHub (Jul 31, 2024):
Yes I also confirm that priority is normal, I even tried to set it to high using ProcessExplorer.
@dhiltgen commented on GitHub (Aug 1, 2024):
Strange. So it's not the Priority, and it's not efficiency mode either.
@greggft commented on GitHub (Aug 12, 2024):
Under windows, maybe try this
Settings
type in GPU in the search
Select Graphic Settings
Should take you to
Graphics performance settings
Click on
Browse
Add each of the executables you want to assign to your GPU
add the executables for ollama here
See if that helps
@ivanvengeruk commented on GitHub (Aug 18, 2024):
have same issue with slow tokens.
ollama version: 0.3.6 - latest at date of post
OS: Windows 11 Pro
23H2 22631.4037 Windows Feature Experience Pack 1000.22700.1027.0
configuration:
Laptop
CPU
12th Gen Intel(R) Core(TM) i7-12800HX
2,00 GHz
Core: 16
Logical cores: 24
Virtualization: ON
RAM: 64GB
Graphical processor: 0
NVIDIA RTX A1000 Laptop GPU
Driver version: 32.0.15.6076
DirectX: 12 (FL 12.1)
Dedicated GPU memory 0,9/4,0 Gb
Shared GPU memory 0,2/31,9 Gb
GPU Memory 1,1/35,9 Gb
reproduce problem:
start ollama app.exe from gui (main menu) or double-click on app (it runs processes Ollama, ollama.exe with ~30 mb memory)
then in terminal "ollama run llama3:8b --verbose" (it runs processes ollama.exe (again but with ~10mb memory), ollama_llama server.exe)
then "hello"
result:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
total duration: 2m29.5910962s
load duration: 67.9495ms
prompt eval count: 11 token(s)
prompt eval duration: 17.11587s
prompt eval rate: 0.64 tokens/s
eval count: 26 token(s)
eval duration: 2m12.403243s
eval rate: 0.20 tokens/s
workaround:
kill all ollama processes in taskmanager
then in terminal "ollama run llama3:8b --verbose"
(it will start Ollama, ollama.exe twice (one with ~30 mb memory, one with ~10 mb), ollama_llama_server.exe)
then "hello"
result:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
total duration: 3.0733124s
load duration: 24.1601ms
prompt eval count: 11 token(s)
prompt eval duration: 762.752ms
prompt eval rate: 14.42 tokens/s
eval count: 26 token(s)
eval duration: 2.284004s
eval rate: 11.38 tokens/s
In both cases processes are same. In both cases if i try to kill process ollama.exe (30mb) - then it auto restarts, and in slow case it countinues to generate slow, in fast case it continues fast. Only the running Ollama matter. From gui-slow from terminal - fast.
Made dump for both cases, maybe it can help(dump from taskmanager).
trying to attach dumps, but 1 DMP ~80 mb, github allow 25mb compress to .7z format 20mb , but github allow GZ or ZIP, which size is more than 25 mb even on ultra mode archivation. So just tell if these dumps would be helpfull then tell me a way how i could send them. Or you can make them locally by same way as i mentoid above.
@jayghoshrao commented on GitHub (Aug 31, 2024):
In my case, setting the process priority to high speeds up the output instantly. The difference between "normal" and high is quite stark, too.
@Fatalmemory commented on GitHub (Sep 18, 2024):
old thread, but to chance a guess - could it be related to windows superfetch? windows could either be caching files differently for files that load on startup, or worst case scenario, could be paging the files into a hibernation file on the hard drive and caching them that way, which would be much slower than RAM until they're swapped into actual RAM
@lrq3000 commented on GitHub (Sep 22, 2024):
Thank you for working on this but I think it is maybe premature to close this issue as I wrote manually changing task priority did not help in the past. I will try again when this will be released in an update and will let you guys know if the issue is fixed or not for me.
@ValleZ commented on GitHub (Oct 16, 2024):
It still doesn't run on windows
@dhiltgen commented on GitHub (Oct 17, 2024):
@ValleZ can you clarify? This issue was tracking a performance problem which we believe we've fixed, but if it completely doesn't run for you, that sounds like an unrelated defect. If you mean the performance problem wasn't fixed, please explain a bit more about your scenario.
@ValleZ commented on GitHub (Oct 17, 2024):
Yeah, maybe. I tried to run it on windows by downloading the exe file and it spent some time setting it up and then setup closed itself with no confirmation or nothing. I repeated the setup several times with the same result and one try I rebooted PC just in case. Then after set up I tried to run it from start menu and it didn't start, nothing appeared. I then noticed there was non-clickable ollama icon in status bar. Maybe it's how it's expected to "run" but I just uninstalled it and then run koboldcpp with no problems. Compiling and running plain llama.cpp also works fine.
@ValleZ commented on GitHub (Oct 17, 2024):
Googling why it won't start gave this ticket for some reason, weird, yes, it's totally unrelated.
@dhiltgen commented on GitHub (Oct 17, 2024):
@ValleZ what happens if you open a fresh PowerShell terminal and type
ollama run llama3.1?@ValleZ commented on GitHub (Oct 17, 2024):
Idk, I uninstalled it. If it’s a terminal only solution it should not have
UI for installer or should have confirmation that installation is completed
and now you are supposed to type something somewhere to proceed
On Thu, Oct 17, 2024 at 12:52 PM Daniel Hiltgen @.***>
wrote:
@dhiltgen commented on GitHub (Oct 17, 2024):
@ValleZ yes, Ollama is a terminal and API based tool for running LLMs. If you're looking for a GUI, there's a ton of community based UIs listed here you can explore - https://github.com/ollama/ollama?tab=readme-ov-file#web--desktop
@mimo-to commented on GitHub (Oct 31, 2025):
yah same happened here,
the ollama GUI is not opening and u cant able to access it from system tray
but in cli everything is working fine
@cdll commented on GitHub (Mar 18, 2026):
same issue still on
ollama@v0.18.1😭@lrq3000 commented on GitHub (Mar 18, 2026):
I use lemonade-server, the GUI and systray works fine and very fast on Windows and it exposes an API aed port compliant with most of ollama's features (it can be used in local chatbot apps as an ollama server).