mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 00:22:43 -05:00
Closed
opened 2026-04-22 12:48:20 -05:00 by GiteaMirror
·
27 comments
No Branch/Tag Specified
main
hoyyeva/anthropic-local-image-path
dhiltgen/ci
dhiltgen/llama-runner
parth-remove-claude-desktop-launch
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#31965
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @RadEdje on GitHub (Mar 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9517
What is the issue?
This seems to be happening for all my LLMs....
ollama logs that the GPU is detected (it names my rtx 4070 super). using Ollama ps.... it says 15% on cpu and 85% on gpu.... but if I look at my system tray.... my GPU vram usage dosen't budge but my CPU RAM gets full. all my LLM's also run slow... i think this could be an nvidia driver update issue?
based on this thread:
https://github.com/ollama/ollama/issues/4563
this has happened before.
I remember updating nvidia drivers recently.
Could it be the nvidia driver update?
here are the server logs:
Any advice would be much appreciated? Is it an nvidia driver issue?> llama3.2-v would run really fast before... Now it's slow.. I wondered why and then I saw it was stuck in the CPU ram....
Relevant log output
OS
Windows
GPU
Nvidia
CPU
AMD
Ollama version
ollama version 0.5.13
@rick-github commented on GitHub (Mar 5, 2025):
Apparently no GPU backends. What does the following show:
@RadEdje commented on GitHub (Mar 5, 2025):
this is what it shows.....
@RadEdje commented on GitHub (Mar 5, 2025):
the cuda folders both contain files... they're not empty.
I already uninstalled and then re-installed ollama.
To test, I tried LM studio.... and LM studio uses the GPU.... but I feel Ollama is just better.
@NGC13009 commented on GitHub (Mar 6, 2025):
我在Linux下面遇见了相同的问题,我重装ollama之后解决了
@RadEdje commented on GitHub (Mar 6, 2025):
Hello.... I tried uninstalling and reinstalling....but same issue.... I think I need to purge something that ollama might be leaving behind.... ? That affects the re install?
@NGC13009 commented on GitHub (Mar 6, 2025):
You can try the steps I did. After I read the
install.sh, followed the installation steps, I did the following to uninstall:sudo systemctl stop ollamasudo systemctl disable ollamasudo rm /usr/local/bin/ollamacp /etc/systemd/system/ollama.service ~/backup/which will be remove in re-install, if it is needed.sudo rm -rf /etc/ollamawhat i did NOT do:
ollamauser/group/etc/systemd/system/ollama.service.In addition, my CUDA environments (including cuDNN and cuBLAS, etc.) were not reinstalled, and based on the prompts during reinstallation,
install.shautomatically skipped installing the cuda environments. This is due to the fact that the relevant environments were installed by me before installing ollama. If your cuda environments were installed automatically by ollama, you may still need to find the dependencies that were once installed and reinstall them based on the contents ofinstall.sh. i guess.@RadEdje commented on GitHub (Mar 6, 2025):
i tried uninstalling and reinstalling.... same problem.....
But I just read your replay... will try these. thanks... Unfortunately I'm on windows.... not linux.....but I'll see if I can replicate what you did. gotta sleep... work tomorrow.... but i will update here what happens.... when I try your suggestion. Thanks again!
@RadEdje commented on GitHub (Mar 8, 2025):
i tried but I couldn't figure out what folders to remove on the windows side of things. Would you have some idea what your linux folders are on the windows side? all i removed was .ollama in the windows user folder aside from the folder automatically removed during uninstall..... this is really sad..... One think I noticed.... if I use a really REALLY small 2gig model..... it seems to run on the GPU.... but when I use the bigger models like deepseek 14b or the llama3.2-vision which used to fit on my GPU.... now ollama goes straight to full CPU and doesn't even touch my GPU before offloading to the CPU ram.... so 2 gig models seem to run on the GPU.... could there be a setting I need to fix?
@NGC13009 commented on GitHub (Mar 8, 2025):
您的显卡有多大显存?一些模型支持cpu/gpu同时推理,这类模型可以让部分层加载到不同的设备上,但是另外一些模型则必须使用单独的设备完成推理,无法分散部署在异构设备上面。
所以可能是显存不够,恰巧这个模型又不支持分布式推理,那就全加载到CPU上了。
您可以参考这个issue:#8509
使用这个命令将模型强制在GPU上计算
虽然使用
ollama ps仍旧看起来是运行在cpu,但是此时模型实际上由gpu进行运算。gpu会在Windows上通过共享显存处理,即使物理显存不足的时候。但是使用共享显存的速度将会非常慢,可能不如cpu推理的速度。
模型消耗的显存大小 = 模型本身的参数量 + kv cache 需要的大小,当 ctx size 设置的比较大的时候,kv cache 会吃很大的显存。如果模型本身小于显卡的物理显存,那么可以缩短上下文试试。
此外,ollama的默认并发数是 3,这会导致 ctx size 为设置的并发数倍。设置Windows的系统环境变量,将并发改为1试试?
@RadEdje commented on GitHub (Mar 8, 2025):
Thank you for your help and quick replies.
I did as you suggested....
changed environmental variable to 1.
Created a new model called llama3.2-vision:gpu and still it runs on CPU ram... This is weird since these use to all run on my GPU. I had deepseek-r1 running on top of llama3.2-vision on openWebUi.... and it all fit into my gpu.... now even llama3.2-v won't work.... even the tiny model Granite vision doesn't run on my GPU.... this is really sad.... btw I also installed CUDA toolkit.... to update my cuda after my nvidia drivers... Still nothing happend....
I'm using an RTX 4070 super with 12 gigs of vram.....
thank you again for any advice...
at this rate I will have to remove ollama and rely mainily on LM studio which is sad since LM studio still doesn't support mllama for vision models.... so llama3.2-Vision doesn't work on LM studio.... sad since if it's this slow.... running llama vision on OLLAMA cpu ram is also pointless due to how slow it has become.... it's basically unusable now locally....
still
thank you for all your help.
@rick-github commented on GitHub (Mar 8, 2025):
Server log will show why GPU is not used.
@RadEdje commented on GitHub (Mar 9, 2025):
oh sorry.... will put server logs here....
initial start of ollama from the terminal as of today.
after using Ollama run llama3.2-vision
ollama ps notes that GPU is utlizied way more than CPU but this is not reflected in the system tray..
updated server logs:
unloading the model with OLLAMA stop shows a drastic decrease in CPU ram usage
here is the updated server log after using ollama stop:
I hope this sheds some light on what is going on with my ollama installation...
I already updated nvidia drivers, and installed nvidia toolkit...
Ollama use to fully utilize my GPU (rtx 4070 super with 12 gigs vram).
based on the last line on the server, ollama is loading the mllama model to the CPU back end instead of the GPU even if ollama detects my 12 gigs of VRam...
I'm not sure what happened or which updated broke GPU support....
But i think this was before even the last OLLAMA update.... so I suspect it was an nvidia driver update?
anyway thank you all for your help.
@RadEdje commented on GitHub (Mar 9, 2025):
not sure if this will help explain what I'm going through but this is what happened in the server logs when I used the ollama run llama3.2-vision:gpu (which is suppose to force GPU use correct?).
server logs:
it still uses CPU back end. :-(
@NGC13009 commented on GitHub (Mar 9, 2025):
ollama ps以及任务管理器里面看着是不准确的,首先ollama ps看起来是显存不够加载整个模型,所以13%的层被加载到了内存中。虽然您的模型只有7.9GB,但是推理的时候,由于上下文需要更大的kvcache以及不同层中特征图占用的内存,所以可能需要超过您显卡显存大小的更多内存。然而,如果您在modelfile中指定了强制GPU并构造了新的模型,那么实际推理肯定是全GPU运算的,只不过是因为显存不够,所以使用了共享显存(也就是cpu内存的一部分). 此外,任务管理器里面看见的显存和共享显存是不正确的,不要相信任务管理器的统计结果。
您需要随便让模型先输出点什么,然后从任务管理器看cuda的使用率,cpu的使用率。
一般来说,如果cpu参与了运算,那么应该会超过90%以上,同时显卡的cuda使用率比较低(<40%)。如果确实是GPU运算100%,那么cuda占用率应该超过90%,cpu使用率不会高于20%(还需要使用cpu是因为共享显存需要cpu去操作造成的,实际上cpu没有参与模型的推理运算)
@RadEdje commented on GitHub (Mar 9, 2025):
Thank you for your reply. I tried what you said. I'm using the llama3.2-vision:gpu model version where is set the GPU to be used by force. The thing is .. this use to work before. I would see my GPU VRAM rise when running ollama. Now it doesn't. However if I use LM Studio, my VRAM rises and my CPU ram stays at a typical level when using other big models like granite vision.
I'm just wondering, if my GPU could run llama3.2-vision before on top of deepseek-r1... What changed such that ollama prefers to put it all into CPU ram now?
Thank you again for your advice
@rick-github commented on GitHub (Mar 9, 2025):
This is the problem, ollama is failing to load the backends required for GPU/advanced CPU use. There's been a couple of other reports for this (#9266, #9245) but no root cause determined yet.
@RadEdje commented on GitHub (Mar 9, 2025):
Ohhhh so it isn't just me.... Glad to know..... I hope it gets fixed... So for now just watch and wait?
Really sad, granite vision and Phi4-mini models just became useable on the latest and greatest ollama.... Guess I'll have to wait and see. Thank you....
@rick-github commented on GitHub (Mar 9, 2025):
Well, it is sort of "just you" - there have been only a few reports so it seems most users don't have a problem. Something specific to the environment of those few users is causing the issue but it hasn't been identified yet.
@RadEdje commented on GitHub (Mar 9, 2025):
Oh.... I guess I have to figure this out.... 😔 I wonder... Would rolling back Nvidia drivers work?
@rick-github commented on GitHub (Mar 9, 2025):
I don't think rolling back drivers will help - the failures to load the CPU backends in the logs have nothing to do with external drivers. It seems like there's an inability to load a file or link a DLL into the running process.
@RadEdje commented on GitHub (Mar 9, 2025):
Oh oki. Thanks for the heads up. You saved me a lot of painfall rollbacks.... I just wish I new what changed. Would totally removing everything: uninstall, remove environmental, variables, remove all downloaded models, remove .ollama folders that remain even after uninstall.... Could that work? I wonder if uninstalling leaves something behind that should be purged on windows.....
My ollama models folder is in drive D.... Since I didn't want to fill up my drive C with models.....
@rick-github commented on GitHub (Mar 9, 2025):
Un-installing, clearing variables, removing everything in C:\Users\PC\AppData\Local\Programs\Ollama and re-installing is worth trying. Removing the models shouldn't be necessary as there are no code related objects there.
@pluberd commented on GitHub (Mar 10, 2025):
I don't know if it helps you. Today I start my ollama (Linux) that was installed about six weeks ago and it tells me it uses the GPU. But in fact it was using CPU. I reinstall and now it works again. Strange.
@RadEdje commented on GitHub (Mar 12, 2025):
hello, to update....
i tried uninstall everything, removing all environmental valuables.
I even updated to version 0.6.... nothing worked....
then I saw this post somewhere.... from
https://github.com/ollama/ollama/issues/9266
by https://github.com/Hsq12138
i added this to the PATH:
THIS FIXED IT...
llama3.2-vision now fully loads onto the GPU....
here is the server log:
I hope this helps clear up anything that might be needed to fix this bug.
Thanks again everyone.
(time to try out Gemma3 !!!! multimodal! yahooooo.....)
@Jay021 commented on GitHub (Mar 16, 2025):
中文内容:
我的Ollama在运行
Ollama ps命令时,虽然显示GPU使用率为100%,但实际运行大模型时,CPU运转飞快,而NVIDIA MSI显示GPU占用率为0%,显存也未占用。我尝试过卸载并重新安装Ollama,但问题依旧。直到我看到了这个帖子:#9266
by https://github.com/Hsq12138
我在PATH中添加了以下路径:
C:\Users\PC\AppData\Local\Programs\Ollama\lib\ollama完美解决了问题,你可以试试。
英文:
When running the
Ollama pscommand, my Ollama shows 100% GPU usage, but in reality, the CPU is running at full speed while the NVIDIA MSI shows 0% GPU usage and no VRAM is being utilized. I tried uninstalling and reinstalling Ollama, but the issue persisted. Until I came across this post:#9266
by https://github.com/Hsq12138
I added the following path to the PATH environment variable:
C:\Users\PC\AppData\Local\Programs\Ollama\lib\ollamaThis perfectly resolved the issue. You can give it a try.
@RadEdje commented on GitHub (Mar 17, 2025):
yes thanks.... I tried this too. It fixed everything..... Will close this thread now.... thanks again everyone.
@mentalblood0 commented on GitHub (Nov 21, 2025):
I had the problem looking exactly the same on Arch Linux and resolved it by using
/usr/bin/ollamainstead of/usr/local/bin/ollamain systemd service