mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 08:02:14 -05:00
Closed
opened 2026-04-28 05:15:00 -05:00 by GiteaMirror
·
57 comments
No Branch/Tag Specified
main
parth-mlx-decode-checkpoints
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#47762
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jmorganca on GitHub (Jan 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2187
Originally assigned to: @dhiltgen on GitHub.
@dhiltgen this will be of interest to you
@dhiltgen commented on GitHub (Jan 25, 2024):
At present we're compiling the GPU runners with some of the matrix CPU features turned on which is the likely cause of this. I'll explore removing that and run performance tests to see if it has a negative impact.
@JadenSWang commented on GitHub (Jan 26, 2024):
It is quite exciting to see the errors I'm over here eating glass over being asked 20 hours earlier, guess I'm on the right path, any ideas on when this may be resolved? I'm on docker.
@dhiltgen commented on GitHub (Jan 26, 2024):
At the very least, we should detect this scenario and not load the library which will crash, and fallback to CPU to remain functional.
@dhiltgen commented on GitHub (Jan 26, 2024):
Until this is resolved, you can force CPU mode https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#llm-libraries
@dhiltgen commented on GitHub (Jan 26, 2024):
With #2214 we'll at least fallback to CPU mode and not crash. A warning in the server log will help users understand why we didn't even try to use their GPU (if present) and are running slow.
@JadenSWang commented on GitHub (Jan 27, 2024):
Wait so does this mean if I have GPUs and get this error is it that a) my GPUs are not configured properly and b) my GPUs wont be used and instead CPU will be?
@JadenSWang commented on GitHub (Jan 27, 2024):
@dhiltgen I'm not sure this is resolved, I'm still getting the same error:
running: ollama/ollama:0.1.22
@JadenSWang commented on GitHub (Jan 27, 2024):
I just fixed it by enabling AVX in proxmox but this seemed to still crash without AVX support
@dhiltgen commented on GitHub (Jan 27, 2024):
The fix to fallback to CPU mode when we detect no AVX support and not even try to load the GPU library was merged after we shipped 0.1.22, so it will show up in 0.1.23 when that ships.
@dhiltgen commented on GitHub (Jan 27, 2024):
To clarify how this works: We compile multiple variations of the LLM native library. In particular for your scenario, we currently compile a single CUDA library and that library is compiled with AVX extensions turned on. This helps improve performance when the entire model doesn't fit on the GPU (which is quite common for larger models) and we have to fallback to partially running on the CPU. AVX is ~400% faster than no AVX. However, this means that if we load that library on a system without AVX, it will crash when those instructions are executed by the process.
What has changed in 0.1.23 (not yet shipped) is detecting this scenario and rejecting the GPU library entirely and falling back to pure CPU without AVX so that we remain functional, albeit much slower, instead of crashing. This also will report a warning in the server log to help users understand that there's a significant performance penalty due to the lack of AVX.
I highly recommend enabling the vector math extensions on your CPU virtualization system where possible.
@Cybervet commented on GitHub (Jan 27, 2024):
So if the cpu has no AVX can not use cuda and GPU not matter what, even after compilation from source?
@JadenSWang commented on GitHub (Jan 28, 2024):
@Cybervet yes it seems GPU support requires the AVX instruction set, luckily a lot of modern CPUs support it: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
@dhiltgen commented on GitHub (Jan 28, 2024):
AVX has been around for ~13 years and I'm not aware of any modern x86 CPU that doesn't support it. The intersection of 14+ year old CPUs and a similar vintage GPU that's supported by CUDA or ROCm and useful for LLM tasks seems unlikely. The more likely scenario is a virtualization/emulation system where it's masking out those features for portability, and given the massive performance hit by not using these features of the CPU, we recommend trying to enable them. We'll at least be functional in 0.1.23, just slow.
@Cybervet to answer your question about building from source, we don't currently optimize our build configuration for this scenario but if you do have a situation that call's for this combination (CUDA support without AVX) modify the default flags we use to build llama.cpp here and take a look at the CUDA section further down in that file.
@Cybervet commented on GitHub (Jan 29, 2024):
Well I have a couple of HP Z800 workstations with dual XEON X5680 (12c/24T) with a 128GB ram running proxmox and I am running ollama in a linux container. The X5680 is a 2010 cpu without AVX , so I thought to use my RTX 3060 12GB on the machine to speed up llms with cuda. The cpu is old but the GPU is new. So far I have not managed to compile with custom flags nomatter what I tried, it works but in cpu only mode. Any ideas?
@dhiltgen commented on GitHub (Jan 29, 2024):
@Cybervet the one other change you'll need is to alter the gpu detection logic to bypass the fairly recent check we added to skip GPUs on non-AVX systems - https://github.com/ollama/ollama/blob/main/gpu/gpu.go#L133
@Cybervet commented on GitHub (Jan 29, 2024):
Is this the only change in the gpu.go (it doesn't seem to work) or we should also add changes to cpu_common.go I just want to see what the situation will be with no AVX and a capable GPU.
@dhiltgen commented on GitHub (Jan 29, 2024):
@Cybervet I believe the two changes you'll need to make are the compile flags and the gpu.go changes, but I haven't tested this scenario. You can set OLLAMA_DEBUG=1 to get more logs in your experiments to understand the flow better.
@dbzoo commented on GitHub (Feb 1, 2024):
I too, ran into this problem - these changes worked for me.
45eb104849@JadenSWang commented on GitHub (Feb 2, 2024):
@Cybervet my understanding is that you cannot use GPUs with Ollama if you don't have AVX support.
@dhiltgen commented on GitHub (Feb 16, 2024):
@khromov was pointing out you can purchase fairly recent CPUs that intel has chosen not to include AVX features in, so unfortunately there are ~modern systems out there that fall into this scenario. I'm still concerned that the performance is going to be really bad if you can't fit 100% of the model into the GPU.
I think what probably makes the most sense for this one is to refine our build scripts to make it much easier for users to build their own copy of ollama from source that disables AVX and other vector extensions for all build components.
@navr32 commented on GitHub (Apr 4, 2024):
Hello i have the same problem with my Z800 2* x5675 Xeon so 24thread and 96gb of ram
and One RTX3090 FE 24GB of Vram.
I am running well with llamacpp without any issue at more than 30Tok/s on models fit in Vram... and i have test previously build of llamacpp with vulkan with an old Amd RX480 give me 2tok/s .
So i want to have my rtx3090 used by ollama too now..but failed.
But last ollama git Says no AVX so unable to use the GPU but this work with llamacpp so must with ollama..
so i have find the commit of
45eb104849clone the dbzoo branch and build ok . (build On manjaro latest stable 6.6.19-1-MANJARO #1 SMP PREEMPT_DYNAMIC and Cuda 12.3.2-1. gcc (GCC) 13.2.1 20230801 and nvcc cuda_12.3.r12.3/compiler.33567101_0).
With the dbzoo version i have 3.67Tok/s to max 4Tok/s with nous-hermes2-mixtral:8x7b-dpo-q4_K_M 26GB model and 20 thread set with : /set parameter num_thread 20....
And the prompt eval very very better with this version using the gpu on old processor and give
prompt eval rate: 108.09 tokens/s ..without gpu ..just 2.68 Tok/s.
So could you merge the commit of dbzoo to the last dev branch to let people with old processor improved with new gpu to be able to be happy with this , or people with new GPU that INTEL have crashed the instruction set with omitted avx but people which have good gpu be happy to use their gpu without this baddest version processor. !
Many many thanks to all for this wonderful dev of olllama . Have a nide days.
@angiopteris commented on GitHub (Apr 9, 2024):
You need to enable your cpu to "host" mode to allow AVX instruction to be passthrough
@apunkt commented on GitHub (Apr 18, 2024):
You can force GPU compilation from source be editing /gpu/cpu_common.go line 20:
-> '''return "avx"'''
then compile with custom options:
OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all-major -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_F16C=off -DLLAMA_FMA=off"
it still will complain:
level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
but it will run:
level=INFO source=server.go:125 msg="offload to gpu" reallayers=33 layers=33 required="5888.5 MiB" used="5888.5 MiB" available="6454.3 MiB" kv="1024.0 MiB" fulloffload="560.0 MiB" partialoffload="585.0 MiB"
so far didn't recognize any negative effects...
@dhiltgen commented on GitHub (Apr 18, 2024):
@apunkt if you want to have a go at making a PR, see if you can set up a model where people can build from source and set an env var (or two) to toggle this. As you pointed out, you have to modify both the gpu variant logic, and the compile flags. For the GPU logic, take a look at how we override the version at compile time here for inspiration.
@apunkt commented on GitHub (Apr 19, 2024):
@dhiltgen did further testing.
It works for me with 1 GPU: loading bigger models, that don't fit into VRAM also works, load is splitted between GPU/CPU
It causes problems for me with 2 GPU: when model fits into VRAM using non-default num_ctx.
This causes then cudaMalloc failed: out of memory on 1st device.
i.e. codellama:7b works fine on two 3050 8GB up to 12k num_ctx, setting to 16k causes unrecoverable crash when trying llm_load_tensors: offloaded 31/33 layers to GPU
any hints? Is it related to https://github.com/ollama/ollama/issues/3711
@navr32 commented on GitHub (Apr 29, 2024):
So thanks to @apunkt for this new trick. Now i can again Re run recent version of ollama on linux manjaro .
So Before build like @apunkt say :
`
You can force GPU compilation from source by editing /gpu/cpu_common.go line 20:
-> return "avx"And force avx=off like says @dbzoo in line 54 off linux gen llm/generate/gen_linux.sh :
and build with :
So for little testing i test ollama run llama3:70b-instruct-q4_K_M 42GB on the oldier than previous i was speaking of Z800 2* X5570 Xeon so 16 threads and here less ram ..32gb of ram so less than the model size tested here but Two RTX3090 FE 24GB of Vram..so all the models fit in the total amount of Vram..
ollama serve :
This setup give :
Gpu usage when generation occurs :
So i think it's very nice result. And give the Gpu Runner viable alternative even without AVX processor and to be stay in the main of ollama. Thanks to all.
@galets commented on GitHub (May 2, 2024):
I was able to build and run ollama on older device without AVX using abovementioned method. Thanks!!!
@lenhardtx commented on GitHub (May 12, 2024):
It works from me.
HP DL360 G7
Proxmox 7.4-3 - Kernel 6.2.11-1-pve-relaxablermrr -> https://github.com/Aterfax/relax-intel-rmrr (pci passthrough)
VM: Oracle Linux 9.4
CUDA 12.4
NVidia A4000
@andydvsn commented on GitHub (May 14, 2024):
Just to note on a MacPro5,1 with X5690 CPUs and AMD Radeon VII GPU running Debian Bookworm, attempting to run any model drops out with "Error: llama runner process has terminated: signal: illegal instruction (core dumped)".
This is version 0.1.37 and there's no graceful fallback to CPU only. I'm brand new to Ollama and can't do any more troubleshooting this evening, but will keep an eye on things. Just wanted to flag this X5690 / AMD combination, as I think most solutions above are Nvidia focused.
@dhiltgen commented on GitHub (May 14, 2024):
@andydvsn can you open a new issue for your scenario? We don't currently support GPUs on Intel macs (tracked via #1016) however it shouldn't crash.
@andydvsn commented on GitHub (May 14, 2024):
Apologies, I should have been clearer - this is a Mac Pro system, but it is not running macOS. This is on Debian Bookworm and the Ollama installation with the Linux instructions proceeded perfectly, including the detection of the GPU and download of AMD dependencies. It's essentially a Xeon workstation PC at this point, just in a silver box with an Apple logo on it.
@mii-key commented on GitHub (May 30, 2024):
Firstly, thank you for the exceptional product!
Special thanks to @apunkt for the valuable tip.
For Windows systems, use "-DLLAMA_AVX=OFF" and "-DLLAMA_AVX2=OFF" in
$script:cmakeDefsin thellm\generate\gen_windows.ps1script file, functionbuild_cuda().Regarding older CPU models:
While a 13-year argument holds merit, it's important to acknowledge that these old processors remain quite capable on servers and workstations today. Consequently, I believe many users would appreciate support for CPUs without AVX in your remarkable product.
@dhiltgen commented on GitHub (Jun 1, 2024):
PR #4517 lays foundation so we can document how to ~easily build from source to get a local build with different vector extensions for the GPU runners. Once that's merged this issue can be resolved with developer docs.
@hithold commented on GitHub (Jun 10, 2024):
There are still many modern processors without avx instructions, for example: Celeron G5900 and Pentium G6600 (Lga1200) are only 3 years old.
It would be great to be able to run llm on gpu on these modern systems without avx1.
@AlexDeveloperUwU commented on GitHub (Jul 14, 2024):
Hey! Any updates in this?
I'm trying to follow the instructions and modifications that need to be done, but when I make a build, it only builds the "ollama_llama_server", so I'm confused.
What should I do? Can anyone share it's build?
This is the PC that I'm using:
It's a very old CPU, I know, but it's a test server and I wanted to try some AI on the gpu to test it's capabilities.
Thanks!
@navr32 commented on GitHub (Jul 14, 2024):
Nvidia quadro k620 is too old to give you interesting result with ollama. You must have a minimum Vram size with about 20% more of the size of the model to have decent usage. So for example llama3 8B Q4 is 4.7GB so..need minimum vram size 6Go..to be about running ok on this old computer. And you must read the requirement of nvidia cuda version..and so and so.. After you have found if you have the requirement for build ok on you hardware perhaps little models size can run ok for example Qwen2 0.5b or 1.5b..935Mb and 352Mb..try all of this and tell us..but verify your Cuda version..
@AlexDeveloperUwU commented on GitHub (Jul 14, 2024):
I finally managed to get it working. Initially, the commands weren't executing properly, but I resolved that issue.
The Nvidia Quadro K620 works well with the Qwen 0.5b and Qwen 1.8b models. Both models generate AI responses quite quickly.
I made sure to update CUDA and the Toolkit to the latest available versions. Currently, I'm using Driver Version: 550.100 and CUDA Version: 12.4.
Given that this setup is primarily for testing purposes, I’m not concerned about running only small models. The goal is to learn and perform very basic tasks, and these smaller models are sufficient for that. So far, they've been working great.
@navr32 commented on GitHub (Jul 15, 2024):
Perhaps this be good you publish here the command you have use for help others ubuntu users wanting try this setup like you on this old type of hardware. Have a nice day.
@navr32 commented on GitHub (Aug 1, 2024):
For use with the latest v0.3.2 some very simple change have been done to :
so now change cpu_common.go line 15 with :
And gen_linux.sh line 54 with :
@Pesc0 commented on GitHub (Sep 26, 2024):
Please remove this artificial restricion. llama3.1 8B works perfectly fine with my my setup:
it would have saved me one full afternoon of trying to compile and manually installing ollama (and i'm lucky i'm not a beginner, imagine someone who is new to all of this)
@brycetryan commented on GitHub (Sep 27, 2024):
This worked perfectly thankyou. I have a cheap Celeron from a few years ago without AVX/2 but 5x 1060's. Inference now runs perfectly fine with multiple gpus
@digitalspaceport commented on GitHub (Oct 22, 2024):
Would love to see support for NVIDIA GPUs remain on the N3150/N3350/N3450 CPU that does not have AVX/2 chip support. These are popular chips for modernish SBC space which can have PCIe slots that do function with bus powered cards like the P2000. Zimaboard/Zimablade being one decent example I ran into this on.
@shkron commented on GitHub (Nov 9, 2024):
In the meantime, is there a similar "hack" to get it to work from the source code, similar to the earlier suggestions? Since the migration to go runner, I am no longer able to locate "gen_linux.sh", neither its migrated analog.
cc: @navr32 , @brycetryan , @dhiltgen
@chris-hatton commented on GitHub (Nov 16, 2024):
As others have proven - it's technically unnecessary to require AVX support where only GPU inference is needed.
I hope one or more of the LocalAI maintainers would consider looking into this and unlocking the App for, probably, a small but significant bracket of users.
I'm running an HP Workstation as a home server, wth dual Intel(R) Xeon(R) CPU X5570 @ 2.93GHz. In spite of being older, it's a relatively powerful server computer and just the kind of target for GPU inference.
@tobiasgraeber commented on GitHub (Nov 17, 2024):
Subscribing. How to get around this AVX-restriction?
Flags seem to be upcomming with
docs/development.md(
7d686a38e9 (diff-97db29a7915320e63d41d38a0440360a87055ee8ed03757aa263116dbbb4aabe)) . ? Any additional docs or plans to auto-detect/handle this in the regular release as well?? Thx!@SplendidAppendix commented on GitHub (Nov 24, 2024):
For those looking to run Ollama without the AVX flags, I have been running the 0.3.2 version according to these instructions with success. Still waiting for the new merges.
@roycwalton commented on GitHub (Nov 24, 2024):
Subscribing. I am running a P2000 with dual Xeon E5530 without avx support; I have Ollama running in a container; I can pass ENV easily enough but unsure how to compile with the above line fixes.
@osering commented on GitHub (Dec 2, 2024):
Right now version 0.4.7 is out, the last advice how to circumvent this bug with AVX-less CPUs is for 0.3.2, with pretty substantive changes (including program structure changes, file deletions, new file inclusions) from versions 0.3 to 0.4.
Therefore advice from navr32 is no longer fully usable.
There is advice for same situation Running Ollama 0.3.12 on multiple-GPUs without AVX/AVX2 , but also incomplete.
You can clone respective 0.3 version, for instance last from 0.3 series - 0.3.14.
git clone --branch v0.3.14 https://github.com/ollama/ollama
Got file:
https://github.com/ollama/ollama/blob/v0.3.14/llm/generate/gen_linux.sh and change line 55, replacing "-DGGML_AVX=on" with "-DGGML_AVX=off" and the second file https://github.com/ollama/ollama/blob/v0.3.14/discover/cpu_common.go and in line 20 replacing "return CPUCapabilityNone" with "return CPUCapabilityAVX"
Installed cmake and go-golang.
Problem that in in 0.3 versions there is no makefile, so no use of make command. As well it not clear what is the right go generate/ go build command to compile. Used these: go generate ./... and go build -tags cuda12 but no working ollama executable were made (gave errors related to ggml, after copied to /usr/local/bin). Is there some env variables shall be used?
Could somebody explain (while there is no respective flag implemented) - what is the right procedure to compile on linux (ubuntu 24.10, amd64, nvidia-560 driver, cuda 12.6) this modified code and install it. What should be done with files libggml.so.gz, libllama.so.gz and ollama_llama_server.gz created in folders ~/ollama/build/linux/amd64/cpu|cpu_avx|cpu_avx2? What are the next steps?
Thanks in advance!
@navr32 commented on GitHub (Dec 7, 2024):
Good News for Non-AVX Processors!
Hi all! We've got very good news for everyone waiting to run Ollama on their non-AVX processors with big "GPUs"!
I'm not sure how many of you have seen the update by @dhiltgen, so i post here too for those who have not seen .
So with this change, you can now run Ollama without
AVX using easy steps. and no hack in the code.
To learn more and get started, head over to:
https://github.com/ollama/ollama/issues/7622#issuecomment-2524637378
Have a nice day and happy run !
@osering commented on GitHub (Dec 8, 2024):
AVX-less CPU runs now (and most probably can be upstreamed)! Great, but next problem arose
Great that finally AVX-less Celeron CPU is deployable. There is just some issues with installation afterwards to have right things in right folder with right permissions as there is no pre-made script for locally compiled ollama version (had to set user/group from ollama to root in ollama.service).
But there is still a problem (not sure if AVX related).
When running without env variables set, it's just identifies cpu as inference engine and still no cuda_v12, although GPUs are identified as capable.
When cuda_v12 is forced, another problem arises - although there is plenty (9 x 5.9GB) of available VRAM, it says: "gpu has too little memory to allocate any layers" and "insufficient VRAM to load any model layers" still it falls back to CPU (and memory overflow as there is not enough RAM).
Here's the (Ubuntu 24.10 x86-64 2-core Celeron with 4GB ram, testing on model: llama3.2-vision) logs:
and
What's done wrong?
@navr32 commented on GitHub (Dec 10, 2024):
@osering please give more details ! model size you load..? command line ? api ? Do you have success with others models ?
@shkron commented on GitHub (Dec 10, 2024):
A follow up question to the latest messages. How much VRAM does the llama3.2-vision 11b (7.9Gb) really need? I have the 11Gb GPU and apparently it doesn't fit with this build/patch and defaults to the CPU runner. Originally I thought that was just a bit too big to fit, now we have a similar experience. Wondering if there is a pattern with this particular model.
I am able to fit this one with no problem:
@osering commented on GitHub (Dec 12, 2024):
model used: llama3.2-vision 11b (7.9Gb), which gives message - Error: model requires more system memory (6.2 GiB) than is available (3.1 GiB)
problem is that it cannot load it and fails as refusing using VRAM and too small RAM (4GB total). Basically I run CLI: ollama run llama3.2-vision
and also tried: curl http://localhost:11434/api/generate -d "{"model":"llama3.2-vision","options":{"num_gpu":40},"prompt":"hello","stream":false}" and received: {"error":{"message":"model requires more system memory (6.2 GiB) than is available (3.1 GiB)","type":"api_error","param":null,"code":null}}
so no difference. CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 ollama serve - did not help either.
No models loading in VRAM (tried also llama3.1:8b and it gave different message, although dmesg showed the same - Error: llama runner process has terminated: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model).
Just small one (ollama run llama3.2:1b as well as :3b) run peacefully on this AVX-less Celeron 3865U CPU.
When loading these bigger models, nvtop for a second or two shows that one/some of the GPUs switches from Graphic to Compute mode and there is jump of some + 60MB occupied VRAM and then back.
New to this thing. Don't know which way to head to find out problems source.
BTW dmesg showed errors:
can these TRAP ERRORS be the source of problem? Why these are on freshly installed Ubuntu 24.10 with freshly compiled Ollama? But after reboot these were gone. Shall I reinstall?
and here is ollama serve response (no errors, but msg="Dynamic LLM libraries" runners=[cpu] makes alert):
Thanks again...
@navr32 commented on GitHub (Dec 18, 2024):
@osering Have you success to stabilize your system ? because before trying anaything you must have a running system
without any trap on all process running...here i see too many.. and this is not ollama problems ...piper ..whisper..huggingface.
Do you run in a vm ? Do you have set enough memory to the VM ? So try to control your RAM if this is not the problem search on power supply stability..processor overheating..bad kernel version ...and so and so..good luck.
After all of this if you always failed i think you have to post a new issue !
@kaplanski commented on GitHub (Dec 19, 2024):
Concerning this topic, let me just chime in with a workaround.
I tried my best to hack around the build scripts and such, but the compiled binary would always end up crashing upon loading a model (Illegal instruction, it tried doing AVX no matter what). After digging deep the workaround I found was to use Intel's Software Development Emulator (SDE) to emulate AVX. What you'd do is to download SDE, then install a stock copy of Ollama and modify the systemd ollama.service.
Change
ExecStart=/usr/local/bin/ollama serveTo
ExecStart=/path/to/intel/sde64 -hsw -- /usr/local/bin/ollama serveThis emulates an Haswell processor, including full AVX support. Loading a model takes a bit longer than normal due to the emulation overhead, but once the model has fully loaded and the first message has been sent it works and performs as expected.
According to the System Configuration section, yama needs to be disabled on Linux for SDE to work.
echo 0 > /proc/sys/kernel/yama/ptrace_scope(Make this a service as well and make ollama.service run after it.)
I tried this on an Intel Xeon X5550 (Nehalem EP/Gainestown) from 2009 w/ an Nvidia Tesla P4.
@osering commented on GitHub (Dec 22, 2024):
As I wrote, these traps were on the first run. Afterwards - no traps in dmesg.
2 types of error messages:
for ollama run llama3.2-vision
Error: model requires more system memory (6.2 GiB) than is available (3.0 GiB)
for ollama run llama3.1:8b
Error: llama runner process has terminated: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model
No VMs. Just clean regular minimal Lubuntu 24.10 on AVX-less Celeron 3865U (inbuilt IGPU) with 4GB ram and 9 x Nvidia 106-90 5.9GB GPUs.
Linux rig2 6.11.0-13-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov 30 23:51:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Nvidia proprietary driver: 560.35.03 CUDA Version: 12.6. Fresh Ollama (not upstreamed AVX-less version)
Probably have to try llama.ccp and/or kobold.ccp...
P.S. Researched also Intel's Software Development Emulator (SDE) to emulate AVX, but read that it will give significant overhead and did not try out:
https://www.intel.com/content/www/us/en/developer/articles/license/pre-relehttps://downloadmirror.intel.com/843185/sde-external-9.48.0-2024-11-25-lin.tar.xzase-license-agreement-for-software-development-emulator.html
https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html
https://downloadmirror.intel.com/843185/sde-external-9.48.0-2024-11-25-lin.tar.xz
After revelations by kaplanski may be have to give a try it as well.
Although my slow PC compiled AVX-less Ollama like 8 hours, there were later no errors like crashing: Illegal instruction or trying to use AVX version.
P.P.S. Is there any info when this AVX-less version by @dhiltgen is planned to be upstreamed?
P.P.P.S. If useful for somebody, I can share compiled ollama, olama_lama_server and libggml_cuda_v12.so
@akesterson commented on GitHub (Jan 8, 2025):
@kaplanski wrote
Brilliant. This works on a Intel Xeon L5520 from 2009 as well. The models do indeed take a while to load but performance is quite good once the model is loaded and starts responding.