mirror of
https://github.com/ollama/ollama.git
synced 2026-05-05 23:53:43 -05:00
Open
opened 2026-04-12 13:49:47 -05:00 by GiteaMirror
·
97 comments
No Branch/Tag Specified
main
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-mlx-decode-checkpoints
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#3282
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jmorganca on GitHub (Jun 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5245
What is the issue?
Currently Ollama can import GGUF files. However, larger models are sometimes split into separate files. Ollama should support loading multiple GGUF files similar to loading safetensor files.
OS
No response
GPU
No response
CPU
No response
Ollama version
No response
@gsoul commented on GitHub (Aug 22, 2024):
Just in case someone would find this issue, like I did a few weeks ago, without knowing any workaround. Currently probably one of the easiest ways import multifile gguf into Ollama would be to:
./llama-gguf-split --merge mymodel-00001-of-00002.gguf out_file_name.gguffor example
./llama-gguf-split --merge Mistral-Large-Instruct-2407-IQ4_XS-00001-of-00002.gguf outfile.ggufHope this will help somebody.
@nauen commented on GitHub (Aug 23, 2024):
yes it does <3
@werruww commented on GitHub (Oct 17, 2024):
Does Obama support fragmented models? Important: They must be merged before running in Olama by modelfile
@werruww commented on GitHub (Oct 17, 2024):
lama cpp and llama cpp python can run multy part
ollama can run it???????????????
@werruww commented on GitHub (Oct 17, 2024):
/content# ollama run hf.co/goodasdgood/dracarys2-72b-instruct
pulling manifest
Error: pull model manifest: 400: The specified repository contains sharded GGUF. Ollama does not support this yet. Follow this issue for more info: https://github.com/ollama/ollama/issues/5245
/content#
@mitchross commented on GitHub (Oct 24, 2024):
https://x.com/reach_vb/status/1846545312548360319
@ahmetkca commented on GitHub (Nov 11, 2024):
Having this issue with
Qwen/Qwen2.5-Coder-32B-Instruct-GGUF:Q8_0@rar0n commented on GitHub (Nov 12, 2024):
Try what gsoul above suggests!
I just did and it worked for qwen2.5-coder-14b-instruct-q5_k_m-00001-of-00002.gguf. Didn't have to specify any other input files.
(Thanks gsoul!)
Still, it'd be nice if ollama could do this natively ofc.
@Kamayuq commented on GitHub (Nov 14, 2024):
Even though the workaround works it is a very cumbersome especially if you have to update the model. And you also have to create a manifest AFAIK. Ollama should really do this locally completely transparent for the user.
@rotvaldi commented on GitHub (Nov 17, 2024):
ugh "the falling"
@DrewGalbraith commented on GitHub (Nov 29, 2024):
For anyone else wondering how this merges all the other parts of the model or if you have to run this command for each split of the file to merge it into the new one or something, this reddit post laid it out. The gguf-split utility simply figures out the rest of the shards to merge together given the name of the first. I assume it requires they be names basically the same with only numbers deviating.
@BornSupercharged commented on GitHub (Dec 17, 2024):
In case you want a way to easily handle this, add this to your .bashrc or .zshrc file:
Usage example:
add_gguf hf.co/bartowski/EVA-LLaMA-3.33-70B-v0.1-GGUF:Q6_K
It also handles the edge case where you've copied the full command out of hugging face (strips "ollama run" out):
add_gguf ollama run hf.co/bartowski/EVA-LLaMA-3.33-70B-v0.1-GGUF:Q6_K
What it does:
After executing the command, you can run your model like this:
ollama run EVA-LLaMA-3.33-70B-v0.1-Q6_K
Verify the model information by entering:
/show info
To stop, enter:
/bye
@AlgorithmicKing737 commented on GitHub (Dec 27, 2024):
where are "bashrc or .zshrc file" located?
@BornSupercharged commented on GitHub (Dec 30, 2024):
@AlgorithmicKing in your user directory, i.e. ~/.bashrc
@ngxson commented on GitHub (Jan 16, 2025):
Upstream llama.cpp added a new API called
llama_model_load_from_splitsthat may help implementing this function on ollama. Let's hope that they will work on this on the next version!@mattapperson commented on GitHub (Jan 23, 2025):
Here is an updated version of @BornSupercharged's script that:
@sakthi-geek commented on GitHub (Jan 23, 2025):
Why is this not natively supported yet? Is it being worked on for future updates?
@renato-umeton commented on GitHub (Jan 26, 2025):
Please work on this guys <3 we love ollama and want to continue using it!
ollama run hf.co/unsloth/DeepSeek-R1-GGUF:Q4_K_Mpulling manifest
Error: pull model manifest: 400: The specified repository contains sharded GGUF. Ollama does not support this yet. Follow this issue for more info: https://github.com/ollama/ollama/issues/5245 (this page)
@AlgorithmicKing737 commented on GitHub (Jan 27, 2025):
I 100% agree with you but if you want to pull the full version of deepseek r1 it is already on ollama you can run
ollama run deepseek-r1:671b@LeiHao0 commented on GitHub (Jan 28, 2025):
Will ollama considering unsloth/DeepSeek-R1-GGUF, the 1.58-bit + 2-bit Dynamic Quants?
@corticalstack commented on GitHub (Jan 28, 2025):
Watching this space for solutions to run lower quantized versions of DeepSeek-R1 that mere mortals can self host, e.g. 1.58-bit + 2-bit Dynamic Quants?
@Tanote650 commented on GitHub (Jan 29, 2025):
Implementing this in Ollama and giving a large number of less experienced users access to the models would be great!
@LeiHao0 commented on GitHub (Jan 29, 2025):
Good News:
I've successfully quantized and run the DeepSeek-R1 model at 1.58 bits on my M2 Ultra, using the instructions from https://unsloth.ai/blog/deepseekr1-dynamic.
Bad News:
Higher quantization levels of 1.73 bits, 2.22 bits and 2.51 bits failed to run. More importantly, even the successful 1.58 bits
model generate nonsensical output despite multiple attempts.
@corticalstack commented on GitHub (Jan 29, 2025):
Hoping some smaller DeepSeek quants / multi-file GGUF support provided soon, given the HUGE interest in this model right now.
@fserb commented on GitHub (Jan 29, 2025):
@LeiHao0 can you confirm you were able to run it with ollama? If so, can you share your Modelfile?
@LeiHao0 commented on GitHub (Jan 30, 2025):
You need to merge these split model files into a single file, then you can load it with ollama.
details in here https://unsloth.ai/blog/deepseekr1-dynamic
./llama.cpp/llama-gguf-split --merge \ DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \ merged_file.gguf@verygreen commented on GitHub (Jan 30, 2025):
@LeiHao0 don't forget to also implement this context fix if you want for the output to actually run to its completion: https://github.com/ollama/ollama/issues/5975#issuecomment-2295330804 (obviously don't use 24k context unless you have gobs of video RAM).
But overall I am having bad results with this particular quant in ollama, even thinking tags don't appear and the model seems to be rambling on endlessly until eventually cutting out
@dmatora commented on GitHub (Feb 1, 2025):
What about
TEMPLATEsection for DeepSeek Modelfile?Doesn't it need one?
@dmatora commented on GitHub (Feb 1, 2025):
Ok, found a tip at issue #8571
@mistrjirka commented on GitHub (Feb 6, 2025):
It seems that the ollama does not support IQ1 bit quantization. So for the 1bit quantization to work maybe some update to ollama is needed. But it is weird because llama.cpp supports the one bit quantization. The ollama errors out on wrong magic number.
Is there some technical explanation why implementing multi-file gguf is difficult? Or why does ollama does not suppport 1 bit quantization? Is that lack of developers having time for this or something more fundamental?
@peng3502 commented on GitHub (Feb 18, 2025):
There are 10 files for DeepSeek
DeepSeek-R1-Q5_K_M-00001-of-00010.gguf DeepSeek-R1-Q5_K_M-00005-of-00010.gguf DeepSeek-R1-Q5_K_M-00009-of-00010.gguf
DeepSeek-R1-Q5_K_M-00002-of-00010.gguf DeepSeek-R1-Q5_K_M-00006-of-00010.gguf DeepSeek-R1-Q5_K_M-00010-of-00010.gguf
DeepSeek-R1-Q5_K_M-00003-of-00010.gguf DeepSeek-R1-Q5_K_M-00007-of-00010.gguf
DeepSeek-R1-Q5_K_M-00004-of-00010.gguf DeepSeek-R1-Q5_K_M-00008-of-00010.gguf
An error throwed out after llama merge.
llama-gguf-split --merge DeepSeek-R1-Q5_K_M-00001-of-00010.gguf DeepSeek-R1-Q5.gguf
gguf_merge: DeepSeek-R1-Q5_K_M-00001-of-00010.gguf -> DeepSeek-R1-Q5.gguf
terminate called after throwing an instance of 'std::__ios_failure'
what(): basic_ios::clear: iostream error
Does any one else have a solution?
@MaoJianwei commented on GitHub (Feb 21, 2025):
This works for me! Thanks!
@thalesluoyx commented on GitHub (Mar 4, 2025):
I face the smiliar issue, the the model generate nonsensical output. Even that there is only one .gguf file donwloaded. May I know if your problem had been solved? Thanks
@renato-umeton commented on GitHub (Mar 4, 2025):
For prototyping, I ended up switching from Ollama to LM Studio
https://lmstudio.ai/
For prod, I'm still on Ollama
🤷
On Tue, Mar 4, 2025, 4:35 AM thalesluoyx @.***> wrote:
@PaulGilmartin commented on GitHub (Mar 12, 2025):
Hi all,
I am attempting to merge the gguf files from a hugging face DeepSeek V3 download. Using llama-gguf-split as follows:
I encounter the following error:
The files were downloaded by cloning https://huggingface.co/unsloth/DeepSeek-V3-GGUF. This is the content of the DeepSeek-V3-GGUF/DeepSeek-V3-Q2_K_L folder:
Does anyone know what's going wrong here/ how to fix this? Thanks in advance!
@MrEdigital commented on GitHub (Mar 27, 2025):
More and more models are going the sharded route, exclusively. This needs to be addressed soon.
@mcDandy commented on GitHub (Apr 5, 2025):
It is highly needed. I want to use a custom gemma 3 but vision tower is separate. Llama.cpp does not help to merge vision into the model.
@1472583610 commented on GitHub (Apr 17, 2025):
I plus-one this. Ollama needs to support this natively. Aside from it being an unreasonably large amount of work to constantly download and manually merge models, there seem to be issues when running the merged models on multiple GPUs.
We have a 2x A6000 AI server and it doesn't load merged models larger than what fits on a single card (48 GB).
Support for sharded models is becoming a must.
@mNandhu commented on GitHub (May 21, 2025):
+1 - There's no mention about the incompatibility in ModelFile Docs either
@lknight commented on GitHub (May 28, 2025):
+1 Its almost impossible to do something with DeepSeek (multiple files model) in Ollama and multiple A6000.
@LastMinuteStudio commented on GitHub (Jun 8, 2025):
Relatively new to LLMs so pardon my ignorance. I've tried merging and running the split GGUFs especially on the latest R1 from unsloth https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/UD-IQ1_S.
The way I run it is through their recommended command
./llama-server --port 10000 --ctx-size 1024 --n-gpu-layers 40 --model a:/Deepseek-R1-0528-UD-IQ1_S-Merged.gguf./llama-server --port 10000 --ctx-size 1024 --n-gpu-layers 40 --model a:/DeepSeek-R1-0528-UD-IQ1_S-00001-of-00004.ggufThey work both merged and split in llama.cpp using llama-server.exe.
I've then tried creating a Modelfile for the merged gguf in ollama
I create the model using
ollama create DeepseekR1 -f ModelfileHowever when I run it through ollama, I keep getting an error telling me that the memory required is insufficient even though llama.cpp has no issues with loading it slowly.
Error: model requires more system memory (164.9 GiB) than is available (94.4 GiB)Is this due to the gguf being split or some other reason? The merged gguf works fine in llama.cpp though
edit: I've increased my swap size just to get past the memory error but now I get the following
llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer@MrEdigital commented on GitHub (Jul 22, 2025):
This doesn't appear to be given the weight it deserves. It's now been more than a year since this was raised.
@giorgostheo commented on GitHub (Jul 22, 2025):
This needs to be prioritized IMO. Shared GGUFs are the norm now. No support for them in probably the most used platform for local LLMs is bonkers...
@kappa8219 commented on GitHub (Jul 23, 2025):
Qwen Coder is coming :) Also sharded...
@giorgostheo commented on GitHub (Jul 24, 2025):
I think its fair to say that this should become priority No.1 for the dev team at this point. Not having shared GGUF support will soon make ollama unusable...
@tolysz commented on GitHub (Jul 27, 2025):
If
llama.cpphasllama_model_load_from_splitswe are almost half way... I wonder what is the desired solution... currently all the models are stored with hashed names, naive implementation could have somevirtualto the top level folder and inside all files symlinked to the hashed files, filenames should follow the splits pattern... otherwise, the model could accept the list of splits and skip the symlinkingedit:
We just need to provide all the hashed files as list... the filename itself is irrelevant...
@xNefas commented on GitHub (Aug 10, 2025):
Just gonna add to the "noise" and say this should be a priority, it's really rough being unable to load sharded GGUFs with Ollama.
@Likkkez commented on GitHub (Aug 16, 2025):
Is it really this hard to just add two files together automatically? Is this one of those delusional ideological stances or what?
@giorgostheo commented on GitHub (Aug 16, 2025):
Honestly at this point Im not even sure this will ever be implemented. After all the fuss with GPT-OSS and the push for ollama turbo, it seems that this will be another one of those os projects that remains os mostly for show... I really hope Im wrong but tbh Im already transitioning to llama-cpp cause of this. I suggest that others do the same.
@OdinVex commented on GitHub (Aug 21, 2025):
? Turbo? What is that, some upsell or something? Is it time to fork Ollama?
@kappa8219 commented on GitHub (Sep 1, 2025):
"The Little Engine That Could" (c)
@bokkob556644-coder commented on GitHub (Oct 22, 2025):
ollama run hf.co/asdgad/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M-GGUF:Q4_K_M
https://huggingface.co/asdgad/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M-GGUF
Error: pull model manifest: 400: {"error":"The specified repository contains sharded GGUF. Ollama does not support this yet. Follow this issue for more info: https://github.com/ollama/ollama/issues/5245"}
@kappa8219 commented on GitHub (Oct 27, 2025):
Welcome to the club
@SvenMeyer commented on GitHub (Nov 8, 2025):
nobody looking into this ?
@slenderq commented on GitHub (Nov 9, 2025):
It would be great to run this model in ollama!
@cvrunmin commented on GitHub (Nov 13, 2025):
not a golang user tho, from my glance on the code, there is only one place which actually call model load function using llama.cpp here:
8a75d8b015/llama/llama.go (L259-L309)at L303,
C.llama_model_load_from_fileis called to load model, which the implementation is here:8a75d8b015/llama/llama.cpp/src/llama.cpp (L304-L325)Yes, there is a function named
llama_model_load_from_splitsthat should be able to load split files!This function requires a list of split file paths as the parameter. Thus we have to know the correct order of the model split files (assume the function that actually load them don't guess how they are split). We might need to add fields in Modelfile to provide this information. This information should be provided by HF too when ollama is ready for multifile gguf (they only provide a list of hash-named blob with blob type afaik)
@tolysz commented on GitHub (Nov 13, 2025):
The challenge is in the config files, as the files are renamed to some-hash... the config file needs to support storing a list of the filenames... like the type of file is not a string but a list of them.
@FearL0rd commented on GitHub (Nov 14, 2025):
Looks like Ollama has more focus on their cloud instead of working on this
@OdinVex commented on GitHub (Nov 14, 2025):
So it seems the only thing holding it back is a config-representation of split-files and a simple 'if split call load-splits' instead?
@rdeforest commented on GitHub (Nov 14, 2025):
One of the many great things about open-source projects is that everyone gets to work on whatever they want to work on. I bet if you put together a quality PR to address this issue, the team would consider merging it. Or if you don't want to wait you could just maintain your own fork.
If you don't want to help, that's fine too. Just don't complain about the priorities of volunteers please?
@OdinVex commented on GitHub (Nov 14, 2025):
I think extension of Ollama manifests to describe split GGUFs (and their order) is necessary, first. Perhaps any Ollama developer could chime in for that?
@Mikec78660 commented on GitHub (Nov 14, 2025):
EDIT: Leaving this in case anyone else has this problem. Seems from the bash script that was posted you can then run:
ollama create GLM-4.5-Air-Q4_0.gguf -f GLM-4.5-Air-Q4_0.gguf.modelAnd voilà, the model is showing up in ollama, no need to use the import in openweui which doesn't seem to work.
I use the llama.cpp method to combine my model:
And it seemed to work:
But trying to import "GLM-4.5-Air-Q4_0.gguf" into ollama after a minute or so I get an error saying error parsing the body. Any idea what I am doing wrong?
@OdinVex commented on GitHub (Nov 14, 2025):
This Issue is about importing shard GGUF files, not about llama.cpp. But on a side note, I've never had llama.cpp produce a file that didn't end up in gibberish or corrupt output.
@shimmyshimmer commented on GitHub (Nov 14, 2025):
We make smaller versions specifically for Ollama, see: https://huggingface.co/unsloth/MiniMax-M2-GGUF/blob/main/MiniMax-M2-UD-TQ1_0.gguf
Usually we do these non-sharded files for any model under 300B parameters or so. But it is very small and 1.77-bit ish
@cvrunmin commented on GitHub (Nov 17, 2025):
I only focused on llamarunner and didn't recognize that ollama has its own runner (ollamarunner) which is a different story. This thing will load models at
ml/backend/ggml/ggml.gowhich really not supported multi-file gguf.Anyway the modification of config format to support file path of sharded model in correct order is still necessary.
@giorgostheo commented on GitHub (Nov 17, 2025):
Hi Michael. Since ollama teams seems to really not care about the shared GGUF thing, it would be great if we got more of those "merged" exports for larger quants. For GLM4.6 for example, something like Q3 would be great. I understand that it is stretching it size-wise, but for us ollama users its the only way to go for now.
Thanks for all your work.
@OdinVex commented on GitHub (Nov 17, 2025):
To the best of my knowledge splits always have their filenames suffixed (before extension) with a format of
splitNumber-totalSplits. That's probably the only assumption that could be made about order. Maybe the backend doesn't care about order and loads them fine.@cvrunmin commented on GitHub (Nov 18, 2025):
If the multi-file GGUF model is created by user using
ollama create -f Modelfileand the filenames of split ggufs are nicely named asxxxxxx-00001-of-00003.ggufthen this is what your case. However, when the model is pulled from Internet, we only have the hash of the file.For example, this is the manifest of gpt-oss hosted on ollama registry (
https://registry.ollama.ai/v2/library/gpt-oss/manifests/latest):In GGUF metadata of split GGUF, we have the split file information
split.no,split.tensors.countandsplit.count. This is where llama.cpp check if the split file is provided in order:584e2d646f/llama/llama.cpp/src/llama-model-loader.cpp (L526-L573)In worst case scenario, we can cache the ordering from the metadata when the model is first created or pulled, then changes towards the config spec could be minimal.
@shimmyshimmer commented on GitHub (Nov 21, 2025):
Even though this is possible it might not be the best idea because if one file breaks or the internet gets cut off, you'll need to redownload the hundreds of GB again. It might be fine if you have good internet but around 50% of people have very slow internet :( But we'll see what we can do - it will be confusing for users to navigate which is which
@eurekin commented on GitHub (Dec 9, 2025):
This still a thing?
Happy birthday to the issue I guess
@FearL0rd commented on GitHub (Dec 9, 2025):
Now all the focus is on cloud
@johnml1135 commented on GitHub (Dec 17, 2025):
I had a similar issue so I spun up my own tooling to adapt poor-fitting GGUF models into ollama by reworking the top layers - https://github.com/johnml1135/ollama-copilot-fixer.
@giorgostheo commented on GitHub (Dec 23, 2025):
Hey,
With GLM4.7 out, would it be possible that you upload a single gguf for the main config (q4_K_M or whatever it is)? You can add some sort of id like "mono" or "single" to make sure that users are not confused. I know it's not pretty, but it's an easy way to solve the complete lack of compatibility with ollama and allow tons more to use the newest and best models!
Keep up the awesome work.
@scorpion7slayer commented on GitHub (Jan 30, 2026):
I have the problem with kimi k2.5 will this be added in the future?
@boomam commented on GitHub (Feb 20, 2026):
Its amusing that they went to the effort of changing the output of the Ollama error to reference this issue. :-p
@cvrunmin commented on GitHub (Feb 21, 2026):
While such error messages is more likely provided from Huggingface side and not from ollama itself (same error message can be triggered by trying to access the manifest file of the model in browsers), it is still very amusing that more issues have been marked duplicate of this issue, meaning that some contributors know that this issue exists, but an pull request that claims that can solve this issues is now about four months old with zero comments from any contributors. Neither "this solution looks good!", nor "this solution is not good".
@OdinVex commented on GitHub (Feb 21, 2026):
Anyone know of an alternative to Ollama?
@SvenMeyer commented on GitHub (Feb 22, 2026):
@OdinVex I switched to LMstudio which does not have this problem and has a nice GUI as well.
Actually, I would not be surprised if you would find it much better in every respect. Also LMstudio contines to provide a solid basis to run AI models locally which is the whole point of ollama/LMstusio, while ollama now diverted to become just a proxy to online AI models.
@OdinVex commented on GitHub (Feb 22, 2026):
Doesn't appear at all to be an alternative, though. Ollama's use at the moment is container-supported for network-based interactions.
@elkay commented on GitHub (Mar 2, 2026):
How is this an issue still 2 years later? Isn't it as simple as combining the split files and using the single file after download? I know you can use llama.cpp to combine the files if you manually download them, but it's really unclear how you would then manually add that file into ollama manually. Makes no sense why the Ollama team is dragging their feet on just supporting the combine in the internal download process itself.
@FearL0rd commented on GitHub (Mar 3, 2026):
looks like the focus today is Ollama Cloud
@alexanderjacuna commented on GitHub (Mar 10, 2026):
Running into this issue as well with: hf.co/unsloth/Qwen3-Coder-Next-GGUF:UD-Q8_K_XL
This bug has been open since June of 2024, but lets put in a error message that references this issue with no traction after 2 years.
@SvenMeyer commented on GitHub (Mar 10, 2026):
I found a solution and it is pretty easy, also adds at lot of other features and usability at the same time : use LMstudio.
@OdinVex commented on GitHub (Mar 10, 2026):
Not a viable solution to those needing a drop-in replacement for software that specifically depends upon Ollama.
@alexanderjacuna commented on GitHub (Mar 10, 2026):
My setup doesn't allow for this unfortunately.
@SvenMeyer commented on GitHub (Mar 11, 2026):
@OdinVex @alexanderjacuna what software is so tightly coupled to ollama that you can not replace it with another inference service? At the end it should be just an IP and port and even that you could set the same way.
Also, if you prefer CLI and do not need the GUI, just use llama.cpp
@OdinVex commented on GitHub (Mar 11, 2026):
Several, but most commonly Open-WebUI.
@FearL0rd commented on GitHub (Mar 14, 2026):
I have a drop-in Solution.
I've built a solution called Ovllm. it's essentially an Ollama-style wrapper, but for vLLM instead of llama.cpp. It's still a work in progress, but the core downloading feature is live. Instead of pulling from a custom registry, it downloads models directly from Hugging Face. Just make sure to set your HF_TOKEN environment variable with your API key. Check it out: https://github.com/FearL0rd/Ovllm
Ovllm is an Ollama-inspired wrapper designed to simplify working with vLLM.
@OdinVex commented on GitHub (Mar 14, 2026):
If it doesn't have complete feature-parity and speak Ollama so other software would integrate then it's not a drop-in solution. Considering the README has enough spelling/grammar issues I'm gravely concerned it's AI, or at the very least unpolished. Good luck with your project, but it's not a drop-in solution.
Edit: Considering Ollama uses llama and llama supports shards I'd wager it'd be better to probably PR it (at least for now, until Ollama is forked by someone that cares about shard-support and more).
@FearL0rd commented on GitHub (Mar 14, 2026):
Thx. This is the first release, and it will become more mature over time. It works for my needs with OpenWebUI and custom apps. It also merges the .gguf (it works with safetensors also, only pass the hf location ie. google/gemma-7b-it)
@Mikec78660 commented on GitHub (Mar 20, 2026):
llama.cpp in router mode should works exactly like ollama now. and can use multi part gguf files.
@OdinVex commented on GitHub (Mar 20, 2026):
So software like Open-WebUI can (without any changes except IP address and port) speak to it as if it were Ollama, even with the Ollama-specific code? And there's an official container for it as well? Not seeing it at all, so... Edit: See my lower post about how this went (failure, does not at all work like Ollama).
@Mikec78660 commented on GitHub (Mar 23, 2026):
@OdinVex yes.
A very minimal implementation is:
llama-server --host [0.0.0.0, or hostname] --port 8080 --models-dir /mnt/AIif you do this and create a connection in openwebui to [ip or dns name]:8080/v1, it will give you any model in the /mnt/AI directory as an option in openwebui.
Even better is using a config.ini file where you can set the setting for each model
llama-server --host 0.0.0.0 --port 8080 --models-dir /mnt/AI --models-preset config.iniThis will allow you to set a custom ctx size, kv cache setting, etc.
@OdinVex commented on GitHub (Mar 24, 2026):
Edit: I see, you meant an OpenAI endpoint, not an Ollama endpoint. Will try and report back if it works.
Edit: It does not work, unfortunately. Trying to download just results in the API reporting 404.
@raro42 commented on GitHub (Mar 24, 2026):
.
@CleyFaye commented on GitHub (Mar 25, 2026):
We already know about the issue, and the general outline of what should be done.
I don't see the value of regurgitating the existing discussion, especially to end on the suggestion to "do what was proposed, then tests and docs".
@raro42 commented on GitHub (Mar 25, 2026):
@CleyFaye sorry for disturbing. Will edit the comment. Thanks for commenting.