mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 08:02:14 -05:00
Open
opened 2026-04-22 03:39:55 -05:00 by GiteaMirror
·
86 comments
No Branch/Tag Specified
main
parth-mlx-decode-checkpoints
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
networking
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#26920
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @donuts-are-good on GitHub (Jan 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2006
Originally assigned to: @mxyng on GitHub.
Is there interest in implementing a rate limiter in the
pullcommand? I'm open to working on this, this is the syntax I have in mind for now:ollama pull modelname --someflagname 1024<-- this would limit to 1024 kbpsI took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader.
This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.
@tkafka commented on GitHub (Jan 23, 2024):
Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet.
How about
rate-limit?@jukofyork commented on GitHub (Jan 24, 2024):
Yeah, same here!
I'm finding
ollama pullis really killing my connection and i have to limit myself to just using it at night now...I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either
wgetorcurl? If so, then it might be good to have control over these parameter(s) too.@escaroda commented on GitHub (Jan 31, 2024):
I would do the same as
wget:@easp commented on GitHub (Feb 1, 2024):
I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers.
@akulbe commented on GitHub (Feb 16, 2024):
I would LOVE to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens.
@BruceMacD commented on GitHub (Feb 16, 2024):
Behavior here will be improved by #2221, working on getting that unblocked now
@donuts-are-good commented on GitHub (Feb 17, 2024):
We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow.
@pablo-01 commented on GitHub (Mar 17, 2024):
Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset.
@simmonsm commented on GitHub (Mar 20, 2024):
I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting.
@fermuch commented on GitHub (Mar 20, 2024):
@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).
@simmonsm commented on GitHub (Mar 20, 2024):
Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch.
@LagSlug commented on GitHub (Apr 14, 2024):
You might be able to accomplish this with a docker container
https://stackoverflow.com/questions/25497523/how-can-i-rate-limit-network-traffic-on-a-docker-container
@supercurio commented on GitHub (Apr 19, 2024):
I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency.
My Ollama instance is running on MacOS as a native app.
For now I found a workaround for my neighbor, using XCode's Network Link Conditioner, but I'm still essentially unable to browse the web on my primary machine when pulling models.
I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all.
My suggestion as a simple solution that can be implemented quickly:
ollama pullorollama runThen later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters.
This github issue suggests a rate limit which would be helpful as well, but selecting an appropriate amount of concurrent connections should do the trick just fine without resorting to manual tuning.
If Ollama is competing with something else trying to use bandwidth, like a neighbor trying to watch Netflix, it should not try to hack TCP's congestion control standards to get all the bandwidth and respect them instead.
I hope this can be addressed shortly.
@mcraveiro commented on GitHub (May 19, 2024):
Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients.
@strangehelix commented on GitHub (May 24, 2024):
Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature.
@FeyNyXx commented on GitHub (May 28, 2024):
Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :<
@metamec commented on GitHub (May 30, 2024):
Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process
on the system. (Just realised it's network-wide, not system-wide. When downloading large models, it feels like my home network is being DDoSed.)@mcraveiro commented on GitHub (May 30, 2024):
Yes, same here, I can only download models at night. Machine is unusable.
@MihailCosmin commented on GitHub (May 30, 2024):
Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues.
I had to find a solution to rate limit download speed, older solutions (wondershaper, trickle or tc) did not work for me. The only one that worked was FireQOS, in case anyone else needs it.
@LutzFassl commented on GitHub (Jun 27, 2024):
+1
@robins commented on GitHub (Jul 6, 2024):
My linux box (i5) got reliably stuck every single time I pulled a model... so +1 for the
--rate-limitfeature.Two solutions, that did help me limp on for now:
As soon as I started the fetch, I used
iotopto change theionicepriority (usingi) toidle. That made the issue completely go away in that, although the downloads were still fast, the linux system was quite usable. However, this was still frustrating since one had to type the PIDs when trying to setionicefor them (and there were a few)!Now since
Ollamaspun up multiple downloads, theionicetool didn't work for me - IIUC that's becauseioniceneeds to be run for each process. So it ended up being far simpler to just get the parent PID, and then set ionice for each of the child processes, each time I was downloading a model.@treibholz commented on GitHub (Jul 12, 2024):
I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model:
This negotiates the linkspeed of my network-interface to 10Mbit.
Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore.
@supercurio commented on GitHub (Jul 13, 2024):
I'm preparing a patch and will submit a PR to address this soon.
@Netzvamp commented on GitHub (Jul 13, 2024):
My solution for now, works fine. This docker-tc can also simulate package loss 😂
@supercurio commented on GitHub (Jul 13, 2024):
To everyone in this thread, I'd encourage you to build your own Ollama from my branch for testing purposes and report how the issue is solved. I'm curious how much of your available bandwidth is used during downloads with the new default.
For me, on MacOS, making a build was easy following https://github.com/ollama/ollama/blob/main/docs/development.md
I see that many PRs are awaiting review and merging, so I don't know how long it'll take.
However, using Ollama is so annoying until this issue is solved, I'm determined to make this fix happen.
@treibholz commented on GitHub (Jul 14, 2024):
@supercurio Works great here. I can still download with 11MB/sec on my 100mbit line here, the machine is still responsive AND I can still watch something in HD over a streaming-service.
@binarynoise commented on GitHub (Jul 18, 2024):
I can confirm as well that reducing the amount of parallel connections restores network usability (even though I patched the source by hand and not using your feature branch).
Interestingly the download speed is unaffected (maxes at 20MB/s which I think is a server limit).
@scscgit commented on GitHub (Jul 23, 2024):
I'm adding a vote, it's really disruptive when you can't even join a meeting in work due to slow internet; some users may not be able to figure out the root cause. I ran it in Docker; even after pausing container/engine, it still consumed the entire bandwidth, so I had to either wait some more or completely shut down the Docker. Note that our scenario could be using a tool like Open WebUI to download the models, so it's not enough to provide hidden CLI parameters, and we need a quick solution that we won't need to spend time googling; Ollama should properly display a warning if using it may cause such disruptions.
@enrico3 commented on GitHub (Jul 30, 2024):
I am using @Netzvamp 's solution. I added these parameters, maybe it helps someone:
The docker-tc container needs to be started before the ollama container I think, because it listens to container:start events (https://github.com/lukaszlach/docker-tc#usage)
Sometimes my download stopped because of a TLS handshake timeout. So I put the command in a loop to immediately resume it after an error:
With this command on the host the speed limit can be changed while the download is running (https://github.com/lukaszlach/docker-tc#post)
curl -d'rate=20Mbit' localhost:4080/ollama@Fluffkin commented on GitHub (Jul 31, 2024):
For those on Linux "Traffic Toll" https://github.com/cryzed/TrafficToll sort of works. But I gave up on ollama because even with that some segments get zero data for long enough to trigger timeouts, so even with a fully saturated bandwidth the download fails. :)
I'm puzzled by two things:
Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download.
What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?
@Kisaragi-ng commented on GitHub (Aug 1, 2024):
afaik, ollama pull doesn't retrieve data from huggingface. the url that being used to download models is cloudflare R2
when a download is fail you can see it's url, for example:
compared to huggingface model download:
@joelanman commented on GitHub (Aug 6, 2024):
just to note as I don't think its clearly in this thread - the issue isn't rate limiting per se - it downloads at 10mbps for me. It's that it is setting up 64 concurrent connections to do so, as per the PR here:
@igorschlum commented on GitHub (Aug 11, 2024):
@joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB.
@ShayBox commented on GitHub (Aug 17, 2024):
This reliably crashes my router and causes it to restart, it's too fast.
@numbermaniac commented on GitHub (Aug 19, 2024):
Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable.
It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging itself with the way it downloads files.
@supercurio commented on GitHub (Aug 19, 2024):
I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago.
After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch.
How to more forward from here?
Routers crashing, people unable to use their computers: that's not the expected result when using any kind of software.
@igorschlum commented on GitHub (Aug 19, 2024):
Hi @supercurio (bonjour François), your patch is here:
https://github.com/ollama/ollama/pull/5683
I know that there were issues that were more important during those weeks, like CUDA fixes, Memory and function calling.
Ollama can run easily without this patch, but if it's just a matter of approving your patch, @jmorganca could decide to do it.
@supercurio commented on GitHub (Aug 19, 2024):
Salut @igorschlum 😌
All of Ollama's the core functionalities are important, that's for sure.
Downloading model(s) is still the first action every Ollama user will take.
I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly.
Fortunately, llamafile provides a good enough alternative in that case.
@joelanman commented on GitHub (Aug 20, 2024):
Is it fixed by this?
@Fluffkin commented on GitHub (Aug 20, 2024):
It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\(ツ)/¯
@igorschlum commented on GitHub (Aug 20, 2024):
@Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time.
@joelanman commented on GitHub (Aug 20, 2024):
@igorschlum no it's changed from 64 to 16, so still a lot of connections
@robins commented on GitHub (Aug 20, 2024):
My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users.
I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes.
So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of 4 threads sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads!
@mrtysn commented on GitHub (Sep 2, 2024):
Reporting from Türkiye, I am unable to run
ollama pullduring the day due to it causing nearly all other connections on my shared Wi-Fi network to almost come to a halt. A download speed rate limit would be greatly appreciated.@igorschlum commented on GitHub (Sep 2, 2024):
@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads.
What version of Ollama do you use? On which OS?
@mrtysn commented on GitHub (Sep 2, 2024):
@igorschlum Apologies for the lack of details.
However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 ~2 days ago.
@igorschlum commented on GitHub (Sep 2, 2024):
@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.
@joelanman commented on GitHub (Sep 2, 2024):
@igorschlum it's 16, which is still very high
@igorschlum commented on GitHub (Sep 2, 2024):
@joelanman Sorry, I anticipated that it could be 4 :-)
@mrtysn commented on GitHub (Sep 3, 2024):
I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize.
Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds.
@donuts-are-good commented on GitHub (Sep 5, 2024):
Hello all,
I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority.
With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day.
@ShayBox commented on GitHub (Sep 5, 2024):
The issue was fixed 2 weeks ago in 0.3.7...
@mdlmarkham commented on GitHub (Sep 7, 2024):
I'm still having the issue.
@devrandom commented on GitHub (Oct 6, 2024):
it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1.
@augusto-rehfeldt commented on GitHub (Dec 18, 2024):
I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish.
Any idea about how to rate limit this on Windows?
@jimbothegrey commented on GitHub (Jan 7, 2025):
used wondershare to slow down the connection. looks like it is working....
@mrtysn commented on GitHub (Jan 10, 2025):
@jimbothegrey what might wondershare be? all I'm finding is a file converter
@TiddlyWiddly commented on GitHub (Jan 12, 2025):
Still experiencing this on pretty quick connection, knocks all my devices off
@ankh2054 commented on GitHub (Jan 14, 2025):
yeah same here, It's hard to work on anything else when models download and very large models will take multiple days to download at my 3.5MB/s
@xtareq commented on GitHub (Jan 17, 2025):
Same issue here. But in my case it was good in my windows 11 for llama3.1 but the issue arise when i try to pull phi4. Any specific reason why this happens?
@ading2210 commented on GitHub (Jan 21, 2025):
As a rudimentary workaround, I wrote a bash script that monitors ollama's network usage and constantly suspends/resumes the ollama process.
https://gist.github.com/ading2210/882565526f7e1f2b9b14a022ac3741ac
Make sure you have
nethogsinstalled (sudo apt install nethogs).For example, if you want to limit downloads to 5000 KB/s:
@digitalextremist commented on GitHub (Feb 15, 2025):
Seems this is a well known problem... Short of handling this at the router-level, the @ading2210 solution seems to help a lot! Thanks very much for sharing that.
Still looking forward to a long term solution that does not chop the process, though that approach is a great idea:
@donuts-are-good commented on GitHub (Feb 18, 2025):
It appears that is not the case...
@martinoturrina commented on GitHub (Mar 26, 2025):
bump
@martinoturrina commented on GitHub (Mar 26, 2025):
bump
@digitalextremist commented on GitHub (Mar 26, 2025):
My permanent solution for this is
lukaszlach/docker-tc... I no longer see this as anOllamaissue to resolve.Example
docker-compose.ymlthat worksAnd then labels added to
Ollamacontainer:@joelanman commented on GitHub (Mar 27, 2025):
that wouldnt fix it for people not using docker?
@digitalextremist commented on GitHub (Mar 28, 2025):
No; but if you do not use
dockerbut still useLinux... thenethogssolution still works: https://github.com/ollama/ollama/issues/2006#issuecomment-2603584766@StdLogicTrig commented on GitHub (May 7, 2025):
Bump
@whjvenyl commented on GitHub (Jun 20, 2025):
Mac version of a rate limiter based on the previous script from @ading2210
@codeisnotcode commented on GitHub (Jun 27, 2025):
Rate-limiting is a really important feature! With the source code it should be simple to implement and it would provide high value. It is super annoying to have Ollama kill other network activity every time I download a new model.
@sotander commented on GitHub (Jul 22, 2025):
I have this issue also. I pull at ca. 200MB/s. The admins are sending me emails to not slow down the network.
@Mondonno commented on GitHub (Aug 29, 2025):
I have this issue also, it hurts especially on slower networks in my case, seems related also to #3741
@anton-karlovskiy commented on GitHub (Sep 16, 2025):
@jmorganca
👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues:
What I’ve tried so far:
From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious.
To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly:
Windows
Open new notepad and put this code.
Next go to save as… and pick All files as file type and name the file with
.batextension.And finally you can run it.
To edit the code above, just change model name from
llama3.2to any other model you like.Linux
I didn’t test the code for Linux but this should hopefully work.
#!/bin/bash
Save this as
run_loop.shthen run these commands.Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e
Good luck 🤞
@ading2210 commented on GitHub (Sep 16, 2025):
@anton-karlovskiy On Linux there is a
retrycommand already, so there's no need for a shell script.You would just use it like:
@anton-karlovskiy commented on GitHub (Sep 16, 2025):
Thank you for your tip. @ading2210
Let's keep in touch. :)
@digitalextremist commented on GitHub (Sep 16, 2025):
Agreed with @anton-karlovskiy, that was a real-life
Pro Tip™️ @ading2210 ... extrartfmpoints there.And nice work on the meticulousness, and DIY solutions @anton-karlovskiy. Do you ever use containers?
@PeaStew commented on GitHub (Oct 15, 2025):
coming up to 2 years now and no proper solution, just hacks
@apassi commented on GitHub (Mar 10, 2026):
I am using the trickle to limit the bandwith:
trickle -s -d 50mb ollama pull xxxxx
@floriandotorg commented on GitHub (Mar 20, 2026):
Super important feature, not sure, why it's not here.
@dhirajlochib commented on GitHub (Apr 2, 2026):
I've opened a PR to address this: #15219
The implementation adds a new
OLLAMA_MAX_DOWNLOAD_SPEEDenvironment variable that lets you cap download bandwidth when pulling models. It uses a shared token-bucket rate limiter (golang.org/x/time/rate) across all 16 concurrent download chunks, so the aggregate bandwidth stays within the specified limit.Usage examples:
OLLAMA_MAX_DOWNLOAD_SPEED=10m— limit to 10 MB/sOLLAMA_MAX_DOWNLOAD_SPEED=500k— limit to 500 KB/sOLLAMA_MAX_DOWNLOAD_SPEED=1g— limit to 1 GB/sWhen unset, downloads run at full speed with zero overhead.
@donuts-are-good commented on GitHub (Apr 13, 2026):
It's crazy that this issue is still open, still being replied to, and still no solution.
@phamngocduy98 commented on GitHub (Apr 15, 2026):
yes, I have the same issue when uploading also. My internet will fail with 16 concurrent connections.
@gregs007 commented on GitHub (Apr 16, 2026):
@dhirajlochib Thanks for developing a solution for this problem for us. It's been a nagging issue for quite some time. Really appreciate the contribution. Looking forward to it!
@Legendary-Lava commented on GitHub (Apr 19, 2026):
A hack I did was isolating by source/server on an IFB (Intermediate function block) that is set below my max bandwidth.
Modifying anything on ingress is always a hack, but its a little more universal than addressing ollama spamming 16 connections specifically.
It is more prone to multiple different "server" connections like when torrenting
make to change eth2 to the correct interface & set bandwidth somewhere below your actual speed
To test just go to fast.com set it to 30 streams, extend test duration & try to do anything else thoughout the download.
@gregs007 commented on GitHub (Apr 20, 2026):
Maybe this will help someone until this change gets into main
I use a tool called tc to limit the ollama container after it starts. I'm no developer so I'm sure someone could make this more universal by automagically parsing out the right interface, burstsize, etc. but this limits my ollama network speed to 500m (I have 1g internet).