mirror of
https://github.com/ollama/ollama.git
synced 2026-05-08 00:51:34 -05:00
Closed
opened 2026-04-12 15:44:35 -05:00 by GiteaMirror
·
40 comments
No Branch/Tag Specified
main
dhiltgen/ci
dhiltgen/llama-runner
parth-launch-codex-app
hoyyeva/anthropic-local-image-path
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc7
v0.30.0-rc6
v0.30.0-rc5
v0.23.2
v0.23.2-rc0
v0.30.0-rc4
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#4787
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ALLMI78 on GitHub (Nov 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7526
Originally assigned to: @dhiltgen on GitHub.
What is the issue?
Sorry, I'm new to github but i got this problem and no solution...
When processing requests with any model in Ollama, a 500 Internal Server Error consistently occurs whenever the LLM computation exceeds exactly 2 minutes. This happens regardless of the model size or GPU/CPU usage, indicating a strict runtime limit. Notably, if the model completes processing under 2 minutes, the response returns without error.
Observed Behavior:
The API returns a 500 error precisely at the 2-minute mark, interrupting the LLM’s processing. Debug logs show no specific timeout warnings or errors related to resource limits. No documented configuration settings appear available to adjust this runtime limit.
Expected Behavior:
Ability to configure or bypass the 2-minute processing timeout to allow longer LLM computations, or receive more detailed error feedback regarding timeout settings.
Debug Attempts:
Verified high debug level (OLLAMA_DEBUG=true).
Tested with models of various sizes (confirming 70% VRAM usage or less).
Checked for relevant timeout settings in logs and source files without success.
Searched for relevant timeout settings in Ollama’s documentation and codebase but found no configurable options related to runtime limits.
Environment:
System: WIN 10 64 bit / 4060 TI 16 GB / 32 GB Ram
Ollama version: 0.3.14
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=4 tid="8572" timestamp=1730876370 DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=5 tid="8572" timestamp=1730876370 DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=8431 slot_id=0 task_id=5 tid="8572" timestamp=1730876370 DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=5 tid="8572" timestamp=1730876370 time=2024-11-06T08:01:27.258+01:00 level=DEBUG source=sched.go:466 msg="context for request finished" time=2024-11-06T08:01:27.259+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=M:\OLLAMA\models\blobs\sha256-cc04e85e1f866a5ba87dd66b5260f0cb32354e2c66505e86a7ac3c0092272b7d duration=5s time=2024-11-06T08:01:27.259+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=M:\OLLAMA\models\blobs\sha256-cc04e85e1f866a5ba87dd66b5260f0cb32354e2c66505e86a7ac3c0092272b7d refCount=0 [GIN] 2024/11/06 - 08:01:27 | 500 | 2m0s | 127.0.0.1 | POST "/api/chat" time=2024-11-06T08:01:27.285+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=M:\OLLAMA\models\blobs\sha256-cc04e85e1f866a5ba87dd66b5260f0cb32354e2c66505e86a7ac3c0092272b7d DEBUG [process_single_task] slot data | n_idle_slots=0 n_processing_slots=1 task_id=2921 tid="8572" timestamp=1730876487 DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=56096 status=200 tid="1200" timestamp=1730876487 DEBUG [update_slots] slot released | n_cache_tokens=11346 n_ctx=32768 n_past=11345 n_system_tokens=0 slot_id=0 task_id=5 tid="8572" timestamp=1730876487 truncated=false DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2924 tid="8572" timestamp=1730876487 DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=56099 status=200 tid="9824" timestamp=1730876487 DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2925 tid="8572" timestamp=1730876487 DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=56099 status=200 tid="9824" timestamp=1730876487 DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2926 tid="8572" timestamp=1730876487 DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=56100 status=200 tid="8676" timestamp=1730876487The first time the error occurs, the Ollama API does not return anything, only logging the error 500 message. After this, the model continues running, but if it doesn't return within another 2 minutes, the 500 Internal Server Error is triggered again, and the process is returned.
OS
Windows
GPU
Nvidia
CPU
Intel
Ollama version
0.3.14
@rick-github commented on GitHub (Nov 6, 2024):
Your client has a two minute timeout.
@ALLMI78 commented on GitHub (Nov 6, 2024):
Thank you for your response. However, I believe there’s a misunderstanding regarding the timeout. My client is set to allow up to 180 seconds (3 minutes) for the request, as seen in the code:
And changing this does not help...
So, the issue does not originate from the client-side timeout. It seems the problem is on the server side, where a hard limit of 2 minutes runtime may be causing the error 500 to trigger if the model takes longer than that to compute a response. Can you confirm whether the server has a 2-minute timeout or any other limit that would cause this issue?
https://pypi.org/project/llama-index-llms-ollama/ has a request_timeout for that i think...
@rick-github commented on GitHub (Nov 6, 2024):
@rick-github commented on GitHub (Nov 6, 2024):
Same for chat endpoint. AFAIK there's nothing on the server side that has a two minute timeout.
@ALLMI78 commented on GitHub (Nov 6, 2024):
I see many thanks, but you did that on cpu only - can you please check this on gpu - is there a gpu runtime limit?
Is it possible that stream true / false does influence something here?
I use num_gpu=100, stream=false...
@ALLMI78 commented on GitHub (Nov 6, 2024):
There are two things that puzzle me:
When the Ollama server logs the 500 error for the first time, my web request does not return yet. The server somehow restarts the model, which is noticeable when I set keep_alive to 0. The model quickly unloads and reloads, while the web request still does not return. In the first run, before the first error 500, the GPU runs quietly and normally. After the first 500 error and the internal reload, the GPU becomes significantly louder in the second run. If the response then takes longer than 2 minutes again, the Ollama server cancels, and my web request returns with an error.
I am surprised that I can't find anything about this. It seems like no one else has these concerns, but this exact termination after 2 minutes confuses me.
Most of the time, it works as expected: i get a resposne text with status 200 and if the 500 error appears first, then the model runs again, generates its response, and returns with a status 200. However, when this doesn't work, and the 2-minute limit is reached twice, after the second 500 error, the web request returns with an error.
@rick-github commented on GitHub (Nov 6, 2024):
If I do it on GPU, it finishes in less than 30 seconds, so doesn't exceed two minutes. Setting
"stream":falsestill takes over two minutes.What hardware do you have that can load 33 layers into GPU but still takes more than 2 minutes for a completion?
What language is your client written in? The examples of
WebRequestI find on the internet have the timeout in milliseconds, so your180is not 3 minutes, it's 0.18 seconds.Do you run a proxy? If so, have you set
noproxy=127.0.0.1in the client environment?Server logs may shed some light.
@ALLMI78 commented on GitHub (Nov 6, 2024):
So, does the process on your end also run for more than 2 minutes on the GPU?
What exactly do you want to know? i7 3770, 32 GB RAM, 4060 ti 16 GB VRAM, Win 10 64 bit.
The client is written in MQL5, and you are right that the timeout for the MQL5 WebRequest is actually in milliseconds, but this is only a connection timeout ("timeout int in Sets the maximum time to wait for a response from the server, in milliseconds"), and if that were the cause, I wouldn't even reach the 2-minute runtime.
No proxy.
Server logs, happy to share, I’ve already included them above. They are quite long since I’m using 32k context, but if you tell me what you want to see, I can show you more. However, there's nothing specific in them...
@ALLMI78 commented on GitHub (Nov 6, 2024):
@ALLMI78 commented on GitHub (Nov 6, 2024):
first error 500, it does not stop and next try is 200
i have only removed my prompt text
can you see something there?
@rick-github commented on GitHub (Nov 6, 2024):
My test only runs on the GPU for 20-30 seconds, because that's as long as the completion takes. If you could share a completion that your system is trying that takes more then 2 minutes, I could try it here to compare.
It's better if you just attach the full log as a file. There's configuration information at the start and calculations about model loading that are not included in the snippets above, and they may give some insight into what's going on. The model unloading/loading around the 500 is unusual, and also not shown in the snippets.
@dhiltgen commented on GitHub (Nov 6, 2024):
Everywhere we return a 500 from the server, we include some error message about why. Can you make sure your client logs the body of a 500 response, and hopefully that will help explain why the server is running into a problem.
@ALLMI78 commented on GitHub (Nov 6, 2024):
error-500-1.txt
I've prepared a file, from the start of the log and one complete cycle including the 500 error, with only the prompt content removed. In this case, you can clearly see how the 500 error is triggered, but everything continues running as normal, and shortly after, the call returns with a 200 status. The strange part is that directly afterward, a second 200 status appears, and I don’t know what it's doing.
I haven’t removed anything around the 500 error from the log file; there really isn’t any additional information or details about the error. Unfortunately, I can’t share the prompts used, as they contain sensitive information, and removing that information causes the prompt to no longer work.
In another cycle after a 500 error, two 200 statuses also appear one right after the other—check out the timestamps:
[GIN] 2024/11/06 - 05:45:12 | 500 | 2m0s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/11/06 - 05:46:51 | 200 | 1m38s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/11/06 - 05:46:51 | 200 | 1.0942ms | 127.0.0.1 | POST "/api/chat"
Version 0.4.0 of Ollama was just installed on my system. I’ll test with this one; maybe this issue is already resolved in this version, which is also supposed to improve support for the 4060 GPU.
I’m still considering whether this could be caused by my MQL5-WebRequest, but I’m leaning toward ‘no.’ If the WebRequest were triggering the 500 error, why does it remain connected and shortly afterward return with a 200 status? That doesn’t really make sense.
"Can you make sure your client logs the body of a 500 response, and hopefully that will help explain why the server is running into a problem."
Sure, i can do that but as far as i remember with error 500 the body was empty, i'll try that again, if the problem appears in 0.4.0 ;) thanks for your help guys, you do very well ;)
@dhiltgen commented on GitHub (Nov 6, 2024):
I'll mark it need info for now - let us know if 0.4.0 does in fact resolve the problem, or you see it happen again, and hopefully you can capture the body the client receives.
@ALLMI78 commented on GitHub (Nov 6, 2024):
I’ve now tested with version 0.4.0, and the issue still persists.
I’ve isolated a part of the attached log here...
Please, take a close look at the lines marked with prompt search for
###REMOVED, and you’ll see that:My prompt is sent, then an error 500 occurs after supposedly 2m0s (though I’m not even sure it really ran for 2 minutes).
The same prompt is then sent again, and this time it returns a normal 200 response with a time of 55.3543941s -> for the same prompt. but thats not me, ollama does that alone
SOLVED -> 3. Afterward, there’s another 200 status with a duration of 1.0847ms, again extremely quick or almost instant. -> why 2 ? EDIT: I know now, i do a model unload after it has returned, sorry my bad please ignore the second fast status 200 response, i forgot that sorry ;)
Back to the 500 error... Unfortunately, I can’t log the response body after 500 error because everything continues as normal after the 500 error. As I mentioned, my WebRequest doesn’t return at the 500 error but only with the 200, and at that point, a regular response (answer) is included.
I noticed this for two reasons: First, the models sometimes run longer than usual (clearly because the first run fails with a 500 error), and after the 500 error, the GPU gets noticeably loud. Normally, it’s barely audible, even when running at 100%, but after the 500 error, something changes. The card still runs at 100%, but it gets significantly warmer, making the fan much more audible.
My assumption is that something in ollama fails or some timeout expires, ollama server changes something internally (maybe to ensure a next smooth run?), it runs again, and then successfully returns with a 200 status. Flash Attention possibly? Do you disable something after a bad run?
And why is no error there? As you told me, there should be some info after status 500, there is nothing ;(
EDIT! Whats that: time=2024-11-06T23:39:57.330+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" <<< directly after the 500 error "evaluating already loaded"?
error-500-2.txt
@dhiltgen commented on GitHub (Nov 6, 2024):
There are various code paths that can result in a 500 error being returned to the client application, and in some the error is not logged on the server, but only returned to the client. We can work on improving that in a future version, but if you can find a way to log in your client on non-200 responses that may help us narrow down exactly where the problem is occurring.
@ALLMI78 commented on GitHub (Nov 7, 2024):
I'm currently trying to do that, because sometimes it comes back with an error after 2 or more 500 error... System is running, but that can work for hours, last time it was running fine for 4-5 hours - until it fails...
Any idea on that -> time=2024-11-06T23:39:57.330+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded"
Thanks for your support mate ;)
@rick-github commented on GitHub (Nov 7, 2024):
This looks like a timeout/retry.
There's 1m56.185 of processing in the first attempt, but the request terminates two minutes after the client sent the request. The client retries and gets a cache hit from the previous processing, letting the completion complete quicker and return a result.
I'm not familiar with the cache code so I don't know if my interpretation is correct, but it's a cache - it's supposed to speed things up, right?
@rick-github commented on GitHub (Nov 7, 2024):
If you can run wireshark on your machine you could capture the packets at the time of the 500 and see any server response and which side closes the connection.
@ALLMI78 commented on GitHub (Nov 7, 2024):
I got something...
MQL5 to explain the result
int timeout = 1000;
ResetLastError();
int awc = WebRequest("POST", endpoint, header, timeout, postDataUTF8, result, responseHeader);
int gle = GetLastError();
responseHeader = CleanData(responseHeader);
if(gle != 0) {
Print(LINE);
Print(FUNCTION + "() | GLE-ERROR | WebRequest() | GLE "+ErrorDescription(gle));
Print(FUNCTION + "() | GLE-ERROR | WebRequest() | AWC "+HttpErrorDescription(awc));
}`
After 2 times error 500:
5203 -> MQL5 -> ERR_WEBREQUEST_REQUEST_FAILED
1003 -> WebRequest return from ollama api
the returned result and responseHeader are both empty ;(
i got 4 error 500 in my ollama log, the first 3 did nothing, or it didnt stop. but number 4 did something different:
The last 2 [GIN] are error 500 and all other 500 before did not stop, they had a 200 after the restart or reload, but this time after 2 error 500 it has returned the webrequest...
The pattern is:
[500]+[200] -> no return of webrequest until the 200 comes back
[500] -> no webrequest return + another [500] -> webrequest returns with the 5203 + 1003
1003 -> WebRequest return from ollama api -> what does 1003 from ollama mean?
I hope it helps, thats all i can see atm...
@rick-github commented on GitHub (Nov 7, 2024):
I'm guessing that the requests have
keep_alive=5.It's like the client has a maximum number of retries, and when it exceeded that this time, it finally returned an error up the call stack.
@ALLMI78 commented on GitHub (Nov 7, 2024):
Yes, keep_alive is set to 5 because previously I had keep_alive at 0, and when a 500 error occurred, the model was immediately unloaded and then reloaded to continue. With keep_alive set to 5, the model remains loaded, but unfortunately, this still didn’t solve my issue that after two 500 errors, the WebRequest returns empty.
I’ve been struggling with this issue for several weeks and haven’t found any leads on how to solve it so far.
1003 -> WebRequest return from ollama api -> what does 1003 from ollama mean?
@rick-github commented on GitHub (Nov 7, 2024):
ollama doesn't return 1003, it's an MQL5 internal error code.
It still looks like a client timeout. Note that the client is not necessarily your program, it may be something in between like a load balancer, but you don't have a proxy so not likely.
@ALLMI78 commented on GitHub (Nov 7, 2024):
sorry mate but there is nothing in between, its my local pc, just the mt5 (metatrader 5) and ollama windows api...
"ollama doesn't return 1003, it's an MQL5 internal error code."
are you sure about that? the 5203 from GetLastError() is the MQL5 code...
because ollama also returns the 200 that i want to see? i always check that with if(awc != 200) and so on...
awc in my example is the returned http code from the server called in the webrequest, so in this case, the ollama server
After 2 times error 500:
5203 -> MQL5 -> ERR_WEBREQUEST_REQUEST_FAILED
1003 -> WebRequest http status code returned from ollama server or api
@rick-github commented on GitHub (Nov 7, 2024):
200, 500, 404, etc are well defined HTTP return codes. 1003 is not on the list. My guess is that MQL5 developers have decided to overload the return value of WebRequest, using numbers < 1000 as normal HTTP return codes, and numbers > 1000 as MQL5 specific return codes.
@ALLMI78 commented on GitHub (Nov 7, 2024):
that is maybe possible, but we have no prove for that...?
https://www.mql5.com/en/docs/network/webrequest -> Return Value HTTP server response code or -1 for an error.
my chatgpt has maybe found something:
It’s exactly what I am doing—I'm loading two models in a rotation, one is exactly llama3.1, and the other is qwen2.5.
CHATGPT found it here:
i can't find something with 1003 there but the model switching and error 500 are the same ...
does unloading a model really clean everything ???
i run lama -> unload -> run qwen -> unload -> run lama and so on...
@rick-github commented on GitHub (Nov 7, 2024):
1003 is not a valid HTTP server response. 1003 is not even in the source code.
ChatGPT is taking a kernel of truth and hallucinating an entire field of wheat. Your logs don't indicate any decode errors, the issue was fixed some time ago, and it doesn't explain the two minute timeout or the client retries.
@ALLMI78 commented on GitHub (Nov 7, 2024):
Thanks for the explanation... I'm aware that 1003 is not a standard HTTP status code.
I had a similar thought but in reverse: I suspected that maybe Ollama or llama.cpp could be returning a custom status code 1003. I hadn’t mentioned to ChatGPT that I was switching between different models, so the insight it found really seems relevant. I had specifically asked ChatGPT to search for connections between 1003 and Ollama or llama.cpp, which led to these findings.
If 1003 isn’t actually a specific return code from Ollama or llama.cpp, I agree with your theory that the problem might lie with the MQL5 WebRequest. This is supported by the fact that no one else seems to have encountered this 'Error 500 after 2 minutes' issue.
So, your assumption would be that there’s a problem with the MQL5 WebRequest where Ollama outputs a 500 error, but then the MQL5 WebRequest just converts it into a 1003 instead?
@rick-github commented on GitHub (Nov 7, 2024):
My assumption is that WebRequest has some hidden internals that cause it to timeout after two minutes of no response from a server, retry that 2 more times, and if there's still no response, return a 1003. If that's the case and there's no feature to override that behaviour, then the way around it would be to enable streaming in the ollama request and aggregate the results in a loop around the WebRequest call.
@rick-github commented on GitHub (Nov 7, 2024):
Something like
@ALLMI78 commented on GitHub (Nov 7, 2024):
That sounds very plausible. I was just about to ask you, how your theory fits with the fact that my request is actually sent twice.
But you already explained that, hmmm...
Can you please explain what is the ""evaluating already loaded" after the 500 error?
Alright, with stream=true, this issue shouldn’t occur then - I’ll test it out! ;)
@rick-github commented on GitHub (Nov 7, 2024):
Whenever a request comes in, ollama checks to see if there's a currently loaded model that can be used to process the request. "evaluating already loaded" means that just after the 500, the client re-sent the query, so ollama checked that the current model would be able to be used.
@ALLMI78 commented on GitHub (Nov 7, 2024):
Thanks mate, and does unloading a model (with messages:[] and keep_alive:0) really clean everything, all stuff, caches and so on?
So switching between models is no problem ? I was not sure if this is the cause...
@rick-github commented on GitHub (Nov 7, 2024):
It should clean everything. If it doesn't, that would be a bug. Model switching should be no problem.
@ALLMI78 commented on GitHub (Nov 7, 2024):
04:43:20 | 200 | 2m6s | 127.0.0.1 | POST "/api/chat"
but then: [GIN] 2024/11/07 - 04:53:04 | 200 | 2m0s | 127.0.0.1 | POST "/api/chat"
and
2024.11.07 04:53:04.041 SendAPIRequest() | GLE-ERROR | WebRequest() | GLE 5203 - WebRequest - request failed
2024.11.07 04:53:04.041 SendAPIRequest() | GLE-ERROR | WebRequest() | HTTP-Error [1003]
2024.11.07 04:53:04.041 SendAPIRequest() | GLE-ERROR | WebRequest() | header -> StringLen = 0 | body -> ArraySize = 0
stream = true does not help, but as you can see there was 200 in ollama and webrequest returns on that with 5203 and 1003, empty header and empty body, so yes that maybe seems to be a problem with webrequest ;(...
@rick-github commented on GitHub (Nov 7, 2024):
Perhaps WebRequest wants the full response to arrive and the client to close the socket, signalling it's finished, before it returns. Maybe you have to spin your own WebRequest with SocketCreate/SocketConnect.
@rick-github commented on GitHub (Nov 7, 2024):
Do you know if ollama has actually output any tokens when it times out? Maybe your completion is just so complex that it takes more than two minutes before it emits something that WebRequest can read.
You could try using curl with one of your prompts and see how long it takes for ollama to respond.
@ALLMI78 commented on GitHub (Nov 7, 2024):
Hello, I’m going to take a close look at this now, but it might take some time. I need to check what options are available for accessing the Ollama API from MQL5, maybe using sockets or ShellExecute / curl.
There are indications that other MQL5 users have also had issues with 1003 errors:
The message 'read failed, 0 error, 1 byte needed' is interesting here, but it seems like 1003 is a general error that can appear in very different situations. What surprises me, though, is that MetaTrader 5 and MQL5 have been around for a while, and if this is indeed a bug, it hasn’t been fixed yet.
This was my thought as well—that my request might be too complex or unsolvable. But I don’t understand why, on retry after a 500 error, I often get a response on a new run. In most cases, I first get a 500 error and then a 200 response within two minutes. Only in rare cases I do get two 500 errors, at which point I cancel on the client side.
I’ll update you once I know more. ;)
@rick-github commented on GitHub (Nov 7, 2024):
Cache.
@ALLMI78 commented on GitHub (Nov 7, 2024):
Ok, i did some tests with my large prompts in win powershell and
[GIN] 2024/11/07 - 19:44:40 | 200 | 3m56s | 127.0.0.1 | POST "/api/chat"
no 500 error...
Sorry for the confusing bug search but WebRequest was working fine before that problem...
Many thanks again for the fast support ;)
EDIT: I'm now using a self created WebRequest by using ShellExecuteW in MQL5 to send a Powershell Invoke-WebRequest and i can confirm that the runtime limit was on the client side, it works very well now...
[GIN] 2024/11/11 - 00:30:02 | 200 | 6m9s | 127.0.0.1 | POST "/api/chat"
0 error 500, thanks again ;)