mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 16:40:08 -05:00
Open
opened 2026-04-12 19:51:39 -05:00 by GiteaMirror
·
87 comments
No Branch/Tag Specified
main
dhiltgen/ci
dhiltgen/llama-runner
hoyyeva/anthropic-local-image-path
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc5
v0.23.2-rc0
v0.30.0-rc4
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#7736
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @taagarwa-rh on GitHub (Aug 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11691
Originally assigned to: @ParthSareen on GitHub.
What is the issue?
OpenAI SDK is unable to parse structured output from gpt-oss:20b responses. Ollama is supposed to be compatible with OpenAI SDK structured outputs per this Blog Post.
Reproducer:
Relevant log output
OS
macOS
GPU
Apple
CPU
Apple
Ollama version
0.11.0
@BeatWolf commented on GitHub (Aug 5, 2025):
i think i have a similar issue with langchain and the ollamachatmodel. my pipelines that depend on structured output dont work, making the model unuseable
@jbcallaghan commented on GitHub (Aug 5, 2025):
I can report the same issue, structured output doesn't work. Content = ''
@sheneman commented on GitHub (Aug 6, 2025):
Structured outputs doesn't work with gpt-oss due to the use of the new Harmony response format.
I believe this can be addressed via an integration layer by the Ollama team, but the lack of structured outputs really makes using gpt-oss model useless for most serious purposes.
@frozenkp commented on GitHub (Aug 6, 2025):
Same issue here. I'm using Pydantic with ChatOllama, and nothing in the response content is causing a parsing exception.
@tneQpx commented on GitHub (Aug 6, 2025):
Same issue. Message.Content = "
When using ollama chat with format
@KlausGPaul commented on GitHub (Aug 6, 2025):
As a workaround, it seems as if adding the desired response schema to the prompt could work, have not tried it at scale yet, though.
The response, though will also enclose the JSON inside markdown, but it follows the schema.
@lachlansleight commented on GitHub (Aug 6, 2025):
Adding +1 - same issue here. Adding some tests:
Sending:
Results in the following response:
Sometimes
responseis empty, sometimes it begins with some of the thinking text, as above. Often this is just a tiny fragment, such as"response": "{\"\n\n }". Removing the system prompt seems to give me these little error fragments much more often (about 60% of the time, as opposed to 10% of the time with a system prompt)If I remove the
"format": "json"parameter altogether, I get the following response:Finally, if I try setting the format to a specific format, I always get the empty response text, with or without a system prompt.
@jbcallaghan commented on GitHub (Aug 6, 2025):
I tried this and it works randomly, sometimes I get a response with the correct formatting and other times no output at all. This is using exactly the same query each time. I also noticed there is a lot of blank content before the structured output is populated when it does work, almost like thinking is being shown as blank content
@duxor commented on GitHub (Aug 6, 2025):
It's a little bit crazy, but it works...
You need to find a position of ```json in the string, in some cases there is more text as a prefix:
@sheneman commented on GitHub (Aug 6, 2025):
@duxor - Yeah, you can specify the desired format in the prompt, but that's not really enforced structured output with a compiled grammar. No guarantee that it will work, and yes - you have to filter other stuff around it.
@frozenkp commented on GitHub (Aug 7, 2025):
My current alternative workaround is not asking gpt-oss to reply with a structured format, and asking another small model to produce the structured format from gpt-oss's response. It's redundant while stable and working.
@duxor commented on GitHub (Aug 7, 2025):
You are right. Do you think it's worth it?
I will definitely avoid
gpt-oss, for now.@frozenkp commented on GitHub (Aug 7, 2025):
Well, it depends. In my case, I used it as one of my research evaluations.
I would suggest using it after the issue is fixed. I didn't expect that I would take this two-layer approach that I used in the very beginning of the LLM era back. LOL
@rick-github commented on GitHub (Aug 7, 2025):
@andreys42 commented on GitHub (Aug 7, 2025):
+1 here
I guess absence of StructuredOutput support is signigicant drawback now
@Mohammadtvk commented on GitHub (Aug 7, 2025):
+1
this feature is very important
@dontriskit commented on GitHub (Aug 8, 2025):
same issue with vLLM
resolved with official vllm docs
@Koki-Itai commented on GitHub (Aug 8, 2025):
same issue
@tttturtle-russ commented on GitHub (Aug 8, 2025):
same issue here, it's an important feature.
@ddudek commented on GitHub (Aug 11, 2025):
As a better workaround, you should put the schema in "developer" role, e.g.:
The above works pretty good, although the model also outputs "analysis" and "commentary" streams, e.g.:
So adding this to the system prompt again improves the output:
Full example:
Output:
<|channel|>final<|message|>{"my_field": "some content"}still needs removing "<|channel|>final<|message|>" but gives very stable behavior.
This is nicely documented in the cookbook https://cookbook.openai.com/articles/openai-harmony#developer-message-format and looks like the model follows this very well.
@youngbinkim0 commented on GitHub (Aug 11, 2025):
can you link to the official vLLM doc referred? running into the same issue @dontriskit
@youngbinkim0 commented on GitHub (Aug 11, 2025):
While this does work for many schemas, it's not the same as enforcing structured outputs. As referred in the structured output section of the cookbook:
"This prompt alone will, however, only influence the model’s behavior but doesn’t guarantee the full adherence to the schema. For this you still need to construct your own grammar and enforce the schema during sampling."
https://cookbook.openai.com/articles/openai-harmony#structured-output
@lachlansleight commented on GitHub (Aug 12, 2025):
Agreed - since identifying this as an issue I basically put GPT-OSS down and haven't touched it since. It's interesting to see how it differs from other agents, but without JSON output it's completely useless for anything other than basic chat applications.
I didn't realise how harmful harmony would be to the update of GPT-OSS. It's so bad that I almost want to put on my tin foil hat and wonder whether Open AI is trying to intentionally harm the open-weight community by fragmenting the ecosystem with a complex, difficult-to-implement response format.
@ParthSareen commented on GitHub (Aug 12, 2025):
Hey folks! Just came across this issue. We currently do not support structured outputs with this model or other thinking models. Working on a change later this week to bring some of the token parsing down to the runner level. With that we'd be able to do grammar sampling and token parsing closer together to know when to start the constrained sampling. Sorry for the delay!
@sheneman commented on GitHub (Aug 13, 2025):
@ParthSareen : Thank you so much for your attention to the issue of Structured outputs and thinking models in Ollama, including gpt-oss. This issue been a roadblock for my organization using Ollama for awhile. We love Ollama but have been considering alternatives because of this limitation with effective use of thinking models.
I assume the issue with gpt-oss is at least related to open issue #523.
But I assume the problem is compounded with gpt-oss because of the use of the Harmony response format.
Again, thank you and the Ollama team for prioritizing this!
@nicholas-johnson-techxcel commented on GitHub (Aug 13, 2025):
Yeah progress, the GBNF generator is now pretty stable and I wrote a function to turn chat history into Harmony. The model is quite smart (for a local model) but also unhinged (although it might be that I have not yet tuned llama.cpp well). It tries to put reasoning output in URL params of tools, they really need to find a way to make it think without emitting those tokens. It can handle web browsing tasks using Playwright but I keep overflowing the context window so now I am writing a full node based infrastructure where we can have nodes consolidate / summarise the history and have tool calls be able to be more context aware and veto certain steps.
I got that idea from Haystack but their tooling is hardly type-safe (it was good as a proof of concept) and it was only after a tonne of complaining that they fixed the issue that by importing Haystack, half your python file lines light up bright red with compiler errors. I am still baffled by the fact that everyone is using old Python syntax, type-safety as an afterthought. Like I am new to Python and somehow it feels like I have to write almost everything because too many of the libraries are missing stubs or have serious issues. I mean, is it so hard for, OpenAI, Firecrawl-py, etc that if you give the function a Pydantic class, that it give back an instance of that class, and if you hand it a dict, you get back a dict? If you are streaming then you can use a small state machine to parse half-completed json and you use Pydantic alias generator so that the JSON is camelCase (JSON is inherently born from JS and this is best practice) but when you await the finished Pydantic class instance in Python, the fields are in snake_case as is idiomatic for Python.
Python still has poor generic templating (things like cannot specify the generic type of the function you are calling without it parsing it as array indexing, and it does not allow each overload to have its own code, because it has no idea which one will be used until runtime) and inheritance support (it only enforces checking if you mark as @final, and there is no way of requiring an instance passed in having been marked as final), but it is enough to do the trick. Well, at least it has come a long way, right? Pydantic is a life-saver, although under the hood it has some code I am not pleased with, and it had failed to implement certain cases, and the field aliasing is a nightmare - but I found a good workaround - use an alias generator and a switch statement inside of the generator with a case for each field.
Golden rule of LLMs, because they are extremely obtuse and get distracted easily (I believe next-token prediction to be merely a stepping stone). Use highly constrained schemas by rooting under the hood of Pydantic and cut down the number of choices it has and filter out noisy data (although there was a research paper recently "Let me speak freely", which argues against this, but does not seem to have used GBNF as a constraint method). The most frustrating thing is that I should not need to prompt an LLM to understand causality and the passage of time. The models try to emit all of the tool calls in one round, making bad assumptions about what the future state will be. Making them follow instructions is near impossible. But this gpt-oss:20b is more promising. We will have the hardware to run gpt-oss:120b in a couple of weeks, but my opinion is it will likely be a dud, not worth the 6x RAM and compute required compared to the 20b version.
I personally hope that all of these things get sorted. I mean there is no agent library I can download which has good general-purpose performance out of the box. I hope to change that. But OpenAI has just muddied the water with Harmony (ironic), there is no universally standardised LLM interface in Python, and the fact that I have to implement JSON schema support on my own is just wild. Again, the industry has billions of dollars and somehow they did not add support for JSON schema in llama.cpp when it's only a little over 300 lines of code. And I don't see how this code couldn't have been in there before, rendering the issue with gpt-oss to be only a matter of converting chat history to Harmony. Maybe they have kept a lot of features closed-source. Anyway, I want to be dealing with LLM concepts, not Python concepts. There are things like MCP (relatively clear) and A2A (can't even figure out at what level of abstraction is it meant to be at) and neither of these things have standardised the LLM interface.
One thing is clear, though: GBNF is the way. I see limitless possibilities with this, provided I can continue writing working compilers for it. It is also food for thought for how I was trying to train my own model (not next-token predictor) in my own time: I had issues formalising grammatical concepts for the training process, and this could be the thing I need. Ditching next-token should allow the inlining of classical functions (latch float inputs to closest binary state, one bit per input) to give LLMs extroadinary mathematical abilities without needing to execute Python or other script parsers. I would also wager similar techniques but for neural nets could solve the RSA problem. I also was struggling to understand how the Ollama JSON response format was implemented - it seemed like a mix of prompting, two-shot examples, and re-prompting. But GBNF, from what I can tell, eliminates illegal next tokens from the probability list (compiled into, meaning that it is relatively fail-safe and does not waste time having to go back and re-generate.
If anyone knows of other gems like GBNF, I would love to hear - I may be overlooking other great solutions.
@rick-github commented on GitHub (Aug 13, 2025):
Ollama use GBNF to implement structured outputs. The problem with applying it to reasoning models is that it currently also constrains the reasoning phase to the GBNF grammar, which compromises quality. Ideally the model should be allowed to consider the full gamut of probabilistic generation during reasoning, and only apply the GBNF grammer during content generation.
@Croups commented on GitHub (Aug 13, 2025):
same issue here, I noticed that sometimes it returns null while using it via pydantic-ai, I tested it without defining a structured output schema, it is generating the answer with tags here is a sample :
AgentRunResult(output='{"analysis":"The user says 'hi how are you', which is a greeting and question about how I am. The user wants to know what ChatGPT is. So respond with a friendly greeting and explanation.<|channel|>commentary:"} ')
This is why when you define a model, pydantic can't parse it.
@nicholas-johnson-techxcel commented on GitHub (Aug 15, 2025):
Can't it reason silently and not emit these symbols? Besides, I have been trying to disable reasoning as it just adds latency, and I can easily add reasoning fields to the json output of non-reasoning models if and when it makes sense for the application. Sometimes I have found it creates a better agent, other times it is just wasting electricity and time.
@rick-github commented on GitHub (Aug 15, 2025):
Models don't have an internal monologue or subconscious, all they do is probabilistically generate tokens. "Reasoning" models are trained to generate "thinking" tokens as a way to guide the generation of tokens in the response phase, but it's just tokens all the way down.
@adamoutler commented on GitHub (Aug 15, 2025):
Confirmed. Same issue. Any model except
gpt-ossseems to work with Structured Outputs.gpt-ossreturns a blank. I hope to see an adapter later in Ollama soon.@adamoutler commented on GitHub (Aug 15, 2025):
Is this being worked on? That thinking section on the json... It's a likely culprit and a good starting point.
@nicholas-johnson-techxcel commented on GitHub (Aug 20, 2025):
I did this in Python and llama.cpp but I since found in the Ollama source code a JSON=>GBNF compiler. You just need to hit llama.cpp with
grammar=gbnfand then it works, but because it is a reasoning model, it then becomes a bit unhinged and tries to insert reasoning into json fields instead of keeping reasoning internally.All Ollama had to do is use the grammar just like it already does and it would work to the extent which I get from llama.cpp (we still need to stop it from reasoning when we use
think=Falsewhich it ignores) but for some reason they seemed to have made an exception for this model and hence they broke it. If I get some time I can look at their code and give a patch.This also begs the question: if Ollama is a wrapper around llama.cpp then it could just become a python library which adds features to llama.cpp (basically a llama.cpp client library) and the actual Ollama server become a heap of scripts for running llama.cpp as a service and automatically pulling models down for it.
@mpauly commented on GitHub (Aug 20, 2025):
@nicholas-johnson-techxcel With regards to llama.cpp: there is an open issue for structured outputs in llama.cpp and things are mostly working.
Those changes would need to be merged into llama.cpp, and could then eventually trickle down/be ported to ollama
@ParthSareen commented on GitHub (Aug 20, 2025):
Hey @nicholas-johnson-techxcel @mpauly that's not how it works - we're not consuming any of the llama.cpp changes for structured outputs - although we do use GBNF.
You can't turn thinking "off" for this model - those tokens have to get generated as this model follows the Harmony format.
@adamoutler As mentioned above I have started working on this. It's not a trivial change unfortunately and needs some moving around of where our parsers current live. Thanks for your patience friends!
@nicholas-johnson-techxcel commented on GitHub (Aug 25, 2025):
I thought that llama.cpp did not have the structured output field, that you have to give GBNF, and it most certainly is working already, I now just compile JSON to GBNF and reluctantly make the root element to be
<thinking>.*</thinking>{json}or just<thinking></thinking>{json}if I want to disable reasoning, and I just capture the thinking tags in a state machine, emit chunk messages with role="thinking" for those (and handle any split messages) and either put the structured messages through a JSON stream parser, or accumulate them and then parse for tool calling.The way I see it, the issue is Ollama.
If anyone is wondering, forcing it to output
<thinking></thinking>does seem to properly disable reasoning - despite those saying it cannot be - it stops it from trying to cram reasoning into JSON fields, and this results in significant decreases in latency.This model still might be one of my favourites for local use, but it has still not caught up to 4o.
@CL415 commented on GitHub (Aug 26, 2025):
In case somebody needs to force GPT-OSS to output valid JSON despite Ollama's shortcomings like I had, I found some success using Pydantic AI
.run_sync, with the Ollama server asprovider, although using the retrial parameter since sometimes OSS does not comply on the first shot.@nicholas-johnson-techxcel commented on GitHub (Aug 27, 2025):
Okay in terms of progress on my side. It was working extremely well except I had to clamp the number of analysis/thinking chars to stop it rambling. But it seems that if I wrap the JSON-GBNF with GBNF for the harmony format, it no longer needs clamping. But I cannot force it with GBNF to emit tokens like <|end|> which is an issue, because then feeding it into the openai-harmony library encoder, it does not strip the harmony frames from it properly. It's not actually that hard to process streaming chunks using a state machine, but still, it would be best doing it properly. Anyone have any ideas on forcing <|end|> to be emitted with GBNF?
@erennyuksell commented on GitHub (Aug 29, 2025):
any reliable solution?
@steenharsted commented on GitHub (Aug 29, 2025):
I’m experiencing the same issue with
gpt-oss:20busingchat_ollama()andchat_structured()fromellmer. The model consistently returns truncated or non-JSON responses. This error persists even after:The output appears to be malformed almost every time.
Are there any plans to improve structured output compliance for
gpt-oss:20b?Thanks
@josemita87 commented on GitHub (Aug 29, 2025):
Same issue here with structured outputs...
@rick-github commented on GitHub (Aug 29, 2025):
https://github.com/ollama/ollama/issues/11691#issuecomment-3181220084
@rakadam commented on GitHub (Sep 1, 2025):
I have the same problem. I looked at the code and Ollama was designed in a way that the HarmonyParser runs on high level, while the sampler runs in the cpp code, with some go code for glue. And it is not possible to connect them, so the sampler cannot know when it is supposed to apply the grammar or not. Since the grammar is only valid inside the message, and Harmony formatting is outside the message, this is a big problem.
One not terribly insane solution, already mentioned in this thread: implementing a minimalistic Harmony parser in the go sampler glue code, so it knows when to enable the grammar constraining. Or this could be calling HarmonyParser basically in both layers.
@nicholas-johnson-techxcel commented on GitHub (Sep 2, 2025):
Okay got it done. The steps are:
/completions(not/v1/completions- this is not the same endpoint)enc.render_conversation_for_completion(Conversation.from_messages(messages), Role.ASSISTANT)from openai_harmony librarytokens=Truein bodygbnf["root"] = f'thinking-block "{tok_start}" "{tok_assistant}" "{tok_channel}" "final" "{tok_constrain}" "json" "{tok_message}" json-root "{tok_end}"'where the tok_X are like "<|start|>", etcthinking-blockis basically the same thing except withanalysisin the channel name, and any number of characters which are not "<" as the thought contentStreamableParser.process()one by oneStreamableParser.current_channelto know ifStreamableParser.last_content_deltawhere we cannot use it because it is contaminated with Harmony tags which should have been filtered out so instead we usecurrent_contentand diff that from last iteration to get the channel delta. Actually nevermind, turns out I have been using last_content_delta now, and it has stopped doing that.Finally the model has stopped being unhinged. It is extremely fast and consistent as an agent.
gpt-oss:20band I still doubt thegpt-oss:120bwill justify its size with performance, but I guess we will find out in a while once we have the 128GB Macbook Pro. Mine is only 64GB.@adamoutler commented on GitHub (Sep 2, 2025):
This caught my attention. I asked ChatGPT more about harmony format.
@MarioRicoIbanez commented on GitHub (Sep 3, 2025):
+1 having the same issue
@inf-bud commented on GitHub (Sep 3, 2025):
+1 having the same issue
@shahidazim commented on GitHub (Sep 3, 2025):
+1 having the same issue
@ParthSareen commented on GitHub (Sep 3, 2025):
Hey everyone! This is currently being worked on - trying to get it to y'all asap. https://github.com/ollama/ollama/pull/12052
@sheneman commented on GitHub (Sep 3, 2025):
@ParthSareen Thank you SO much!
@MarioRicoIbanez commented on GitHub (Sep 4, 2025):
@ParthSareen Thanks!
@kiwamizamurai commented on GitHub (Sep 4, 2025):
@ParthSareen so nice
@Seyid-cmd commented on GitHub (Sep 5, 2025):
@ParthSareen thanks
@ParthSareen commented on GitHub (Sep 6, 2025):
You guys can use this branch until I get it into main: https://github.com/ollama/ollama/tree/parth/gpt-oss-structured-outputs 😁
Would also love to know what you use structured outputs for if you do give the branch a shot
@adamoutler commented on GitHub (Sep 6, 2025):
Analyzing test results and reacting to binary/enum decisions.
How does it work? What did you do with the thinking?
@asabla commented on GitHub (Sep 6, 2025):
Mostly when interacting with LLMs I want to avoid writing too much fuzzy validation code (e.g making sure all needed data is there). Structured output is basically a very convenient shortcut for doing so. On top of that, most agentic frameworks for building reliable workflows, is using structured output under the hood for the same reasons.
Haven't had the time to test out the feature branch yet, but I'll get back to you when I've done so @ParthSareen
@sheneman commented on GitHub (Sep 6, 2025):
@ParthSareen The fix you implemented appears to generally work! Structured outputs with gpt-oss are working for me as expected and are present in the content field of the response. Reasoning traces are located in the "thinking" field. This is very improved behavior, and I am very grateful for your help! THANK YOU.
I did have a couple observations, as there still are some inconsistencies with responses from gpt-oss compared to other thinking models:
gpt-oss now provides separate reasoning traces, even if you specify "think": False. This is not a horrible default behavior, but technically it is incorrect and different than other thinking models (e.g. qwen3) which honor the "think" boolean for controlling thinking output as described here: https://ollama.com/blog/thinking
Other thinking models (qwen3) don't behave in the same way as gpt-oss:
a. If you set "thinking": False, there will be no thinking trace (correct for qwen, fails for gpt-oss)
b. If you use thinking mode and structured outputs with qwen3, it still will not emit a thinking trace (BUG)
@adamoutler commented on GitHub (Sep 7, 2025):
I believe with the harmony format, thinking is never turned off which is the problem we are experiencing here. I'm pretty sure this is not fixable on this particular model. It may be on the other one though.
@ParthSareen commented on GitHub (Sep 7, 2025):
@sheneman @adamoutler is correct. the thinking cannot be turned off for gpt-oss - you can only do
lowmedium, andhigh. And currently my PR only supports gpt-oss as a trial. Going to do thinking models as a whole next!@sheneman commented on GitHub (Sep 7, 2025):
@adamoutler @ParthSareen Thank you! While you can't actually turn off thinking in gpt-oss, you could set thinking to "low" and then suppress or mask the thinking trace. This would maintain response format compatibility with other thinking models. I could also see why you would prefer to output the thinking trace since its being generated anyway. It's easy enough to ignore if needed, so not a huge deal either way.
And Thank you @ParthSareen for now attacking the issue of structured outputs X thinking mode in the other models!!! With that, Ollama becomes so much more compelling for our organization!
@nicholas-johnson-techxcel commented on GitHub (Sep 8, 2025):
You can force it to emit an empty thinking tag using GBNF if you wish to save time.
@ParthSareen commented on GitHub (Sep 8, 2025):
You could but you're breaking the format the model was trained on. From experience the model is very sensitive to breaking the format which results in poor outputs. So your mileage may vary with that.
@vishalgoel2 commented on GitHub (Sep 14, 2025):
I tested the fix PR branch for structured outputs and it does improve things — simple structured outputs work now.
However, I’m running into mixed results when using it with
browser-use+gpt-oss:20b. With the release version of Ollama, it fails consistently with the familiarOn the fix branch, sometimes it works, but other times I see warnings like this in the logs:
and then
browser-useerrors out withSo it looks like the PR handles some structured output cases, but not all. Not sure yet if
browser-useis passing the tool schema in a way Ollama doesn’t expect, or if the fix still misses some scenarios.@trebor commented on GitHub (Sep 20, 2025):
i'm on ollama v0.12.0 and still seeing the issue. the query takes time, but returns with a zero-length response field. i'm happy to include payload and response text if that is helpful, but it is the typical prompt and json scheme in the format field.
@ParthSareen commented on GitHub (Sep 20, 2025):
Hi @trebor it's not released yet
@srshkmr commented on GitHub (Sep 24, 2025):
Hi @ParthSareen any ETA on the release? is there changes required on the PR?
@MarioRicoIbanez commented on GitHub (Sep 29, 2025):
Any news on when it will be released?
@AlexanderKozhevin commented on GitHub (Oct 2, 2025):
funny thing, structured output does work on Groq cloud
@ParthSareen commented on GitHub (Oct 2, 2025):
Had to make some updates to how we ran it. Just put up another PR. Aiming for next release.
@bogzbonny commented on GitHub (Oct 12, 2025):
haven't gotten it to work with ollama-rs calling on ollama 0.12.5 - I'm assuming https://github.com/ollama/ollama/pull/12460 doesn't actually fully resolve this issue but is only a stepping stone? (@ParthSareen)
@ParthSareen commented on GitHub (Oct 12, 2025):
Hmm it should be working... can you try running
ollama run gpt-oss --format json hello!and see if it shows thinking + the final output? if so it might be some weird client behavior
@vansatchen commented on GitHub (Oct 12, 2025):
ollama run gpt-oss:20b --format json hello!
We need to respond to ":", basically greet, friendly.Hello! 👋 How can I help you today?Thinking...
We responded.We are done.
...done thinking.
Hey there! What's on your mind today? 😊✨ "}Error: error parsing tool call: raw='Hey there! What’s on your mind today? 😊 <|constrain|> <|constrain|><|constrain|>.} ', err=invalid character 'H' looking for beginning of value
ollama -v
ollama version is 0.12.5
@sheneman commented on GitHub (Oct 12, 2025):
So just to be clear, the fix for this issue has not yet been merged to main, as of 0.12.5?
@ParthSareen commented on GitHub (Oct 12, 2025):
Ah I gave the wrong query @vansatchen @bogzbonny. Run just
ollama run gpt-oss --format jsonand then type something to the model.I haven't updated the generate endpoint yet there's some refactoring to do. Let me know if this doesn't work.
@vansatchen commented on GitHub (Oct 12, 2025):
@bogzbonny commented on GitHub (Oct 12, 2025):
@ParthSareen Okay cool, appreciated. I tried it and got similar output to @vansatchen I'm not sure how to feed a schema from the CLI but within ollama-rs it appears to be using the generate endpoints HENCE I think I'm still blocked on that endpoint refactor you mentioned get this operating.
(also https://github.com/ollama/ollama/pull/12460 was merged into 0.12.5 @sheneman if you look at the commit history)
@trebor commented on GitHub (Oct 16, 2025):
i have been testing ollama 0.12.5 with the most recent gpt-oss:20b, see below for specific examples. is this the expected behavior? is the change maybe still percolating through the system? am i calling it wrong?
curl commands i used to test:
curl 'http://localhost:11434/api/generate' --data-raw '{"model":"gpt-oss:20b","stream":false,"format":{"type":"integer","minimum":1,"maximum":10},"prompt":"choose a number\n"}'and am still seeing an empty response. for completeness:
{"model":"gpt-oss:20b","created_at":"2025-10-16T22:39:42.787141Z","response":"","done":true,"done_reason":"stop","context":[200006,17360,200008,3575,553,17554,162016,11,261,4410,6439,2359,22203,656,7788,17527,558,87447,100594,25,220,1323,19,12,3218,198,6576,3521,25,220,1323,20,12,702,12,1125,279,30377,289,25,14093,279,2,13888,18403,25,8450,11,49159,11,1721,13,21030,2804,413,7360,395,1753,3176,13,200007,200006,1428,200008,47312,261,2086,198,200007,200006,173781,16,220],"total_duration":639591959,"load_duration":152973209,"prompt_eval_count":71,"prompt_eval_duration":279207292,"eval_count":3,"eval_duration":50198457}%if i use qwen:14b, for example:
curl 'http://localhost:11434/api/generate' --data-raw '{"model":"qwen3:14b","stream":false,"format":{"type":"integer","minimum":1,"maximum":10},"prompt":"choose a number\n"}'i see what i would expect:
{"model":"qwen3:14b","created_at":"2025-10-16T22:45:54.903555Z","response":"8\n\n","done":true,"done_reason":"stop","context":[151644,872,198,27052,264,1372,198,151645,198,151644,77091,198,23,271],"total_duration":594441292,"load_duration":85178125,"prompt_eval_count":12,"prompt_eval_duration":394819125,"eval_count":3,"eval_duration":89476292}%@ParthSareen commented on GitHub (Oct 16, 2025):
hi @trebor sorry for the lack of documentation at the moment. It should work for the
/chatendpoint. Need to do some cleanup on/generatebefore we can support it there@trebor commented on GitHub (Oct 16, 2025):
oh got it, thank you!
@ParthSareen commented on GitHub (Oct 16, 2025):
Hey folks it should be out! Closing this issue. It'll work as expected with the
/chatendpoint./generatewill come at some point but might be a bit. Just wanted to unblock everyone!@dhicks commented on GitHub (Oct 16, 2025):
Could I suggest leaving this open until the issue has been resolved for
/generateas well?@trebor commented on GitHub (Oct 16, 2025):
btw: here is a minimal example curl that worked for me:
curl 'http://localhost:11434/api/chat' --data-raw '{"model":"gpt-oss:20b","stream":false,"format":{"type":"integer","minimum":1,"maximum":10},"messages":[{"content": "choose a number", "role": "user"}]}'huge thanks to @ParthSareen!
@jacksimpsoncartesian commented on GitHub (Oct 31, 2025):
So glad I found this - thought I was doing something wrong when I was getting the wrong structured outputs. Any word on whether this is likely to be fixed?
@sheneman commented on GitHub (Nov 23, 2025):
Hello @ParthSareen - Has this fix been addressed in /generate and pulled into the main branch? Thank you for your consideration.
@chakka-guna-sekhar-venkata-chennaiah commented on GitHub (Nov 26, 2025):
@sheneman
hey hi, but for its workign when i used the improt statement
from langchain_ollama import ChatOllamawhere i called model asits worked for me. In the reponse im getting json as
@bogzbonny commented on GitHub (Nov 27, 2025):
still doesn't work for me with (using
ollama-rs)@4IbWNsis3S commented on GitHub (Dec 7, 2025):
OpenAI harmony format used in OSS 20b and 120b breaks Ollama response handling. It's been over two-months since 20b and 120b were released and it's still broken in current/0.13.1