[GH-ISSUE #13157] which quantization ollama cloud models use #8703

Closed
opened 2026-04-12 21:28:35 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @MertcanTekin on GitHub (Nov 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13157

I'm wondering if I'll get the same performance running Ollama models locally versus on their cloud service. For example, are ollama run gpt-oss:120b-cloud and ollama run gpt-oss:120b the same quantized models, or does the cloud version use less-quantized (higher quality) models?

Originally created by @MertcanTekin on GitHub (Nov 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13157 I'm wondering if I'll get the same performance running Ollama models locally versus on their cloud service. For example, are ollama run gpt-oss:120b-cloud and ollama run gpt-oss:120b the same quantized models, or does the cloud version use less-quantized (higher quality) models?
GiteaMirror added the feature request label 2026-04-12 21:28:35 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 19, 2025):

glm-4.6                 FP8
kimi-k2:1t              FP8
kimi-k2-thinking        INT4
qwen3-coder:480b        BF16
deepseek-v3.1:671b      FP8
gpt-oss:120b            MXFP4
gpt-oss:20b             MXFP4
qwen3-vl:235b-instruct  FP8
qwen3-vl:235b           BF16
minimax-m2              unknown
gemini-3-pro-preview    unknown
<!-- gh-comment-id:3553404646 --> @rick-github commented on GitHub (Nov 19, 2025): ``` glm-4.6 FP8 kimi-k2:1t FP8 kimi-k2-thinking INT4 qwen3-coder:480b BF16 deepseek-v3.1:671b FP8 gpt-oss:120b MXFP4 gpt-oss:20b MXFP4 qwen3-vl:235b-instruct FP8 qwen3-vl:235b BF16 minimax-m2 unknown gemini-3-pro-preview unknown ```
Author
Owner

@yuch85 commented on GitHub (Jan 9, 2026):

is there somewhere this info can be found? i have the same query, but for nemotron-3-nano:30b-cloud.

<!-- gh-comment-id:3728160441 --> @yuch85 commented on GitHub (Jan 9, 2026): is there somewhere this info can be found? i have the same query, but for nemotron-3-nano:30b-cloud.
Author
Owner

@joetifa2003 commented on GitHub (Mar 26, 2026):

Is this info public somewhere?

Why not show this on the website?

<!-- gh-comment-id:4136064211 --> @joetifa2003 commented on GitHub (Mar 26, 2026): Is this info public somewhere? Why not show this on the website?
Author
Owner

@rick-github commented on GitHub (Mar 26, 2026):

(echo "|model|size|context|quant|capabilities|" ; echo "|--|--|--|--|--|" ; for i in $(curl -s https://ollama.com/api/tags | jq -r '.models[].name') ; do curl -s https://ollama.com/api/show -H "Authorization: Bearer $OLLAMA_API_KEY" -d '{"model":"'$i'"}' | jq -r '"|['$i'](https://ollama.com/library/'$i')|\(.details.parameter_size|tonumber/1e9|tostring|split(".")[0])b|\(.model_info|."general.architecture" as $arch | ."\($arch).context_length")|\(.details.quantization_level)|\(.capabilities|sort|join(","))|"' ; done | sort )
model size context quant capabilities
cogito-2.1:671b 671b 163840 FP8 completion,thinking,tools
deepseek-v3.1:671b 671b 163840 FP8 completion,thinking,tools
deepseek-v3.2 671b 163840 FP8 completion,thinking,tools
devstral-2:123b 123b 262144 FP8 completion,tools
devstral-small-2:24b 24b 262144 FP8 completion,tools,vision
gemini-3-flash-preview 0b 1048576 completion,thinking,tools
gemma3:12b 12b 131072 BF16 completion,vision
gemma3:27b 27b 131072 BF16 completion,vision
gemma3:4b 4b 131072 BF16 completion,vision
gemma4:31b 32b 262144 BF16 completion,thinking,tools,vision
glm-4.6 357b 202752 FP8 completion,thinking,tools
glm-4.7 357b 202752 FP8 completion,thinking,tools
glm-5 756b 202752 FP8 completion,thinking,tools
gpt-oss:120b 116b 131072 MXFP4 completion,thinking,tools
gpt-oss:20b 20b 131072 MXFP4 completion,thinking,tools
kimi-k2:1t 1042b 262144 FP8 completion,tools
kimi-k2.5 1042b 262144 INT4 completion,thinking,tools,vision
kimi-k2-thinking 1042b 262144 INT4 completion,thinking,tools
minimax-m2.1 230b 204800 FP8 completion,thinking,tools
minimax-m2.5 230b 204800 FP8 completion,thinking,tools
minimax-m2.7 0b 204800 completion,thinking,tools
minimax-m2 230b 204800 completion,tools
ministral-3:14b 14b 262144 FP8 completion,tools,vision
ministral-3:3b 3b 262144 FP8 completion,tools,vision
ministral-3:8b 8b 262144 FP8 completion,tools,vision
mistral-large-3:675b 675b 262144 FP8 completion,tools,vision
nemotron-3-nano:30b 32b 1048576 FP8 completion,thinking,tools
nemotron-3-super 120b 262144 NVFP4 completion,thinking,tools
qwen3.5:397b 397b 262144 BF16 completion,thinking,tools,vision
qwen3-coder:480b 480b 262144 FP8 completion,tools
qwen3-coder-next 80b 262144 FP8 completion,tools
qwen3-next:80b 80b 262144 FP8 completion,thinking,tools
qwen3-vl:235b 235b 262144 BF16 completion,thinking,tools,vision
qwen3-vl:235b-instruct 235b 262144 FP8 completion,tools,vision
rnj-1:8b 8b 32768 FP16 completion,tools
<!-- gh-comment-id:4136086819 --> @rick-github commented on GitHub (Mar 26, 2026): ```shell (echo "|model|size|context|quant|capabilities|" ; echo "|--|--|--|--|--|" ; for i in $(curl -s https://ollama.com/api/tags | jq -r '.models[].name') ; do curl -s https://ollama.com/api/show -H "Authorization: Bearer $OLLAMA_API_KEY" -d '{"model":"'$i'"}' | jq -r '"|['$i'](https://ollama.com/library/'$i')|\(.details.parameter_size|tonumber/1e9|tostring|split(".")[0])b|\(.model_info|."general.architecture" as $arch | ."\($arch).context_length")|\(.details.quantization_level)|\(.capabilities|sort|join(","))|"' ; done | sort ) ``` |model|size|context|quant|capabilities| |--|--|--|--|--| |[cogito-2.1:671b](https://ollama.com/library/cogito-2.1:671b)|671b|163840|FP8|completion,thinking,tools| |[deepseek-v3.1:671b](https://ollama.com/library/deepseek-v3.1:671b)|671b|163840|FP8|completion,thinking,tools| |[deepseek-v3.2](https://ollama.com/library/deepseek-v3.2)|671b|163840|FP8|completion,thinking,tools| |[devstral-2:123b](https://ollama.com/library/devstral-2:123b)|123b|262144|FP8|completion,tools| |[devstral-small-2:24b](https://ollama.com/library/devstral-small-2:24b)|24b|262144|FP8|completion,tools,vision| |[gemini-3-flash-preview](https://ollama.com/library/gemini-3-flash-preview)|0b|1048576||completion,thinking,tools| |[gemma3:12b](https://ollama.com/library/gemma3:12b)|12b|131072|BF16|completion,vision| |[gemma3:27b](https://ollama.com/library/gemma3:27b)|27b|131072|BF16|completion,vision| |[gemma3:4b](https://ollama.com/library/gemma3:4b)|4b|131072|BF16|completion,vision| |[gemma4:31b](https://ollama.com/library/gemma4:31b)|32b|262144|BF16|completion,thinking,tools,vision| |[glm-4.6](https://ollama.com/library/glm-4.6)|357b|202752|FP8|completion,thinking,tools| |[glm-4.7](https://ollama.com/library/glm-4.7)|357b|202752|FP8|completion,thinking,tools| |[glm-5](https://ollama.com/library/glm-5)|756b|202752|FP8|completion,thinking,tools| |[gpt-oss:120b](https://ollama.com/library/gpt-oss:120b)|116b|131072|MXFP4|completion,thinking,tools| |[gpt-oss:20b](https://ollama.com/library/gpt-oss:20b)|20b|131072|MXFP4|completion,thinking,tools| |[kimi-k2:1t](https://ollama.com/library/kimi-k2:1t)|1042b|262144|FP8|completion,tools| |[kimi-k2.5](https://ollama.com/library/kimi-k2.5)|1042b|262144|INT4|completion,thinking,tools,vision| |[kimi-k2-thinking](https://ollama.com/library/kimi-k2-thinking)|1042b|262144|INT4|completion,thinking,tools| |[minimax-m2.1](https://ollama.com/library/minimax-m2.1)|230b|204800|FP8|completion,thinking,tools| |[minimax-m2.5](https://ollama.com/library/minimax-m2.5)|230b|204800|FP8|completion,thinking,tools| |[minimax-m2.7](https://ollama.com/library/minimax-m2.7)|0b|204800||completion,thinking,tools| |[minimax-m2](https://ollama.com/library/minimax-m2)|230b|204800||completion,tools| |[ministral-3:14b](https://ollama.com/library/ministral-3:14b)|14b|262144|FP8|completion,tools,vision| |[ministral-3:3b](https://ollama.com/library/ministral-3:3b)|3b|262144|FP8|completion,tools,vision| |[ministral-3:8b](https://ollama.com/library/ministral-3:8b)|8b|262144|FP8|completion,tools,vision| |[mistral-large-3:675b](https://ollama.com/library/mistral-large-3:675b)|675b|262144|FP8|completion,tools,vision| |[nemotron-3-nano:30b](https://ollama.com/library/nemotron-3-nano:30b)|32b|1048576|FP8|completion,thinking,tools| |[nemotron-3-super](https://ollama.com/library/nemotron-3-super)|120b|262144|NVFP4|completion,thinking,tools| |[qwen3.5:397b](https://ollama.com/library/qwen3.5:397b)|397b|262144|BF16|completion,thinking,tools,vision| |[qwen3-coder:480b](https://ollama.com/library/qwen3-coder:480b)|480b|262144|FP8|completion,tools| |[qwen3-coder-next](https://ollama.com/library/qwen3-coder-next)|80b|262144|FP8|completion,tools| |[qwen3-next:80b](https://ollama.com/library/qwen3-next:80b)|80b|262144|FP8|completion,thinking,tools| |[qwen3-vl:235b](https://ollama.com/library/qwen3-vl:235b)|235b|262144|BF16|completion,thinking,tools,vision| |[qwen3-vl:235b-instruct](https://ollama.com/library/qwen3-vl:235b-instruct)|235b|262144|FP8|completion,tools,vision| |[rnj-1:8b](https://ollama.com/library/rnj-1:8b)|8b|32768|FP16|completion,tools|
Author
Owner

@ferenci84 commented on GitHub (Apr 5, 2026):

I think it's helpful to clarify, if not immediately apparent from the above question, that you can query this info for a single model with the command line:

ollama show glm-5:cloud
<!-- gh-comment-id:4189270275 --> @ferenci84 commented on GitHub (Apr 5, 2026): I think it's helpful to clarify, if not immediately apparent from the above question, that you can query this info for a single model with the command line: ``` ollama show glm-5:cloud ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8703