[GH-ISSUE #13157] which quantization ollama cloud models use #55216

Closed
opened 2026-04-29 08:31:36 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @MertcanTekin on GitHub (Nov 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13157

I'm wondering if I'll get the same performance running Ollama models locally versus on their cloud service. For example, are ollama run gpt-oss:120b-cloud and ollama run gpt-oss:120b the same quantized models, or does the cloud version use less-quantized (higher quality) models?

Originally created by @MertcanTekin on GitHub (Nov 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13157 I'm wondering if I'll get the same performance running Ollama models locally versus on their cloud service. For example, are ollama run gpt-oss:120b-cloud and ollama run gpt-oss:120b the same quantized models, or does the cloud version use less-quantized (higher quality) models?
GiteaMirror added the feature request label 2026-04-29 08:31:36 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 19, 2025):

glm-4.6                 FP8
kimi-k2:1t              FP8
kimi-k2-thinking        INT4
qwen3-coder:480b        BF16
deepseek-v3.1:671b      FP8
gpt-oss:120b            MXFP4
gpt-oss:20b             MXFP4
qwen3-vl:235b-instruct  FP8
qwen3-vl:235b           BF16
minimax-m2              unknown
gemini-3-pro-preview    unknown
<!-- gh-comment-id:3553404646 --> @rick-github commented on GitHub (Nov 19, 2025): ``` glm-4.6 FP8 kimi-k2:1t FP8 kimi-k2-thinking INT4 qwen3-coder:480b BF16 deepseek-v3.1:671b FP8 gpt-oss:120b MXFP4 gpt-oss:20b MXFP4 qwen3-vl:235b-instruct FP8 qwen3-vl:235b BF16 minimax-m2 unknown gemini-3-pro-preview unknown ```
Author
Owner

@yuch85 commented on GitHub (Jan 9, 2026):

is there somewhere this info can be found? i have the same query, but for nemotron-3-nano:30b-cloud.

<!-- gh-comment-id:3728160441 --> @yuch85 commented on GitHub (Jan 9, 2026): is there somewhere this info can be found? i have the same query, but for nemotron-3-nano:30b-cloud.
Author
Owner

@joetifa2003 commented on GitHub (Mar 26, 2026):

Is this info public somewhere?

Why not show this on the website?

<!-- gh-comment-id:4136064211 --> @joetifa2003 commented on GitHub (Mar 26, 2026): Is this info public somewhere? Why not show this on the website?
Author
Owner

@rick-github commented on GitHub (Mar 26, 2026):

(echo "|model|size|context|quant|capabilities|" ; echo "|--|--|--|--|--|" ; for i in $(curl -s https://ollama.com/api/tags | jq -r '.models[].name') ; do curl -s https://ollama.com/api/show -H "Authorization: Bearer $OLLAMA_API_KEY" -d '{"model":"'$i'"}' | jq -r '"|['$i'](https://ollama.com/library/'$i')|\(.details.parameter_size|tonumber/1e9|tostring|split(".")[0])b|\(.model_info|."general.architecture" as $arch | ."\($arch).context_length")|\(.details.quantization_level)|\(.capabilities|sort|join(","))|"' ; done | sort )
model size context quant capabilities
cogito-2.1:671b 671b 163840 FP8 completion,thinking,tools
deepseek-v3.1:671b 671b 163840 FP8 completion,thinking,tools
deepseek-v3.2 671b 163840 FP8 completion,thinking,tools
devstral-2:123b 123b 262144 FP8 completion,tools
devstral-small-2:24b 24b 262144 FP8 completion,tools,vision
gemini-3-flash-preview 0b 1048576 completion,thinking,tools
gemma3:12b 12b 131072 BF16 completion,vision
gemma3:27b 27b 131072 BF16 completion,vision
gemma3:4b 4b 131072 BF16 completion,vision
gemma4:31b 32b 262144 BF16 completion,thinking,tools,vision
glm-4.6 357b 202752 FP8 completion,thinking,tools
glm-4.7 357b 202752 FP8 completion,thinking,tools
glm-5 756b 202752 FP8 completion,thinking,tools
gpt-oss:120b 116b 131072 MXFP4 completion,thinking,tools
gpt-oss:20b 20b 131072 MXFP4 completion,thinking,tools
kimi-k2:1t 1042b 262144 FP8 completion,tools
kimi-k2.5 1042b 262144 INT4 completion,thinking,tools,vision
kimi-k2-thinking 1042b 262144 INT4 completion,thinking,tools
minimax-m2.1 230b 204800 FP8 completion,thinking,tools
minimax-m2.5 230b 204800 FP8 completion,thinking,tools
minimax-m2.7 0b 204800 completion,thinking,tools
minimax-m2 230b 204800 completion,tools
ministral-3:14b 14b 262144 FP8 completion,tools,vision
ministral-3:3b 3b 262144 FP8 completion,tools,vision
ministral-3:8b 8b 262144 FP8 completion,tools,vision
mistral-large-3:675b 675b 262144 FP8 completion,tools,vision
nemotron-3-nano:30b 32b 1048576 FP8 completion,thinking,tools
nemotron-3-super 120b 262144 NVFP4 completion,thinking,tools
qwen3.5:397b 397b 262144 BF16 completion,thinking,tools,vision
qwen3-coder:480b 480b 262144 FP8 completion,tools
qwen3-coder-next 80b 262144 FP8 completion,tools
qwen3-next:80b 80b 262144 FP8 completion,thinking,tools
qwen3-vl:235b 235b 262144 BF16 completion,thinking,tools,vision
qwen3-vl:235b-instruct 235b 262144 FP8 completion,tools,vision
rnj-1:8b 8b 32768 FP16 completion,tools
<!-- gh-comment-id:4136086819 --> @rick-github commented on GitHub (Mar 26, 2026): ```shell (echo "|model|size|context|quant|capabilities|" ; echo "|--|--|--|--|--|" ; for i in $(curl -s https://ollama.com/api/tags | jq -r '.models[].name') ; do curl -s https://ollama.com/api/show -H "Authorization: Bearer $OLLAMA_API_KEY" -d '{"model":"'$i'"}' | jq -r '"|['$i'](https://ollama.com/library/'$i')|\(.details.parameter_size|tonumber/1e9|tostring|split(".")[0])b|\(.model_info|."general.architecture" as $arch | ."\($arch).context_length")|\(.details.quantization_level)|\(.capabilities|sort|join(","))|"' ; done | sort ) ``` |model|size|context|quant|capabilities| |--|--|--|--|--| |[cogito-2.1:671b](https://ollama.com/library/cogito-2.1:671b)|671b|163840|FP8|completion,thinking,tools| |[deepseek-v3.1:671b](https://ollama.com/library/deepseek-v3.1:671b)|671b|163840|FP8|completion,thinking,tools| |[deepseek-v3.2](https://ollama.com/library/deepseek-v3.2)|671b|163840|FP8|completion,thinking,tools| |[devstral-2:123b](https://ollama.com/library/devstral-2:123b)|123b|262144|FP8|completion,tools| |[devstral-small-2:24b](https://ollama.com/library/devstral-small-2:24b)|24b|262144|FP8|completion,tools,vision| |[gemini-3-flash-preview](https://ollama.com/library/gemini-3-flash-preview)|0b|1048576||completion,thinking,tools| |[gemma3:12b](https://ollama.com/library/gemma3:12b)|12b|131072|BF16|completion,vision| |[gemma3:27b](https://ollama.com/library/gemma3:27b)|27b|131072|BF16|completion,vision| |[gemma3:4b](https://ollama.com/library/gemma3:4b)|4b|131072|BF16|completion,vision| |[gemma4:31b](https://ollama.com/library/gemma4:31b)|32b|262144|BF16|completion,thinking,tools,vision| |[glm-4.6](https://ollama.com/library/glm-4.6)|357b|202752|FP8|completion,thinking,tools| |[glm-4.7](https://ollama.com/library/glm-4.7)|357b|202752|FP8|completion,thinking,tools| |[glm-5](https://ollama.com/library/glm-5)|756b|202752|FP8|completion,thinking,tools| |[gpt-oss:120b](https://ollama.com/library/gpt-oss:120b)|116b|131072|MXFP4|completion,thinking,tools| |[gpt-oss:20b](https://ollama.com/library/gpt-oss:20b)|20b|131072|MXFP4|completion,thinking,tools| |[kimi-k2:1t](https://ollama.com/library/kimi-k2:1t)|1042b|262144|FP8|completion,tools| |[kimi-k2.5](https://ollama.com/library/kimi-k2.5)|1042b|262144|INT4|completion,thinking,tools,vision| |[kimi-k2-thinking](https://ollama.com/library/kimi-k2-thinking)|1042b|262144|INT4|completion,thinking,tools| |[minimax-m2.1](https://ollama.com/library/minimax-m2.1)|230b|204800|FP8|completion,thinking,tools| |[minimax-m2.5](https://ollama.com/library/minimax-m2.5)|230b|204800|FP8|completion,thinking,tools| |[minimax-m2.7](https://ollama.com/library/minimax-m2.7)|0b|204800||completion,thinking,tools| |[minimax-m2](https://ollama.com/library/minimax-m2)|230b|204800||completion,tools| |[ministral-3:14b](https://ollama.com/library/ministral-3:14b)|14b|262144|FP8|completion,tools,vision| |[ministral-3:3b](https://ollama.com/library/ministral-3:3b)|3b|262144|FP8|completion,tools,vision| |[ministral-3:8b](https://ollama.com/library/ministral-3:8b)|8b|262144|FP8|completion,tools,vision| |[mistral-large-3:675b](https://ollama.com/library/mistral-large-3:675b)|675b|262144|FP8|completion,tools,vision| |[nemotron-3-nano:30b](https://ollama.com/library/nemotron-3-nano:30b)|32b|1048576|FP8|completion,thinking,tools| |[nemotron-3-super](https://ollama.com/library/nemotron-3-super)|120b|262144|NVFP4|completion,thinking,tools| |[qwen3.5:397b](https://ollama.com/library/qwen3.5:397b)|397b|262144|BF16|completion,thinking,tools,vision| |[qwen3-coder:480b](https://ollama.com/library/qwen3-coder:480b)|480b|262144|FP8|completion,tools| |[qwen3-coder-next](https://ollama.com/library/qwen3-coder-next)|80b|262144|FP8|completion,tools| |[qwen3-next:80b](https://ollama.com/library/qwen3-next:80b)|80b|262144|FP8|completion,thinking,tools| |[qwen3-vl:235b](https://ollama.com/library/qwen3-vl:235b)|235b|262144|BF16|completion,thinking,tools,vision| |[qwen3-vl:235b-instruct](https://ollama.com/library/qwen3-vl:235b-instruct)|235b|262144|FP8|completion,tools,vision| |[rnj-1:8b](https://ollama.com/library/rnj-1:8b)|8b|32768|FP16|completion,tools|
Author
Owner

@ferenci84 commented on GitHub (Apr 5, 2026):

I think it's helpful to clarify, if not immediately apparent from the above question, that you can query this info for a single model with the command line:

ollama show glm-5:cloud
<!-- gh-comment-id:4189270275 --> @ferenci84 commented on GitHub (Apr 5, 2026): I think it's helpful to clarify, if not immediately apparent from the above question, that you can query this info for a single model with the command line: ``` ollama show glm-5:cloud ```
Author
Owner

@weathon commented on GitHub (Apr 25, 2026):

what about glm-5.1

<!-- gh-comment-id:4318771915 --> @weathon commented on GitHub (Apr 25, 2026): what about glm-5.1
Author
Owner

@themw123 commented on GitHub (Apr 25, 2026):

what about glm-5.1

It's in FP8

PS C:\Users\marvi> ollama show glm-5.1:cloud
  Model
    architecture        glm5.1
    parameters          756162687872
    context length      202752
    embedding length    6144
    quantization        FP8

  Capabilities
    thinking
    completion
    tools
<!-- gh-comment-id:4318857005 --> @themw123 commented on GitHub (Apr 25, 2026): > what about glm-5.1 It's in FP8 ``` PS C:\Users\marvi> ollama show glm-5.1:cloud Model architecture glm5.1 parameters 756162687872 context length 202752 embedding length 6144 quantization FP8 Capabilities thinking completion tools ```
Author
Owner

@themw123 commented on GitHub (Apr 25, 2026):

ollama show minimax-m2.7:cloud does not yet show the quantization of minimax-m2.7

<!-- gh-comment-id:4318859892 --> @themw123 commented on GitHub (Apr 25, 2026): `ollama show minimax-m2.7:cloud` does not yet show the quantization of minimax-m2.7
Author
Owner

@weathon commented on GitHub (Apr 25, 2026):

hmmm why mine did not show it with the cli

(base) wg25r:@MacBook-Pro:/Users/wenqiguo => ollama show glm-5.1:cloud
  Model
    Remote model    glm-5.1                   
    Remote URL      https://ollama.com:443    
    quantization                              

  Capabilities
    completion    
    tools         
    thinking      

<!-- gh-comment-id:4319019164 --> @weathon commented on GitHub (Apr 25, 2026): hmmm why mine did not show it with the cli ```bash (base) wg25r:@MacBook-Pro:/Users/wenqiguo => ollama show glm-5.1:cloud Model Remote model glm-5.1 Remote URL https://ollama.com:443 quantization Capabilities completion tools thinking ```
Author
Owner

@jhult commented on GitHub (Apr 28, 2026):

I feel like the table posted by @rick-github would be better than the following on https://ollama.com/pricing:

What quantization or data format do cloud models use?
Native weights, as released by the model provider. On modern NVIDIA hardware, models may use accelerated data formats supported by Blackwell and Vera Rubin architectures (e.g. NVFP4).


Image
<!-- gh-comment-id:4339903020 --> @jhult commented on GitHub (Apr 28, 2026): I feel like the table posted by @rick-github would be better than the following on https://ollama.com/pricing: > **What quantization or data format do cloud models use?** > Native weights, as released by the model provider. On modern NVIDIA hardware, models may use accelerated data formats supported by Blackwell and Vera Rubin architectures (e.g. NVFP4). --- <img width="775" height="537" alt="Image" src="https://github.com/user-attachments/assets/a20e97f1-31cf-4131-81bf-b7bf72f42518" />
Author
Owner

@jhult commented on GitHub (Apr 29, 2026):

Related repo: https://github.com/EndoTheDev/OMeter

<!-- gh-comment-id:4340654516 --> @jhult commented on GitHub (Apr 29, 2026): Related repo: https://github.com/EndoTheDev/OMeter
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55216