[GH-ISSUE #216] Something might still be wrong with K-Quant #62126

Closed
opened 2026-05-03 07:36:56 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @nkoehring on GitHub (Jul 26, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/216

When I run a 30B model (in this case upstage-llama-30b-instruct-2048.ggmlv3.q5_K_M.bin) the debug output in ollama talks about a 13B model size:
Screenshot from 2023-07-26 13-07-51

when running the same model with llama.cpp it outputs the correct size:
Screenshot from 2023-07-26 13-11-07

I tested with a 13B model and the output seems correct. Both models seem to work (as in generating output).

Originally created by @nkoehring on GitHub (Jul 26, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/216 When I run a 30B model (in this case upstage-llama-30b-instruct-2048.ggmlv3.q5_K_M.bin) the debug output in ollama talks about a 13B model size: ![Screenshot from 2023-07-26 13-07-51](https://github.com/jmorganca/ollama/assets/246402/36bb44f1-a534-44ae-94bb-3e87d7ce5a74) when running the same model with llama.cpp it outputs the correct size: ![Screenshot from 2023-07-26 13-11-07](https://github.com/jmorganca/ollama/assets/246402/2eb0621f-683b-4ea7-82a0-5aedf8292a03) I tested with a 13B model and the output seems correct. Both models seem to work (as in generating output).
GiteaMirror added the bug label 2026-05-03 07:36:56 -05:00
Author
Owner

@mxyng commented on GitHub (Jul 26, 2023):

@nkoehring can you confirm the Modelfile you're using for upstage_llama_30b references the correct file on disk? You can also compare the SHA256 of the file with the blob path; they should match.

<!-- gh-comment-id:1652284172 --> @mxyng commented on GitHub (Jul 26, 2023): @nkoehring can you confirm the Modelfile you're using for `upstage_llama_30b` references the correct file on disk? You can also compare the SHA256 of the file with the blob path; they should match.
Author
Owner

@mxyng commented on GitHub (Jul 26, 2023):

This is what I see using the same model:

2023/07/26 20:12:50 images.go:208: [model] - C:\Users\michael_yang\Downloads\upstage-llama-30b-instruct-2048.ggmlv3.q5_K_M.bin
[GIN] 2023/07/26 - 20:25:05 | 200 |        12m14s |       127.0.0.1 | POST     "/api/create"
llama.cpp: loading model from C:\Users\michael_yang\.ollama\models\blobs\sha256-00e2f30c83fb9230da3e54de681bfb9514b5b1f73a932fafcd7a01b058515e40
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 17 (mostly Q5_K - Medium)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.14 MB
llama_model_load_internal: mem required  = 24238.87 MB (+ 3124.00 MB per state)
llama_new_context_with_model: kv self size  = 3120.00 MB

Notice the SHA256 hash is different:

  • your screenshot show 6c6636176e21820adfc6781dee5b60d8385deea57fcc1d880930a401f339f95f (there may be some transcription errors)
  • my log output show 00e2f30c83fb9230da3e54de681bfb9514b5b1f73a932fafcd7a01b058515e40.
<!-- gh-comment-id:1652451013 --> @mxyng commented on GitHub (Jul 26, 2023): This is what I see using the [same](https://huggingface.co/TheBloke/upstage-llama-30b-instruct-2048-GGML/tree/main) model: ``` 2023/07/26 20:12:50 images.go:208: [model] - C:\Users\michael_yang\Downloads\upstage-llama-30b-instruct-2048.ggmlv3.q5_K_M.bin [GIN] 2023/07/26 - 20:25:05 | 200 | 12m14s | 127.0.0.1 | POST "/api/create" llama.cpp: loading model from C:\Users\michael_yang\.ollama\models\blobs\sha256-00e2f30c83fb9230da3e54de681bfb9514b5b1f73a932fafcd7a01b058515e40 llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 6656 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 52 llama_model_load_internal: n_layer = 60 llama_model_load_internal: n_rot = 128 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 17 (mostly Q5_K - Medium) llama_model_load_internal: n_ff = 17920 llama_model_load_internal: model size = 30B llama_model_load_internal: ggml ctx size = 0.14 MB llama_model_load_internal: mem required = 24238.87 MB (+ 3124.00 MB per state) llama_new_context_with_model: kv self size = 3120.00 MB ``` Notice the SHA256 hash is different: * your screenshot show `6c6636176e21820adfc6781dee5b60d8385deea57fcc1d880930a401f339f95f` (there may be some transcription errors) * my log output show `00e2f30c83fb9230da3e54de681bfb9514b5b1f73a932fafcd7a01b058515e40`.
Author
Owner

@mxyng commented on GitHub (Aug 2, 2023):

Closing this as can't reproduce. If there are more issues, please create a new issue

<!-- gh-comment-id:1662810630 --> @mxyng commented on GitHub (Aug 2, 2023): Closing this as can't reproduce. If there are more issues, please create a new issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62126