[GH-ISSUE #320] Cannot create a model based on llama2:70b #62178

Closed
opened 2026-05-03 07:45:40 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @asarturas on GitHub (Aug 10, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/320

If we change example devops-engineer model slightly to use 70b model instead of 13b, like:

# Modelfile for creating a devops engineer assistant
# Run `ollama create devops-engineer -f ./Modelfile` and then `ollama run devops-engineer` and enter a topic

FROM llama2:13b
PARAMETER temperature 1
SYSTEM """
You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible
"""

Then on it generates everything fine, but it fails with an error:

$ ollama run devops
>>> hello
Error: failed to load model
For more details, check the error logs at /Users/ollama/.ollama/logs/server.log

and the diagnostics is:

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024
llama_load_model_from_file: failed to load model

while same Modelfile with originally used b13 works fine.

Originally created by @asarturas on GitHub (Aug 10, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/320 If we change example [devops-engineer](https://github.com/jmorganca/ollama/blob/main/examples/devops-engineer/Modelfile) model slightly to use 70b model instead of 13b, like: ``` # Modelfile for creating a devops engineer assistant # Run `ollama create devops-engineer -f ./Modelfile` and then `ollama run devops-engineer` and enter a topic FROM llama2:13b PARAMETER temperature 1 SYSTEM """ You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible """ ``` Then on it generates everything fine, but it fails with an error: ``` $ ollama run devops >>> hello Error: failed to load model For more details, check the error logs at /Users/ollama/.ollama/logs/server.log ``` and the diagnostics is: ``` error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model ``` while same Modelfile with originally used b13 works fine.
GiteaMirror added the question label 2026-05-03 07:45:40 -05:00
Author
Owner

@mxyng commented on GitHub (Aug 10, 2023):

@asarturas in your Modelfile, did you mean FROM llama2:70b?

The error you're seeing is likely due to num_gqa not being set correctly. The value should be correct in the upstream 70b model so I would first try ollama pull llama2:70b then recreate the devops-engineer model.

<!-- gh-comment-id:1674016380 --> @mxyng commented on GitHub (Aug 10, 2023): @asarturas in your Modelfile, did you mean `FROM llama2:70b`? The error you're seeing is likely due to `num_gqa` not being set correctly. The value should be correct in the upstream 70b model so I would first try `ollama pull llama2:70b` then recreate the `devops-engineer` model.
Author
Owner

@asarturas commented on GitHub (Aug 10, 2023):

@mxyng brilliant, thank works, thank you. does this documentation tweak looks legit? https://github.com/jmorganca/ollama/pull/326

<!-- gh-comment-id:1674067825 --> @asarturas commented on GitHub (Aug 10, 2023): @mxyng brilliant, thank works, thank you. does this documentation tweak looks legit? https://github.com/jmorganca/ollama/pull/326
Author
Owner

@technovangelist commented on GitHub (Aug 23, 2023):

I see that #326 was merged, resolving everything from this issue. Thanks so much for the PR and identifying the issue. I'll go ahead and close this.

<!-- gh-comment-id:1690786112 --> @technovangelist commented on GitHub (Aug 23, 2023): I see that #326 was merged, resolving everything from this issue. Thanks so much for the PR and identifying the issue. I'll go ahead and close this.
Author
Owner

@iam4x commented on GitHub (Sep 5, 2023):

I'm running into the same issue:

FROM codellama:34b-instruct
PARAMETER temperature 0.2

Changing temperature triggers the same error, I've tried to add:

PARAMETER num_gqa [1|2|3|4]

Didn't change anything, made sure as well to have re-pulled codellama:34b-instruct

Should I open a new issue?

<!-- gh-comment-id:1706116828 --> @iam4x commented on GitHub (Sep 5, 2023): I'm running into the same issue: ``` FROM codellama:34b-instruct PARAMETER temperature 0.2 ``` Changing temperature triggers the same error, I've tried to add: `PARAMETER num_gqa [1|2|3|4]` Didn't change anything, made sure as well to have re-pulled `codellama:34b-instruct` Should I open a new issue?
Author
Owner

@iam4x commented on GitHub (Sep 5, 2023):

From the 7B WizardLM I can do this and it works:

FROM wizardlm-uncensored:latest

PARAMETER temperature 0.7
PARAMETER num_ctx 4096
PARAMETER top_p 0.95
PARAMETER repetition_penalty 1.15
PARAMETER repeat_last_n -1
<!-- gh-comment-id:1706119197 --> @iam4x commented on GitHub (Sep 5, 2023): From the 7B WizardLM I can do this and it works: ``` FROM wizardlm-uncensored:latest PARAMETER temperature 0.7 PARAMETER num_ctx 4096 PARAMETER top_p 0.95 PARAMETER repetition_penalty 1.15 PARAMETER repeat_last_n -1 ```
Author
Owner

@iam4x commented on GitHub (Sep 5, 2023):

Running the base model with --verbose after exit could find the default params:

llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_head_kv  = 8
llama_model_load_internal: n_layer    = 48
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 8
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: freq_base  = 1000000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 34B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 18168.87 MB (+  384.00 MB per state)
  • We can see num_gqa should be set to 8
  • Using this value in Modelfile works now

Edit ->

Working is a big word, at least it starts replying to my commands but fails and print only blank returns / jump line after 2-3 lines outputted

image

(can't really be seen on screenshot, but all the black/blank is new lines created by LLM reply and it doesn't stops itself)

<!-- gh-comment-id:1706144954 --> @iam4x commented on GitHub (Sep 5, 2023): Running the base model with `--verbose` after exit could find the default params: ``` llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_head_kv = 8 llama_model_load_internal: n_layer = 48 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 8 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: freq_base = 1000000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 34B llama_model_load_internal: ggml ctx size = 0.13 MB llama_model_load_internal: mem required = 18168.87 MB (+ 384.00 MB per state) ``` - We can see `num_gqa` should be set to 8 - Using this value in Modelfile works now Edit -> Working is a big word, at least it starts replying to my commands but fails and print only blank returns / jump line after 2-3 lines outputted ![image](https://github.com/jmorganca/ollama/assets/893837/72ebae99-e5ee-4c79-9b92-cba1aa5925de) (can't really be seen on screenshot, but all the black/blank is new lines created by LLM reply and it doesn't stops itself)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62178