[GH-ISSUE #5919] ollama run (modelname) runs instruction-finetuned model #3695

Closed
opened 2026-04-12 14:30:38 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @d-kleine on GitHub (Jul 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5919

I just have noticed that ollama run llama3 runs Llama 3 8B Instruct (the instruction-finetuned variant) instead of Llama 3 8B:
https://ollama.com/library/llama3/blobs/6a0746a1ec1a

These are different models:
Llama 3 8B: https://huggingface.co/meta-llama/Meta-Llama-3-8B
Llama 3 8B Instruct: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

Would it be possible to fix this?

Originally created by @d-kleine on GitHub (Jul 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5919 I just have noticed that `ollama run llama3` runs ***Llama 3 8B Instruct*** (the instruction-finetuned variant) instead of ***Llama 3 8B***: https://ollama.com/library/llama3/blobs/6a0746a1ec1a These are different models: **Llama 3 8B**: https://huggingface.co/meta-llama/Meta-Llama-3-8B **Llama 3 8B Instruct**: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct Would it be possible to fix this?
GiteaMirror added the model label 2026-04-12 14:30:38 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 24, 2024):

I don't know if it's policy, but it seems that the plain model name in the ollama library usually references the finetuned model, and the base model is usually <model>:text or <model>:base, eg llama3:text

<!-- gh-comment-id:2248437927 --> @rick-github commented on GitHub (Jul 24, 2024): I don't know if it's policy, but it seems that the plain model name in the ollama library usually references the finetuned model, and the base model is usually \<model\>:text or \<model\>:base, eg [llama3:text](https://ollama.com/library/llama3:text)
Author
Owner

@d-kleine commented on GitHub (Jul 24, 2024):

Oh, wow, thanks! Not very intuitive from my pov...

<!-- gh-comment-id:2248444536 --> @d-kleine commented on GitHub (Jul 24, 2024): Oh, wow, thanks! Not very intuitive from my pov...
Author
Owner

@Cephra commented on GitHub (Jul 24, 2024):

But it is indeed odd why there is a :text for llama3, while there isn't one for llama3.1. Maybe this is indeed an oversight?

<!-- gh-comment-id:2248604007 --> @Cephra commented on GitHub (Jul 24, 2024): But it is indeed odd why there is a :text for [llama3](https://ollama.com/library/llama3/tags), while there isn't one for [llama3.1](https://ollama.com/library/llama3.1/tags). Maybe this is indeed an oversight?
Author
Owner

@gwillen commented on GitHub (Jul 24, 2024):

I have been assuming that it's just because llama3.1 is still too new, and it takes time to do format conversion and quantization and so forth.

(Note that, in addition to defaulting to the instruct-tuned model, ollama also defaults to a 4-bit quantization of the model. I think this makes sense, because otherwise most people would waste a ton of bandwidth downloading the full-sized version, and then not be able to run it. But it's worth inspecting the specific quantization levels yourself; I don't recommend ever downloading the default option. NB: I am in no way affiliated with ollama, this is just my sense as a user.)

<!-- gh-comment-id:2248710336 --> @gwillen commented on GitHub (Jul 24, 2024): I have been assuming that it's just because llama3.1 is still too new, and it takes time to do format conversion and quantization and so forth. (Note that, in addition to defaulting to the instruct-tuned model, ollama also defaults to a 4-bit quantization of the model. I think this makes sense, because otherwise most people would waste a ton of bandwidth downloading the full-sized version, and then not be able to run it. But it's worth inspecting the specific quantization levels yourself; I don't recommend ever downloading the default option. NB: I am in no way affiliated with ollama, this is just my sense as a user.)
Author
Owner

@d-kleine commented on GitHub (Jul 24, 2024):

I have been assuming that it's just because llama3.1 is still too new, and it takes time to do format conversion and quantization and so forth.

Might be, but actually llama3.1 is already provided (llama3.1), and the instruction-finetuned variant most likely won't have a different architecture. So idk... And I would be surprised if this process isn't automated to some point tbh.

(Note that, in addition to defaulting to the instruct-tuned model, ollama also defaults to a 4-bit quantization of the model.

Yeah, seems to be q_4 by default

<!-- gh-comment-id:2248752680 --> @d-kleine commented on GitHub (Jul 24, 2024): > I have been assuming that it's just because llama3.1 is still too new, and it takes time to do format conversion and quantization and so forth. Might be, but actually llama3.1 is already provided (llama3.1), and the instruction-finetuned variant most likely won't have a different architecture. So idk... And I would be surprised if this process isn't automated to some point tbh. > (Note that, in addition to defaulting to the instruct-tuned model, ollama also defaults to a 4-bit quantization of the model. Yeah, seems to be `q_4` by default
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3695