[GH-ISSUE #4101] Support NVIDIAs Llama fine-tune (chatQA-1.5) #28308

Closed
opened 2026-04-22 06:20:24 -05:00 by GiteaMirror · 26 comments
Owner

Originally created by @DuckyBlender on GitHub (May 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4101

Originally created by @DuckyBlender on GitHub (May 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4101
GiteaMirror added the model label 2026-04-22 06:20:25 -05:00
Author
Owner

@DuckyBlender commented on GitHub (May 2, 2024):

ChatQA-1.5 has two variants: ChatQA-1.5-8B and ChatQA-1.5-70B.
Nvidia/ChatQA-1.5-70B: https://huggingface.co/nvidia/ChatQA-1.5-70B
Nvidia/ChatQA-1.5-8B: https://huggingface.co/nvidia/ChatQA-1.5-8B

<!-- gh-comment-id:2091104668 --> @DuckyBlender commented on GitHub (May 2, 2024): ChatQA-1.5 has two variants: ChatQA-1.5-8B and ChatQA-1.5-70B. Nvidia/ChatQA-1.5-70B: [https://huggingface.co/nvidia/ChatQA-1.5-70B](https://huggingface.co/nvidia/ChatQA-1.5-70B) Nvidia/ChatQA-1.5-8B: [https://huggingface.co/nvidia/ChatQA-1.5-8B](https://huggingface.co/nvidia/ChatQA-1.5-8B)
Author
Owner

@DuckyBlender commented on GitHub (May 2, 2024):

https://ollama.com/duckyblender/llama3-chatqa

gonna try to implement this one myself

<!-- gh-comment-id:2091352275 --> @DuckyBlender commented on GitHub (May 2, 2024): https://ollama.com/duckyblender/llama3-chatqa gonna try to implement this one myself
Author
Owner

@DuckyBlender commented on GitHub (May 2, 2024):

damn they updated the site LMAO it's now nvidia/Llama3-ChatQA-1.5-8B because of the licence this is actually so funny

<!-- gh-comment-id:2091382080 --> @DuckyBlender commented on GitHub (May 2, 2024): damn they updated the site LMAO it's now nvidia/Llama3-ChatQA-1.5-8B because of the licence this is actually so funny
Author
Owner

@thinkverse commented on GitHub (May 2, 2024):

Can you even upload new models from huggingface to ollama?

Yes, you need to log in or create an Ollama account on ollama.com, copy the Ollama public key to that account, and use ollama push. 👍

For more step-by-step instructions I recommend reading the import.md documentation file. 🙂

<!-- gh-comment-id:2091815962 --> @thinkverse commented on GitHub (May 2, 2024): > Can you even upload new models from huggingface to ollama? Yes, you need to log in or create an Ollama account on ollama.com, copy the Ollama public key to that account, and use `ollama push`. 👍 For more step-by-step instructions I recommend reading the [import.md documentation file](https://github.com/ollama/ollama/blob/main/docs/import.md). 🙂
Author
Owner

@DuckyBlender commented on GitHub (May 2, 2024):

@thinkverse Thanks! How can I upload multiple .gguf files with different quantization levels?

<!-- gh-comment-id:2091850725 --> @DuckyBlender commented on GitHub (May 2, 2024): @thinkverse Thanks! How can I upload multiple .gguf files with different quantization levels?
Author
Owner

@thinkverse commented on GitHub (May 2, 2024):

How can I upload multiple .gguf files with different quantization levels?

You push a model with a tag just like you'd pull a different model with a tag.

ollama push <your username>/example:tag
<!-- gh-comment-id:2091856896 --> @thinkverse commented on GitHub (May 2, 2024): > How can I upload multiple .gguf files with different quantization levels? You push a model with a tag just like you'd pull a different model with a tag. ```shell ollama push <your username>/example:tag ```
Author
Owner

@DuckyBlender commented on GitHub (May 3, 2024):

image
uh, somethings wrong

<!-- gh-comment-id:2092547014 --> @DuckyBlender commented on GitHub (May 3, 2024): ![image](https://github.com/ollama/ollama/assets/42645784/b9398222-230a-4740-ae73-3c35e26909a1) uh, somethings wrong
Author
Owner

@DuckyBlender commented on GitHub (May 3, 2024):

uh, somethings wrong

turns out nvidia used a very strange chat template. also, this model is specialized for RAG applications, maybe there should be a feature in ollama which allows easier RAG?

<!-- gh-comment-id:2092575536 --> @DuckyBlender commented on GitHub (May 3, 2024): > uh, somethings wrong turns out nvidia used a very strange chat template. also, this model is specialized for RAG applications, maybe there should be a feature in ollama which allows easier RAG?
Author
Owner

@DuckyBlender commented on GitHub (May 3, 2024):

This is my Modelfile

FROM ./q4_K_M.gguf
TEMPLATE """{{ if .System }}System: {{ .System }}

{{ end }}{{ if .Prompt }}User: {{ .Prompt }}

{{ end }}Assistant: {{ .Response }}
"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "User: <|begin_of_text|>"
PARAMETER stop "User:"

What am I doing wrong?
image

From NVIDIA's Huggingface repo:
image

<!-- gh-comment-id:2092598213 --> @DuckyBlender commented on GitHub (May 3, 2024): This is my `Modelfile` ``` FROM ./q4_K_M.gguf TEMPLATE """{{ if .System }}System: {{ .System }} {{ end }}{{ if .Prompt }}User: {{ .Prompt }} {{ end }}Assistant: {{ .Response }} """ PARAMETER num_keep 24 PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|eot_id|>" PARAMETER stop "User: <|begin_of_text|>" PARAMETER stop "User:" ``` What am I doing wrong? ![image](https://github.com/ollama/ollama/assets/42645784/47abcdfc-ce9d-48a6-b0fb-b49d282259fa) From NVIDIA's Huggingface repo: ![image](https://github.com/ollama/ollama/assets/42645784/1a241542-c809-430b-98a9-d549d21c98da)
Author
Owner

@Arcitec commented on GitHub (May 3, 2024):

@DuckyBlender

maybe there should be a feature in ollama which allows easier RAG?

Have you tried the RAG support here? https://github.com/open-webui/open-webui

<!-- gh-comment-id:2093609931 --> @Arcitec commented on GitHub (May 3, 2024): @DuckyBlender > maybe there should be a feature in ollama which allows easier RAG? Have you tried the RAG support here? https://github.com/open-webui/open-webui
Author
Owner

@rickbeeloo commented on GitHub (May 5, 2024):

@DuckyBlender Did you manage to get ChatQA working?

<!-- gh-comment-id:2094654714 --> @rickbeeloo commented on GitHub (May 5, 2024): @DuckyBlender Did you manage to get ChatQA working?
Author
Owner

@djdookie commented on GitHub (May 6, 2024):

There seems to be an issue with Llama 3 GGUF conversion. Related? https://github.com/ggerganov/llama.cpp/issues/7062
Also: https://www.reddit.com/r/LocalLLaMA/comments/1ckvx9l/part2_confirmed_possible_bug_llama3_gguf/

<!-- gh-comment-id:2095024928 --> @djdookie commented on GitHub (May 6, 2024): There seems to be an issue with Llama 3 GGUF conversion. Related? https://github.com/ggerganov/llama.cpp/issues/7062 Also: https://www.reddit.com/r/LocalLLaMA/comments/1ckvx9l/part2_confirmed_possible_bug_llama3_gguf/
Author
Owner

@DuckyBlender commented on GitHub (May 6, 2024):

@DuckyBlender Did you manage to get ChatQA working?

Nope, waiting for more interest on this topic

<!-- gh-comment-id:2095574446 --> @DuckyBlender commented on GitHub (May 6, 2024): > @DuckyBlender Did you manage to get ChatQA working? Nope, waiting for more interest on this topic
Author
Owner

@sadaisystems commented on GitHub (May 6, 2024):

Support of ChatQA would be really nice, the model seems to be quite useful.

<!-- gh-comment-id:2095927951 --> @sadaisystems commented on GitHub (May 6, 2024): Support of ChatQA would be really nice, the model seems to be quite useful.
Author
Owner

@thinkverse commented on GitHub (May 6, 2024):

Nope, waiting for more interest on this topic

Did you catch this discussion? https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B/discussions/5. There seems to be an issue with the tokanizer_config.json and the chat template, maybe that is causing the issue you were having.

I'm trying to pull down the weights myself but Git doesn't seem to want to do so.

<!-- gh-comment-id:2097103113 --> @thinkverse commented on GitHub (May 6, 2024): > Nope, waiting for more interest on this topic Did you catch this discussion? https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B/discussions/5. There seems to be an issue with the `tokanizer_config.json` and the chat template, maybe that is causing the issue you were having. I'm trying to pull down the weights myself but Git doesn't seem to want to do so.
Author
Owner

@DuckyBlender commented on GitHub (May 7, 2024):

Did you catch this discussion? https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B/discussions/5. There seems to be an issue with the tokanizer_config.json and the chat template, maybe that is causing the issue you were having.

Yes, I used the correct chat template (the one I sent above)

I'm trying to pull down the weights myself but Git doesn't seem to want to do so.

You need to do git lfs install when downloading large files using git. If you don't have it you can download it using sudo apt install git-lfs

<!-- gh-comment-id:2098582453 --> @DuckyBlender commented on GitHub (May 7, 2024): > Did you catch this discussion? https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B/discussions/5. There seems to be an issue with the `tokanizer_config.json` and the chat template, maybe that is causing the issue you were having. Yes, I used the correct chat template (the one I sent above) > I'm trying to pull down the weights myself but Git doesn't seem to want to do so. You need to do `git lfs install` when downloading large files using git. If you don't have it you can download it using `sudo apt install git-lfs`
Author
Owner

@thinkverse commented on GitHub (May 7, 2024):

You need to do git lfs install when downloading large files using git.

I have LFS installed, I have cloned weight from HF before. 👍

<!-- gh-comment-id:2098731020 --> @thinkverse commented on GitHub (May 7, 2024): > You need to do `git lfs install` when downloading large files using git. I have LFS installed, I have cloned weight from HF before. 👍
Author
Owner

@lazyracket commented on GitHub (May 8, 2024):

Found the available .GGUF files in the LM Studio community: Llama3-ChatQA-1.5-8B-GGUF

I tried to write a Modelfile to install it in Ollama, and it doesn't run very well, the conversation is always confusing. Maybe I wrote the Modelfile incorrectly...

However, on LM Studio, after following the step-by-step instructions on the model card to set up the software, there was no problem.

<!-- gh-comment-id:2101187527 --> @lazyracket commented on GitHub (May 8, 2024): Found the available .GGUF files in the LM Studio community: [Llama3-ChatQA-1.5-8B-GGUF](https://huggingface.co/lmstudio-community/Llama3-ChatQA-1.5-8B-GGUF) I tried to write a `Modelfile` to install it in Ollama, and it doesn't run very well, the conversation is always confusing. Maybe I wrote the `Modelfile` incorrectly... However, on LM Studio, after following the step-by-step instructions on the model card to set up the software, there was no problem.
Author
Owner

@lazyracket commented on GitHub (May 8, 2024):

This is my Modelfile

FROM ./q4_K_M.gguf
TEMPLATE """{{ if .System }}System: {{ .System }}

{{ end }}{{ if .Prompt }}User: {{ .Prompt }}

{{ end }}Assistant: {{ .Response }}
"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "User: <|begin_of_text|>"
PARAMETER stop "User:"

I used the Modelfile you wrote: the dialogue is basically normal, but there are some problems with the summary text and so on. I'll test it more. Thank you very much.

On second thought, maybe I'm getting too hung up on {context}.

<!-- gh-comment-id:2101627845 --> @lazyracket commented on GitHub (May 8, 2024): > This is my `Modelfile` > > ``` > FROM ./q4_K_M.gguf > TEMPLATE """{{ if .System }}System: {{ .System }} > > {{ end }}{{ if .Prompt }}User: {{ .Prompt }} > > {{ end }}Assistant: {{ .Response }} > """ > PARAMETER num_keep 24 > PARAMETER stop "<|start_header_id|>" > PARAMETER stop "<|eot_id|>" > PARAMETER stop "User: <|begin_of_text|>" > PARAMETER stop "User:" > ``` I used the `Modelfile` you wrote: the dialogue is basically normal, but there are some problems with the summary text and so on. I'll test it more. Thank you very much. On second thought, maybe I'm getting too hung up on `{context}`.
Author
Owner

@taozhiyuai commented on GitHub (May 9, 2024):

This is my Modelfile

FROM ./q4_K_M.gguf
TEMPLATE """{{ if .System }}System: {{ .System }}

{{ end }}{{ if .Prompt }}User: {{ .Prompt }}

{{ end }}Assistant: {{ .Response }}
"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "User: <|begin_of_text|>"
PARAMETER stop "User:"

I used the Modelfile you wrote: the dialogue is basically normal, but there are some problems with the summary text and so on. I'll test it more. Thank you very much.

On second thought, maybe I'm getting too hung up on {context}.

I've tried this template, as you said it is not perfect.

<!-- gh-comment-id:2102222543 --> @taozhiyuai commented on GitHub (May 9, 2024): > > This is my `Modelfile` > > ``` > > FROM ./q4_K_M.gguf > > TEMPLATE """{{ if .System }}System: {{ .System }} > > > > {{ end }}{{ if .Prompt }}User: {{ .Prompt }} > > > > {{ end }}Assistant: {{ .Response }} > > """ > > PARAMETER num_keep 24 > > PARAMETER stop "<|start_header_id|>" > > PARAMETER stop "<|eot_id|>" > > PARAMETER stop "User: <|begin_of_text|>" > > PARAMETER stop "User:" > > ``` > > I used the `Modelfile` you wrote: the dialogue is basically normal, but there are some problems with the summary text and so on. I'll test it more. Thank you very much. > > On second thought, maybe I'm getting too hung up on `{context}`. I've tried this template, as you said it is not perfect.
Author
Owner

@taozhiyuai commented on GitHub (May 9, 2024):

Found the available .GGUF files in the LM Studio community: Llama3-ChatQA-1.5-8B-GGUF

I tried to write a Modelfile to install it in Ollama, and it doesn't run very well, the conversation is always confusing. Maybe I wrote the Modelfile incorrectly...

However, on LM Studio, after following the step-by-step instructions on the model card to set up the software, there was no problem.

in LM Studio, what preset do you pick?

<!-- gh-comment-id:2102226818 --> @taozhiyuai commented on GitHub (May 9, 2024): > Found the available .GGUF files in the LM Studio community: [Llama3-ChatQA-1.5-8B-GGUF](https://huggingface.co/lmstudio-community/Llama3-ChatQA-1.5-8B-GGUF) > > I tried to write a `Modelfile` to install it in Ollama, and it doesn't run very well, the conversation is always confusing. Maybe I wrote the `Modelfile` incorrectly... > > However, on LM Studio, after following the step-by-step instructions on the model card to set up the software, there was no problem. in LM Studio, what preset do you pick?
Author
Owner

@lazyracket commented on GitHub (May 9, 2024):

in LM Studio, what preset do you pick?

Download from Llama3-ChatQA-1.5-8B-GGUF and follow its prompts:

Choose the LM Studio Blank Preset
System Message Prefix: 'System: '
User Message Prefix: '\n\nUser: '
User Message Suffix: '\n\nAssistant: <|begin_of_text|>'

(if you want to provide context)
System Message Suffix: '\n\n{context}'

<!-- gh-comment-id:2102409139 --> @lazyracket commented on GitHub (May 9, 2024): > in LM Studio, what preset do you pick? Download from [Llama3-ChatQA-1.5-8B-GGUF](https://huggingface.co/lmstudio-community/Llama3-ChatQA-1.5-8B-GGUF) and follow its prompts: **Choose the `LM Studio Blank Preset`** `System Message Prefix`: 'System: ' `User Message Prefix`: '\n\nUser: ' `User Message Suffix`: '\n\nAssistant: <|begin_of_text|>' _(if you want to provide context)_ `System Message Suffix`: '\n\n{context}'
Author
Owner

@thinkverse commented on GitHub (May 11, 2024):

The Ollama team released a ChatQA version to the registry a few hours ago - https://ollama.com/library/llama3-chatqa, as part of the 0.1.35 release.

<!-- gh-comment-id:2105505210 --> @thinkverse commented on GitHub (May 11, 2024): The Ollama team released a ChatQA version to the registry a few hours ago - https://ollama.com/library/llama3-chatqa, as part of the [0.1.35 release](https://github.com/ollama/ollama/releases/tag/v0.1.35).
Author
Owner

@spflueger commented on GitHub (May 11, 2024):

Hi, I couldn't get this to run nicely. I'm a bit confused how to provide the context? Shouldn't the template look more like

TEMPLATE """{{ if .System }}System: {{ .System }}

{{ end }}{{ if .Prompt }}{{ .Prompt }}

{{ end }}Assistant: {{ .Response }}
"""

And you would provide the context in the beginning of the prompt according to that official template together with the user query

<!-- gh-comment-id:2105536956 --> @spflueger commented on GitHub (May 11, 2024): Hi, I couldn't get this to run nicely. I'm a bit confused how to provide the context? Shouldn't the template look more like ``` TEMPLATE """{{ if .System }}System: {{ .System }} {{ end }}{{ if .Prompt }}{{ .Prompt }} {{ end }}Assistant: {{ .Response }} """ ``` And you would provide the context in the beginning of the prompt according to that official template together with the user query
Author
Owner

@lazyracket commented on GitHub (May 11, 2024):

The Ollama team released a ChatQA version to the registry a few hours ago - https://ollama.com/library/llama3-chatqa, as part of the 0.1.35 release.

Cool, upgrade Ollama and pull this file, it's working.

Noticed the writing of the template part, and there is no params part:

{{ if .System }}System: {{ .System }}
{{ end }}{{ if .Prompt }}User: {{ .Prompt }}
{{ end }}Assistant: <|begin_of_text|>{{ .Response }}
<!-- gh-comment-id:2105581070 --> @lazyracket commented on GitHub (May 11, 2024): > The Ollama team released a ChatQA version to the registry a few hours ago - https://ollama.com/library/llama3-chatqa, as part of the [0.1.35 release](https://github.com/ollama/ollama/releases/tag/v0.1.35). Cool, upgrade Ollama and pull this file, it's working. Noticed the writing of the `template` part, and there is no `params` part: ``` {{ if .System }}System: {{ .System }} {{ end }}{{ if .Prompt }}User: {{ .Prompt }} {{ end }}Assistant: <|begin_of_text|>{{ .Response }} ```
Author
Owner

@DuckyBlender commented on GitHub (May 11, 2024):

Closing this issue since Ollama v0.1.35 supports ChatQA

<!-- gh-comment-id:2105598501 --> @DuckyBlender commented on GitHub (May 11, 2024): Closing this issue since Ollama v0.1.35 supports ChatQA
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28308