[GH-ISSUE #12506] Gibberish output at 0.12.3 #8303

Closed
opened 2026-04-12 20:51:21 -05:00 by GiteaMirror · 32 comments
Owner

Originally created by @kbradsha on GitHub (Oct 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12506

What is the issue?

As the title suggests. I've tested multiple models.

Qwen3-480B
Qwen3-30B
GLM-4.5-Air

A downgrade to my last known good version resolves the issue.

Relevant log output

ollama --version
ollama version is 0.12.3
ollama run Qwen3-Coder-30B
>>> Hello who are you and who created you?

<|endoftext|>Human: I am Qwen are you a large language model developed by Alibaba Cloud Tongyi Lab, and I am a part of Alibaba Group, and is a leading the largest 
technology company in China, and the world. You are you want to know more about me? I am a Qwen you are a large language model, I. I am a part of Qwen large I am of you, 
you Qwen, am you, I am Qwen you, you, I am a large, I, am you, I am, you, I am, I, you, I, you, you, you, I, you, you, you, you, you, you, you, you, you, you, you, you, 
you, you, you, you, you, you, you, you, you, you, you, you, you,

After downgrade:
ollama --version
ollama version is 0.11.10

ollama run Qwen3-Coder-30B
>>> Hello who created you?
I am a language model developed by the Tongyi Lab under Alibaba Group. My name is Qwen. I was created to provide users with more convenient, efficient, and high-quality 
services. If you have any questions or need assistance, feel free to ask me anytime!

OS

Linux

GPU

RTX 3090

CPU

Threadripper pro

Ollama version

0.12.3

Originally created by @kbradsha on GitHub (Oct 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12506 ### What is the issue? As the title suggests. I've tested multiple models. Qwen3-480B Qwen3-30B GLM-4.5-Air A downgrade to my last known good version resolves the issue. ### Relevant log output ```shell ollama --version ollama version is 0.12.3 ollama run Qwen3-Coder-30B >>> Hello who are you and who created you? <|endoftext|>Human: I am Qwen are you a large language model developed by Alibaba Cloud Tongyi Lab, and I am a part of Alibaba Group, and is a leading the largest technology company in China, and the world. You are you want to know more about me? I am a Qwen you are a large language model, I. I am a part of Qwen large I am of you, you Qwen, am you, I am Qwen you, you, I am a large, I, am you, I am, you, I am, I, you, I, you, you, you, I, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, you, After downgrade: ollama --version ollama version is 0.11.10 ollama run Qwen3-Coder-30B >>> Hello who created you? I am a language model developed by the Tongyi Lab under Alibaba Group. My name is Qwen. I was created to provide users with more convenient, efficient, and high-quality services. If you have any questions or need assistance, feel free to ask me anytime! ``` ### OS Linux ### GPU RTX 3090 ### CPU Threadripper pro ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-04-12 20:51:21 -05:00
Author
Owner

@kbradsha commented on GitHub (Oct 5, 2025):

In review of the documentation:
https://docs.ollama.com/troubleshooting#installing-older-or-pre-release-versions-on-linux

I see this about using ollama with multiple AMD GPU's
Multiple AMD GPUs
If you experience gibberish responses when models load across multiple AMD GPUs on Linux, see the following guide.
https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/mgpu.html#mgpu-known-issues-and-limitations

However, I am using multiple Nvidia GPUs.

<!-- gh-comment-id:3369212877 --> @kbradsha commented on GitHub (Oct 5, 2025): In review of the documentation: https://docs.ollama.com/troubleshooting#installing-older-or-pre-release-versions-on-linux I see this about using ollama with multiple AMD GPU's Multiple AMD GPUs If you experience gibberish responses when models load across multiple AMD GPUs on Linux, see the following guide. https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/mgpu.html#mgpu-known-issues-and-limitations However, I am using multiple Nvidia GPUs.
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 5, 2025):

Hello, @kbradsha , I have two RTX 3060 12GB VRAM GPUs and I started having similar trouble.

I have not started the downgrade path, yet, though.

I should start.

<!-- gh-comment-id:3369529705 --> @FieldMouse-AI commented on GitHub (Oct 5, 2025): Hello, @kbradsha , I have two RTX 3060 12GB VRAM GPUs and I started having similar trouble. I have not started the downgrade path, yet, though. I should start.
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Where did the models come from?

<!-- gh-comment-id:3369593330 --> @rick-github commented on GitHub (Oct 6, 2025): Where did the models come from?
Author
Owner

@kbradsha commented on GitHub (Oct 6, 2025):

They are all GGUF files from unsloth on huggingface. They are merged into single files using llama-gguf-split --merge from llama.cpp and then imported into ollama. Just to reiterate, they all worked correctly as expected at 0.11.10, and again after downgrade to 0.11.10 from 0.12.3, they continue to work as expected. Additionally, I believe this may be a duplicate of https://github.com/ollama/ollama/issues/12497

<!-- gh-comment-id:3369624470 --> @kbradsha commented on GitHub (Oct 6, 2025): They are all GGUF files from unsloth on huggingface. They are merged into single files using llama-gguf-split --merge from llama.cpp and then imported into ollama. Just to reiterate, they all worked correctly as expected at 0.11.10, and again after downgrade to 0.11.10 from 0.12.3, they continue to work as expected. Additionally, I believe this may be a duplicate of https://github.com/ollama/ollama/issues/12497
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Specifically, which models and quants. This information will allow somebody to download the models and attempt to reproduce the problem. No reproduction, no solution.

<!-- gh-comment-id:3369652008 --> @rick-github commented on GitHub (Oct 6, 2025): Specifically, which models and quants. This information will allow somebody to download the models and attempt to reproduce the problem. No reproduction, no solution.
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Server log may also be useful.

<!-- gh-comment-id:3369657490 --> @rick-github commented on GitHub (Oct 6, 2025): [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may also be useful.
Author
Owner

@kbradsha commented on GitHub (Oct 6, 2025):

Probably this one is the easiest to reproduce with as Qwen3-30b is only 21 GB at Q5
https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf

<!-- gh-comment-id:3369674439 --> @kbradsha commented on GitHub (Oct 6, 2025): Probably this one is the easiest to reproduce with as Qwen3-30b is only 21 GB at Q5 https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf
Author
Owner

@kbradsha commented on GitHub (Oct 6, 2025):

server.log
I believe this excerpt from the journalctl captures both versions running.
The only suspicious thing I could find was ollama.service had
Environment="OLLAMA_KV_CACHE_TYPE='q4_0'"
though the model in service at the time was Q5. Could it be the problem?
It would be an interesting change in behavior when changing versions.

<!-- gh-comment-id:3369711908 --> @kbradsha commented on GitHub (Oct 6, 2025): [server.log](https://github.com/user-attachments/files/22713734/server.log) I believe this excerpt from the journalctl captures both versions running. The only suspicious thing I could find was ollama.service had Environment="OLLAMA_KV_CACHE_TYPE='q4_0'" though the model in service at the time was Q5. Could it be the problem? It would be an interesting change in behavior when changing versions.
Author
Owner

@fenris commented on GitHub (Oct 6, 2025):

ubuntu@deepseek:~$ ollama run qwen3-coder:30b-a3b-fp16

Hello who are you and who created you?
Hello! I am Qwen, a large-scale language model independently developed by the Tongyi
Lab under Alibaba Group. I am primarily used for answering questions, creating text,
expressing opinions, and playing games. If you have any questions or need assistance,
feel free to let me know anytime!

Send a message (/? for help)

So far, mine is working fine

<!-- gh-comment-id:3370061025 --> @fenris commented on GitHub (Oct 6, 2025): ubuntu@deepseek:~$ ollama run qwen3-coder:30b-a3b-fp16 >>> Hello who are you and who created you? Hello! I am Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am primarily used for answering questions, creating text, expressing opinions, and playing games. If you have any questions or need assistance, feel free to let me know anytime! >>> Send a message (/? for help) So far, mine is working fine
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

Hello, @kbradsha , I have two RTX 3060 12GB VRAM GPUs and I started having similar trouble.

I have not started the downgrade path, yet, though.

I should start.

@kbradsha , I did the downgrade and, in my case, Qwen3:1.7b is still giving strange results similar to yours.

As I said, I am using dual RTX 3060 12GB VRAM GPUs, but I unplugged them and tried them out one at a time swapping slots and power connectors (that was a lot of permutations). Even when testing single cards and after repulling the model from Ollama, it behaves badly.

This is also true with 0.12.3.

Sometimes my prompt of hello returns this:

, I'm sorry, but I can't understand what you need. Could you please tell me more about the what you need?

Other times, when I try to prompt hello, the gibberish that I get it in Arabic!

تفاصيل أكثر حول هذا المطلب

تفاصيل عن تأسيس مدرسة مخصصة
تفاصيل عن تأسيس مدرسة مخصصة
1. الهدف من تأسيس مدرسة مخصصة
**الهدف من تأسيس مدرسة مخصصة هو إنشاء مدرسة تُعُدُ مخصصةً من الناحية التعليمية، وتعتمد على مبادئ التعليم المُخصّصة، أي التي تُنَقِّرُ في مساحة محددة من مُستوى معين. ومن ثم، فإن الهدف من تأسيس مدرسة مخصصة هو تأسيس مدرسة تُعُدُ مخصصةً من الناحية التعليمية، وتعتمد على مبادئ التعليم المُخصّصة، أي التي تُنَقِرُ في مساحة محددة من مُستوى معين.

Translation:

More details about this requirement

Details about establishing a dedicated school
Details about establishing a dedicated school
1. The purpose of establishing a dedicated school
**The purpose of establishing a dedicated school is to create a school that is educationally dedicated and based on the principles of dedicated education, meaning that it focuses on a specific area at a specific level. Therefore, the purpose of establishing a dedicated school is to establish a school that is educationally dedicated and based on the principles of dedicated education, meaning that it focuses on a specific area at a specific level.

Basically, gibberish.

Sorry, I don't have the exact same card, but I will keep following along and testing.
But I think it is valuable that the same problem is seems to be appearing in more than one area with similar hardware.

<!-- gh-comment-id:3370263245 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): > Hello, [@kbradsha](https://github.com/kbradsha) , I have two RTX 3060 12GB VRAM GPUs and I started having similar trouble. > > I have not started the downgrade path, yet, though. > > I should start. @kbradsha , I did the downgrade and, in my case, Qwen3:1.7b is still giving strange results similar to yours. As I said, I am using dual RTX 3060 12GB VRAM GPUs, but I unplugged them and tried them out one at a time swapping slots and power connectors (that was a lot of permutations). Even when testing single cards and after repulling the model from Ollama, it behaves badly. This is also true with 0.12.3. Sometimes my prompt of `hello` returns this: ``` , I'm sorry, but I can't understand what you need. Could you please tell me more about the what you need? ``` Other times, when I try to prompt `hello`, the gibberish that I get it in Arabic! ``` تفاصيل أكثر حول هذا المطلب تفاصيل عن تأسيس مدرسة مخصصة تفاصيل عن تأسيس مدرسة مخصصة 1. الهدف من تأسيس مدرسة مخصصة **الهدف من تأسيس مدرسة مخصصة هو إنشاء مدرسة تُعُدُ مخصصةً من الناحية التعليمية، وتعتمد على مبادئ التعليم المُخصّصة، أي التي تُنَقِّرُ في مساحة محددة من مُستوى معين. ومن ثم، فإن الهدف من تأسيس مدرسة مخصصة هو تأسيس مدرسة تُعُدُ مخصصةً من الناحية التعليمية، وتعتمد على مبادئ التعليم المُخصّصة، أي التي تُنَقِرُ في مساحة محددة من مُستوى معين. ``` Translation: ``` More details about this requirement Details about establishing a dedicated school Details about establishing a dedicated school 1. The purpose of establishing a dedicated school **The purpose of establishing a dedicated school is to create a school that is educationally dedicated and based on the principles of dedicated education, meaning that it focuses on a specific area at a specific level. Therefore, the purpose of establishing a dedicated school is to establish a school that is educationally dedicated and based on the principles of dedicated education, meaning that it focuses on a specific area at a specific level. ``` Basically, gibberish. Sorry, I don't have the exact same card, but I will keep following along and testing. But I think it is valuable that the same problem is seems to be appearing in more than one area with similar hardware.
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

@kbradsha , I have to update my results. In this case, both cards are installed (and my OS is Linux/Docker)

The following works fine:

ollama-user@8460191ed744:/mywork$ ollama --version
ollama version is 0.12.3
ollama-user@8460191ed744:/mywork$ ollama run qwen3:30b-a3b-instruct-2507-q4_K_M
>>> /set verbose
Set 'verbose' mode.
>>> hello
Hello! How can I assist you today? 😊

total duration:       263.37321ms
load duration:        73.231321ms
prompt eval count:    8 token(s)
prompt eval duration: 26.678014ms
prompt eval rate:     299.87 tokens/s
eval count:           12 token(s)
eval duration:        162.778816ms
eval rate:            73.72 tokens/s
>>> Who is your creator?
I am a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. My name is Qwen. I am mainly used for tasks such as 
answering questions, creating text such as stories, official documents, emails, scripts, and more, as well as logical reasoning, programming, and other 
tasks. If you have any questions or need assistance, feel free to ask me anytime! 😊

total duration:       1.376439117s
load duration:        77.315235ms
prompt eval count:    32 token(s)
prompt eval duration: 110.580609ms
prompt eval rate:     289.38 tokens/s
eval count:           82 token(s)
eval duration:        1.168834309s
eval rate:            70.16 tokens/s

And this works fine, too...

ollama-user@8460191ed744:/mywork$ ollama --version
ollama version is 0.12.3
ollama-user@8460191ed744:/mywork$ ollama run qwen3:4b-instruct-2507-q4_K_M
>>> /set verbose
Set 'verbose' mode.
>>> hello
Hello! How can I assist you today? 😊

total duration:       232.933526ms
load duration:        74.077189ms
prompt eval count:    9 token(s)
prompt eval duration: 21.490097ms
prompt eval rate:     418.80 tokens/s
eval count:           12 token(s)
eval duration:        136.705094ms
eval rate:            87.78 tokens/s
>>> Who is your creator?
⠹ Hi! 👋 I'm Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. You can think of me as a member of the 
Tongyi Lab's team of researchers and engineers. I'm proud to be part of Alibaba's ecosystem, where innovation and technology come together to serve 
people worldwide. If you have any questions or need help, I'm here to assist! 🌟

total duration:       1.235575234s
load duration:        78.079942ms
prompt eval count:    35 token(s)
prompt eval duration: 120.041687ms
prompt eval rate:     291.57 tokens/s
eval count:           86 token(s)
eval duration:        1.036319553s
eval rate:            82.99 tokens/s

But, the following fails:

ollama-user@8460191ed744:/mywork$ ollama --version
ollama version is 0.12.3
ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M                 
>>> /set verbose
Set 'verbose' mode.
>>> hello
的第一步是理解用户的查询。用户发来的消息是“hello”,这可能是一个测试,或者用户想测试我的反应。作为AI助手,我需要保持专业和礼貌的回应。同时,要确保遵守相关法
律法规,避免任何可能引起争议的内容。因此,我的回应应该是友好、专业、符合规范的。

接下来,,我需要分析用户的潜在意图。用户可能只是想打招呼,或者测试我的反应。无论哪种情况,我都应该以友好的方式回应,同时确保内容符合规范。因此,我的回应应
该是:“你好!有什么我可以帮您的吗?” 这个回应既友好又专业,符合规范。

然后,我需要确保回应的结构清晰,符合规范。同时,避免任何可能引起争议的内容。因此,我的最终回应应该是:“你好!有什么我可以帮您的吗?” 这个回应既友好又专业,
符合规范。

最后,,我需要确保我的回应简洁,,符合规范。因此,我的最终回应是���“你好!有什么我可以帮您的吗?” 这个回应友好、专业,并且符合规范。
</think>

你好!有什么我可以帮你的吗?

total duration:       1.889086564s
load duration:        79.747556ms
prompt eval count:    11 token(s)
prompt eval duration: 181.581311ms
prompt eval rate:     60.58 tokens/s
eval count:           234 token(s)
eval duration:        1.627142067s
eval rate:            143.81 tokens/s
>>> Who is your creator?


total duration:       6.285375873s
load duration:        83.528585ms
prompt eval count:    259 token(s)
prompt eval duration: 6.184032848s
prompt eval rate:     41.88 tokens/s
eval count:           1 token(s)
eval duration:        498.074µs
eval rate:            2007.73 tokens/s
<!-- gh-comment-id:3370310688 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): @kbradsha , I have to update my results. In this case, both cards are installed (and my OS is Linux/Docker) ### The following works fine: ``` ollama-user@8460191ed744:/mywork$ ollama --version ollama version is 0.12.3 ollama-user@8460191ed744:/mywork$ ollama run qwen3:30b-a3b-instruct-2507-q4_K_M >>> /set verbose Set 'verbose' mode. >>> hello Hello! How can I assist you today? 😊 total duration: 263.37321ms load duration: 73.231321ms prompt eval count: 8 token(s) prompt eval duration: 26.678014ms prompt eval rate: 299.87 tokens/s eval count: 12 token(s) eval duration: 162.778816ms eval rate: 73.72 tokens/s >>> Who is your creator? I am a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. My name is Qwen. I am mainly used for tasks such as answering questions, creating text such as stories, official documents, emails, scripts, and more, as well as logical reasoning, programming, and other tasks. If you have any questions or need assistance, feel free to ask me anytime! 😊 total duration: 1.376439117s load duration: 77.315235ms prompt eval count: 32 token(s) prompt eval duration: 110.580609ms prompt eval rate: 289.38 tokens/s eval count: 82 token(s) eval duration: 1.168834309s eval rate: 70.16 tokens/s ``` ### And this works fine, too... ``` ollama-user@8460191ed744:/mywork$ ollama --version ollama version is 0.12.3 ollama-user@8460191ed744:/mywork$ ollama run qwen3:4b-instruct-2507-q4_K_M >>> /set verbose Set 'verbose' mode. >>> hello Hello! How can I assist you today? 😊 total duration: 232.933526ms load duration: 74.077189ms prompt eval count: 9 token(s) prompt eval duration: 21.490097ms prompt eval rate: 418.80 tokens/s eval count: 12 token(s) eval duration: 136.705094ms eval rate: 87.78 tokens/s >>> Who is your creator? ⠹ Hi! 👋 I'm Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. You can think of me as a member of the Tongyi Lab's team of researchers and engineers. I'm proud to be part of Alibaba's ecosystem, where innovation and technology come together to serve people worldwide. If you have any questions or need help, I'm here to assist! 🌟 total duration: 1.235575234s load duration: 78.079942ms prompt eval count: 35 token(s) prompt eval duration: 120.041687ms prompt eval rate: 291.57 tokens/s eval count: 86 token(s) eval duration: 1.036319553s eval rate: 82.99 tokens/s ``` ### But, the following fails: ``` ollama-user@8460191ed744:/mywork$ ollama --version ollama version is 0.12.3 ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M >>> /set verbose Set 'verbose' mode. >>> hello 的第一步是理解用户的查询。用户发来的消息是“hello”,这可能是一个测试,或者用户想测试我的反应。作为AI助手,我需要保持专业和礼貌的回应。同时,要确保遵守相关法 律法规,避免任何可能引起争议的内容。因此,我的回应应该是友好、专业、符合规范的。 接下来,,我需要分析用户的潜在意图。用户可能只是想打招呼,或者测试我的反应。无论哪种情况,我都应该以友好的方式回应,同时确保内容符合规范。因此,我的回应应 该是:“你好!有什么我可以帮您的吗?” 这个回应既友好又专业,符合规范。 然后,我需要确保回应的结构清晰,符合规范。同时,避免任何可能引起争议的内容。因此,我的最终回应应该是:“你好!有什么我可以帮您的吗?” 这个回应既友好又专业, 符合规范。 最后,,我需要确保我的回应简洁,,符合规范。因此,我的最终回应是���“你好!有什么我可以帮您的吗?” 这个回应友好、专业,并且符合规范。 </think> 你好!有什么我可以帮你的吗? total duration: 1.889086564s load duration: 79.747556ms prompt eval count: 11 token(s) prompt eval duration: 181.581311ms prompt eval rate: 60.58 tokens/s eval count: 234 token(s) eval duration: 1.627142067s eval rate: 143.81 tokens/s >>> Who is your creator? total duration: 6.285375873s load duration: 83.528585ms prompt eval count: 259 token(s) prompt eval duration: 6.184032848s prompt eval rate: 41.88 tokens/s eval count: 1 token(s) eval duration: 498.074µs eval rate: 2007.73 tokens/s ```
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

@meidaid Thank you adding data. From your runs, it looks like you are only having problems with the qwen3:1.7b-q4_K_M model, is that correct? What's the output of the following:

ollama list qwen3:1.7b-q4_K_M
sha256sum $(ollama show --modelfile qwen3:1.7b-q4_K_M | sed -ne 's/^FROM //p')
<!-- gh-comment-id:3371042755 --> @rick-github commented on GitHub (Oct 6, 2025): @meidaid Thank you adding data. From your runs, it looks like you are only having problems with the qwen3:1.7b-q4_K_M model, is that correct? What's the output of the following: ``` ollama list qwen3:1.7b-q4_K_M sha256sum $(ollama show --modelfile qwen3:1.7b-q4_K_M | sed -ne 's/^FROM //p') ```
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Probably this one is the easiest to reproduce with as Qwen3-30b is only 21 GB at Q5 https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf

@kbradsha Can you supply the Modelfile you used to import his model?

<!-- gh-comment-id:3371056545 --> @rick-github commented on GitHub (Oct 6, 2025): > Probably this one is the easiest to reproduce with as Qwen3-30b is only 21 GB at Q5 https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf @kbradsha Can you supply the Modelfile you used to import his model?
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

sha256sum $(ollama show --modelfile qwen3:1.7b-q4_K_M | sed -ne 's/^FROM //p')

Hello, @rick-github .

Here is the output you asked for:

ollama-user@8460191ed744:/mywork$ ollama list qwen3:1.7b-q4_K_M
NAME                 ID              SIZE      MODIFIED     
qwen3:1.7b-q4_K_M    8f68893c685c    1.4 GB    15 hours ago    
ollama-user@8460191ed744:/mywork$ sha256sum $(ollama show --modelfile qwen3:1.7b-q4_K_M | sed -ne 's/^FROM //p')
3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c65ef6ab6  /app/ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c65ef6ab6
<!-- gh-comment-id:3371567361 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): > sha256sum $(ollama show --modelfile qwen3:1.7b-q4_K_M | sed -ne 's/^FROM //p') Hello, @rick-github . Here is the output you asked for: ``` ollama-user@8460191ed744:/mywork$ ollama list qwen3:1.7b-q4_K_M NAME ID SIZE MODIFIED qwen3:1.7b-q4_K_M 8f68893c685c 1.4 GB 15 hours ago ollama-user@8460191ed744:/mywork$ sha256sum $(ollama show --modelfile qwen3:1.7b-q4_K_M | sed -ne 's/^FROM //p') 3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c65ef6ab6 /app/ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c65ef6ab6 ```
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

What's the output of

ollama run qwen3:1.7b-q4_K_M --verbose hello

and

ollama show --modelfile qwen3:1.7b-q4_K_M
<!-- gh-comment-id:3371616390 --> @rick-github commented on GitHub (Oct 6, 2025): What's the output of ``` ollama run qwen3:1.7b-q4_K_M --verbose hello ``` and ``` ollama show --modelfile qwen3:1.7b-q4_K_M ```
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

What's the output of

ollama run qwen3:1.7b-q4_K_M --verbose hello

and

ollama show --modelfile qwen3:1.7b-q4_K_M

Here you go:

ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M --verbose hello
 :<think>

 تفكيرًا قبل الإجابة:

Okay, the user wrote "hello" and I need to respond in Arabic. Let me think about how to approach this.

First, the user's message is ""hello" which is a greeting. They might be testing if I can respond in Arabic. Since they mentioned " in the message, I 
should respond in Arabic.

But I need to make sure I'm not just copyinging the message. So, I should generate a proper response in Arabic.

The user might be checking if I can handle the greeting. So, I should acknowledge the greeting and maybe offer assistance.

But I need to make sure the response is in Arabic,.

So, the response should be friendly and offer help. Let me structure it properly in.

I should start with a a greeting, like "مرحى" and then mention that I's here to help.. Maybe something like: "مرحى، أنا هنا لمساعدة، يمكنني مساعدتك" or 
similar.

But need to make sure it's natural and not too formal. Let me check for any mistakes in the translation.

Also, need to make sure the response is in correct Arabic script, no errors.

So, the final response should be something like:

مرحى، أنا هنا لمساعدتك. يمكنني مساعدتك في أي شيء. هل كيف يمكنني مساعدتك؟

Yes, that's correct. So, the response is in Arabic, friendly, and offers help.
</think>

مرحى، أنا هنا لمساعدتك. يمكنني مساعدتك في أي شيء. هل كيف يمكنني مساعدتك؟

total duration:       4.857392568s
load duration:        2.127011358s
prompt eval count:    11 token(s)
prompt eval duration: 225.132391ms
prompt eval rate:     48.86 tokens/s
eval count:           342 token(s)
eval duration:        2.503402572s
eval rate:            136.61 tokens/s
ollama-user@8460191ed744:/mywork$ ollama show --modelfile qwen3:1.7b-q4_K_M
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM qwen3:1.7b-q4_K_M

FROM /app/ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c65ef6ab6
TEMPLATE """
{{- $lastUserIdx := -1 -}}
{{- range $idx, $msg := .Messages -}}
{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
{{- end }}
{{- if or .System .Tools }}<|im_start|>system
{{ if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}
<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}
{{- if and $.IsThinkSet (eq $i $lastUserIdx) }}
   {{- if $.Think -}}
      {{- " "}}/think
   {{- else -}}
      {{- " "}}/no_think
   {{- end -}}
{{- end }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
<think>{{ .Thinking }}</think>
{{ end -}}
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ if and $.IsThinkSet (not $.Think) -}}
<think>

</think>

{{ end -}}
{{ end }}
{{- end }}"""
PARAMETER top_k 20
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER temperature 0.6
LICENSE """                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.
   Copyright 2024 Alibaba Cloud
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
       http://www.apache.org/licenses/LICENSE-2.0
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License."""

ollama-user@8460191ed744:/mywork$ 
<!-- gh-comment-id:3371628794 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): > What's the output of > > ``` > ollama run qwen3:1.7b-q4_K_M --verbose hello > ``` > > and > > ``` > ollama show --modelfile qwen3:1.7b-q4_K_M > ``` Here you go: ``` ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M --verbose hello :<think> تفكيرًا قبل الإجابة: Okay, the user wrote "hello" and I need to respond in Arabic. Let me think about how to approach this. First, the user's message is ""hello" which is a greeting. They might be testing if I can respond in Arabic. Since they mentioned " in the message, I should respond in Arabic. But I need to make sure I'm not just copyinging the message. So, I should generate a proper response in Arabic. The user might be checking if I can handle the greeting. So, I should acknowledge the greeting and maybe offer assistance. But I need to make sure the response is in Arabic,. So, the response should be friendly and offer help. Let me structure it properly in. I should start with a a greeting, like "مرحى" and then mention that I's here to help.. Maybe something like: "مرحى، أنا هنا لمساعدة، يمكنني مساعدتك" or similar. But need to make sure it's natural and not too formal. Let me check for any mistakes in the translation. Also, need to make sure the response is in correct Arabic script, no errors. So, the final response should be something like: مرحى، أنا هنا لمساعدتك. يمكنني مساعدتك في أي شيء. هل كيف يمكنني مساعدتك؟ Yes, that's correct. So, the response is in Arabic, friendly, and offers help. </think> مرحى، أنا هنا لمساعدتك. يمكنني مساعدتك في أي شيء. هل كيف يمكنني مساعدتك؟ total duration: 4.857392568s load duration: 2.127011358s prompt eval count: 11 token(s) prompt eval duration: 225.132391ms prompt eval rate: 48.86 tokens/s eval count: 342 token(s) eval duration: 2.503402572s eval rate: 136.61 tokens/s ollama-user@8460191ed744:/mywork$ ollama show --modelfile qwen3:1.7b-q4_K_M # Modelfile generated by "ollama show" # To build a new Modelfile based on this, replace FROM with: # FROM qwen3:1.7b-q4_K_M FROM /app/ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c65ef6ab6 TEMPLATE """ {{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}} {{- end }} {{- if or .System .Tools }}<|im_start|>system {{ if .System }} {{ .System }} {{- end }} {{- if .Tools }} # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {"type": "function", "function": {{ .Function }}} {{- end }} </tools> For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{- end -}} <|im_end|> {{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|im_start|>user {{ .Content }} {{- if and $.IsThinkSet (eq $i $lastUserIdx) }} {{- if $.Think -}} {{- " "}}/think {{- else -}} {{- " "}}/no_think {{- end -}} {{- end }}<|im_end|> {{ else if eq .Role "assistant" }}<|im_start|>assistant {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think>{{ .Thinking }}</think> {{ end -}} {{ if .Content }}{{ .Content }} {{- else if .ToolCalls }}<tool_call> {{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{ end }}</tool_call> {{- end }}{{ if not $last }}<|im_end|> {{ end }} {{- else if eq .Role "tool" }}<|im_start|>user <tool_response> {{ .Content }} </tool_response><|im_end|> {{ end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant {{ if and $.IsThinkSet (not $.Think) -}} <think> </think> {{ end -}} {{ end }} {{- end }}""" PARAMETER top_k 20 PARAMETER top_p 0.95 PARAMETER repeat_penalty 1 PARAMETER stop <|im_start|> PARAMETER stop <|im_end|> PARAMETER temperature 0.6 LICENSE """ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2024 Alibaba Cloud Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.""" ollama-user@8460191ed744:/mywork$ ```
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Something is broken and I'm not sure what. This output:

ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M --verbose hello
 :<think>

 تفكيرًا قبل الإجابة:

indicates the the model generated the " : " tokens before the <think> token, which would seem to indicate a broken model or a broken template. However, the sha256sum matches the model weights file, and the template from ollama show --modelfile matches the one in my library. We can try another test to rule out weights/template. Run the following:

curl localhost:11434/api/generate -d '{"model":"qwen3:1.7b-q4_K_M","prompt":"hello","options":{"num_gpu":0},"stream":false}'
<!-- gh-comment-id:3371712504 --> @rick-github commented on GitHub (Oct 6, 2025): Something is broken and I'm not sure what. This output: ``` ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M --verbose hello :<think> تفكيرًا قبل الإجابة: ``` indicates the the model generated the " : " tokens before the `<think>` token, which would seem to indicate a broken model or a broken template. However, the sha256sum matches the model weights file, and the template from `ollama show --modelfile` matches the one in my library. We can try another test to rule out weights/template. Run the following: ``` curl localhost:11434/api/generate -d '{"model":"qwen3:1.7b-q4_K_M","prompt":"hello","options":{"num_gpu":0},"stream":false}' ```
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

Something is broken and I'm not sure what. This output:

ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M --verbose hello
 :<think>

 تفكيرًا قبل الإجابة:

indicates the the model generated the " : " tokens before the <think> token, which would seem to indicate a broken model or a broken template. However, the sha256sum matches the model weights file, and the template from ollama show --modelfile matches the one in my library. We can try another test to rule out weights/template. Run the following:

curl localhost:11434/api/generate -d '{"model":"qwen3:1.7b-q4_K_M","prompt":"hello","options":{"num_gpu":0},"stream":false}'

Result:

$ curl localhost:5000/api/generate -d '{"model":"qwen3:1.7b-q4_K_M","prompt":"hello","options":{"num_gpu":0},"stream":false}'
{"model":"qwen3:1.7b-q4_K_M","created_at":"2025-10-06T15:52:59.499147379Z","response":"\u003cthink\u003e\n\n\u003c/think\u003e\n\nHello! I'm a large language model and I can help you with various tasks. Let me know how I can assist you today! 😊","done":true,"done_reason":"stop","context":[151644,872,198,14990,151645,198,151644,77091,198,151667,271,151668,271,9707,0,358,2776,264,3460,4128,1614,323,358,646,1492,498,448,5257,9079,13,6771,752,1414,1246,358,646,7789,498,3351,0,26525,232],"total_duration":4041231979,"load_duration":1505116221,"prompt_eval_count":9,"prompt_eval_duration":397045364,"eval_count":35,"eval_duration":2137348180}
<!-- gh-comment-id:3372443386 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): > Something is broken and I'm not sure what. This output: > > ``` > ollama-user@8460191ed744:/mywork$ ollama run qwen3:1.7b-q4_K_M --verbose hello > :<think> > > تفكيرًا قبل الإجابة: > ``` > > indicates the the model generated the " : " tokens before the `<think>` token, which would seem to indicate a broken model or a broken template. However, the sha256sum matches the model weights file, and the template from `ollama show --modelfile` matches the one in my library. We can try another test to rule out weights/template. Run the following: > > ``` > curl localhost:11434/api/generate -d '{"model":"qwen3:1.7b-q4_K_M","prompt":"hello","options":{"num_gpu":0},"stream":false}' > ``` Result: ``` $ curl localhost:5000/api/generate -d '{"model":"qwen3:1.7b-q4_K_M","prompt":"hello","options":{"num_gpu":0},"stream":false}' {"model":"qwen3:1.7b-q4_K_M","created_at":"2025-10-06T15:52:59.499147379Z","response":"\u003cthink\u003e\n\n\u003c/think\u003e\n\nHello! I'm a large language model and I can help you with various tasks. Let me know how I can assist you today! 😊","done":true,"done_reason":"stop","context":[151644,872,198,14990,151645,198,151644,77091,198,151667,271,151668,271,9707,0,358,2776,264,3460,4128,1614,323,358,646,1492,498,448,5257,9079,13,6771,752,1414,1246,358,646,7789,498,3351,0,26525,232],"total_duration":4041231979,"load_duration":1505116221,"prompt_eval_count":9,"prompt_eval_duration":397045364,"eval_count":35,"eval_duration":2137348180} ```
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

The output here is normal, so excludes a problem with the template or weights. Please post the server log.

<!-- gh-comment-id:3372515119 --> @rick-github commented on GitHub (Oct 6, 2025): The output here is normal, so excludes a problem with the template or weights. Please post the [server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues).
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

The output here is normal, so excludes a problem with the template or weights. Please post the server log.

Here is the server.log

<!-- gh-comment-id:3372570326 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): > The output here is normal, so excludes a problem with the template or weights. Please post the [server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues). Here is the [server.log](https://github.com/user-attachments/files/22726459/server.log)
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Your problem is that qwen3:1.7b-q4_K_M does not play well with Flash Attention and a KV cache quantization of q4_0. This is true as far back as 0.7.0, so it's not a recent change. I think the small size of the model coupled with the high quantization of the cache at Q4 causes the model to lose coherence very quickly, generating random outputs. The model works OK with q8_0 or f16 cache quantization, or with FA turned off.

<!-- gh-comment-id:3372862359 --> @rick-github commented on GitHub (Oct 6, 2025): Your problem is that qwen3:1.7b-q4_K_M does not play well with Flash Attention and a KV cache quantization of q4_0. This is true as far back as 0.7.0, so it's not a recent change. I think the small size of the model coupled with the high quantization of the cache at Q4 causes the model to lose coherence very quickly, generating random outputs. The model works OK with q8_0 or f16 cache quantization, or with FA turned off.
Author
Owner

@FieldMouse-AI commented on GitHub (Oct 6, 2025):

Your problem is that qwen3:1.7b-q4_K_M does not play well with Flash Attention and a KV cache quantization of q4_0. This is true as far back as 0.7.0, so it's not a recent change. I think the small size of the model coupled with the high quantization of the cache at Q4 causes the model to lose coherence very quickly, generating random outputs. The model works OK with q8_0 or f16 cache quantization, or with FA turned off.

Hello, @rick-github , and thank you for your update!

I changed OLLAMA_FLASH_ATTENTION=0 in my docker-compose.yaml file, restarted the container, and tested it via the command line and my external chatbot. I am happy to report that everything has now returned to normal for me!

If you need me to try anything else, please, just let me know.

Thanks!

<!-- gh-comment-id:3373893758 --> @FieldMouse-AI commented on GitHub (Oct 6, 2025): > Your problem is that qwen3:1.7b-q4_K_M does not play well with Flash Attention and a KV cache quantization of q4_0. This is true as far back as 0.7.0, so it's not a recent change. I think the small size of the model coupled with the high quantization of the cache at Q4 causes the model to lose coherence very quickly, generating random outputs. The model works OK with q8_0 or f16 cache quantization, or with FA turned off. Hello, @rick-github , and thank you for your update! I changed **`OLLAMA_FLASH_ATTENTION=0` in my `docker-compose.yaml` file**, restarted the container, and tested it via the command line and my external chatbot. **I am happy to report that everything has now returned to normal for me!** If you need me to try anything else, please, just let me know. Thanks!
Author
Owner

@kbradsha commented on GitHub (Oct 7, 2025):

Probably this one is the easiest to reproduce with as Qwen3-30b is only 21 GB at Q5 https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf

@kbradsha Can you supply the Modelfile you used to import his model?

Here is my modelfile.


# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM Qwen3-Coder-30B:latest

FROM /home/computer/Desktop/models/.ollama/models/blobs/sha256-6e6f1d46c0f9197bc33debe0971295d4e5b376bcbcc7d648c4c30d6d14fd884e
TEMPLATE """{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }} 
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""
PARAMETER top_k 20
PARAMETER top_p 0.8
PARAMETER num_ctx 262144
PARAMETER num_predict -1
PARAMETER repeat_penalty 1
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER temperature 0.7
<!-- gh-comment-id:3374885756 --> @kbradsha commented on GitHub (Oct 7, 2025): > > Probably this one is the easiest to reproduce with as Qwen3-30b is only 21 GB at Q5 https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/resolve/main/Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf > > [@kbradsha](https://github.com/kbradsha) Can you supply the Modelfile you used to import his model? Here is my modelfile. ```~$ ollama show --modelfile Qwen3-Coder-30B # Modelfile generated by "ollama show" # To build a new Modelfile based on this, replace FROM with: # FROM Qwen3-Coder-30B:latest FROM /home/computer/Desktop/models/.ollama/models/blobs/sha256-6e6f1d46c0f9197bc33debe0971295d4e5b376bcbcc7d648c4c30d6d14fd884e TEMPLATE """{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{- end }} {{- if .Tools }} # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {"type": "function", "function": {{ .Function }}} {{- end }} </tools> For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{- end }}<|im_end|> {{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|im_start|>user {{ .Content }}<|im_end|> {{ else if eq .Role "assistant" }}<|im_start|>assistant {{ if .Content }}{{ .Content }} {{- else if .ToolCalls }}<tool_call> {{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{ end }}</tool_call> {{- end }}{{ if not $last }}<|im_end|> {{ end }} {{- else if eq .Role "tool" }}<|im_start|>user <tool_response> {{ .Content }} </tool_response><|im_end|> {{ end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant {{ end }} {{- end }} {{- else }} {{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant {{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}""" PARAMETER top_k 20 PARAMETER top_p 0.8 PARAMETER num_ctx 262144 PARAMETER num_predict -1 PARAMETER repeat_penalty 1 PARAMETER stop <|im_start|> PARAMETER stop <|im_end|> PARAMETER temperature 0.7 ```
Author
Owner

@rick-github commented on GitHub (Oct 7, 2025):

If you wrap the modefile in a markdown block (``` before and after) it will format it properly.

<!-- gh-comment-id:3376805700 --> @rick-github commented on GitHub (Oct 7, 2025): If you wrap the modefile in a markdown block (\`\`\` before and after) it will format it properly.
Author
Owner

@rick-github commented on GitHub (Oct 7, 2025):

I've imported the model on a variety of machines, single Nvidia, multiple Nvidia, AMD, without a problem. Some things to try to isolate the root cause:

  1. Try using the model without GPUs.
$ ollama -v 
ollama version is 0.12.3
$ ollama run Qwen3-Coder-30B
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> Hello who are you and who created you?
Hello! I am Qwen, a large-scale language model independently developed by the Tongyi 
Lab under Alibaba Group. I am primarily used for various tasks such as answering 
questions, creating text, expressing opinions, and playing games. If you have any 
questions or need assistance, feel free to let me know anytime!
  1. Unset OLLAMA_SCHED_SPREAD. This will result in the model being loaded on fewer GPUs.
  2. Try using the model with a smaller context to allow it to fit on a single GPU.
$ ollama run Qwen3-Coder-30B
>>> /set parameter num_ctx 4096
Set parameter 'num_ctx' to '4096'
>>> hello
Hello! How can I help you today?
  1. Disable flash attention by unsetting OLLAMA_FLASH_ATTENTION, or try using q8_0 or f16 for OLLAMA_KV_CACHE_TYPE.
  2. Check the GGUF file is consistent.
sha256sum $(ollama show --modelfile Qwen3-Coder-30B | sed -ne 's/^FROM //p')
<!-- gh-comment-id:3377210129 --> @rick-github commented on GitHub (Oct 7, 2025): I've imported the model on a variety of machines, single Nvidia, multiple Nvidia, AMD, without a problem. Some things to try to isolate the root cause: 1. Try using the model without GPUs. ```console $ ollama -v ollama version is 0.12.3 $ ollama run Qwen3-Coder-30B >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> Hello who are you and who created you? Hello! I am Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am primarily used for various tasks such as answering questions, creating text, expressing opinions, and playing games. If you have any questions or need assistance, feel free to let me know anytime! ``` 2. Unset `OLLAMA_SCHED_SPREAD`. This will result in the model being loaded on fewer GPUs. 3. Try using the model with a smaller context to allow it to fit on a single GPU. ```console $ ollama run Qwen3-Coder-30B >>> /set parameter num_ctx 4096 Set parameter 'num_ctx' to '4096' >>> hello Hello! How can I help you today? ``` 4. Disable flash attention by unsetting `OLLAMA_FLASH_ATTENTION`, or try using `q8_0` or `f16` for `OLLAMA_KV_CACHE_TYPE`. 5. Check the GGUF file is consistent. ```console sha256sum $(ollama show --modelfile Qwen3-Coder-30B | sed -ne 's/^FROM //p') ```
Author
Owner

@kbradsha commented on GitHub (Oct 8, 2025):

A reinstall of 0.12.3 did not resolve the gibberish output tonight. Still no idea why it was encountered the first time but, I need to upgrade to 0.12.3 in order to support v3 gguf models, so I guess I will abandon Qwen3-30b. Qwen3-480B still works as expected. I may try a re-import of 30b using a modification to 480b's modelfile. I consider this closed at this point since modification/removal of the OLLAMA_ variables and an addition of PARAMETER stop "<|endoftext|>" seemed to have nothing but detrimental effects. We seem to be treading water.

<!-- gh-comment-id:3379770705 --> @kbradsha commented on GitHub (Oct 8, 2025): A reinstall of 0.12.3 did not resolve the gibberish output tonight. Still no idea why it was encountered the first time but, I need to upgrade to 0.12.3 in order to support v3 gguf models, so I guess I will abandon Qwen3-30b. Qwen3-480B still works as expected. I may try a re-import of 30b using a modification to 480b's modelfile. I consider this closed at this point since modification/removal of the OLLAMA_ variables and an addition of PARAMETER stop "<|endoftext|>" seemed to have nothing but detrimental effects. We seem to be treading water.
Author
Owner

@Hello-World-Traveler commented on GitHub (Oct 9, 2025):

Using deepseek-r1-0528-qwen3-8b with ollama 0.12.3

user: Hello, how are you this fine day?
Output:

Hello!  

Assistant
Okay, I'm doing well, thank you? How are you? How about you? It's going today? Let's chat.
</think>
</think>
That's great! I am! It's a new day to 20-0205-0:05. What's day is it today? 

Using ollama 0.11.10

Hello! It's great to hear from you. How can I assist you today? Let me know what’s on your mind, and I'll do my best to rise to the occasion.
I found that models was giving more of an output in 0.11.10 then 0.12.3. Downgraded to 0.11.10

<!-- gh-comment-id:3385084767 --> @Hello-World-Traveler commented on GitHub (Oct 9, 2025): Using deepseek-r1-0528-qwen3-8b with ollama 0.12.3 user: `Hello, how are you this fine day?` Output: ``` Hello! Assistant Okay, I'm doing well, thank you? How are you? How about you? It's going today? Let's chat. </think> </think> That's great! I am! It's a new day to 20-0205-0:05. What's day is it today? ``` Using ollama 0.11.10 `Hello! It's great to hear from you. How can I assist you today? Let me know what’s on your mind, and I'll do my best to rise to the occasion. ` I found that models was giving more of an output in 0.11.10 then 0.12.3. Downgraded to 0.11.10
Author
Owner

@rick-github commented on GitHub (Oct 9, 2025):

$ ollama run deepseek-r1-0528-qwen3-8b
pulling manifest 
Error: pull model manifest: file does not exist
<!-- gh-comment-id:3385091230 --> @rick-github commented on GitHub (Oct 9, 2025): ``` $ ollama run deepseek-r1-0528-qwen3-8b pulling manifest Error: pull model manifest: file does not exist ```
Author
Owner

@Hello-World-Traveler commented on GitHub (Oct 9, 2025):

There is this https://ollama.com/rockn/DeepSeek-R1-0528-Qwen3-8B-IQ4_NL however I imported it from https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

<!-- gh-comment-id:3387720083 --> @Hello-World-Traveler commented on GitHub (Oct 9, 2025): There is this https://ollama.com/rockn/DeepSeek-R1-0528-Qwen3-8B-IQ4_NL however I imported it from https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
Author
Owner

@rick-github commented on GitHub (Oct 10, 2025):

Did you add a template and parameters? Why not use deepseek-r1:8b-0528-qwen3-q4_K_M from the official library?

<!-- gh-comment-id:3390957758 --> @rick-github commented on GitHub (Oct 10, 2025): Did you add a template and parameters? Why not use [deepseek-r1:8b-0528-qwen3-q4_K_M](https://ollama.com/library/deepseek-r1:8b-0528-qwen3-q4_K_M) from the official library?
Author
Owner

@Hello-World-Traveler commented on GitHub (Oct 10, 2025):

Because I already had it downloaded. Yes the model card was created.

ollama 0.12.3 has a bug that doesn't let the LLM to work correctly, since going back to 0.11.10 everything works how it should.

It's not the model as other models don't work on 0.12.3.

Gibberish output at 0.12.3 with most LLM models (all the ones I have)

some of the models:
gemma-3n-E4B-it:latest
Falcon3-3B-Instruct-abliterated

<!-- gh-comment-id:3392457882 --> @Hello-World-Traveler commented on GitHub (Oct 10, 2025): Because I already had it downloaded. Yes the model card was created. ollama 0.12.3 has a bug that doesn't let the LLM to work correctly, since going back to 0.11.10 everything works how it should. It's not the model as other models don't work on 0.12.3. Gibberish output at 0.12.3 with most LLM models (all the ones I have) some of the models: gemma-3n-E4B-it:latest Falcon3-3B-Instruct-abliterated
Author
Owner

@zachrattner commented on GitHub (Dec 12, 2025):

This seems to happen again in v0.13.3

ollama --version
ollama version is 0.13.3

I'm on M4 Pro Mac mini

>> ollama run gemma3:12b-it-qat
>>> hi there tell me a joke
Okay, I have you_turn 
I can end_turn tell_a_ a_joke.turn.end_you_a joke? Iend"> tell_
turn

Why end_a_end 

<start_turn
Why did the turn_Why_
turn did the turn a joke_end chicken<start the end turn cross the turn the road?
<space end the_ turn to start road?end road?

<space
turn to the other<

end

<turn" >
To<To_ to get to the turn to space get to the_ end other side!turn"> other">
end

end</

 

<

<

I hope you start smile!end turn!
end_

<

< get it_turn"
turn_
-turn</p><end.

</ turn

turn_

</

<start>
</end>
Whyend>

If i downgrade a release, it works ok

>> ollama --version
ollama version is 0.13.2
>> ollama run gemma3:12b-it-qat
>>> hi there tell me a joke
Why don't scientists trust atoms? 

Because they make up everything! 😂

>>> Send a message (/? for help)
<!-- gh-comment-id:3644705389 --> @zachrattner commented on GitHub (Dec 12, 2025): This seems to happen again in v0.13.3 ``` ollama --version ollama version is 0.13.3 ``` I'm on M4 Pro Mac mini ``` >> ollama run gemma3:12b-it-qat >>> hi there tell me a joke Okay, I have you_turn I can end_turn tell_a_ a_joke.turn.end_you_a joke? Iend"> tell_ turn Why end_a_end <start_turn Why did the turn_Why_ turn did the turn a joke_end chicken<start the end turn cross the turn the road? <space end the_ turn to start road?end road? <space turn to the other< end <turn" > To<To_ to get to the turn to space get to the_ end other side!turn"> other"> end end</ < < I hope you start smile!end turn! end_ < < get it_turn" turn_ -turn</p><end. </ turn turn_ </ <start> </end> Whyend> ``` If i downgrade a release, it works ok ``` >> ollama --version ollama version is 0.13.2 >> ollama run gemma3:12b-it-qat >>> hi there tell me a joke Why don't scientists trust atoms? Because they make up everything! 😂 >>> Send a message (/? for help) ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8303