[GH-ISSUE #10767] Token repetition issue with Qwen2.5-VL #69133

Open
opened 2026-05-04 17:14:57 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @woojh3690 on GitHub (May 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10767

What is the issue?

It seems that the Ollama version of the Qwen2.5-VL model has an issue with repeating tokens. At first, I thought it was a performance issue with the model itself, but its behavior is drastically different from the demo on this Hugging Face page.

Sample Image:
Image

Demo Page:
Image

Ollama Code:

import ollama

client = ollama.Client()
model = "qwen2.5vl:32b-fp16"

messages = [
    {
        'role': 'user', 
        'content': 'Extract all the text from image', 
        'images': ['ja ocr sample.png']
    }
]

resp = client.chat(
    model=model,
    messages=messages,
    options={
        "temperature": 0
    }
)

print(resp["message"]["content"])

Ollama Code Ouput:

大丈夫なのですか…?
大丈夫なのですよねよね…?

repeat "よね"

Additional information:

  1. While this could be due to differences in parameters like temperature, the repetition persists no matter how high I set the repeat_penalty in Ollama — it appears to have no effect at all.
  2. This issue doesn't seem limited to just the example image — it appears to occur with most images when attempting OCR. Naturally, this behavior was not observed on the demo page.
  3. This issue was also observed with both qwen2.5vl:7b and qwen2.5vl:72b.

OS

Ubuntu 24.04.2

GPU

Nvidia A100
Driver Version: 535.230.02

CPU

Intel(R) Xeon(R) Gold 6130

Ollama version

v0.7.0, Offical Docker image

Originally created by @woojh3690 on GitHub (May 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10767 ### What is the issue? It seems that the Ollama version of the Qwen2.5-VL model has an issue with repeating tokens. At first, I thought it was a performance issue with the model itself, but its behavior is drastically different from the demo on [this Hugging Face page](https://huggingface.co/spaces/Qwen/Qwen2.5-VL-32B-Instruct). Sample Image: ![Image](https://github.com/user-attachments/assets/742909c3-1282-4f9a-8854-442438466d4d) Demo Page: ![Image](https://github.com/user-attachments/assets/3409508e-c4fc-4538-8432-1e388bcbe046) Ollama Code: ``` import ollama client = ollama.Client() model = "qwen2.5vl:32b-fp16" messages = [ { 'role': 'user', 'content': 'Extract all the text from image', 'images': ['ja ocr sample.png'] } ] resp = client.chat( model=model, messages=messages, options={ "temperature": 0 } ) print(resp["message"]["content"]) ``` Ollama Code Ouput: ``` 大丈夫なのですか…? 大丈夫なのですよねよね…? ``` repeat "よね" Additional information: 1. While this could be due to differences in parameters like temperature, the repetition persists no matter how high I set the repeat_penalty in Ollama — it appears to have no effect at all. 2. This issue doesn't seem limited to just the example image — it appears to occur with most images when attempting OCR. Naturally, this behavior was not observed on the demo page. 3. This issue was also observed with both qwen2.5vl:7b and qwen2.5vl:72b. ### OS Ubuntu 24.04.2 ### GPU Nvidia A100 Driver Version: 535.230.02 ### CPU Intel(R) Xeon(R) Gold 6130 ### Ollama version v0.7.0, Offical Docker image
GiteaMirror added the bug label 2026-05-04 17:14:57 -05:00
Author
Owner

@woojh3690 commented on GitHub (May 19, 2025):

Found another English case with repeated lines.

Sample Image:
Image

Ollama Output (qwen2.5vl:32b-fp16):

Dream Arts Arts
L 2-047, LEVEL 2, VIVACITY MEGAMALL,
JALAN WAN ALWI, 93350 KUCHING
JALAN WAN WAN ALWI, 93350 KUCHINGING

GST Reg.: 000168861696

Document No.: T02035040
Date: 29/03/2018 07:35:38 PM
Terminal: T02
Cashier: CASHIER01
Cashier: CASHIER01

DESC                 U. PRICE    Disc    AMOUNT
QTY                  RM          Disc    RM
--------------------------------------------------------------------------------
L123 ANDROID ANDROID USB USB CABLE (CODE15)
1 UNIT ×            15.00       0.00    15.00
1 UNIT ×            15.00       0.00    15.00

Sub Total (Exclusive GST): 14.15
Sub Total (GST GST): 14.15
GST 6%: 0.85
Rounding Adjustment: 0.00

Rounded Total (RM): 15.00
Rounded Total (RM): 15.00

Cash
Cash
15.00

GST Summary          Amount(RM)    Tax(RM)
SR @ 6%              14.15         0.85
SR @ 6%              14.15         0.85

THANK YOU AND PLEASE PLEASE COME COME AGAIN
Goods sold sold cannot cannot be returned returned & payment payment not not
refundable.

Happy Shopping Shopping with with us.
Happy Shopping Shopping with with us.

Demo Page Output:

Dream Arts
L 2-047, LEVEL 2, VIVACITY MEGAMALL,
JALAN WAN ALWI, 93350 KUCHING

GST Reg.: 000168861696

Document No.: T02035040
Date: 29/03/2018 07:35:38 PM
Terminal: T02
Cashier: CASHIER01

DESC                  U. PRICE    Disc    AMOUNT
QTY                   RM          RM      RM
-------------------------------------------------
L123 ANDROID USB CABLE (CODE15)
1 UNIT *              15.00       0.00    15.00

Sub Total (Exclusive GST): 14.15
GST 6%:                     0.85
Rounding Adjustment:        0.00

Rounded Total (RM):         15.00

Cash                       15.00

GST Summary                Amount(RM)   Tax(RM)
SR @ 6%                    14.15        0.85

THANK YOU AND PLEASE COME AGAIN
Goods sold cannot be returned & payment not refundable.
Happy Shopping with us.

This is just my guess, but seeing that tokens repeat horizontally in wide images and entire lines repeat in tall images, it seems like the issue might be caused by margins in the chunking process.

<!-- gh-comment-id:2889702350 --> @woojh3690 commented on GitHub (May 19, 2025): Found another English case with repeated lines. Sample Image: ![Image](https://github.com/user-attachments/assets/f6f78886-a4be-4f5a-9825-7afab7464e84) Ollama Output (qwen2.5vl:32b-fp16): ``` Dream Arts Arts L 2-047, LEVEL 2, VIVACITY MEGAMALL, JALAN WAN ALWI, 93350 KUCHING JALAN WAN WAN ALWI, 93350 KUCHINGING GST Reg.: 000168861696 Document No.: T02035040 Date: 29/03/2018 07:35:38 PM Terminal: T02 Cashier: CASHIER01 Cashier: CASHIER01 DESC U. PRICE Disc AMOUNT QTY RM Disc RM -------------------------------------------------------------------------------- L123 ANDROID ANDROID USB USB CABLE (CODE15) 1 UNIT × 15.00 0.00 15.00 1 UNIT × 15.00 0.00 15.00 Sub Total (Exclusive GST): 14.15 Sub Total (GST GST): 14.15 GST 6%: 0.85 Rounding Adjustment: 0.00 Rounded Total (RM): 15.00 Rounded Total (RM): 15.00 Cash Cash 15.00 GST Summary Amount(RM) Tax(RM) SR @ 6% 14.15 0.85 SR @ 6% 14.15 0.85 THANK YOU AND PLEASE PLEASE COME COME AGAIN Goods sold sold cannot cannot be returned returned & payment payment not not refundable. Happy Shopping Shopping with with us. Happy Shopping Shopping with with us. ``` Demo Page Output: ``` Dream Arts L 2-047, LEVEL 2, VIVACITY MEGAMALL, JALAN WAN ALWI, 93350 KUCHING GST Reg.: 000168861696 Document No.: T02035040 Date: 29/03/2018 07:35:38 PM Terminal: T02 Cashier: CASHIER01 DESC U. PRICE Disc AMOUNT QTY RM RM RM ------------------------------------------------- L123 ANDROID USB CABLE (CODE15) 1 UNIT * 15.00 0.00 15.00 Sub Total (Exclusive GST): 14.15 GST 6%: 0.85 Rounding Adjustment: 0.00 Rounded Total (RM): 15.00 Cash 15.00 GST Summary Amount(RM) Tax(RM) SR @ 6% 14.15 0.85 THANK YOU AND PLEASE COME AGAIN Goods sold cannot be returned & payment not refundable. Happy Shopping with us. ``` This is just my guess, but seeing that tokens repeat horizontally in wide images and entire lines repeat in tall images, it seems like the issue might be caused by margins in the chunking process.
Author
Owner

@adcape commented on GitHub (May 20, 2025):

A similar issue with the same model (in different sizes and quantizations, e.g from 32b 4 bit to 72b 8 bit):

When processing thee image containing the words "developer" and "señor", the model repeats these words multiple times, the smaller the model the more repetitions (e.g. 32b will repeat "developer" several times and gets stuck repeating "señor", while 72b repeats only 'developer" 2- 3 times, and, when summarizing image contents treats this repetition as something really occurring in the image).

When asked to recite any poem, the model gets stuck repeating whole stanzas, increasing repetition penalty doesn't help.

<!-- gh-comment-id:2893586507 --> @adcape commented on GitHub (May 20, 2025): A similar issue with the same model (in different sizes and quantizations, e.g from 32b 4 bit to 72b 8 bit): When processing thee image containing the words "developer" and "señor", the model repeats these words multiple times, the smaller the model the more repetitions (e.g. 32b will repeat "developer" several times and gets stuck repeating "señor", while 72b repeats only 'developer" 2- 3 times, and, when summarizing image contents treats this repetition as something really occurring in the image). When asked to recite any poem, the model gets stuck repeating whole stanzas, increasing repetition penalty doesn't help.
Author
Owner

@woojh3690 commented on GitHub (May 20, 2025):

@gaestur It seems we’ve identified the reason why the penalty doesn’t appear to be applied in this issue. Please take a look https://github.com/ollama/ollama/issues/10771

<!-- gh-comment-id:2893597096 --> @woojh3690 commented on GitHub (May 20, 2025): @gaestur It seems we’ve identified the reason why the penalty doesn’t appear to be applied in this issue. Please take a look https://github.com/ollama/ollama/issues/10771
Author
Owner

@jessegross commented on GitHub (May 20, 2025):

@woojh3690 Qwen2.5-VL doesn't use the code path fixed in #10771 so that is unlikely to have an impact here. Also both of those parameters default to 0.

<!-- gh-comment-id:2895213150 --> @jessegross commented on GitHub (May 20, 2025): @woojh3690 Qwen2.5-VL doesn't use the code path fixed in #10771 so that is unlikely to have an impact here. Also both of those parameters default to 0.
Author
Owner

@ql390962 commented on GitHub (May 22, 2025):

I have encountered the same problem, do you have a solution?

<!-- gh-comment-id:2900165226 --> @ql390962 commented on GitHub (May 22, 2025): I have encountered the same problem, do you have a solution?
Author
Owner

@woojh3690 commented on GitHub (May 22, 2025):

@ql390962 No. It seems like it will take a long time for Ollama to fix this issue, so I'm planning to switch to another solution like VLLM.

<!-- gh-comment-id:2900256621 --> @woojh3690 commented on GitHub (May 22, 2025): @ql390962 No. It seems like it will take a long time for Ollama to fix this issue, so I'm planning to switch to another solution like VLLM.
Author
Owner

@adcape commented on GitHub (May 25, 2025):

The issue is still present in v. 0.7.1

<!-- gh-comment-id:2907643582 --> @adcape commented on GitHub (May 25, 2025): The issue is still present in v. 0.7.1
Author
Owner

@adcape commented on GitHub (May 29, 2025):

The issue persists in v. 0.8.0.

<!-- gh-comment-id:2918294009 --> @adcape commented on GitHub (May 29, 2025): The issue persists in v. 0.8.0.
Author
Owner

@javajuice1337 commented on GitHub (Jun 9, 2025):

I'm experiencing this issue myself when using Qwen2.5-VL... it will get stuck repeating the same thing over and over, forcing me to stop the prompt. I really want to use this model with Ollama, but this behavior is unacceptable in a production environment.

<!-- gh-comment-id:2956649165 --> @javajuice1337 commented on GitHub (Jun 9, 2025): I'm experiencing this issue myself when using Qwen2.5-VL... it will get stuck repeating the same thing over and over, forcing me to stop the prompt. I really want to use this model with Ollama, but this behavior is unacceptable in a production environment.
Author
Owner

@hyp530 commented on GitHub (Jun 19, 2025):

Just had same issue. Qwen2.5-VL is not working well with ollama.

<!-- gh-comment-id:2986672046 --> @hyp530 commented on GitHub (Jun 19, 2025): Just had same issue. Qwen2.5-VL is not working well with ollama.
Author
Owner

@ghzgod commented on GitHub (Jun 23, 2025):

Same

Prompt: English only please: Write a short poem about technology.
Response:

Technology is so cool,
ItWith its so full.
It is only please.

A machine so bright so

So so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so

<!-- gh-comment-id:2996311816 --> @ghzgod commented on GitHub (Jun 23, 2025): Same ``` Prompt: English only please: Write a short poem about technology. Response: Technology is so cool, ItWith its so full. It is only please. A machine so bright so So so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so ```
Author
Owner

@woojh3690 commented on GitHub (Jun 27, 2025):

Same

Prompt: English only please: Write a short poem about technology.
Response:

Technology is so cool,
ItWith its so full.
It is only please.

A machine so bright so

So so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so

Does this issue also occur in the official demo? If it does, it might simply be a limitation (hallucination) of the model.

<!-- gh-comment-id:3011604498 --> @woojh3690 commented on GitHub (Jun 27, 2025): > Same > > ``` > Prompt: English only please: Write a short poem about technology. > Response: > > Technology is so cool, > ItWith its so full. > It is only please. > > A machine so bright so > > So so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so > ``` Does this issue also occur in the official demo? If it does, it might simply be a limitation (hallucination) of the model.
Author
Owner

@woojh3690 commented on GitHub (Jul 8, 2025):

May be related https://github.com/ggml-org/llama.cpp/issues/13694#issuecomment-3045752307

<!-- gh-comment-id:3050435511 --> @woojh3690 commented on GitHub (Jul 8, 2025): May be related https://github.com/ggml-org/llama.cpp/issues/13694#issuecomment-3045752307
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69133