[GH-ISSUE #6865] qwen2.5 context length #66372

Open
opened 2026-05-04 03:15:03 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @zlwu on GitHub (Sep 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6865

What is the issue?

image

According to the model card, the context length should be 128k?

OS

No response

GPU

No response

CPU

No response

Ollama version

0.3.10

Originally created by @zlwu on GitHub (Sep 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6865 ### What is the issue? <img width="674" alt="image" src="https://github.com/user-attachments/assets/03949cc7-07fd-45c4-a09a-4a971e0a3586"> According to the model card, the context length should be **128k**? ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.3.10
GiteaMirror added the bug label 2026-05-04 03:15:03 -05:00
Author
Owner

@rezzie-rich commented on GitHub (Sep 19, 2024):

from Qwen 2.5 model card:
"Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models."

it looks like llama.cpp has some PR merged regarding yarn. Is it possible to get yarn support from ollama for qwen2.5? maybe the dynamic yarn or a better suited one rather than static?

<!-- gh-comment-id:2362375179 --> @rezzie-rich commented on GitHub (Sep 19, 2024): from Qwen 2.5 model card: "Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models." it looks like llama.cpp has some PR merged regarding yarn. Is it possible to get yarn support from ollama for qwen2.5? maybe the dynamic yarn or a better suited one rather than static?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66372