[GH-ISSUE #8493] Long context for Qwen2.5 is possible but needs something to work #67527

Open
opened 2026-05-04 10:40:06 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @devlux76 on GitHub (Jan 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8493

The instructions for Qwen2.5 (all of them) state quite clearly that everything from 7B on up have 128k context. However in order to use that context you need to do something...
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct#processing-long-texts

For supported frameworks, you could add the following to config.json to enable YaRN:

{
  ...,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}

What is the ollama method of achieving this because I'd really like to use these models at their full context length.
Thanks!

Originally created by @devlux76 on GitHub (Jan 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8493 The instructions for Qwen2.5 (all of them) state quite clearly that everything from 7B on up have 128k context. However in order to use that context you need to do something... https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct#processing-long-texts For supported frameworks, you could add the following to config.json to enable YaRN: ``` { ..., "rope_scaling": { "factor": 4.0, "original_max_position_embeddings": 32768, "type": "yarn" } } ``` What is the ollama method of achieving this because I'd really like to use these models at their full context length. Thanks!
Author
Owner

@rick-github commented on GitHub (Jan 20, 2025):

https://github.com/ggerganov/llama.cpp/pull/10698 for adding yarn support for qwen. The model needs to be re-quantized with llama.cpp.

We advise adding the rope_scaling configuration only when processing long contexts is required.

It seems it might degrade the performance for shorter context lengths so probably not appropriate for the default model.

<!-- gh-comment-id:2601891513 --> @rick-github commented on GitHub (Jan 20, 2025): https://github.com/ggerganov/llama.cpp/pull/10698 for adding yarn support for qwen. The model needs to be re-quantized with llama.cpp. > We advise adding the rope_scaling configuration only when processing long contexts is required. It seems it might degrade the performance for shorter context lengths so probably not appropriate for the default model.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67527