[GH-ISSUE #15316] Ollama Claude Code context auto-compact/Timeout issue #35556

Open
opened 2026-04-22 20:08:16 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @kausikp11 on GitHub (Apr 4, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15316

What is the issue?

While we set the Context Length in Ollama based on GPU size when the model reach the context length the Claude is not doing AutoCompact or give warning of the Context length being filled fully. When I ran the /context in the interface it shows the output as Token used/Max token of model instead it should be Token used/Max token we have set in Ollama. This is when the Claudes auto compact or the real time out issue can be addressed.

Also you can try to see what is the max GPU size and try to fit the full context size for the model if possible?

Added screenshot for reference.

Image

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.18.2

Originally created by @kausikp11 on GitHub (Apr 4, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15316 ### What is the issue? While we set the Context Length in Ollama based on GPU size when the model reach the context length the Claude is not doing AutoCompact or give warning of the Context length being filled fully. When I ran the `/context` in the interface it shows the output as `Token used/Max token of model` instead it should be `Token used/Max token we have set in Ollama`. This is when the Claudes auto compact or the real time out issue can be addressed. Also you can try to see what is the max GPU size and try to fit the full context size for the model if possible? Added screenshot for reference. <img width="1531" height="471" alt="Image" src="https://github.com/user-attachments/assets/9ad323e0-8fe9-4601-9332-e38bdc615ac4" /> ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.18.2
GiteaMirror added the bug label 2026-04-22 20:08:16 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 5, 2026):

This is a Claude Code issue, not an ollama issue. CC assumes that the model has a context size of 200k because that's the default for Anthropic models. In theory CC could probe the model to find out what context the model has been loaded with, but logs of HTTP requests don't show any attempt at discoverability. CC does support setting a larger context with /model gemma4[1m], but smaller values are rejected.

<!-- gh-comment-id:4188087713 --> @rick-github commented on GitHub (Apr 5, 2026): This is a Claude Code issue, not an ollama issue. CC assumes that the model has a context size of 200k because that's the [default](https://support.claude.com/en/articles/8606394-how-large-is-the-context-window-on-paid-claude-plans) for Anthropic models. In theory CC could probe the model to find out what context the model has been loaded with, but logs of HTTP requests don't show any attempt at discoverability. CC does support setting a larger context with `/model gemma4[1m]`, but smaller values are rejected.
Author
Owner

@ParthSareen commented on GitHub (Apr 13, 2026):

We currently set the context length correctly for cloud models but not local on purpose. I've given it some thought and had decided this as we do context shifting under the hood. The experience for local models would degrade significantly if you had to hit compaction every couple prompts. Caching doesn't work great either as the compaction has a different prompt. You can configure it yourself if you'd like to try it with CLAUDE_CODE_AUTO_COMPACT_WINDOW

<!-- gh-comment-id:4238658591 --> @ParthSareen commented on GitHub (Apr 13, 2026): We currently set the context length correctly for cloud models but not local on purpose. I've given it some thought and had decided this as we do context shifting under the hood. The experience for local models would degrade significantly if you had to hit compaction every couple prompts. Caching doesn't work great either as the compaction has a different prompt. You can configure it yourself if you'd like to try it with `CLAUDE_CODE_AUTO_COMPACT_WINDOW`
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15316
Analyzed: 2026-04-18T18:22:37.800109

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274310342 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15316 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15316 **Analyzed**: 2026-04-18T18:22:37.800109 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35556