Support client-side context clipping using OpenAI compliant API #1129

New Issue

GiteaMirror · 2025-11-11T14:38:13-06:00

GiteaMirror commented

2025-11-11 14:38:13 -06:00

Originally created by @matbeedotcom on GitHub (Jun 4, 2024).

Is your feature request related to a problem? Please describe.
Yes, once the maximum context window is reached, it fails by sending the entire context to the llm api.

Describe the solution you'd like
Support a variety of context clipping mechanisms

I'm trying to use Exllamav2 models as Ollama is far too slow with LLaMa3-70b

Originally created by @matbeedotcom on GitHub (Jun 4, 2024). **Is your feature request related to a problem? Please describe.** Yes, once the maximum context window is reached, it fails by sending the entire context to the llm api. **Describe the solution you'd like** Support a variety of context clipping mechanisms I'm trying to use Exllamav2 models as Ollama is far too slow with LLaMa3-70b

GiteaMirror closed this issue

2025-11-11 14:38:14 -06:00

GiteaMirror commented

2025-11-11 14:38:14 -06:00

@tjbck commented on GitHub (Jun 4, 2024):

#1268

@tjbck commented on GitHub (Jun 4, 2024): #1268

GiteaMirror referenced this issue

2025-11-11 17:24:54 -06:00

[PR #1130] [MERGED] fix: rag #7379

GiteaMirror referenced this issue

2026-04-19 19:15:16 -05:00