[GH-ISSUE #11036] gemma3:27b unable to get full 128k context #7279

Closed
opened 2026-04-12 19:19:43 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @pavanrajkg04 on GitHub (Jun 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11036

What is the issue?

i tried with the gemma3:27b model but I am unable to get the full 128k context as I am working on chatbot that takes the SQL data. i am unable to deliver what I am expecting. i need help on this

Relevant log output


OS

macOS, Linux

GPU

Nvidia

CPU

No response

Ollama version

v0.9.1

Originally created by @pavanrajkg04 on GitHub (Jun 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11036 ### What is the issue? i tried with the gemma3:27b model but I am unable to get the full 128k context as I am working on chatbot that takes the SQL data. i am unable to deliver what I am expecting. i need help on this ### Relevant log output ```shell ``` ### OS macOS, Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version v0.9.1
GiteaMirror added the bug label 2026-04-12 19:19:43 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

but I am unable to get the full 128k context

What does this mean? Your client can't send 128k tokens of data? The model doesn't load because the context is too big? The model doesn't generate tokens when the context is 128k?

<!-- gh-comment-id:2960242556 --> @rick-github commented on GitHub (Jun 10, 2025): > but I am unable to get the full 128k context What does this mean? Your client can't send 128k tokens of data? The model doesn't load because the context is too big? The model doesn't generate tokens when the context is 128k?
Author
Owner

@duck-5 commented on GitHub (Jun 10, 2025):

Please provide more context, the GPU model, and if possible the Modelfile and the request json.

<!-- gh-comment-id:2960277015 --> @duck-5 commented on GitHub (Jun 10, 2025): Please provide more context, the GPU model, and if possible the Modelfile and the request json.
Author
Owner

@pavanrajkg04 commented on GitHub (Jun 11, 2025):

I found the solution for this. Ollama doesn't directly support 128k context; it maxes out at 32k context.
using this command we can increase the context length:
export OLLAMA_CONTEXT_LENGTH=131072

<!-- gh-comment-id:2961262386 --> @pavanrajkg04 commented on GitHub (Jun 11, 2025): I found the solution for this. Ollama doesn't directly support 128k context; it maxes out at 32k context. using this command we can increase the context length: export OLLAMA_CONTEXT_LENGTH=131072
Author
Owner

@pavanrajkg04 commented on GitHub (Jun 11, 2025):

but I am unable to get the full 128k context

What does this mean? Your client can't send 128k tokens of data? The model doesn't load because the context is too big? The model doesn't generate tokens when the context is 128k?

The problem was:

  1. I used a RAG-based SQL generator. While a user asks a question, the LLM has to generate the SQL using the available schema. As my schema was very large, the model context would overflow, and I would get hallucinated SQL.
  2. Once generated, the model would also get 1000+ rows of data and it was unable to generate the correct summary.

i searched on the web and found that Ollama doesn't directly support 128k context length, it max out to 32k and using the following command we can increase it "export OLLAMA_CONTEXT_LENGTH=131072"

<!-- gh-comment-id:2961274718 --> @pavanrajkg04 commented on GitHub (Jun 11, 2025): > > but I am unable to get the full 128k context > > What does this mean? Your client can't send 128k tokens of data? The model doesn't load because the context is too big? The model doesn't generate tokens when the context is 128k? The problem was: 1. I used a RAG-based SQL generator. While a user asks a question, the LLM has to generate the SQL using the available schema. As my schema was very large, the model context would overflow, and I would get hallucinated SQL. 2. Once generated, the model would also get 1000+ rows of data and it was unable to generate the correct summary. i searched on the web and found that Ollama doesn't directly support 128k context length, it max out to 32k and using the following command we can increase it "export OLLAMA_CONTEXT_LENGTH=131072"
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7279