[GH-ISSUE #11283] Ollama 0.9.3 truncating prompt to 8192 no mater params passed #33201

Closed
opened 2026-04-22 15:38:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @alexveli1 on GitHub (Jul 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11283

What is the issue?

Dear colleagues,
Want to share experience with 0.9.3 and appreciate fixing.
I've upgraded ollama to 0.9.3 to pull and run gemma3n (unsuccessful though - gemma3n not running - but this is not a topic here - see codebase is being adapted for this model, so will wait).
The issue: Running other models on 0.9.3 (e.g. gemma3:12b-it-q8_0) with whatever OLLAMA_CONTEXT_LENGTH (100000, 70000, 10000, etc.) on ollama 0.9.3 shows that ollama truncates prompt to 8192.
My solution: Falling back to 0.9.2 solved issue with truncating prompt/context, but of course gemma3n not starting.

Environment:
OS: Ubuntu 24.04
hardware: 1xRTX 4090
docker: 28.2.2
docker compose: v2.36.2
client: open-webui 0.6.15

docker-compose.yml

 ollama:
    image: ollama/ollama:0.9.3
    volumes:
      - /home/ollama/.ollama:/root/.ollama
    container_name: ollama
    hostname: ollama
    pull_policy: refresh
    tty: true
    restart: unless-stopped
    environment:
      - OLLAMA_HOST=ollama:11434
      - OLLAMA_ORIGINS=*
      - OLLAMA_MAX_LOADED_MODELS=2
      - OLLAMA_KEEP_ALIVE=-1
      - OLLAMA_NUM_PARALLEL=1
      - OLLAMA_NUM_THREADS=10
      - OLLAMA_MAX_QUEUE=1
      - OLLAMA_NO_PULL=1
      - OLLAMA_CUDA=1
      - OLLAMA_MODELS=/root/.ollama/models
      - OLLAMA_HOST=ollama
      - OLLAMA_DEBUG=2
    ports:
      - "11434:11434"
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu

Relevant log output

msg="truncating input prompt" limit=8192 prompt=66476 keep=0 new=8192
66476 - prompt/context size.

OS

Ubuntu 24.04

GPU

RTX 4090

CPU

i9-14900K

Ollama version

0.9.3

Originally created by @alexveli1 on GitHub (Jul 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11283 ### What is the issue? Dear colleagues, Want to share experience with 0.9.3 and appreciate fixing. I've upgraded ollama to 0.9.3 to pull and run gemma3n (unsuccessful though - gemma3n not running - but this is not a topic here - see codebase is being adapted for this model, so will wait). **The issue:** Running other models on 0.9.3 (e.g. gemma3:12b-it-q8_0) with whatever OLLAMA_CONTEXT_LENGTH (100000, 70000, 10000, etc.) on ollama 0.9.3 shows that ollama truncates prompt to 8192. **My solution:** Falling back to 0.9.2 solved issue with truncating prompt/context, but of course gemma3n not starting. **Environment:** **OS**: Ubuntu 24.04 **hardware**: 1xRTX 4090 **docker**: 28.2.2 **docker compose**: v2.36.2 **client**: open-webui 0.6.15 **docker-compose.yml** ```docker ollama: image: ollama/ollama:0.9.3 volumes: - /home/ollama/.ollama:/root/.ollama container_name: ollama hostname: ollama pull_policy: refresh tty: true restart: unless-stopped environment: - OLLAMA_HOST=ollama:11434 - OLLAMA_ORIGINS=* - OLLAMA_MAX_LOADED_MODELS=2 - OLLAMA_KEEP_ALIVE=-1 - OLLAMA_NUM_PARALLEL=1 - OLLAMA_NUM_THREADS=10 - OLLAMA_MAX_QUEUE=1 - OLLAMA_NO_PULL=1 - OLLAMA_CUDA=1 - OLLAMA_MODELS=/root/.ollama/models - OLLAMA_HOST=ollama - OLLAMA_DEBUG=2 ports: - "11434:11434" logging: driver: "json-file" options: max-size: "10m" max-file: "3" deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: - gpu ``` ### Relevant log output ```shell msg="truncating input prompt" limit=8192 prompt=66476 keep=0 new=8192 66476 - prompt/context size. ``` ### OS Ubuntu 24.04 ### GPU RTX 4090 ### CPU i9-14900K ### Ollama version 0.9.3
GiteaMirror added the bug label 2026-04-22 15:38:24 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 3, 2025):

https://github.com/ollama/ollama/pull/11175

The initial version of gemma3 had the wrong context size, so the PR above is limiting the prompt to the incorrect smaller size. Re-pull the model and the context limit will increase.

<!-- gh-comment-id:3031488259 --> @rick-github commented on GitHub (Jul 3, 2025): https://github.com/ollama/ollama/pull/11175 The initial version of gemma3 had the [wrong](https://github.com/ollama/ollama/issues/9702#issuecomment-2727063304) context size, so the PR above is limiting the prompt to the incorrect smaller size. Re-pull the model and the context limit will increase.
Author
Owner

@pdevine commented on GitHub (Jul 3, 2025):

@alexveli1 sorry about this. As @rick-github (thank you Rick!) mentioned, just re-pull the model and it should be fine.

<!-- gh-comment-id:3032849510 --> @pdevine commented on GitHub (Jul 3, 2025): @alexveli1 sorry about this. As @rick-github (thank you Rick!) mentioned, just re-pull the model and it should be fine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33201