[GH-ISSUE #14130] v0.15.5: OLLAMA_NUM_CTX ignored, causes OOM on systems with ≥48GB VRAM #55730

Closed
opened 2026-04-29 09:39:32 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Bunkmil on GitHub (Feb 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14130

What is the issue?

Bug Report: v0.15.5 Ignores OLLAMA_NUM_CTX and Forces Excessive Context Size

Summary

Ollama v0.15.5 ignores the OLLAMA_NUM_CTX and OLLAMA_MAX_CONTEXT environment variables and automatically sets extremely large context sizes based on detected VRAM, causing OOM kills on systems with limited RAM.

Environment

  • Ollama Version: 0.15.5
  • OS: CachyOS (Arch Linux)
  • GPU: AMD Radeon 890M (64GB shared VRAM)
  • System RAM: 56GB available
  • Backend: Vulkan / ROCm

Expected Behavior

When OLLAMA_NUM_CTX=6144 is explicitly set in the systemd service configuration, Ollama should respect this value and allocate memory accordingly.

Actual Behavior

Ollama v0.15.5 detects 64GB VRAM and automatically sets default_num_ctx=262144 (262K tokens), completely ignoring the configured OLLAMA_NUM_CTX value. This results in:

  • KV cache allocation of ~96GB (48GB GPU + 48GB CPU)
  • Total memory requirement of 118GB for an 18GB model
  • Immediate OOM kills when attempting to load models
  • Service crashes with oom-kill status

Steps to Reproduce

  1. Configure Ollama service with explicit context limit:
# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_NUM_CTX=6144"
Environment="OLLAMA_MAX_CONTEXT=6144"
  1. Reload and restart service:
sudo systemctl daemon-reload
sudo systemctl restart ollama
  1. Attempt to load a 30B Q4 model:
ollama run hf.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF:Q4_K_M

Logs

time=2026-02-06T17:33:34.142-05:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="64.0 GiB" default_num_ctx=262144

llama_context: n_ctx = 811008
llama_context: n_ctx_seq = 202752
llama_kv_cache: CPU KV buffer size = 30294.00 MiB
llama_kv_cache: Vulkan0 KV buffer size = 48807.00 MiB
total memory size="118.3 GiB"

radv/amdgpu: Failed to allocate a buffer:
systemd[1]: ollama.service: The kernel OOM killer killed some processes in this unit.
ollama.service: Failed with result 'oom-kill'.

Root Cause Analysis

The changelog for v0.15.5 states:

Ollama will now default to the following context lengths based on VRAM:

  • < 24 GiB VRAM: 4,096 context
  • 24-48 GiB VRAM: 32,768 context
  • = 48 GiB VRAM: 262,144 context

This auto-detection appears to override user-configured OLLAMA_NUM_CTX values instead of being used as a fallback when the variable is not set.

Workaround

Downgrade to v0.15.4:

sudo pacman -U https://archive.archlinux.org/packages/o/ollama/ollama-0.15.4-2-x86_64.pkg.tar.zst

v0.15.4 correctly respects OLLAMA_NUM_CTX and models load successfully.

Impact

This bug makes Ollama v0.15.5 completely unusable on systems with:

  • Shared VRAM/RAM architectures (iGPUs)
  • Systems with ≥48GB VRAM but limited total system memory
  • Any configuration requiring explicit context size control

Proposed Fix

The VRAM-based auto-detection should only apply when OLLAMA_NUM_CTX is not explicitly set. User-configured values should always take precedence over auto-detection.

Suggested logic:

if OLLAMA_NUM_CTX is set:
    use OLLAMA_NUM_CTX
else:
    use VRAM-based auto-detection

Additional Context

  • Models tested: Qwen3-30B-A3B, GLM-4.7-Flash-REAP-23B-A3B, Ministral-3-14B
  • All models work perfectly on v0.15.4 with OLLAMA_NUM_CTX=6144
  • All models fail with OOM on v0.15.5 despite identical configuration
  • The issue affects both Vulkan and ROCm backends

System Configuration Files

Systemd override used (works on v0.15.4, ignored on v0.15.5):

[Service]
Environment="OLLAMA_VULKAN=1"
Environment="VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json"
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_KEEP_ALIVE=24h"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_NUM_GPU=1"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_NUM_CTX=6144"
Environment="OLLAMA_MAX_CONTEXT=6144"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_NUM_THREAD=12"
Environment="OLLAMA_MAX_VRAM=24576"
Environment="OLLAMA_NUM_BATCH=512"

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @Bunkmil on GitHub (Feb 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14130 ### What is the issue? # Bug Report: v0.15.5 Ignores OLLAMA_NUM_CTX and Forces Excessive Context Size ## Summary Ollama v0.15.5 ignores the `OLLAMA_NUM_CTX` and `OLLAMA_MAX_CONTEXT` environment variables and automatically sets extremely large context sizes based on detected VRAM, causing OOM kills on systems with limited RAM. ## Environment - **Ollama Version**: 0.15.5 - **OS**: CachyOS (Arch Linux) - **GPU**: AMD Radeon 890M (64GB shared VRAM) - **System RAM**: 56GB available - **Backend**: Vulkan / ROCm ## Expected Behavior When `OLLAMA_NUM_CTX=6144` is explicitly set in the systemd service configuration, Ollama should respect this value and allocate memory accordingly. ## Actual Behavior Ollama v0.15.5 detects 64GB VRAM and automatically sets `default_num_ctx=262144` (262K tokens), completely ignoring the configured `OLLAMA_NUM_CTX` value. This results in: - KV cache allocation of ~96GB (48GB GPU + 48GB CPU) - Total memory requirement of 118GB for an 18GB model - Immediate OOM kills when attempting to load models - Service crashes with `oom-kill` status ## Steps to Reproduce 1. Configure Ollama service with explicit context limit: ```bash # /etc/systemd/system/ollama.service.d/override.conf [Service] Environment="OLLAMA_NUM_CTX=6144" Environment="OLLAMA_MAX_CONTEXT=6144" ``` 2. Reload and restart service: ```bash sudo systemctl daemon-reload sudo systemctl restart ollama ``` 3. Attempt to load a 30B Q4 model: ```bash ollama run hf.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF:Q4_K_M ``` ## Logs ``` time=2026-02-06T17:33:34.142-05:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="64.0 GiB" default_num_ctx=262144 llama_context: n_ctx = 811008 llama_context: n_ctx_seq = 202752 llama_kv_cache: CPU KV buffer size = 30294.00 MiB llama_kv_cache: Vulkan0 KV buffer size = 48807.00 MiB total memory size="118.3 GiB" radv/amdgpu: Failed to allocate a buffer: systemd[1]: ollama.service: The kernel OOM killer killed some processes in this unit. ollama.service: Failed with result 'oom-kill'. ``` ## Root Cause Analysis The changelog for v0.15.5 states: > Ollama will now default to the following context lengths based on VRAM: > - < 24 GiB VRAM: 4,096 context > - 24-48 GiB VRAM: 32,768 context > - >= 48 GiB VRAM: 262,144 context This auto-detection appears to **override** user-configured `OLLAMA_NUM_CTX` values instead of being used as a fallback when the variable is not set. ## Workaround Downgrade to v0.15.4: ```bash sudo pacman -U https://archive.archlinux.org/packages/o/ollama/ollama-0.15.4-2-x86_64.pkg.tar.zst ``` v0.15.4 correctly respects `OLLAMA_NUM_CTX` and models load successfully. ## Impact This bug makes Ollama v0.15.5 **completely unusable** on systems with: - Shared VRAM/RAM architectures (iGPUs) - Systems with ≥48GB VRAM but limited total system memory - Any configuration requiring explicit context size control ## Proposed Fix The VRAM-based auto-detection should only apply when `OLLAMA_NUM_CTX` is **not explicitly set**. User-configured values should always take precedence over auto-detection. Suggested logic: ``` if OLLAMA_NUM_CTX is set: use OLLAMA_NUM_CTX else: use VRAM-based auto-detection ``` ## Additional Context - Models tested: Qwen3-30B-A3B, GLM-4.7-Flash-REAP-23B-A3B, Ministral-3-14B - All models work perfectly on v0.15.4 with `OLLAMA_NUM_CTX=6144` - All models fail with OOM on v0.15.5 despite identical configuration - The issue affects both Vulkan and ROCm backends ## System Configuration Files Systemd override used (works on v0.15.4, ignored on v0.15.5): ```ini [Service] Environment="OLLAMA_VULKAN=1" Environment="VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json" Environment="OLLAMA_HOST=0.0.0.0:11434" Environment="OLLAMA_ORIGINS=*" Environment="OLLAMA_KEEP_ALIVE=24h" Environment="OLLAMA_MAX_LOADED_MODELS=1" Environment="OLLAMA_NUM_GPU=1" Environment="OLLAMA_NUM_PARALLEL=4" Environment="OLLAMA_NUM_CTX=6144" Environment="OLLAMA_MAX_CONTEXT=6144" Environment="OLLAMA_FLASH_ATTENTION=1" Environment="OLLAMA_NUM_THREAD=12" Environment="OLLAMA_MAX_VRAM=24576" Environment="OLLAMA_NUM_BATCH=512" ``` ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 09:39:32 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 7, 2026):

Set OLLAMA_CONTEXT_LENGTH.

The following are not Ollama configuration variables:

OLLAMA_NUM_GPU
OLLAMA_NUM_CTX
OLLAMA_MAX_CONTEXT
OLLAMA_NUM_THREAD
OLLAMA_MAX_VRAM
OLLAMA_NUM_BATCH

<!-- gh-comment-id:3863233020 --> @rick-github commented on GitHub (Feb 7, 2026): Set [`OLLAMA_CONTEXT_LENGTH`](https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-can-i-specify-the-context-window-size). The following are not Ollama configuration variables: OLLAMA_NUM_GPU OLLAMA_NUM_CTX OLLAMA_MAX_CONTEXT OLLAMA_NUM_THREAD OLLAMA_MAX_VRAM OLLAMA_NUM_BATCH
Author
Owner

@Bunkmil commented on GitHub (Feb 7, 2026):

Set OLLAMA_CONTEXT_LENGTH.

The following are not Ollama configuration variables:

OLLAMA_NUM_GPU OLLAMA_NUM_CTX OLLAMA_MAX_CONTEXT OLLAMA_NUM_THREAD OLLAMA_MAX_VRAM OLLAMA_NUM_BATCH

You're right. Those must be artifacts of me messing around. It didn't bother until now...
Thanks!

<!-- gh-comment-id:3863341682 --> @Bunkmil commented on GitHub (Feb 7, 2026): > Set [`OLLAMA_CONTEXT_LENGTH`](https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-can-i-specify-the-context-window-size). > > The following are not Ollama configuration variables: > > OLLAMA_NUM_GPU OLLAMA_NUM_CTX OLLAMA_MAX_CONTEXT OLLAMA_NUM_THREAD OLLAMA_MAX_VRAM OLLAMA_NUM_BATCH You're right. Those must be artifacts of me messing around. It didn't bother until now... Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55730