[GH-ISSUE #15945] OLLAMA_DEBUG_LOG_REQUESTS is missing from ollama serve --help. #72213

Open
opened 2026-05-05 03:38:54 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @your-diary on GitHub (May 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15945

What is the issue?

The OLLAMA_DEBUG_LOG_REQUESTS=1 env var enables request logging:

But that is not listed in the output of ollama serve --help:

$ ollama serve --help
Start Ollama

Usage:
  ollama serve [flags]

Aliases:
  serve, start

Flags:
  -h, --help   help for serve

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_CONTEXT_LENGTH      Context length to use unless otherwise specified (default: 4k/32k/256k based on VRAM)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NO_CLOUD            Disable Ollama cloud features (remote inference and web search)
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_KV_CACHE_TYPE       Quantization type for the K/V cache (default: f16)
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")

Relevant log output


OS

macOS

GPU

No response

CPU

No response

Ollama version

0.22.0

Originally created by @your-diary on GitHub (May 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15945 ### What is the issue? The `OLLAMA_DEBUG_LOG_REQUESTS=1` env var enables request logging: - https://github.com/ollama/ollama/pull/14106 But that is not listed in the output of `ollama serve --help`: ``` $ ollama serve --help Start Ollama Usage: ollama serve [flags] Aliases: serve, start Flags: -h, --help help for serve Environment Variables: OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1) OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434) OLLAMA_CONTEXT_LENGTH Context length to use unless otherwise specified (default: 4k/32k/256k based on VRAM) OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m") OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models per GPU OLLAMA_MAX_QUEUE Maximum number of queued requests OLLAMA_MODELS The path to the models directory OLLAMA_NUM_PARALLEL Maximum number of parallel requests OLLAMA_NO_CLOUD Disable Ollama cloud features (remote inference and web search) OLLAMA_NOPRUNE Do not prune model blobs on startup OLLAMA_ORIGINS A comma separated list of allowed origins OLLAMA_SCHED_SPREAD Always schedule model across all GPUs OLLAMA_FLASH_ATTENTION Enabled flash attention OLLAMA_KV_CACHE_TYPE Quantization type for the K/V cache (default: f16) OLLAMA_LLM_LIBRARY Set LLM library to bypass autodetection OLLAMA_GPU_OVERHEAD Reserve a portion of VRAM per GPU (bytes) OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m") ``` ### Relevant log output ```shell ``` ### OS macOS ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.22.0
GiteaMirror added the bug label 2026-05-05 03:38:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72213