[GH-ISSUE #11834] Improve log readability by adding a single summary line after model load #7854

Open
opened 2026-04-12 20:01:04 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @kha84 on GitHub (Aug 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11834

Currently, Ollama’s logs make it hard to quickly see the key parameters with which a model was actually loaded. Important details — such as:

  • Model name
  • Context size
  • Whether flash attention was enabled
  • KV cache quantization setting
  • Device used (CPU/GPU, etc.)

— are scattered across multiple log lines, often printed by different parts of the runtime as the model initializes.

I understand this is partly due to the fact that model loading decisions happen dynamically in different modules, but it would greatly improve troubleshooting if Ollama printed one concise, standardized summary line once the model is fully loaded.

Example of what I have in mind:

Loaded model <MODEL_NAME> with context=<N>, flash_attention=<on/off>, kv_cache_quant=<mode>, device=<CPU/GPU/other>, ...

This would allow developers and users to:

  • Quickly confirm the model was loaded as intended
  • Spot mismatches in parameters without scanning multiple log entries
  • Troubleshoot tricky runtime behaviors more easily

Even if the underlying details are still logged separately during initialization, this one-line summary would be an excellent quality-of-life improvement.

Originally created by @kha84 on GitHub (Aug 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11834 Currently, Ollama’s logs make it hard to quickly see the key parameters with which a model was actually loaded. Important details — such as: * Model name * Context size * Whether flash attention was enabled * KV cache quantization setting * Device used (CPU/GPU, etc.) — are scattered across multiple log lines, often printed by different parts of the runtime as the model initializes. I understand this is partly due to the fact that model loading decisions happen dynamically in different modules, but it would greatly improve troubleshooting if Ollama printed **one concise, standardized summary line** once the model is fully loaded. Example of what I have in mind: ``` Loaded model <MODEL_NAME> with context=<N>, flash_attention=<on/off>, kv_cache_quant=<mode>, device=<CPU/GPU/other>, ... ``` This would allow developers and users to: * Quickly confirm the model was loaded as intended * Spot mismatches in parameters without scanning multiple log entries * Troubleshoot tricky runtime behaviors more easily Even if the underlying details are still logged separately during initialization, this one-line summary would be an excellent quality-of-life improvement.
GiteaMirror added the feature request label 2026-04-12 20:01:04 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7854