[GH-ISSUE #15067] Feature: Capability-aware tool presentation based on model size #9668

Closed
opened 2026-04-12 22:33:27 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @spranab on GitHub (Mar 26, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15067

Summary

When using /api/chat with tools, Ollama presents all tool definitions identically regardless of model size. A 0.8B model receives the same 80 tool descriptions as a 35B model. This wastes prompt tokens and degrades tool selection accuracy for smaller models.

Since Ollama already knows the loaded model's parameter count, it could adapt tool presentation automatically.

The Problem

Benchmarked with Ollama's native tool calling API (/api/chat with tools), 80 tools, 50 prompts:

Model Accuracy Prompt tokens
qwen2.5:1.5b 50% 3,408
qwen3.5:9b 80% 5,272
gpt-oss:20b 80% 2,143
qwen3.5:35b 88% 5,272

Small models waste 3,000-5,000 tokens on tool descriptions they can't effectively use.

Key Finding

Tool selection accuracy decomposes as:

P(correct tool) = P(correct family) × P(correct tool | correct family)

Even qwen2.5:1.5b achieves 89% within-family accuracy. The bottleneck is navigation, not selection.

Proposed Feature

When the loaded model has fewer parameters, Ollama could:

Option A: Server-side tool adaptation

Ollama automatically shortens tool descriptions and reduces parameter schemas for smaller models. The client sends full tools; Ollama adapts them before prompting.

Option B: Expose model metadata to clients

Add model parameter count to /api/show response (if not already), so clients can adapt tools before sending. This is the lighter-touch option.

Option C: Support tier hints in tool definitions

Allow an optional tiers field in each tool definition:

{
  "type": "function",
  "function": {
    "name": "file_read",
    "description": "Read file contents with line numbers, offset, and encoding",
    "parameters": { ... },
    "tiers": {
      "small": {
        "description": "Read file",
        "parameters": {
          "type": "object",
          "properties": { "path": {"type": "string"} },
          "required": ["path"]
        }
      }
    }
  }
}

Ollama picks the right tier based on loaded model size. Unknown tiers fall back to the top-level definition.

Benchmark Evidence

With tier-adapted presentation on Ollama:

Strategy 1.5B 20B
Baseline (all 80 tools) 50% 80%
Hybrid (8 detailed + 72 name-only) 60% (+10pp) 76%
Semantic reorder + category hint 54% 88% (+8pp)

Token reduction: 83-97% depending on strategy.

References

All benchmarks were run on Ollama's /api/chat with native tool calling. Happy to provide additional data.

Originally created by @spranab on GitHub (Mar 26, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15067 ## Summary When using `/api/chat` with `tools`, Ollama presents all tool definitions identically regardless of model size. A 0.8B model receives the same 80 tool descriptions as a 35B model. This wastes prompt tokens and degrades tool selection accuracy for smaller models. Since Ollama already knows the loaded model's parameter count, it could **adapt tool presentation automatically**. ## The Problem Benchmarked with Ollama's native tool calling API (`/api/chat` with `tools`), 80 tools, 50 prompts: | Model | Accuracy | Prompt tokens | |-------|----------|--------------| | qwen2.5:1.5b | **50%** | 3,408 | | qwen3.5:9b | 80% | 5,272 | | gpt-oss:20b | 80% | 2,143 | | qwen3.5:35b | **88%** | 5,272 | Small models waste 3,000-5,000 tokens on tool descriptions they can't effectively use. ## Key Finding Tool selection accuracy decomposes as: ``` P(correct tool) = P(correct family) × P(correct tool | correct family) ``` Even qwen2.5:1.5b achieves **89% within-family accuracy**. The bottleneck is navigation, not selection. ## Proposed Feature When the loaded model has fewer parameters, Ollama could: ### Option A: Server-side tool adaptation Ollama automatically shortens tool descriptions and reduces parameter schemas for smaller models. The client sends full tools; Ollama adapts them before prompting. ### Option B: Expose model metadata to clients Add model parameter count to `/api/show` response (if not already), so clients can adapt tools before sending. This is the lighter-touch option. ### Option C: Support tier hints in tool definitions Allow an optional `tiers` field in each tool definition: ```json { "type": "function", "function": { "name": "file_read", "description": "Read file contents with line numbers, offset, and encoding", "parameters": { ... }, "tiers": { "small": { "description": "Read file", "parameters": { "type": "object", "properties": { "path": {"type": "string"} }, "required": ["path"] } } } } } ``` Ollama picks the right tier based on loaded model size. Unknown tiers fall back to the top-level definition. ## Benchmark Evidence With tier-adapted presentation on Ollama: | Strategy | 1.5B | 20B | |----------|------|-----| | Baseline (all 80 tools) | 50% | 80% | | Hybrid (8 detailed + 72 name-only) | **60%** (+10pp) | 76% | | Semantic reorder + category hint | 54% | **88%** (+8pp) | Token reduction: **83-97%** depending on strategy. ## References - Whitepaper: [DOI: 10.5281/zenodo.19228710](https://zenodo.org/records/19228710) - SDK: [yantrikos-sdk](https://pypi.org/project/yantrikos-sdk/) on PyPI - Benchmark: [github.com/yantrikos/tier](https://github.com/yantrikos/tier) (harness_v3.py, all data in JSONL) All benchmarks were run on Ollama's `/api/chat` with native tool calling. Happy to provide additional data.
GiteaMirror added the feature request label 2026-04-12 22:33:27 -05:00
Author
Owner

@PiyushInt commented on GitHub (Mar 28, 2026):

hi @spranab, I hope my solution can fix this Issue.

please review it

<!-- gh-comment-id:4148312146 --> @PiyushInt commented on GitHub (Mar 28, 2026): hi @spranab, I hope my solution can fix this Issue. please review it
Author
Owner

@spranab commented on GitHub (Mar 28, 2026):

Thanks @PiyushInt! Exposing parameter_count in /api/show is exactly Option B from the proposal — gives clients what they need to adapt tool presentation without Ollama having to implement the routing logic itself. Clean approach.

This would let the yantrikos-sdk call /api/show → read parameter_count → auto-detect tier → adapt tools, all client-side.

<!-- gh-comment-id:4148607839 --> @spranab commented on GitHub (Mar 28, 2026): Thanks @PiyushInt! Exposing `parameter_count` in `/api/show` is exactly Option B from the proposal — gives clients what they need to adapt tool presentation without Ollama having to implement the routing logic itself. Clean approach. This would let the [yantrikos-sdk](https://pypi.org/project/yantrikos-sdk/) call `/api/show` → read `parameter_count` → auto-detect tier → adapt tools, all client-side.
Author
Owner

@spranab commented on GitHub (Mar 29, 2026):

Update: we've published yantrikos-sdk v0.3.0 which now reads the exact parameter count from Ollama's /api/show — no PR needed, the data is already there.

from yantrikos import detect_tier_from_ollama

tier = detect_tier_from_ollama('qwen3.5:9b')
# Queries /api/show → reads model_info.general.parameter_count → 9,653,104,368 → Tier.M

Thanks to @rick-github for pointing out that general.parameter_count is already available. This made the implementation trivial.

pip install yantrikos-sdk

<!-- gh-comment-id:4149468712 --> @spranab commented on GitHub (Mar 29, 2026): Update: we've published [yantrikos-sdk v0.3.0](https://pypi.org/project/yantrikos-sdk/) which now reads the exact parameter count from Ollama's `/api/show` — no PR needed, the data is already there. ```python from yantrikos import detect_tier_from_ollama tier = detect_tier_from_ollama('qwen3.5:9b') # Queries /api/show → reads model_info.general.parameter_count → 9,653,104,368 → Tier.M ``` Thanks to @rick-github for pointing out that `general.parameter_count` is already available. This made the implementation trivial. `pip install yantrikos-sdk`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9668