[PR #15081] feat: autotune - automatic hardware-based performance optimization #61704

Open
opened 2026-04-29 16:44:37 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15081
Author: @GuilhermeP96
Created: 3/26/2026
Status: 🔄 Open

Base: mainHead: main


📝 Commits (2)

  • 9d65fdf feat: add autotune package for automatic hardware-based performance optimization
  • f9ac5d5 docs: add autotune documentation to CLI reference and FAQ

📊 Changes

12 files changed (+1498 additions, -24 deletions)

View changed files

autotune/README.md (+87 -0)
autotune/api.go (+152 -0)
autotune/apply.go (+82 -0)
autotune/autotune_test.go (+332 -0)
autotune/hardware.go (+120 -0)
autotune/profile.go (+145 -0)
autotune/tuner.go (+328 -0)
📝 cmd/cmd.go (+132 -1)
📝 docs/cli.mdx (+26 -0)
📝 docs/faq.mdx (+51 -0)
📝 envconfig/config.go (+27 -23)
📝 server/routes.go (+16 -0)

📄 Description

Summary

Adds an autotune package that automatically detects hardware (GPU VRAM, CPU cores, RAM) and applies optimal Ollama environment variables for best inference performance.

Changes

New package: autotune/

  • hardware.go — Detects GPU VRAM, CPU cores, total RAM
  • profile.go — Defines 5 performance profiles: balanced, speed, memory, multiuser, max
  • tuner.go — Selects the best profile based on detected hardware
  • apply.go — Applies profile settings as Ollama environment variables
  • api.go — HTTP handlers for the autotune API
  • autotune_test.go — Unit tests

API Endpoints

  • GET /api/autotune — Returns current autotune status and hardware info
  • GET /api/autotune/profiles — Lists all available profiles
  • POST /api/autotune — Applies a specific profile (body: {"profile": "speed"})

Integration Points

  • envconfig/ — Registered autotune environment variables
  • server/routes.go — Registered API route handlers
  • cmd/cmd.go — Added ollama autotune CLI commands (status, profiles, apply)

Documentation

  • docs/cli.mdx — Documented ollama autotune commands
  • docs/faq.mdx — Added autotune FAQ entry

Profiles

Profile Use Case Key Settings
balanced Default, general use Auto-scaled to hardware
speed Single-user, max throughput KEEP_ALIVE=30m, optimized threading
memory Low-VRAM GPUs (<6GB) Reduced context, conservative memory
multiuser Shared servers NUM_PARALLEL=4, shorter keep-alive
max Maximum performance, high-end GPUs Max context, aggressive settings

Testing

  • Unit tests included (go test ./autotune/)
  • Tested on: Intel Xeon E5-2676 v3 (24C/48T) + GTX 1660 SUPER 6GB + 56GB RAM
  • Built and validated on Windows with go build (v0.18.2-autotune rebased onto v0.18.4-rc0)

Motivation

Many users don't know the optimal environment variables for their hardware. Autotune eliminates guesswork by detecting the system capabilities and applying the best settings automatically, improving inference speed without manual configuration.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15081 **Author:** [@GuilhermeP96](https://github.com/GuilhermeP96) **Created:** 3/26/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (2) - [`9d65fdf`](https://github.com/ollama/ollama/commit/9d65fdf8f844b3518ec43a57fc9764d842e7db26) feat: add autotune package for automatic hardware-based performance optimization - [`f9ac5d5`](https://github.com/ollama/ollama/commit/f9ac5d5a073e0543c4c711a0ebbca856605e10e5) docs: add autotune documentation to CLI reference and FAQ ### 📊 Changes **12 files changed** (+1498 additions, -24 deletions) <details> <summary>View changed files</summary> ➕ `autotune/README.md` (+87 -0) ➕ `autotune/api.go` (+152 -0) ➕ `autotune/apply.go` (+82 -0) ➕ `autotune/autotune_test.go` (+332 -0) ➕ `autotune/hardware.go` (+120 -0) ➕ `autotune/profile.go` (+145 -0) ➕ `autotune/tuner.go` (+328 -0) 📝 `cmd/cmd.go` (+132 -1) 📝 `docs/cli.mdx` (+26 -0) 📝 `docs/faq.mdx` (+51 -0) 📝 `envconfig/config.go` (+27 -23) 📝 `server/routes.go` (+16 -0) </details> ### 📄 Description ## Summary Adds an **autotune** package that automatically detects hardware (GPU VRAM, CPU cores, RAM) and applies optimal Ollama environment variables for best inference performance. ## Changes ### New package: `autotune/` - **hardware.go** — Detects GPU VRAM, CPU cores, total RAM - **profile.go** — Defines 5 performance profiles: balanced, speed, memory, multiuser, max - **tuner.go** — Selects the best profile based on detected hardware - **apply.go** — Applies profile settings as Ollama environment variables - **api.go** — HTTP handlers for the autotune API - **autotune_test.go** — Unit tests ### API Endpoints - `GET /api/autotune` — Returns current autotune status and hardware info - `GET /api/autotune/profiles` — Lists all available profiles - `POST /api/autotune` — Applies a specific profile (body: `{"profile": "speed"}`) ### Integration Points - `envconfig/` — Registered autotune environment variables - `server/routes.go` — Registered API route handlers - `cmd/cmd.go` — Added `ollama autotune` CLI commands (status, profiles, apply) ### Documentation - `docs/cli.mdx` — Documented `ollama autotune` commands - `docs/faq.mdx` — Added autotune FAQ entry ## Profiles | Profile | Use Case | Key Settings | |---------|----------|-------------| | balanced | Default, general use | Auto-scaled to hardware | | speed | Single-user, max throughput | KEEP_ALIVE=30m, optimized threading | | memory | Low-VRAM GPUs (<6GB) | Reduced context, conservative memory | | multiuser | Shared servers | NUM_PARALLEL=4, shorter keep-alive | | max | Maximum performance, high-end GPUs | Max context, aggressive settings | ## Testing - Unit tests included (`go test ./autotune/`) - Tested on: Intel Xeon E5-2676 v3 (24C/48T) + GTX 1660 SUPER 6GB + 56GB RAM - Built and validated on Windows with `go build` (v0.18.2-autotune rebased onto v0.18.4-rc0) ## Motivation Many users don't know the optimal environment variables for their hardware. Autotune eliminates guesswork by detecting the system capabilities and applying the best settings automatically, improving inference speed without manual configuration. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:44:37 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61704