[PR #14969] create: move safetensor model creation server-side #14945

Open
opened 2026-04-13 01:06:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14969
Author: @dhiltgen
Created: 3/19/2026
Status: 🔄 Open

Base: mainHead: create_phase2


📝 Commits (1)

  • e6a479b create: add remote safetensors creation via API

📊 Changes

15 files changed (+1505 additions, -170 deletions)

View changed files

📝 api/client.go (+13 -0)
📝 api/types.go (+9 -0)
📝 cmd/cmd.go (+23 -5)
📝 docs/api.md (+39 -0)
📝 integration/create_test.go (+57 -16)
📝 progress/bar.go (+4 -0)
📝 server/create.go (+169 -0)
📝 server/routes_create_test.go (+91 -146)
📝 x/create/client/create.go (+1 -1)
📝 x/create/client/quantize.go (+5 -2)
x/create/client/remote.go (+633 -0)
x/create/client/remote_test.go (+420 -0)
📝 x/mlxrunner/mlx/memory.go (+5 -0)
📝 x/mlxrunner/mlx/stream.go (+5 -0)
📝 x/safetensors/extractor.go (+31 -0)

📄 Description

Add a streaming pipeline for creating safetensors models through the server API. The client extracts tensors, uploads blobs concurrently (with HEAD checks to skip existing), and sends a CreateRequest with model_format="safetensors". The server assembles the manifest from pre-uploaded blobs, with optional server-side quantization via MLX.

Key changes:

  • New remote creation pipeline (x/create/client/remote.go)
  • Server-side createSafetensorsModel handler with quantization support
  • API change ModelFormat, Capabilities fields; HeadBlob client method
  • MLX lifecycle fixes: cached DefaultCPUStream, Pin/Sweep/ClearCache
  • Safetensors LLM always uses API path (faster due to parallelism)
  • Imagegen stays on local-only path behind --experimental

Using GLM-4.7-Flash as a test case, this change improves performance for both unquantized and quantized creates by over 2x. (51s before, 23-25s after)


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14969 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/19/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `create_phase2` --- ### 📝 Commits (1) - [`e6a479b`](https://github.com/ollama/ollama/commit/e6a479bf0239228b63304ba9a777a4fb0bf1b299) create: add remote safetensors creation via API ### 📊 Changes **15 files changed** (+1505 additions, -170 deletions) <details> <summary>View changed files</summary> 📝 `api/client.go` (+13 -0) 📝 `api/types.go` (+9 -0) 📝 `cmd/cmd.go` (+23 -5) 📝 `docs/api.md` (+39 -0) 📝 `integration/create_test.go` (+57 -16) 📝 `progress/bar.go` (+4 -0) 📝 `server/create.go` (+169 -0) 📝 `server/routes_create_test.go` (+91 -146) 📝 `x/create/client/create.go` (+1 -1) 📝 `x/create/client/quantize.go` (+5 -2) ➕ `x/create/client/remote.go` (+633 -0) ➕ `x/create/client/remote_test.go` (+420 -0) 📝 `x/mlxrunner/mlx/memory.go` (+5 -0) 📝 `x/mlxrunner/mlx/stream.go` (+5 -0) 📝 `x/safetensors/extractor.go` (+31 -0) </details> ### 📄 Description Add a streaming pipeline for creating safetensors models through the server API. The client extracts tensors, uploads blobs concurrently (with HEAD checks to skip existing), and sends a CreateRequest with model_format="safetensors". The server assembles the manifest from pre-uploaded blobs, with optional server-side quantization via MLX. Key changes: - New remote creation pipeline (x/create/client/remote.go) - Server-side createSafetensorsModel handler with quantization support - **API change** ModelFormat, Capabilities fields; HeadBlob client method - MLX lifecycle fixes: cached DefaultCPUStream, Pin/Sweep/ClearCache - Safetensors LLM always uses API path (faster due to parallelism) - Imagegen stays on local-only path behind --experimental Using GLM-4.7-Flash as a test case, this change improves performance for both unquantized and quantized creates by over 2x. (51s before, 23-25s after) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:06:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14945