[PR #2824] [MERGED] Convert Safetensors to an Ollama model #10980

Closed
opened 2026-04-12 23:17:37 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2824
Author: @pdevine
Created: 2/29/2024
Status: Merged
Merged: 3/7/2024
Merged by: @pdevine

Base: mainHead: convert


📝 Commits (10+)

  • 06ec129 wip mistral converter
  • 2b5b123 formatting
  • b069fb4 working convert for mistral safetensors -> gguf f16
  • 804ba07 hook convert into ollama create
  • 7ab42f4 switch gorgonia package to fork to fix gc problem
  • 4bf5a2d dry out creation + remove the temp zip file after creation
  • 23dc370 remove debugging printfs
  • 4e428e3 more cleanup
  • e0f04cd fix linter issues
  • 5e4f0c8 address comments

📊 Changes

9 files changed (+3083 additions, -153 deletions)

View changed files

📝 cmd/cmd.go (+89 -8)
convert/convert.go (+331 -0)
convert/sentencepiece/sentencepiece_model.pb.go (+1497 -0)
convert/sentencepiece_model.proto (+333 -0)
📝 go.mod (+22 -3)
📝 go.sum (+148 -2)
📝 llm/ggml.go (+2 -2)
📝 llm/gguf.go (+574 -137)
📝 server/images.go (+87 -1)

📄 Description

This (admittedly very large) change converts a safetensors file in the FROM line of a Modelfile into a 16 bit non-quantized Ollama model without having to use llamacpp's convert.py script. This initial version works with Mistral v0.2 (and presumably v0.1 although I haven't yet tested this), and with some tweaks should probably also work with Gemma as well.

Some things to note:

  • Currently it only works with SentencePiece tokenization. We can add BPE and other Tokenizers in the future to support more model types
  • Quantization is not yet supported, but this will be supported in a future change to easily allow different quantization levels
  • LORA adapters don't currently work, but it would be nice to support his in the future
  • llamacpp requires repacking the q and k attention layers to swap some of the axes. This required pulling in the gorgonia.org/tensors package which is somewhat abandoned. I forked that library and it's hosted at github.com/pdevine/tensors but ideally we wouldn't have to fork it.
  • A lot of processing is around converting brainfloat16 numbers into float16. Neither of those formats are supported by golang which kind of sucks.
  • I mapped all of the params that Mistral required to build the GGUF file, but there are probably some that are missing. Those should be added as we support more models.
  • I haven't yet added unit tests here, which would be really nice, but there's so much to test!

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2824 **Author:** [@pdevine](https://github.com/pdevine) **Created:** 2/29/2024 **Status:** ✅ Merged **Merged:** 3/7/2024 **Merged by:** [@pdevine](https://github.com/pdevine) **Base:** `main` ← **Head:** `convert` --- ### 📝 Commits (10+) - [`06ec129`](https://github.com/ollama/ollama/commit/06ec129db441962c818dd499725270e69591d0ad) wip mistral converter - [`2b5b123`](https://github.com/ollama/ollama/commit/2b5b123c27728e89fd08b97396efbe94ce452b38) formatting - [`b069fb4`](https://github.com/ollama/ollama/commit/b069fb408188cb23f4224f9c577155e8e0a350ab) working convert for mistral safetensors -> gguf f16 - [`804ba07`](https://github.com/ollama/ollama/commit/804ba07c536d83d2e380d6913c40c2f405699a23) hook convert into ollama create - [`7ab42f4`](https://github.com/ollama/ollama/commit/7ab42f4ba1602fdb08135af813f9141a1dadc708) switch gorgonia package to fork to fix gc problem - [`4bf5a2d`](https://github.com/ollama/ollama/commit/4bf5a2d39e0c487953f54df7ead71702e8a49506) dry out creation + remove the temp zip file after creation - [`23dc370`](https://github.com/ollama/ollama/commit/23dc37007ccb3d4ad7e9220b5f87284a4ee48e22) remove debugging printfs - [`4e428e3`](https://github.com/ollama/ollama/commit/4e428e3c33d0994fb15c67126db89f1be1bdf5bd) more cleanup - [`e0f04cd`](https://github.com/ollama/ollama/commit/e0f04cd55c858a3ca9c4ae960f1ffeb4137d8c78) fix linter issues - [`5e4f0c8`](https://github.com/ollama/ollama/commit/5e4f0c856fffeb7552288f8579ac1f366f662763) address comments ### 📊 Changes **9 files changed** (+3083 additions, -153 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+89 -8) ➕ `convert/convert.go` (+331 -0) ➕ `convert/sentencepiece/sentencepiece_model.pb.go` (+1497 -0) ➕ `convert/sentencepiece_model.proto` (+333 -0) 📝 `go.mod` (+22 -3) 📝 `go.sum` (+148 -2) 📝 `llm/ggml.go` (+2 -2) 📝 `llm/gguf.go` (+574 -137) 📝 `server/images.go` (+87 -1) </details> ### 📄 Description This (admittedly very large) change converts a safetensors file in the `FROM` line of a Modelfile into a 16 bit *non-quantized* Ollama model without having to use llamacpp's `convert.py` script. This initial version works with Mistral v0.2 (and presumably v0.1 although I haven't yet tested this), and with some tweaks should probably also work with Gemma as well. Some things to note: * Currently it only works with SentencePiece tokenization. We can add BPE and other Tokenizers in the future to support more model types * Quantization is not yet supported, but this will be supported in a future change to easily allow different quantization levels * LORA adapters don't currently work, but it would be nice to support his in the future * llamacpp requires repacking the `q` and `k` attention layers to swap some of the axes. This required pulling in the `gorgonia.org/tensors` package which is somewhat abandoned. I forked that library and it's hosted at `github.com/pdevine/tensors` but ideally we wouldn't have to fork it. * A lot of processing is around converting brainfloat16 numbers into float16. Neither of those formats are supported by golang which kind of sucks. * I mapped all of the params that Mistral required to build the GGUF file, but there are probably some that are missing. Those should be added as we support more models. * I haven't *yet* added unit tests here, which would be really nice, but there's so much to test! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:17:37 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#10980