[GH-ISSUE #2342] Quantize and Ollama Model #1354

Closed
opened 2026-04-12 11:11:28 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @stealthier-ai on GitHub (Feb 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2342

I need to quantize a full version of an Ollama model that I layered in new weights for a specialized use case. Is there a way to do that within Ollama? It seems like I need to clone Llama.cpp and quantize through that. There are also other ways to quantize GGUF files and then recreate an Ollama model file. Am I missing anything or is there a specific method I should be using?

Originally created by @stealthier-ai on GitHub (Feb 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2342 I need to quantize a full version of an Ollama model that I layered in new weights for a specialized use case. Is there a way to do that within Ollama? It seems like I need to clone Llama.cpp and quantize through that. There are also other ways to quantize GGUF files and then recreate an Ollama model file. Am I missing anything or is there a specific method I should be using?
Author
Owner

@pdevine commented on GitHub (Feb 4, 2024):

Hey @stealthier-ai . There are some instructions on how to do this here. I'm guessing you probably want to follow the steps for manually converting your model, but you don't actually need to clone a copy of llama.cpp if you have ollama already cloned, as there is a copy in the llm/llama.cpp directory. You can just run make quantize in that directory to build the binary.

That said, the process is less than ideal, and I've been working on creating a new way to convert/quantize models to make this a lot easier.

<!-- gh-comment-id:1925562605 --> @pdevine commented on GitHub (Feb 4, 2024): Hey @stealthier-ai . There are some instructions on how to do this [here](https://github.com/ollama/ollama/blob/main/docs/import.md). I'm guessing you probably want to follow the steps for [manually converting](https://github.com/ollama/ollama/blob/main/docs/import.md#manually-converting--quantizing-models) your model, but you don't actually need to clone a copy of llama.cpp if you have ollama already cloned, as there is a copy in the `llm/llama.cpp` directory. You can just run `make quantize` in that directory to build the binary. That said, the process is less than ideal, and I've been working on creating a new way to convert/quantize models to make this a lot easier.
Author
Owner

@bmizerany commented on GitHub (Mar 11, 2024):

Going to go ahead and close this. Please feel free to reopen if you need to.

<!-- gh-comment-id:1989662284 --> @bmizerany commented on GitHub (Mar 11, 2024): Going to go ahead and close this. Please feel free to reopen if you need to.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1354