[GH-ISSUE #156] Fine-tuning support #46569

Open
opened 2026-04-27 23:01:17 -05:00 by GiteaMirror · 21 comments
Owner

Originally created by @shrikrishnaholla on GitHub (Jul 21, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/156

Originally assigned to: @pdevine on GitHub.

First of all, thanks for building this tool and releasing it as open source. I like that the interfaces seem similar to docker.

I also like the idea of Modelfile. Maybe it could also be used to define a finetuning process. That would also allow making the build process be part of a CI/CD routine and would allow building private finetuned models with a good developer UX, which I'm sure lots of people are looking for presently.

Originally created by @shrikrishnaholla on GitHub (Jul 21, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/156 Originally assigned to: @pdevine on GitHub. First of all, thanks for building this tool and releasing it as open source. I like that the interfaces seem similar to `docker`. I also like the idea of Modelfile. Maybe it could also be used to define a finetuning process. That would also allow making the build process be part of a CI/CD routine and would allow building private finetuned models with a good developer UX, which I'm sure lots of people are looking for presently.
GiteaMirror added the feature request label 2026-04-27 23:01:17 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Jul 21, 2023):

@shrikrishnaholla This is a feature that we actively think about. That being said, there are many foundational features that we need to prioritize first before embarking on this major feature.

Would love to have an understanding how do you do this today without Ollama, and where?

<!-- gh-comment-id:1644998352 --> @mchiang0610 commented on GitHub (Jul 21, 2023): @shrikrishnaholla This is a feature that we actively think about. That being said, there are many foundational features that we need to prioritize first before embarking on this major feature. Would love to have an understanding how do you do this today without Ollama, and where?
Author
Owner

@SaraiQX commented on GitHub (Jul 21, 2023):

@mchiang0610 Re your last question, I've just started to learn about cuda, metal and ggml (and new languages like mojo, taichi, ..) and try to understand the challenge for apple devices to be used as N cards 😂. Given my zero CS background, I feel excited to learn about Ollama and really look forward to your updates. 💪🏻💪🏻

<!-- gh-comment-id:1645094058 --> @SaraiQX commented on GitHub (Jul 21, 2023): @mchiang0610 Re your last question, I've just started to learn about cuda, metal and ggml (and new languages like mojo, taichi, ..) and try to understand the challenge for apple devices to be used as N cards 😂. Given my zero CS background, I feel excited to learn about Ollama and really look forward to your updates. 💪🏻💪🏻
Author
Owner

@shrikrishnaholla commented on GitHub (Jul 26, 2023):

I haven't finetuned any model yet. However, I will need to soon for my work. So I have been exploring easy ways to do so. Currently, I have come across the following links that could be useful:

<!-- gh-comment-id:1651093735 --> @shrikrishnaholla commented on GitHub (Jul 26, 2023): I haven't finetuned any model yet. However, I will need to soon for my work. So I have been exploring easy ways to do so. Currently, I have come across the following links that could be useful: - https://github.com/OpenAccess-AI-Collective/axolotl (This is one that by far has better UX for finetuning imo) - https://old.reddit.com/r/LocalLLaMA/comments/14vnfh2/my_experience_on_starting_with_fine_tuning_llms/ (Doesn't have code, but interesting discussions and experience of someone who actually has done finetuning work) - https://docs.ray.io/en/latest/ray-air/examples/gptj_deepspeed_fine_tuning.html (Coding example from ray.io. Production grade finetuning ) - https://huggingface.co/docs/trl/main/en/sft_trainer (This could be a starting point for the project. You could start simple, and gradually improve based on community feedback) - https://github.com/kuutsav/llm-toys (Simple finetunes)
Author
Owner

@OmeliaEngineering commented on GitHub (Aug 4, 2023):

@mchiang0610 the replicate offer is currently the simplest and best presented, from what we've seen. we're actively looking for an alternative to GPT4 and so also very interested in easy ways to fine tune foundational models

https://replicate.com/blog/fine-tune-llama-2

<!-- gh-comment-id:1665422503 --> @OmeliaEngineering commented on GitHub (Aug 4, 2023): @mchiang0610 the replicate offer is currently the simplest and best presented, from what we've seen. we're actively looking for an alternative to GPT4 and so also very interested in easy ways to fine tune foundational models https://replicate.com/blog/fine-tune-llama-2
Author
Owner

@repollo commented on GitHub (Sep 26, 2023):

After going through some of the provided links, I've come to understand that there seems to be a distinction between a fundamental or "base" fine-tuning implementation and a more sophisticated, "ideal" approach. I particularly found insights from the ray.io example on deepspeed fine-tuning useful for conceptualizing the base implementation. On the other hand, the Reddit post on fine-tuning LLMs provided a comprehensive view of an ideal fine-tuning strategy.

The implementations can be summarized as:

Experimental Process (Tinkering with Specific Content):

  1. Select a Pre-trained Model: Begin with a model that already has generalized knowledge, serving as a foundation.
  2. Curate Specific Data: Gather content-specific datasets, such as Shakespeare's works, for a targeted fine-tuning goal.
  3. Set Training Parameters: Define hyperparameters, training epochs, learning rates, etc., tailored to the content-specific objective.
  4. Engage in Fine-tuning: Use appropriate tools to train the model on the curated dataset, aiming to achieve the desired style or content knowledge.
  5. Evaluate & Play: Test the model's outputs to ensure they align with the intended style or content. Iterate as necessary for improvements.

Scalable Process (For Broad Knowledge Absorption and Retrieval):

  1. Define Clear Objectives: Understand the broader goals, such as making the model knowledgeable about a wide range of topics or company-specific information.
  2. Establish an Embedding Store: As new and relevant data emerges, convert and store it in the form of embeddings. This serves as a dynamic, quickly accessible information repository.
  3. Query and Reference: When a question is posed to the model, it can check the embeddings to provide information, even if it hasn't been directly trained on that data.
  4. Periodic Fine-tuning: Monitor the embedding store's size and relevance. Once it reaches a certain threshold, use this data to fine-tune the model, enabling it to internalize the new knowledge.
  5. Cleanse and Refresh: After a successful fine-tuning, purge the embedding store of data that the model has been trained on, ensuring efficiency and preventing redundancy.
  6. Continuous Monitoring & Updates: Regularly evaluate the model's performance, and stay updated with new data and methodologies for consistent relevance and accuracy.

By distinguishing between these two processes, users can decide whether they want a more playful, content-specific model or a broader, continually updating knowledge base.

Note: Before embarking on any fine-tuning process, it's highly recommended to make a copy of the original model. This ensures that the original weights and biases remain unaffected, allowing users to revert to the base model if necessary or have multiple versions for different applications. Probably be wise to make the copy by default if using the fine tuning methods and optional if rewriting original models.

From all of this I think, the approach should be something in along the following code in llm.go:

package llm

import (
	"context"
	"fmt"
	"log"
	"os"

	"github.com/pbnjay/memory"

	"github.com/jmorganca/ollama/api"
)

type FineTuningData struct {
    TrainingData   []string  // Placeholder for actual training data
    Epochs         int       // Number of training epochs
    LearningRate   float64   // Learning rate
    CreateCopy     bool      // Whether to create a copy of the model for fine-tuning or modify the original
    // Add more fields as needed
}

type LLM interface {
	Predict(context.Context, []int, string, func(api.GenerateResponse)) error
	Embedding(context.Context, string) ([]float64, error)
	Encode(context.Context, string) ([]int, error)
	Decode(context.Context, []int) (string, error)
	SetOptions(api.Options)
	Close()
	Ping(context.Context) error
	FineTune(context.Context, FineTuningData) error

}

func (l *llama) FineTune(ctx context.Context, data FineTuningData) error {
    // Your fine-tuning logic here.
    // Use the data provided in the FineTuningData struct to adjust your model.
    // For example:
    // - Use data.TrainingData to get the data for fine-tuning.
    // - Adjust the learning rate using data.LearningRate.
    // - Fine-tune for data.Epochs epochs.
    
    // If data.CreateCopy is true, you might want to create a copy of the model before fine-tuning.
    
    // Ensure any errors during fine-tuning are returned.

    return nil  // return appropriate error if something goes wrong
}


func New(workDir, model string, adapters []string, opts api.Options) (LLM, error) {
	if _, err := os.Stat(model); err != nil {
		return nil, err
	}

	f, err := os.Open(model)
	if err != nil {
		return nil, err
	}
	defer f.Close()

	ggml, err := DecodeGGML(f)
	if err != nil {
		return nil, err
	}

	switch ggml.FileType() {
	case "Q8_0":
		if ggml.Name() != "gguf" && opts.NumGPU != 0 {
			// GGML Q8_0 do not support Metal API and will
			// cause the runner to segmentation fault so disable GPU
			log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0")
			opts.NumGPU = 0
		}
	case "F32", "Q5_0", "Q5_1":
		if opts.NumGPU != 0 {
			// F32, Q5_0, Q5_1, and Q8_0 do not support Metal API and will
			// cause the runner to segmentation fault so disable GPU
			log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0")
			opts.NumGPU = 0
		}
	}

	totalResidentMemory := memory.TotalMemory()
	switch ggml.ModelType() {
	case "3B", "7B":
		if ggml.FileType() == "F16" && totalResidentMemory < 16*1024*1024 {
			return nil, fmt.Errorf("F16 model requires at least 16GB of memory")
		} else if totalResidentMemory < 8*1024*1024 {
			return nil, fmt.Errorf("model requires at least 8GB of memory")
		}
	case "13B":
		if ggml.FileType() == "F16" && totalResidentMemory < 32*1024*1024 {
			return nil, fmt.Errorf("F16 model requires at least 32GB of memory")
		} else if totalResidentMemory < 16*1024*1024 {
			return nil, fmt.Errorf("model requires at least 16GB of memory")
		}
	case "30B", "34B", "40B":
		if ggml.FileType() == "F16" && totalResidentMemory < 64*1024*1024 {
			return nil, fmt.Errorf("F16 model requires at least 64GB of memory")
		} else if totalResidentMemory < 32*1024*1024 {
			return nil, fmt.Errorf("model requires at least 32GB of memory")
		}
	case "65B", "70B":
		if ggml.FileType() == "F16" && totalResidentMemory < 128*1024*1024 {
			return nil, fmt.Errorf("F16 model requires at least 128GB of memory")
		} else if totalResidentMemory < 64*1024*1024 {
			return nil, fmt.Errorf("model requires at least 64GB of memory")
		}
	case "180B":
		if ggml.FileType() == "F16" && totalResidentMemory < 512*1024*1024 {
			return nil, fmt.Errorf("F16 model requires at least 512GB of memory")
		} else if totalResidentMemory < 128*1024*1024 {
			return nil, fmt.Errorf("model requires at least 128GB of memory")
		}
	}

	switch ggml.Name() {
	case "gguf":
		opts.NumGQA = 0 // TODO: remove this when llama.cpp runners differ enough to need separate newLlama functions
		return newLlama(model, adapters, chooseRunners(workDir, "gguf"), ggml.NumLayers(), opts)
	case "ggml", "ggmf", "ggjt", "ggla":
		return newLlama(model, adapters, chooseRunners(workDir, "ggml"), ggml.NumLayers(), opts)
	default:
		return nil, fmt.Errorf("unknown ggml type: %s", ggml.ModelFamily())
	}
}

Im really bad at go

<!-- gh-comment-id:1735596408 --> @repollo commented on GitHub (Sep 26, 2023): After going through some of the provided links, I've come to understand that there seems to be a distinction between a fundamental or "base" fine-tuning implementation and a more sophisticated, "ideal" approach. I particularly found insights from the [ray.io example on deepspeed fine-tuning](https://docs.ray.io/en/latest/train/examples/deepspeed/gptj_deepspeed_fine_tuning.html?highlight=fine%20tuning#loading-the-dataset-a-name-load-a) useful for conceptualizing the base implementation. On the other hand, the [Reddit post on fine-tuning LLMs](https://old.reddit.com/r/LocalLLaMA/comments/14vnfh2/my_experience_on_starting_with_fine_tuning_llms/) provided a comprehensive view of an ideal fine-tuning strategy. The implementations can be summarized as: **Experimental Process (Tinkering with Specific Content)**: 1. **Select a Pre-trained Model**: Begin with a model that already has generalized knowledge, serving as a foundation. 2. **Curate Specific Data**: Gather content-specific datasets, such as Shakespeare's works, for a targeted fine-tuning goal. 3. **Set Training Parameters**: Define hyperparameters, training epochs, learning rates, etc., tailored to the content-specific objective. 4. **Engage in Fine-tuning**: Use appropriate tools to train the model on the curated dataset, aiming to achieve the desired style or content knowledge. 5. **Evaluate & Play**: Test the model's outputs to ensure they align with the intended style or content. Iterate as necessary for improvements. **Scalable Process (For Broad Knowledge Absorption and Retrieval)**: 1. **Define Clear Objectives**: Understand the broader goals, such as making the model knowledgeable about a wide range of topics or company-specific information. 2. **Establish an Embedding Store**: As new and relevant data emerges, convert and store it in the form of embeddings. This serves as a dynamic, quickly accessible information repository. 3. **Query and Reference**: When a question is posed to the model, it can check the embeddings to provide information, even if it hasn't been directly trained on that data. 4. **Periodic Fine-tuning**: Monitor the embedding store's size and relevance. Once it reaches a certain threshold, use this data to fine-tune the model, enabling it to internalize the new knowledge. 5. **Cleanse and Refresh**: After a successful fine-tuning, purge the embedding store of data that the model has been trained on, ensuring efficiency and preventing redundancy. 6. **Continuous Monitoring & Updates**: Regularly evaluate the model's performance, and stay updated with new data and methodologies for consistent relevance and accuracy. By distinguishing between these two processes, users can decide whether they want a more playful, content-specific model or a broader, continually updating knowledge base. Note: Before embarking on any fine-tuning process, it's highly recommended to make a copy of the original model. This ensures that the original weights and biases remain unaffected, allowing users to revert to the base model if necessary or have multiple versions for different applications. Probably be wise to make the copy by default if using the fine tuning methods and optional if rewriting original models. From all of this I think, the approach should be something in along the following code in `llm.go`: ```go package llm import ( "context" "fmt" "log" "os" "github.com/pbnjay/memory" "github.com/jmorganca/ollama/api" ) type FineTuningData struct { TrainingData []string // Placeholder for actual training data Epochs int // Number of training epochs LearningRate float64 // Learning rate CreateCopy bool // Whether to create a copy of the model for fine-tuning or modify the original // Add more fields as needed } type LLM interface { Predict(context.Context, []int, string, func(api.GenerateResponse)) error Embedding(context.Context, string) ([]float64, error) Encode(context.Context, string) ([]int, error) Decode(context.Context, []int) (string, error) SetOptions(api.Options) Close() Ping(context.Context) error FineTune(context.Context, FineTuningData) error } func (l *llama) FineTune(ctx context.Context, data FineTuningData) error { // Your fine-tuning logic here. // Use the data provided in the FineTuningData struct to adjust your model. // For example: // - Use data.TrainingData to get the data for fine-tuning. // - Adjust the learning rate using data.LearningRate. // - Fine-tune for data.Epochs epochs. // If data.CreateCopy is true, you might want to create a copy of the model before fine-tuning. // Ensure any errors during fine-tuning are returned. return nil // return appropriate error if something goes wrong } func New(workDir, model string, adapters []string, opts api.Options) (LLM, error) { if _, err := os.Stat(model); err != nil { return nil, err } f, err := os.Open(model) if err != nil { return nil, err } defer f.Close() ggml, err := DecodeGGML(f) if err != nil { return nil, err } switch ggml.FileType() { case "Q8_0": if ggml.Name() != "gguf" && opts.NumGPU != 0 { // GGML Q8_0 do not support Metal API and will // cause the runner to segmentation fault so disable GPU log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0") opts.NumGPU = 0 } case "F32", "Q5_0", "Q5_1": if opts.NumGPU != 0 { // F32, Q5_0, Q5_1, and Q8_0 do not support Metal API and will // cause the runner to segmentation fault so disable GPU log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0") opts.NumGPU = 0 } } totalResidentMemory := memory.TotalMemory() switch ggml.ModelType() { case "3B", "7B": if ggml.FileType() == "F16" && totalResidentMemory < 16*1024*1024 { return nil, fmt.Errorf("F16 model requires at least 16GB of memory") } else if totalResidentMemory < 8*1024*1024 { return nil, fmt.Errorf("model requires at least 8GB of memory") } case "13B": if ggml.FileType() == "F16" && totalResidentMemory < 32*1024*1024 { return nil, fmt.Errorf("F16 model requires at least 32GB of memory") } else if totalResidentMemory < 16*1024*1024 { return nil, fmt.Errorf("model requires at least 16GB of memory") } case "30B", "34B", "40B": if ggml.FileType() == "F16" && totalResidentMemory < 64*1024*1024 { return nil, fmt.Errorf("F16 model requires at least 64GB of memory") } else if totalResidentMemory < 32*1024*1024 { return nil, fmt.Errorf("model requires at least 32GB of memory") } case "65B", "70B": if ggml.FileType() == "F16" && totalResidentMemory < 128*1024*1024 { return nil, fmt.Errorf("F16 model requires at least 128GB of memory") } else if totalResidentMemory < 64*1024*1024 { return nil, fmt.Errorf("model requires at least 64GB of memory") } case "180B": if ggml.FileType() == "F16" && totalResidentMemory < 512*1024*1024 { return nil, fmt.Errorf("F16 model requires at least 512GB of memory") } else if totalResidentMemory < 128*1024*1024 { return nil, fmt.Errorf("model requires at least 128GB of memory") } } switch ggml.Name() { case "gguf": opts.NumGQA = 0 // TODO: remove this when llama.cpp runners differ enough to need separate newLlama functions return newLlama(model, adapters, chooseRunners(workDir, "gguf"), ggml.NumLayers(), opts) case "ggml", "ggmf", "ggjt", "ggla": return newLlama(model, adapters, chooseRunners(workDir, "ggml"), ggml.NumLayers(), opts) default: return nil, fmt.Errorf("unknown ggml type: %s", ggml.ModelFamily()) } } ``` <sub><sub>Im really bad at go</sub></sub>
Author
Owner

@shrikrishnaholla commented on GitHub (Nov 6, 2023):

Guys, I found this project that might be helpful: https://github.com/promptslab/LLMtuner

Discussion: https://old.reddit.com/r/LocalLLaMA/comments/17o8zl2/open_sourcing_llmtuner_an_experimental_framework/

<!-- gh-comment-id:1794011891 --> @shrikrishnaholla commented on GitHub (Nov 6, 2023): Guys, I found this project that might be helpful: https://github.com/promptslab/LLMtuner Discussion: https://old.reddit.com/r/LocalLLaMA/comments/17o8zl2/open_sourcing_llmtuner_an_experimental_framework/
Author
Owner

@MostlyKIGuess commented on GitHub (Nov 17, 2023):

oh boy, I would love to work on the embedding, we can implement the similar stuff as localGPT, using smaller instructor models to find the most relevant data and then cloning that data along with prompt for the most accurate answer keeping creativity temperature 0.

so the workflow would go from

  • user query -> model

to:

  • user query-> data analyzer
  • from data analyzer the most relevant data of chunk size fixed by user along with the no. of citations + prompt -> model
<!-- gh-comment-id:1816044791 --> @MostlyKIGuess commented on GitHub (Nov 17, 2023): oh boy, I would love to work on the embedding, we can implement the similar stuff as localGPT, using smaller instructor models to find the most relevant data and then cloning that data along with prompt for the most accurate answer keeping creativity temperature 0. ### so the workflow would go from - user query -> model ### to: - user query-> data analyzer - from data analyzer the most relevant data of chunk size fixed by user along with the no. of citations + prompt -> model
Author
Owner

@ahdyt commented on GitHub (Feb 28, 2024):

Hi is it possible to finetune the model as easy as ollama train <input_model> "books/input.pdf/anything" <output_model> ?

<!-- gh-comment-id:1969010676 --> @ahdyt commented on GitHub (Feb 28, 2024): Hi is it possible to finetune the model as easy as `ollama train <input_model> "books/input.pdf/anything" <output_model>` ?
Author
Owner

@Nicat-dcw commented on GitHub (Mar 29, 2024):

Hi is it possible to finetune the model as easy as ollama train <input_model> "books/input.pdf/anything" <output_model> ?

Can you send pdf file?

<!-- gh-comment-id:2027229555 --> @Nicat-dcw commented on GitHub (Mar 29, 2024): > Hi is it possible to finetune the model as easy as `ollama train <input_model> "books/input.pdf/anything" <output_model>` ? Can you send pdf file?
Author
Owner

@AlgoClaw commented on GitHub (Apr 10, 2024):

Can this issue be renamed to "fine tuning support"? (instead of "fune")

<!-- gh-comment-id:2048441645 --> @AlgoClaw commented on GitHub (Apr 10, 2024): Can this issue be renamed to "fine tuning support"? (instead of "fune")
Author
Owner

@eokic commented on GitHub (Apr 11, 2024):

Best I can do is "fun-tuning support"

<!-- gh-comment-id:2049072014 --> @eokic commented on GitHub (Apr 11, 2024): Best I can do is "fun-tuning support"
Author
Owner

@KSemenenko commented on GitHub (May 4, 2024):

Can’t wait

<!-- gh-comment-id:2094301892 --> @KSemenenko commented on GitHub (May 4, 2024): Can’t wait
Author
Owner

@Chukarslan commented on GitHub (Jun 7, 2024):

Any updates on this?

<!-- gh-comment-id:2155525308 --> @Chukarslan commented on GitHub (Jun 7, 2024): Any updates on this?
Author
Owner

@KSemenenko commented on GitHub (Jul 20, 2024):

Can’t wait for this functionality :)

<!-- gh-comment-id:2241151073 --> @KSemenenko commented on GitHub (Jul 20, 2024): Can’t wait for this functionality :)
Author
Owner

@Chukarslan commented on GitHub (Aug 4, 2024):

Following intently, we're forced to use Llamafactory rn and we really don't want to :(

<!-- gh-comment-id:2267221344 --> @Chukarslan commented on GitHub (Aug 4, 2024): Following intently, we're forced to use Llamafactory rn and we really don't want to :(
Author
Owner

@chigkim commented on GitHub (Aug 9, 2024):

Be able to finetune with Ollama would be amazing.
Maybe let users feed messages in json with OpenAI format, and Ollama takes care of all the chat formatting according to the model template, and finetune based on them?

[
	{
		"role": "system",
		"content": "You are a helpful assistant."
	},
	{
		"role": "user",
		"content": "Hello!"
	}
	{
		"role": "assistant",
		"content": "Hi, how can I help you?"
	},
	...
]
<!-- gh-comment-id:2276921328 --> @chigkim commented on GitHub (Aug 9, 2024): Be able to finetune with Ollama would be amazing. Maybe let users feed messages in json with OpenAI format, and Ollama takes care of all the chat formatting according to the model template, and finetune based on them? ``` [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } { "role": "assistant", "content": "Hi, how can I help you?" }, ... ] ```
Author
Owner

@savejeff commented on GitHub (Sep 28, 2024):

Would also by highly interested in a straightforward fine-tuning process.
I would like to fine-tune on a large software project or not public framework so that it can give detailed and insightful answers based on the code. for extension of that project etc. it would know about Coding style, Practices used and code styling.

<!-- gh-comment-id:2380583202 --> @savejeff commented on GitHub (Sep 28, 2024): Would also by highly interested in a straightforward fine-tuning process. I would like to fine-tune on a large software project or not public framework so that it can give detailed and insightful answers based on the code. for extension of that project etc. it would know about Coding style, Practices used and code styling.
Author
Owner

@insidesecurity-yhojann-aguilera commented on GitHub (Oct 4, 2024):

It seemed like a very interesting option considering that Ollama is a high-level system. If I have to train models manually, then it's not worth having an automated manager if you're going to have to touch code anyway. In a professional or corporate environment, you can't opt ​​for a hybrid like Ollama. If you have qualified personnel to perform low-level tuning (out of obligation), then there's no need to have the mechanism for exposing the execution of a high-level model.

Have a fork with this feature?

<!-- gh-comment-id:2394485491 --> @insidesecurity-yhojann-aguilera commented on GitHub (Oct 4, 2024): It seemed like a very interesting option considering that Ollama is a high-level system. If I have to train models manually, then it's not worth having an automated manager if you're going to have to touch code anyway. In a professional or corporate environment, you can't opt ​​for a hybrid like Ollama. If you have qualified personnel to perform low-level tuning (out of obligation), then there's no need to have the mechanism for exposing the execution of a high-level model. Have a fork with this feature?
Author
Owner

@hg0428 commented on GitHub (Oct 8, 2024):

I'd love to see support for this coming soon.

<!-- gh-comment-id:2398416625 --> @hg0428 commented on GitHub (Oct 8, 2024): I'd love to see support for this coming soon.
Author
Owner

@KSemenenko commented on GitHub (Oct 16, 2024):

Yah, it will be amazing!

<!-- gh-comment-id:2417616535 --> @KSemenenko commented on GitHub (Oct 16, 2024): Yah, it will be amazing!
Author
Owner

@coxfrederic commented on GitHub (Feb 5, 2025):

This would be gold! Following, it is very hard to finetune a model with own data and the RAG solutions via PDF's are also not easy or reliable. If this would work better through Ollama, that would really increase the quality of the product.

<!-- gh-comment-id:2636821407 --> @coxfrederic commented on GitHub (Feb 5, 2025): This would be gold! Following, it is very hard to finetune a model with own data and the RAG solutions via PDF's are also not easy or reliable. If this would work better through Ollama, that would really increase the quality of the product.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46569