[GH-ISSUE #1193] FR: PEFT and QLoRA adapter loading, huggingface transformers load balancer #26367

Closed
opened 2026-04-22 02:37:09 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @loopyd on GitHub (Nov 18, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1193

This discussion is on the validity of adding PEFT fine tuning abilities and QLoRA adapter loading to ollama. Allowing expandable tensor level knowledgebases to be added via ollama model configuration or via API calls, and to trigger training via API to bake adapters using PEFT. This feature request also includes the transformers load balancer from HuggingFace.

These options are enabled in projects such as oobabooga's text-generation-webui

As LLaMA 2 was the original experiment for QLoRA automating adapter loading in ollama is a much desired feature for performant modular knowledgebase extensions.

If you are not aware of Parameter Efficient Fine tuning or QLoRA adapters see this resource

Addition of QLoRA is proposed based upon its:

  • performance
  • convenience to apply modular fine tuned knowledgebases.
  • Accessability to fine tune large language models on lower end hardware.
  • works on GGUF and LLama based models.
  • has been extended to work on a wider range of models, such as GPTQ
  • Automation API is mature.

Please add any feedback on including this feature below.

Originally created by @loopyd on GitHub (Nov 18, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1193 This discussion is on the validity of adding PEFT fine tuning abilities and QLoRA adapter loading to ollama. Allowing expandable tensor level knowledgebases to be added via ollama model configuration or via API calls, and to trigger training via API to bake adapters using PEFT. This feature request also includes the transformers load balancer from HuggingFace. These options are enabled in projects such as [oobabooga's text-generation-webui](https://github.com/oobabooga/text-generation-webui) As LLaMA 2 was the original experiment for QLoRA automating adapter loading in ollama is a much desired feature for performant modular knowledgebase extensions. If you are not aware of Parameter Efficient Fine tuning or QLoRA adapters see [this resource](https://github.com/artidoro/qlora) Addition of QLoRA is proposed based upon its: - performance - convenience to apply modular fine tuned knowledgebases. - Accessability to fine tune large language models on lower end hardware. - works on GGUF and LLama based models. - has been extended to work on a wider range of models, such as GPTQ - Automation API is mature. Please add any feedback on including this feature below.
GiteaMirror added the feature request label 2026-04-22 02:37:09 -05:00
Author
Owner

@runvnc commented on GitHub (Nov 25, 2023):

Sounds amazing. How do we use this with our qLoRAs?

<!-- gh-comment-id:1826261741 --> @runvnc commented on GitHub (Nov 25, 2023): Sounds amazing. How do we use this with our qLoRAs?
Author
Owner

@BruceMacD commented on GitHub (Mar 11, 2024):

Hi @loopyd thanks for you feedback here. Ollama now support loading QLoRA adapters via the ADAPTER in a modelfile:
https://github.com/ollama/ollama/blob/main/docs/modelfile.md#adapter

For PEFT fine-tuning I'm going to start organizing fine-tuning requests around #156 to keep things organized.

Feel free to add more feedback if you have any.

<!-- gh-comment-id:1989109732 --> @BruceMacD commented on GitHub (Mar 11, 2024): Hi @loopyd thanks for you feedback here. Ollama now support loading QLoRA adapters via the `ADAPTER` in a modelfile: https://github.com/ollama/ollama/blob/main/docs/modelfile.md#adapter For PEFT fine-tuning I'm going to start organizing fine-tuning requests around #156 to keep things organized. Feel free to add more feedback if you have any.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26367