[GH-ISSUE #2873] Improvement suggestion: "Recommended" and brief explanation on ollama.com/library #48267

Open
opened 2026-04-28 07:30:08 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @ewebgh33 on GitHub (Mar 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2873

Hi

Would you consider adding to the website some features like

  • select your GPU (dropdown box). This can help filter suitable models. Personally I have 2x4090 so can run 70b models, so for me maybe a filter shows me the entire list (or close to) but would be more useful to someone with 8gb.
  • also adding a brief explanation (at the top) of the models for those people who aren't deep into reading model blogs daily.

For example, I went to get pull phind-codellama.
Then I discovered on the tags tab, there are 49 options, and two of these are the same (latest and 34b). Also that v2 which reportedly has more training is not the default.

Do I need to care about all these other versions? Depends on my needs or GPU. Ok, can I filter by accuracy and VRAM? IE what's the most capable that I can run?
Or can I filter by speed and accuracy? Etc.

For a lot of pages I imagine this could just be a boilerplate reminder.
What does K, KM, KS mean? Etc. Is lower number better or higher? Honestly, I have a lot of models and have been playing with LLMs a lot and I still can't keep track of what abbreviation and quantize level is what.

Anyway I think this would be very helpful to a lot of people. I'd volunteer to help with this but clearly I don't have a deep understanding of the tradeoffs of all the quantized versions!

Originally created by @ewebgh33 on GitHub (Mar 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2873 Hi Would you consider adding to the website some features like - select your GPU (dropdown box). This can help filter suitable models. Personally I have 2x4090 so can run 70b models, so for me maybe a filter shows me the entire list (or close to) but would be more useful to someone with 8gb. - also adding a brief explanation (at the top) of the models for those people who aren't deep into reading model blogs daily. For example, I went to get pull phind-codellama. Then I discovered on the tags tab, there are 49 options, and two of these are the same (latest and 34b). Also that v2 which reportedly has more training is not the default. Do I need to care about all these other versions? Depends on my needs or GPU. Ok, can I filter by accuracy and VRAM? IE what's the most capable that I can run? Or can I filter by speed and accuracy? Etc. For a lot of pages I imagine this could just be a boilerplate reminder. What does K, KM, KS mean? Etc. Is lower number better or higher? Honestly, I have a lot of models and have been playing with LLMs a lot and I still can't keep track of what abbreviation and quantize level is what. Anyway I think this would be very helpful to a lot of people. I'd volunteer to help with this but clearly I don't have a deep understanding of the tradeoffs of all the quantized versions!
GiteaMirror added the ollama.comfeature request labels 2026-04-28 07:30:08 -05:00
Author
Owner

@ewebgh33 commented on GitHub (Mar 2, 2024):

Good example is TheBloke on Huggingface
And speaking of, I can't work out the correlation of models here.
https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF

The default pull for this model in Ollama is 19gb, what quantize level is this? It's got no suffix at all, but it's smaller than the Q5, Q6, Q8 models that are supposed to have less loss. What loss does the 19gb model have and why is it selected as the default please? Is this explained in the docs somewhere, how you choose what version of a model is default?

################################
Edit:
I've found some information on a different model page. I guess there's just a lot to do and a small team, my apologies.

Example:
https://ollama.com/library/wizardlm

This model page has the brief explanation I was talking about (though in much less detail than TheBloke).
And it explains that the default for Ollama is 4-bit.

So if I have a larger GPU I could try a 5 or 6 bit? Will see how it goes.

<!-- gh-comment-id:1974435440 --> @ewebgh33 commented on GitHub (Mar 2, 2024): Good example is TheBloke on Huggingface And speaking of, I can't work out the correlation of models here. https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF The default pull for this model in Ollama is 19gb, what quantize level is this? It's got no suffix at all, but it's smaller than the Q5, Q6, Q8 models that are supposed to have less loss. What loss does the 19gb model have and why is it selected as the default please? Is this explained in the docs somewhere, how you choose what version of a model is default? ################################ **Edit:** I've found some information on a different model page. I guess there's just a lot to do and a small team, my apologies. Example: https://ollama.com/library/wizardlm This model page has the brief explanation I was talking about (though in much less detail than TheBloke). And it explains that the default for Ollama is 4-bit. So if I have a larger GPU I could try a 5 or 6 bit? Will see how it goes.
Author
Owner

@pdevine commented on GitHub (Mar 4, 2024):

We definitely need to make it easier for users to not have to wade through dozens of potential tags with each of the different quantizations. I have been thinking of some different ways to do this, and hopefully we'll have something in the next few weeks.

I think ideally you would just choose the quantization level that you want to run with, and ollama would take care of the rest; no need to ever look at the different tags. In the case where there is a model which doesn't have the quantization level that you wanted, it would just fall back to a different quantization level.

That said, Ollama defaults right now to Q4_0 for each model. We chose that because it was a decent balance between model performance and what kinds of systems most people have (most people don't have 2x4090s!).

<!-- gh-comment-id:1975971865 --> @pdevine commented on GitHub (Mar 4, 2024): We definitely need to make it easier for users to not have to wade through dozens of potential tags with each of the different quantizations. I have been thinking of some different ways to do this, and hopefully we'll have something in the next few weeks. I think ideally you would just choose the quantization level that you want to run with, and ollama would take care of the rest; no need to ever look at the different tags. In the case where there is a model which doesn't have the quantization level that you wanted, it would just fall back to a different quantization level. That said, Ollama defaults right now to Q4_0 for each model. We chose that because it was a decent balance between model performance and what kinds of systems most people have (most people don't have 2x4090s!).
Author
Owner

@ewebgh33 commented on GitHub (Mar 4, 2024):

Thanks for the reply, appreciate it.

Q4 is a good medium as you said, it just took me some digging to find out that's what you guys were doing. And I get it, models every day, trying to keep up is like running in place!

<!-- gh-comment-id:1976302702 --> @ewebgh33 commented on GitHub (Mar 4, 2024): Thanks for the reply, appreciate it. Q4 is a good medium as you said, it just took me some digging to find out that's what you guys were doing. And I get it, models every day, trying to keep up is like running in place!
Author
Owner

@gaardhus commented on GitHub (Jun 17, 2024):

For browsing tags I created a small userscript that let's you filter the long list of tags:

// ==UserScript==
// @name             OllamaFilter
// @match            *://ollama.com/library/*/tags
// @version          1.0
// @author           gaardhus
// ==/UserScript==

let tagsContainer = document.querySelector("div.px-4:nth-child(1)");
tagsContainer.style.alignItems = "center";
let input = document.createElement("input");
input.placeholder = "Filter tags";
input.style.marginLeft = "10px";
input.classList = "w-full resize-none rounded-lg py-1.5 pr-10 text-sm border-gray-200";
tagsContainer.appendChild(input);

input.addEventListener("change", filterTags);

function filterTags(event) {
  let tags = document.querySelectorAll("div.px-4:nth-child(n+2)");
  console.log(tags);
  console.log(event.target.value);
  tags.forEach((element) => {
    let tag = element.querySelector("a > div");
    if (tag.textContent.includes(event.target.value)) {
      element.style.display = "block";
    } else {
      element.style.display = "none";
    }
  });
}
<!-- gh-comment-id:2173235657 --> @gaardhus commented on GitHub (Jun 17, 2024): For browsing tags I created a small userscript that let's you filter the long list of tags: ```javascript // ==UserScript== // @name OllamaFilter // @match *://ollama.com/library/*/tags // @version 1.0 // @author gaardhus // ==/UserScript== let tagsContainer = document.querySelector("div.px-4:nth-child(1)"); tagsContainer.style.alignItems = "center"; let input = document.createElement("input"); input.placeholder = "Filter tags"; input.style.marginLeft = "10px"; input.classList = "w-full resize-none rounded-lg py-1.5 pr-10 text-sm border-gray-200"; tagsContainer.appendChild(input); input.addEventListener("change", filterTags); function filterTags(event) { let tags = document.querySelectorAll("div.px-4:nth-child(n+2)"); console.log(tags); console.log(event.target.value); tags.forEach((element) => { let tag = element.querySelector("a > div"); if (tag.textContent.includes(event.target.value)) { element.style.display = "block"; } else { element.style.display = "none"; } }); } ```
Author
Owner

@gwillen commented on GitHub (Jun 22, 2024):

I think ideally you would just choose the quantization level that you want to run with, and ollama would take care of the rest; no need to ever look at the different tags. In the case where there is a model which doesn't have the quantization level that you wanted, it would just fall back to a different quantization level.
That said, Ollama defaults right now to Q4_0 for each model. We chose that because it was a decent balance between model performance and what kinds of systems most people have (most people don't have 2x4090s!).

Is there any plan to shift to Q4_K_S as default? Considering it has the similar file size and speed, but with better perplexity and output quality (specifically for 7B models).

I came here to say the same thing -- as far as I know, q4_0 is considered deprecated, and q4_K_S is strictly superior (better output, same size). I don't think it makes sense to keep defaulting to q4_0 for models where q4_K_S is available instead. (And arguably you might even want to default to q4_K_M, which is very slightly larger but usually significantly higher quality. But at least q4_K_S should be a no-cost improvement over q4_0.)

<!-- gh-comment-id:2184132185 --> @gwillen commented on GitHub (Jun 22, 2024): > > I think ideally you would just choose the quantization level that you want to run with, and ollama would take care of the rest; no need to ever look at the different tags. In the case where there is a model which doesn't have the quantization level that you wanted, it would just fall back to a different quantization level. > > That said, Ollama defaults right now to Q4_0 for each model. We chose that because it was a decent balance between model performance and what kinds of systems most people have (most people don't have 2x4090s!). > > Is there any plan to shift to Q4_K_S as default? Considering it has the similar file size and speed, but with better perplexity and output quality (specifically for 7B models). I came here to say the same thing -- as far as I know, q4_0 is considered deprecated, and q4_K_S is strictly superior (better output, same size). I don't think it makes sense to keep defaulting to q4_0 for models where q4_K_S is available instead. (And arguably you might even want to default to q4_K_M, which is very slightly larger but usually significantly higher quality. But at least q4_K_S should be a no-cost improvement over q4_0.)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48267