[GH-ISSUE #2415] Provide logits or logprobs in the API #63445

New Issue

GiteaMirror · 2026-05-03T13:29:48-05:00

GiteaMirror commented

2026-05-03 13:29:48 -05:00

Originally created by @freQuensy23-coder on GitHub (Feb 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2415

Originally assigned to: @BruceMacD, @ParthSareen on GitHub.

Feature request:
How can i get logits (probabilites of each next token), during generation, just like I can do it in Open AI API (logprobs)? This feature will be helpfull for apps, that use logprobs to measure model avareness and confidence.

Originally created by @freQuensy23-coder on GitHub (Feb 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2415 Originally assigned to: @BruceMacD, @ParthSareen on GitHub. Feature request: How can i get logits (probabilites of each next token), during generation, just like I can do it in Open AI API (logprobs)? This feature will be helpfull for apps, that use logprobs to measure model avareness and confidence.

GiteaMirror added the feature request api labels 2026-05-03 13:29:50 -05:00

GiteaMirror closed this issue

2026-05-03 13:29:51 -05:00

GiteaMirror commented

2026-05-03 13:29:53 -05:00

@neychevr commented on GitHub (Mar 25, 2024):

Hello! Seems like a really good feature for more complex usage of Ollama.
Is this feature already WIP, or contribution would be fine?

UPD: seems there is already a pending PR with this feature implemented: https://github.com/ollama/ollama/pull/1640
Could we help somehow to speed up the merge? :)

@neychevr commented on GitHub (Mar 25, 2024): Hello! Seems like a really good feature for more complex usage of Ollama. Is this feature already WIP, or contribution would be fine? UPD: seems there is already a pending PR with this feature implemented: https://github.com/ollama/ollama/pull/1640 Could we help somehow to speed up the merge? :)

GiteaMirror commented

2026-05-03 13:29:54 -05:00

@josiahbryan commented on GitHub (Apr 19, 2024):

This would be super helpful to some ongoing research work I'm doing. Does anyone know of any providers that DO return logprobs, other than OpenAI of course? Any ETA when this might land here in ollama?

@josiahbryan commented on GitHub (Apr 19, 2024): This would be super helpful to some ongoing research work I'm doing. Does anyone know of any providers that DO return logprobs, other than OpenAI of course? Any ETA when this might land here in ollama?

GiteaMirror commented

2026-05-03 13:29:55 -05:00

@mateon1 commented on GitHub (Apr 27, 2024):

I would also like to have this, I'm interested in having both echo + logprobs, so I can get information about the prompt too, instead of just the completion. Right now I'm using very small models with pytorch to compute logits directly, but that's really slow.

@mateon1 commented on GitHub (Apr 27, 2024): I would also like to have this, I'm interested in having both echo + logprobs, so I can get information about the prompt too, instead of just the completion. Right now I'm using very small models with pytorch to compute logits directly, but that's really slow.

GiteaMirror commented

2026-05-03 13:29:56 -05:00

@magic-YuanTian commented on GitHub (May 2, 2024):

Any updates?

@magic-YuanTian commented on GitHub (May 2, 2024): Any updates?

GiteaMirror commented

2026-05-03 13:29:57 -05:00

@briancleland commented on GitHub (May 6, 2024):

Any updates?

https://github.com/ollama/ollama/pull/1640#issuecomment-2043381653

@briancleland commented on GitHub (May 6, 2024): > Any updates? https://github.com/ollama/ollama/pull/1640#issuecomment-2043381653

GiteaMirror commented

2026-05-03 13:30:00 -05:00

@SharmaM-dev commented on GitHub (Jul 14, 2024):

Any updates?

@SharmaM-dev commented on GitHub (Jul 14, 2024): Any updates?

GiteaMirror commented

2026-05-03 13:30:01 -05:00

@drdsgvo commented on GitHub (Jul 29, 2024):

Are there any updates on this very important issue? To not implement logits is not a valid solution. Anyone (including me) who needs logits will move from Ollama to a different solution! Please be aware of that.

@drdsgvo commented on GitHub (Jul 29, 2024): Are there any updates on this very important issue? To not implement logits is not a valid solution. Anyone (including me) who needs logits will move from Ollama to a different solution! Please be aware of that.

GiteaMirror commented

2026-05-03 13:30:02 -05:00

@The-Inscrutable-X commented on GitHub (Aug 9, 2024):

support, would be very nice

@The-Inscrutable-X commented on GitHub (Aug 9, 2024): support, would be very nice

GiteaMirror commented

2026-05-03 13:30:02 -05:00

@moritz-gross commented on GitHub (Aug 30, 2024):

I'm surprised this is not one of the first things implemented 🤔

@moritz-gross commented on GitHub (Aug 30, 2024): I'm surprised this is not one of the first things implemented 🤔

GiteaMirror commented

2026-05-03 13:30:03 -05:00

@haukelicht commented on GitHub (Sep 11, 2024):

Hi there,

any progress on integrating this feature request?

@haukelicht commented on GitHub (Sep 11, 2024): Hi there, any progress on integrating this feature request?

GiteaMirror commented

2026-05-03 13:30:04 -05:00

@szocsbarni commented on GitHub (Sep 19, 2024):

Hi, is there a timeline available for integration?

@szocsbarni commented on GitHub (Sep 19, 2024): Hi, is there a timeline available for integration?

GiteaMirror commented

2026-05-03 13:30:05 -05:00

@mommi84 commented on GitHub (Sep 19, 2024):

Hi, is there a timeline available for integration?

We are going to get AGI before this, 100%!

@mommi84 commented on GitHub (Sep 19, 2024): > Hi, is there a timeline available for integration? We are going to get AGI before this, 100%!

GiteaMirror commented

2026-05-03 13:30:06 -05:00

@SharmaM-dev commented on GitHub (Sep 19, 2024):

[laugh] Mridul Sharma reacted to your message:

From: Tom Soru @.>
Sent: Thursday, September 19, 2024 12:53:37 PM
To: ollama/ollama @.>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)

Hi, is there a timeline available for integration?

We are going to get AGI before this, 100%!

—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-2360907875, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUUYCRVXZP6V33UKE4LZXLCNDAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRQHEYDOOBXGU.
You are receiving this because you commented.Message ID: @.***>

@SharmaM-dev commented on GitHub (Sep 19, 2024): [laugh] Mridul Sharma reacted to your message: ________________________________ From: Tom Soru ***@***.***> Sent: Thursday, September 19, 2024 12:53:37 PM To: ollama/ollama ***@***.***> Cc: Mridul Sharma ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415) Hi, is there a timeline available for integration? We are going to get AGI before this, 100%! — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2415#issuecomment-2360907875>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6MENUUYCRVXZP6V33UKE4LZXLCNDAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRQHEYDOOBXGU>. You are receiving this because you commented.Message ID: ***@***.***>

GiteaMirror commented

2026-05-03 13:30:07 -05:00

@martinkozle commented on GitHub (Sep 19, 2024):

We are going to get GTA 6 before this.

@martinkozle commented on GitHub (Sep 19, 2024): We are going to get GTA 6 before this.

GiteaMirror commented

2026-05-03 13:30:08 -05:00

@latent-variable commented on GitHub (Oct 16, 2024):

man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.

@latent-variable commented on GitHub (Oct 16, 2024): man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.

GiteaMirror commented

2026-05-03 13:30:10 -05:00

@josiahbryan commented on GitHub (Oct 16, 2024):

I've given up hope and switched back to llama.cpp for production inference.
Using it with ramalama which can pull from the ollama model library.

Really disappointed that the maintainers here show such disregard for such
a huge community request.

Makes me want to make sure I don't use the project in any way. If an
obvious thing like this is being totally ignored by the maintainers, then
it shows they don't really care much about what the community is asking
for.

On Tue, Oct 15, 2024, 11:39 PM Lino Valdovinos @.***>
wrote:

man, I really need this for to implement an CoT-Decoding pipeline for the
open-webui. I guess I'll go back to playing Sparking Zero.

—
Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2415#issuecomment-2415716327,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABEZELGRV4XP267QL3XM4ODZ3XUZLAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVG4YTMMZSG4
.
You are receiving this because you commented.Message ID:
@.***>

@josiahbryan commented on GitHub (Oct 16, 2024): I've given up hope and switched back to llama.cpp for production inference. Using it with ramalama which can pull from the ollama model library. Really disappointed that the maintainers here show such disregard for such a huge community request. Makes me want to make sure I don't use the project in any way. If an obvious thing like this is being totally ignored by the maintainers, then it shows they don't really care much about what the community is asking for. On Tue, Oct 15, 2024, 11:39 PM Lino Valdovinos ***@***.***> wrote: > man, I really need this for to implement an CoT-Decoding pipeline for the > open-webui. I guess I'll go back to playing Sparking Zero. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2415#issuecomment-2415716327>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABEZELGRV4XP267QL3XM4ODZ3XUZLAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVG4YTMMZSG4> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

GiteaMirror commented

2026-05-03 13:30:12 -05:00

@briancleland commented on GitHub (Oct 16, 2024):

@jmorganca @bmizerany Has the team given up on implementing this feature?

@briancleland commented on GitHub (Oct 16, 2024): @jmorganca @bmizerany Has the team given up on implementing this feature?

GiteaMirror commented

2026-05-03 13:30:14 -05:00

@NumberChiffre commented on GitHub (Oct 19, 2024):

Pls make this happen lol, as a painful user on Mac :(

@NumberChiffre commented on GitHub (Oct 19, 2024): Pls make this happen lol, as a painful user on Mac :(

GiteaMirror commented

2026-05-03 13:30:15 -05:00

@josiahbryan commented on GitHub (Oct 19, 2024):

@jmorganca @bmizerany you guys broadcast your partnership with Hugging Face - great! What about this though? This seems like less than 1/10th the effort - why are you ignoring everyone asking for input here? Why don't you at least provide a timeline?

@josiahbryan commented on GitHub (Oct 19, 2024): @jmorganca @bmizerany you guys broadcast your partnership with Hugging Face - great! What about this though? This seems like less than 1/10th the effort - why are you ignoring everyone asking for input here? Why don't you at least provide a timeline?

GiteaMirror commented

2026-05-03 13:30:16 -05:00

@athmanar commented on GitHub (Oct 22, 2024):

insane that this not given as an option? maybe better to switch to pure huggingface models

@athmanar commented on GitHub (Oct 22, 2024): insane that this not given as an option? maybe better to switch to pure huggingface models

GiteaMirror commented

2026-05-03 13:30:17 -05:00

@codelion commented on GitHub (Oct 28, 2024):

man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.

Cot decoding and entropy decoding are available in optillm - https://github.com/codelion/optillm

@codelion commented on GitHub (Oct 28, 2024): > man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero. Cot decoding and entropy decoding are available in optillm - https://github.com/codelion/optillm

GiteaMirror commented

2026-05-03 13:30:17 -05:00

@Cy-Fi commented on GitHub (Oct 29, 2024):

Ollama will stop being an option for us if crucial features like this are not beeing implemented...

@Cy-Fi commented on GitHub (Oct 29, 2024): Ollama will stop being an option for us if crucial features like this are not beeing implemented...

GiteaMirror commented

2026-05-03 13:30:18 -05:00

@magic-YuanTian commented on GitHub (Nov 3, 2024):

For such a simple but important feature, the team demonstrates unexpected arrogance and ignorance over such a long time. I think this is a red flag for us to give up using Ollama as LLM backend since they cannot go far for sure.

@magic-YuanTian commented on GitHub (Nov 3, 2024): For such a simple but important feature, the team demonstrates unexpected arrogance and ignorance over such a long time. I think this is a red flag for us to give up using Ollama as LLM backend since they cannot go far for sure.

GiteaMirror commented

2026-05-03 13:30:19 -05:00

@codelion commented on GitHub (Nov 3, 2024):

I have implemented it in PyTorch if anyone is looking for it they can use the following colab - https://colab.research.google.com/drive/1zPv47_tog2_KOFJY-WJxwPYR6mgoxKlK?usp=sharing

Here is the discussion on optillm where it was brought up as well - https://github.com/codelion/optillm/discussions/82

@codelion commented on GitHub (Nov 3, 2024): I have implemented it in PyTorch if anyone is looking for it they can use the following colab - https://colab.research.google.com/drive/1zPv47_tog2_KOFJY-WJxwPYR6mgoxKlK?usp=sharing Here is the discussion on optillm where it was brought up as well - https://github.com/codelion/optillm/discussions/82

GiteaMirror commented

2026-05-03 13:30:20 -05:00

@codelion commented on GitHub (Nov 13, 2024):

Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.

@codelion commented on GitHub (Nov 13, 2024): Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.

GiteaMirror commented

2026-05-03 13:30:21 -05:00

@jooray commented on GitHub (Nov 13, 2024):

This would be very useful for enforcing the structure of output (output_cls with langchain, that currently works with llama.cpp and hugging face).

It can reject tokens that would break the output structure. Very useful for tool calling as well.

@jooray commented on GitHub (Nov 13, 2024): This would be very useful for enforcing the structure of output (output_cls with langchain, that currently works with llama.cpp and hugging face). It can reject tokens that would break the output structure. Very useful for tool calling as well.

GiteaMirror commented

2026-05-03 13:30:22 -05:00

@drdsgvo commented on GitHub (Nov 13, 2024):

Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.

Great to hear that we have a cool alternative to ollama as those guys are not doing what needs to be done!

@drdsgvo commented on GitHub (Nov 13, 2024): > Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code. Great to hear that we have a cool alternative to ollama as those guys are not doing what needs to be done!

GiteaMirror commented

2026-05-03 13:30:23 -05:00

@jooray commented on GitHub (Nov 14, 2024):

I would suggest you being more kind. Ollama is an open source project, they are not working for you. Feel free to offer a bounty to implement this, or create a pull request.

I would really like to see this implemented, but that does not mean I have to be mean to authors of a software I get for free. And being an ass in comments (which many of you are here) is not very motivating for developers either.

@jooray commented on GitHub (Nov 14, 2024): I would suggest you being more kind. Ollama is an open source project, they are not working for you. Feel free to offer a bounty to implement this, or create a pull request. I would really like to see this implemented, but that does not mean I have to be mean to authors of a software I get for free. And being an ass in comments (which many of you are here) is not very motivating for developers either.

GiteaMirror commented

2026-05-03 13:30:24 -05:00

@ParthSareen commented on GitHub (Dec 9, 2024):

Hey everyone! Sorry for the delay and no updates here - going to be picking this up soon and hopefully getting it in early Jan. There's been a ton of changes on the API even more so coming on the inference engine layer so just need to be a bit careful as it would be an API addition but it is something we want to support!

@ParthSareen commented on GitHub (Dec 9, 2024): Hey everyone! Sorry for the delay and no updates here - going to be picking this up soon and hopefully getting it in early Jan. There's been a ton of changes on the API even more so coming on the inference engine layer so just need to be a bit careful as it would be an API addition but it is something we want to support!

GiteaMirror commented

2026-05-03 13:30:25 -05:00

@ParthSareen commented on GitHub (Dec 13, 2024):

Hey folks would like to get your thoughts:
Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?

Working on designing the API and getting the functionality right along with that :) Appreciate your patience!

@ParthSareen commented on GitHub (Dec 13, 2024): Hey folks would like to get your thoughts: Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why? Working on designing the API and getting the functionality right along with that :) Appreciate your patience!

GiteaMirror commented

2026-05-03 13:30:26 -05:00

@josiahbryan commented on GitHub (Dec 13, 2024):

Personally would prefer log probs, just because all my tooling is setup for
that and used to thinking in logorobs haha

On Thu, Dec 12, 2024, 7:00 PM Parth Sareen @.***> wrote:

Hey folks would like to get your thoughts:
Would you care if it was logits vs logprobs? Would you prefer one over the
other? If so why?

Working on designing the API and getting the functionality right along
with that :) Appreciate your patience!

—
Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2415#issuecomment-2540310646,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABEZELA4VG6OFOCJ6QPM7TT2FIWTJAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBQGMYTANRUGY
.
You are receiving this because you commented.Message ID:
@.***>

@josiahbryan commented on GitHub (Dec 13, 2024): Personally would prefer log probs, just because all my tooling is setup for that and used to thinking in logorobs haha On Thu, Dec 12, 2024, 7:00 PM Parth Sareen ***@***.***> wrote: > Hey folks would like to get your thoughts: > Would you care if it was logits vs logprobs? Would you prefer one over the > other? If so why? > > Working on designing the API and getting the functionality right along > with that :) Appreciate your patience! > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2415#issuecomment-2540310646>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABEZELA4VG6OFOCJ6QPM7TT2FIWTJAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBQGMYTANRUGY> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

GiteaMirror commented

2026-05-03 13:30:27 -05:00

@martinkozle commented on GitHub (Dec 13, 2024):

But if you want to calculate the probability of the LLM generating "yes" or "no" for example you would either have to use constrained generation with logprobs, where only those 2 tokens will be non 0. Or you can directly use logits and do the constraining yourself.

@martinkozle commented on GitHub (Dec 13, 2024): But if you want to calculate the probability of the LLM generating "yes" or "no" for example you would either have to use constrained generation with logprobs, where only those 2 tokens will be non 0. Or you can directly use logits and do the constraining yourself.

GiteaMirror commented

2026-05-03 13:30:28 -05:00

@mommi84 commented on GitHub (Dec 13, 2024):

Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?

Definitely logprobs so that it doesn't deviate from the OpenAI standards (see examples here) and the following can be supported:

for token in response.choices[0].logprobs.content:
    for top_logprob in token.top_logprobs:
        print((top_logprob.token, np.exp(top_logprob.logprob)))

@mommi84 commented on GitHub (Dec 13, 2024): > Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why? Definitely logprobs so that it doesn't deviate from the OpenAI standards ([see examples here](https://cookbook.openai.com/examples/using_logprobs)) and the following can be supported: ```python for token in response.choices[0].logprobs.content: for top_logprob in token.top_logprobs: print((top_logprob.token, np.exp(top_logprob.logprob))) ```

GiteaMirror commented

2026-05-03 13:30:30 -05:00

@ParthSareen commented on GitHub (Dec 13, 2024):

Thanks @josiahbryan, @martinkozle, @mommi84. I also think it makes more sense to have logprobs for now and then happy to re-evaluate when doing the new engine. Have some fun plans for sampling :)

@ParthSareen commented on GitHub (Dec 13, 2024): Thanks @josiahbryan, @martinkozle, @mommi84. I also think it makes more sense to have logprobs for now and then happy to re-evaluate when doing the new engine. Have some fun plans for sampling :)

GiteaMirror commented

2026-05-03 13:30:31 -05:00

@Elimane0800 commented on GitHub (Dec 15, 2024):

Hey folks would like to get your thoughts:

Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?

Working on designing the API and getting the functionality right along with that :) Appreciate your patience!

Personally I prefer logits because we can obtain ourselves a kind of logprobs if we have logits. Plus when trying tasks such as distillation or uncertainty observation, logits are more interesting to have. So please add logits 🙏, we can do the logprobs calculations by ourselves

@Elimane0800 commented on GitHub (Dec 15, 2024): > Hey folks would like to get your thoughts: > > Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why? > > > > Working on designing the API and getting the functionality right along with that :) Appreciate your patience! Personally I prefer logits because we can obtain ourselves a kind of logprobs if we have logits. Plus when trying tasks such as distillation or uncertainty observation, logits are more interesting to have. So please add logits 🙏, we can do the logprobs calculations by ourselves

GiteaMirror commented

2026-05-03 13:30:34 -05:00

@Elimane0800 commented on GitHub (Dec 15, 2024):

To be more precise : Logprobs can be derived directly from logits through a softmax followed by a logarithmic transformation. This kind of operation is neither the most difficult nor the most time consuming. It's easy to get logprobs from logits but not that easy to get logits from logprobs except if you add a scale constant to get an approximation of the logits value.

@Elimane0800 commented on GitHub (Dec 15, 2024): To be more precise : Logprobs can be derived directly from logits through a softmax followed by a logarithmic transformation. This kind of operation is neither the most difficult nor the most time consuming. It's easy to get logprobs from logits but not that easy to get logits from logprobs except if you add a scale constant to get an approximation of the logits value.

GiteaMirror commented

2026-05-03 13:30:37 -05:00

@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024):

I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token.
Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.

@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024): I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token. Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.

GiteaMirror commented

2026-05-03 13:30:40 -05:00

@Elimane0800 commented on GitHub (Dec 20, 2024):

I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token.

Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.

For having already tried to calculate logprobs from logits with cpu i can say that it is definitely not the type of task that will run during 24h. So if the only matter is a question of time and cost of the operation I can tell that it is nor the most time consuming neither the most expensive operation.

@Elimane0800 commented on GitHub (Dec 20, 2024): > I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token. > > Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not. For having already tried to calculate logprobs from logits with cpu i can say that it is definitely not the type of task that will run during 24h. So if the only matter is a question of time and cost of the operation I can tell that it is nor the most time consuming neither the most expensive operation.

GiteaMirror commented

2026-05-03 13:30:42 -05:00

@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024):

There are two use cases:

we want to know the probabilities of the N most probable tokens
we want to know the probabilities or the logits of all the tokens in the vocabulary.

I think the first use case is the most requested today. To answer it effectively, the API has no choice but to give the logprobs (or probs) of the N most probable tokens. Indeed, if the API only provides the logits, then it must provide them for the entire vocabulary if we want to deduce the logprobs. With vocabulary sizes sometimes approaching 100k, this overloads the API's JSON response enormously.

@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024): There are two use cases: - we want to know the probabilities of the N most probable tokens - we want to know the probabilities or the logits of all the tokens in the vocabulary. I think the first use case is the most requested today. To answer it effectively, the API has no choice but to give the logprobs (or probs) of the N most probable tokens. Indeed, if the API only provides the logits, then it must provide them for the entire vocabulary if we want to deduce the logprobs. With vocabulary sizes sometimes approaching 100k, this overloads the API's JSON response enormously.

GiteaMirror commented

2026-05-03 13:30:48 -05:00

@ParthSareen commented on GitHub (Jan 3, 2025):

Hey folks, thanks for your patience! Going to be doing something like top k log probs for the API. Not saying no to logits, but given the comments, API design, and system constraints it just makes sense to do this first.

There's also some adjacent work going on right now as well in which there's some API refactoring/design to be done. Upon it's completion this will be one of the first things that go out :)

@ParthSareen commented on GitHub (Jan 3, 2025): Hey folks, thanks for your patience! Going to be doing something like top k log probs for the API. Not saying no to logits, but given the comments, API design, and system constraints it just makes sense to do this first. There's also some adjacent work going on right now as well in which there's some API refactoring/design to be done. Upon it's completion this will be one of the first things that go out :)

GiteaMirror commented

2026-05-03 13:30:55 -05:00

@OriginalGoku commented on GitHub (Jan 25, 2025):

any timeframe for this update?

@OriginalGoku commented on GitHub (Jan 25, 2025): any timeframe for this update?

GiteaMirror commented

2026-05-03 13:30:55 -05:00

@codelion commented on GitHub (Jan 25, 2025):

any timeframe for this update?

It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm

https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288

@codelion commented on GitHub (Jan 25, 2025): > any timeframe for this update? It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm - https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288

GiteaMirror commented

2026-05-03 13:30:56 -05:00

@ClaudiuCreanga commented on GitHub (Jan 30, 2025):

It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm

Provide logits or logprobs in the API #2415 (comment)

Seems like this one is an openai tooling, while we're interested in other models.

@ClaudiuCreanga commented on GitHub (Jan 30, 2025): > It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm > > * [Provide logits or logprobs in the API #2415 (comment)](https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288) Seems like this one is an openai tooling, while we're interested in other models.

GiteaMirror commented

2026-05-03 13:30:57 -05:00

@ParthSareen commented on GitHub (Jan 30, 2025):

Hey everyone - sorry for the delay, making my rounds right now.

It is not due to relying on llama.cpp, I did have a branch working back early in Jan. As you may know we've been working on a new go engine. All of us have been pretty heads down on that for the last month. There's going to be some bifurcation in running the new engine vs. current, which means that new API features would need to have parity on both engines. In order to do that, just need to make sure the behavior between the old and new engines are the same for the logprobs endpoint as well as tokenize/detokenize.

I've also been mainly working on our new sampling interfaces and the logprobs feature has been top of mind as well as I build it. I do hope to get to it soon I know it's super important to you all! The new engine is going to bring a lot of stability and maintainability throughout so we can support these kind of features much faster in the future.

@ParthSareen commented on GitHub (Jan 30, 2025): Hey everyone - sorry for the delay, making my rounds right now. It is not due to relying on llama.cpp, I did have a branch working back early in Jan. As you may know we've been working on a new go engine. All of us have been pretty heads down on that for the last month. There's going to be some bifurcation in running the new engine vs. current, which means that new API features would need to have parity on both engines. In order to do that, just need to make sure the behavior between the old and new engines are the same for the logprobs endpoint as well as tokenize/detokenize. I've also been mainly working on our new sampling interfaces and the logprobs feature has been top of mind as well as I build it. I do hope to get to it soon I know it's super important to you all! The new engine is going to bring a lot of stability and maintainability throughout so we can support these kind of features much faster in the future.

GiteaMirror commented

2026-05-03 13:30:58 -05:00

@codelion commented on GitHub (Jan 30, 2025):

It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm

Provide logits or logprobs in the API #2415 (comment)

Seems like this one is an openai tooling, while we're interested in other models.

This uses the same format and API as OpenAI but works for any model from hugging face. You can use it to build datasets for distillation like this - https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits

@codelion commented on GitHub (Jan 30, 2025): > > It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm > > > > * [Provide logits or logprobs in the API #2415 (comment)](https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288) > > Seems like this one is an openai tooling, while we're interested in other models. This uses the same format and API as OpenAI but works for any model from hugging face. You can use it to build datasets for distillation like this - https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits

GiteaMirror commented

2026-05-03 13:30:59 -05:00

@chaoyupeng commented on GitHub (Feb 18, 2025):

Hi, Ollama team, any updates on the logprobs functionality updates?

@chaoyupeng commented on GitHub (Feb 18, 2025): Hi, Ollama team, any updates on the logprobs functionality updates?

GiteaMirror commented

2026-05-03 13:31:00 -05:00

@BruceMacD commented on GitHub (Feb 20, 2025):

I've taken over implementation of this, expect some progress soon.

@BruceMacD commented on GitHub (Feb 20, 2025): I've taken over implementation of this, expect some progress soon.

GiteaMirror commented

2026-05-03 13:31:02 -05:00

@BruceMacD commented on GitHub (Feb 27, 2025):

Update for those interested:
I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs:
https://github.com/ollama/ollama/pull/9282

Once that gets in I'll move forward with returning the values from the Ollama server.

@BruceMacD commented on GitHub (Feb 27, 2025): Update for those interested: I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs: https://github.com/ollama/ollama/pull/9282 Once that gets in I'll move forward with returning the values from the Ollama server.

GiteaMirror commented

2026-05-03 13:31:03 -05:00

@SabaPivot commented on GitHub (Mar 5, 2025):

Update for those interested: I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs: #9282

Once that gets in I'll move forward with returning the values from the Ollama server.

Cool! This is really cool!

@SabaPivot commented on GitHub (Mar 5, 2025): > Update for those interested: I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs: [#9282](https://github.com/ollama/ollama/pull/9282) > > Once that gets in I'll move forward with returning the values from the Ollama server. Cool! This is really cool!

GiteaMirror commented

2026-05-03 13:31:04 -05:00

@SeriousJ55 commented on GitHub (Mar 19, 2025):

Hello! Do you know when this feature will be implemented? Pull request #9282 seems to be pending.

And thanks to the people in the dev team for their amazing work!

@SeriousJ55 commented on GitHub (Mar 19, 2025): Hello! Do you know when this feature will be implemented? Pull request #9282 seems to be pending. And thanks to the people in the dev team for their amazing work!

GiteaMirror commented

2026-05-03 13:31:05 -05:00

@BruceMacD commented on GitHub (Mar 19, 2025):

I'm still working on this at the same time as a few other things, but I haven't forgotten about it. Updated the #9282 pull request, so hopefully that one gets in soon.

@BruceMacD commented on GitHub (Mar 19, 2025): I'm still working on this at the same time as a few other things, but I haven't forgotten about it. Updated the #9282 pull request, so hopefully that one gets in soon.

GiteaMirror commented

2026-05-03 13:31:06 -05:00

@K0IN commented on GitHub (Apr 7, 2025):

Hei, i know this might be the wrong thread, and it is surly somewhere documented, but why did ollama switch from the llama.cpp server to a custom implementation in the first place?

@K0IN commented on GitHub (Apr 7, 2025): Hei, i know this might be the wrong thread, and it is surly somewhere documented, but why did ollama switch from the llama.cpp server to a custom implementation in the first place?

GiteaMirror commented

2026-05-03 13:31:07 -05:00

@aakash232 commented on GitHub (Apr 9, 2025):

Hello, Any updates on this feature?

@aakash232 commented on GitHub (Apr 9, 2025): Hello, Any updates on this feature?

GiteaMirror commented

2026-05-03 13:31:08 -05:00

@qzhou711 commented on GitHub (Apr 9, 2025):

Thank you very much for the efforts of the developers. Is there any update to this feature? This is a very important feature.

@qzhou711 commented on GitHub (Apr 9, 2025): Thank you very much for the efforts of the developers. Is there any update to this feature? This is a very important feature.

GiteaMirror commented

2026-05-03 13:31:09 -05:00

@BruceMacD commented on GitHub (Apr 14, 2025):

Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.

@BruceMacD commented on GitHub (Apr 14, 2025): Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.

GiteaMirror commented

2026-05-03 13:31:10 -05:00

@brunodifranco commented on GitHub (Apr 21, 2025):

Hello! Any estimate on when the feature will be completed?

@brunodifranco commented on GitHub (Apr 21, 2025): Hello! Any estimate on when the feature will be completed?

GiteaMirror commented

2026-05-03 13:31:12 -05:00

@Proteusiq commented on GitHub (Apr 28, 2025):

Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.

How can we help?

@Proteusiq commented on GitHub (Apr 28, 2025): > Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans. How can we help?

GiteaMirror commented

2026-05-03 13:31:14 -05:00

@CodeHatchling commented on GitHub (May 1, 2025):

Please consider merging in pull request #9282 that implements this very needed feature.

@CodeHatchling commented on GitHub (May 1, 2025): Please consider merging in pull request #9282 that implements this very needed feature.

GiteaMirror commented

2026-05-03 13:31:16 -05:00

@BarryKeee commented on GitHub (Jun 26, 2025):

Any update on this? This feature is very much needed!

@BarryKeee commented on GitHub (Jun 26, 2025): Any update on this? This feature is very much needed!

GiteaMirror commented

2026-05-03 13:31:19 -05:00

@enochlev commented on GitHub (Jun 30, 2025):

May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry.

Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAI), and we are limited to only using vLLM, which is quite GPU hungry.

Again, we understand this contribution comes from your free time, and it is much appreciated.

@enochlev commented on GitHub (Jun 30, 2025): May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to **95% of the researchers**. Unlocking this capability is a major contribution to the LLM research industry. Most paid hosting LLMs prohibit the extraction of logits ([all except for OPENAI](https://docs.litellm.ai/docs/completion/input)), and we are limited to only using vLLM, which is quite GPU hungry. Again, we understand this contribution comes from your free time, and it is much appreciated.

GiteaMirror commented

2026-05-03 13:31:21 -05:00

@SharmaM-dev commented on GitHub (Jun 30, 2025):

[like] Mridul Sharma reacted to your message:

From: enochlev @.>
Sent: Monday, June 30, 2025 5:51:23 PM
To: ollama/ollama @.>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)

[https://avatars.githubusercontent.com/u/47466848?s=20&v=4]enochlev left a comment (ollama/ollama#2415)https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668

May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry.

Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAIhttps://docs.litellm.ai/docs/completion/input), and we are limited to only using vLLM, which is quite GPU hungry.

Again, we understand this contribution comes from your free time, and it is much appreciated.

—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-3020176668, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUVU4NYYI4LXICDOS433GF2JXAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRQGE3TMNRWHA.
You are receiving this because you commented.Message ID: @.***>

@SharmaM-dev commented on GitHub (Jun 30, 2025): [like] Mridul Sharma reacted to your message: ________________________________ From: enochlev ***@***.***> Sent: Monday, June 30, 2025 5:51:23 PM To: ollama/ollama ***@***.***> Cc: Mridul Sharma ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415) [https://avatars.githubusercontent.com/u/47466848?s=20&v=4]enochlev left a comment (ollama/ollama#2415)<https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668> May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry. Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAI<https://docs.litellm.ai/docs/completion/input>), and we are limited to only using vLLM, which is quite GPU hungry. Again, we understand this contribution comes from your free time, and it is much appreciated. — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6MENUVU4NYYI4LXICDOS433GF2JXAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRQGE3TMNRWHA>. You are receiving this because you commented.Message ID: ***@***.***>

GiteaMirror commented

2026-05-03 13:31:22 -05:00

@baptistejamin commented on GitHub (Jul 1, 2025):

I can be used for other things besides research. At Crisp, we utilize log-probs with fine-tuned models for binary classification on highly complex queries.

It can be super helpful.

Categorize this into two categories: SPAM, HAM

Query:

You can then have the log prob for SPAM, and use thresholds to classify.

@baptistejamin commented on GitHub (Jul 1, 2025): I can be used for other things besides research. At Crisp, we utilize log-probs with fine-tuned models for binary classification on highly complex queries. It can be super helpful. ``` Categorize this into two categories: SPAM, HAM Query: ``` You can then have the log prob for SPAM, and use thresholds to classify.

GiteaMirror commented

2026-05-03 13:31:23 -05:00

@jaylinwylie commented on GitHub (Jul 15, 2025):

Looking like we are waiting on the pull request to be approved. Unless theres a branch/fork we can experiment with in the meantime?

@jaylinwylie commented on GitHub (Jul 15, 2025): Looking like we are waiting on the pull request to be approved. Unless theres a branch/fork we can experiment with in the meantime?

GiteaMirror commented

2026-05-03 13:31:24 -05:00

@unacceptable commented on GitHub (Jul 16, 2025):

I am surprised that @jmorganca or @rick-github haven't closed this out yet saying to use a proxy. That's what they did in #1053 and #8573 for auth and token usage.

@unacceptable commented on GitHub (Jul 16, 2025): I am surprised that @jmorganca or @rick-github haven't closed this out yet saying to use a proxy. That's what they did in #1053 and #8573 for auth and token usage.

GiteaMirror commented

2026-05-03 13:31:26 -05:00

@rick-github commented on GitHub (Jul 16, 2025):

A proxy does not have access to logits. See here to learn more about logits.

@rick-github commented on GitHub (Jul 16, 2025): A proxy does not have access to logits. See [here](https://huggingface.co/blog/logits-processor-zoo#what-are-logits-in-language-models) to learn more about logits.

GiteaMirror commented

2026-05-03 13:31:27 -05:00

@kjam commented on GitHub (Aug 12, 2025):

Commenting to also say this is useful for implementing privacy and security controls, such as differential privacy predictions and regularization when managing adversarial input. Would be nice to merge https://github.com/ollama/ollama/pull/9282

@kjam commented on GitHub (Aug 12, 2025): Commenting to also say this is useful for implementing privacy and security controls, such as differential privacy predictions and regularization when managing adversarial input. Would be nice to merge https://github.com/ollama/ollama/pull/9282

GiteaMirror commented

2026-05-03 13:31:28 -05:00

@codelion commented on GitHub (Aug 12, 2025):

A proxy does not have access to logits. See here to learn more about logits.

Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - https://github.com/codelion/optillm/issues/182

@codelion commented on GitHub (Aug 12, 2025): > A proxy does not have access to logits. See [here](https://huggingface.co/blog/logits-processor-zoo#what-are-logits-in-language-models) to learn more about logits. Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - https://github.com/codelion/optillm/issues/182

GiteaMirror commented

2026-05-03 13:31:29 -05:00

@Tritonio commented on GitHub (Aug 12, 2025):

A proxy does not have access to logits. See here to learn more about logits.

Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - codelion/optillm#182

Is optiLLM able to get log-probs from Ollama though? I may be wrong but from a cursory look at the code it looks like it uses torch internally to get the logits, so if that is the case it's not acting as a proxy in front of ollama when it does so.

@Tritonio commented on GitHub (Aug 12, 2025): > > A proxy does not have access to logits. See [here](https://huggingface.co/blog/logits-processor-zoo#what-are-logits-in-language-models) to learn more about logits. > > Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - [codelion/optillm#182](https://github.com/codelion/optillm/issues/182) Is optiLLM able to get log-probs from Ollama though? I may be wrong but from a cursory look at the code it looks like it uses torch internally to get the logits, so if that is the case it's not acting as a proxy in front of ollama when it does so.

GiteaMirror commented

2026-05-03 13:31:29 -05:00

@codelion commented on GitHub (Aug 12, 2025):

Is optiLLM able to get log-probs from Ollama though?

No it is not through Ollama, you do not need ollama you can directly do the inference in OptiLLM with the in-built server which provides full logits via the standard OpenAI compatible API.

@codelion commented on GitHub (Aug 12, 2025): > Is optiLLM able to get log-probs from Ollama though? No it is not through Ollama, you do not need ollama you can directly do the inference in OptiLLM with the in-built server which provides full logits via the standard OpenAI compatible API.

GiteaMirror commented

2026-05-03 13:31:30 -05:00

@rick-github commented on GitHub (Aug 12, 2025):

The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.

@rick-github commented on GitHub (Aug 12, 2025): The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.

GiteaMirror commented

2026-05-03 13:31:30 -05:00

@codelion commented on GitHub (Aug 12, 2025):

The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.

OptiLLM is just an alternative, this issue has been open for over 18 months, if ollama wanted to implement it, they would have done it by now.

@codelion commented on GitHub (Aug 12, 2025): > The issue is about getting logits from ollama, not optillm. Please don't distract from the issue. OptiLLM is just an alternative, this issue has been open for over 18 months, if ollama wanted to implement it, they would have done it by now.

GiteaMirror commented

2026-05-03 13:31:32 -05:00

@rick-github commented on GitHub (Aug 12, 2025):

Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.

@rick-github commented on GitHub (Aug 12, 2025): Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.

GiteaMirror commented

2026-05-03 13:31:33 -05:00

@SharmaM-dev commented on GitHub (Aug 12, 2025):

[like] Mridul Sharma reacted to your message:

From: frob @.>
Sent: Tuesday, August 12, 2025 9:15:34 AM
To: ollama/ollama @.>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)

[https://avatars.githubusercontent.com/u/14946854?s=20&v=4]rick-github left a comment (ollama/ollama#2415)https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110

Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.

—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-3178468110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUX3YFJQSFJMEUUJLJL3NGWDNAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZYGQ3DQMJRGA.
You are receiving this because you commented.Message ID: @.***>

@SharmaM-dev commented on GitHub (Aug 12, 2025): [like] Mridul Sharma reacted to your message: ________________________________ From: frob ***@***.***> Sent: Tuesday, August 12, 2025 9:15:34 AM To: ollama/ollama ***@***.***> Cc: Mridul Sharma ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415) [https://avatars.githubusercontent.com/u/14946854?s=20&v=4]rick-github left a comment (ollama/ollama#2415)<https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110> Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama. — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6MENUX3YFJQSFJMEUUJLJL3NGWDNAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZYGQ3DQMJRGA>. You are receiving this because you commented.Message ID: ***@***.***>

GiteaMirror commented

2026-05-03 13:31:34 -05:00

@VKrishna04 commented on GitHub (Aug 28, 2025):

yes please add it

@VKrishna04 commented on GitHub (Aug 28, 2025): yes please add it

GiteaMirror commented

2026-05-03 13:31:35 -05:00

@CodeHatchling commented on GitHub (Sep 7, 2025):

I figured I'd throw in a couple of the many possible use cases for a feature like this.

Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model.

Another case is assisted writing, where the top N continuations are offered instead of the usual single continuation, such as with my tokenscape project on github.

Cheers!

@CodeHatchling commented on GitHub (Sep 7, 2025): I figured I'd throw in a couple of the many possible use cases for a feature like this. Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model. Another case is assisted writing, where the top N continuations are offered instead of the usual single continuation, such as with my tokenscape project on github. Cheers!

GiteaMirror commented

2026-05-03 13:31:36 -05:00

@martinkozle commented on GitHub (Sep 9, 2025):

Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model.

Ollama does have structured outputs which you can use for this use-case.

https://github.com/ollama/ollama/blob/main/docs/openai.md#structured-outputs
https://ollama.com/blog/structured-outputs

In the Pydantic model use an enum with the valid options.

Not exactly what you said and doesn't excuse the lack of logits feature, but it may be useful if you need it for this specific case.

@martinkozle commented on GitHub (Sep 9, 2025): > Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model. Ollama does have structured outputs which you can use for this use-case. https://github.com/ollama/ollama/blob/main/docs/openai.md#structured-outputs https://ollama.com/blog/structured-outputs In the Pydantic model use an enum with the valid options. Not exactly what you said and doesn't excuse the lack of logits feature, but it may be useful if you need it for this specific case.

GiteaMirror commented

2026-05-03 13:31:37 -05:00

@Abdulrahman392011 commented on GitHub (Sep 21, 2025):

hey guys, so after this long read. I can't really tell when i can expect this to be available.

I also want to say that ollama already provide the logprobs for openai-python compatibility layer. so I don't really understand what's the hold up. logically speaking it should be almost copy and paste with some modifications. can someone please correct me.

@Abdulrahman392011 commented on GitHub (Sep 21, 2025): hey guys, so after this long read. I can't really tell when i can expect this to be available. I also want to say that ollama already provide the logprobs for openai-python compatibility layer. so I don't really understand what's the hold up. logically speaking it should be almost copy and paste with some modifications. can someone please correct me.

GiteaMirror commented

2026-05-03 13:31:40 -05:00

@rick-github commented on GitHub (Sep 21, 2025):

Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.

@rick-github commented on GitHub (Sep 21, 2025): Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.

GiteaMirror commented

2026-05-03 13:31:41 -05:00

@Abdulrahman392011 commented on GitHub (Sep 21, 2025):

Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.

yes you're right i mistakenly saw the logprobs in the
https://docs.ollama.com/openai
but it was unchecked. sorry for the confusion.

@Abdulrahman392011 commented on GitHub (Sep 21, 2025): > Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API. yes you're right i mistakenly saw the logprobs in the https://docs.ollama.com/openai but it was unchecked. sorry for the confusion.

GiteaMirror commented

2026-05-03 13:31:43 -05:00

@Abdulrahman392011 commented on GitHub (Sep 21, 2025):

so what is needed to do this. why is it not implemented yet. I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. so in my mind it's more like a command that just need modification when llama.cpp instance is initiated.

I am here to learn, so correct me please.

@Abdulrahman392011 commented on GitHub (Sep 21, 2025): so what is needed to do this. why is it not implemented yet. I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. so in my mind it's more like a command that just need modification when llama.cpp instance is initiated. I am here to learn, so correct me please.

GiteaMirror commented

2026-05-03 13:31:46 -05:00

@rick-github commented on GitHub (Sep 21, 2025):

I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs.

This issue was opened before llama.cpp provided OpenAI compatible logprobs. In the meantime, ollama has migrated away from llama.cpp as the primary backend. Work to support logprobs needs to be done on the new ollama engine. The main developers are busy with other tasks.

@rick-github commented on GitHub (Sep 21, 2025): > I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. This issue was opened before llama.cpp provided OpenAI compatible logprobs. In the meantime, ollama has migrated away from llama.cpp as the primary backend. Work to support logprobs needs to be done on the new ollama engine. The main developers are busy with other tasks.

GiteaMirror commented

2026-05-03 13:31:49 -05:00

@baptistejamin commented on GitHub (Nov 1, 2025):

I just relesed this PR containing logprogs: https://github.com/ollama/ollama/pull/12899

You can try with:

GOTOOLCHAIN=auto go build .

./ollama serve

 curl http://localhost:11434/api/generate -d '{
                                                               "model": "llama3.2",
                                                               "prompt": "The capital of France is",
                                                               "stream": false,
                                                               "logprobs": true,
                                                               "top_logprobs": 1
                                                             }'

@baptistejamin commented on GitHub (Nov 1, 2025): I just relesed this PR containing logprogs: https://github.com/ollama/ollama/pull/12899 You can try with: ``` GOTOOLCHAIN=auto go build . ``` ``` ./ollama serve ``` ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "The capital of France is", "stream": false, "logprobs": true, "top_logprobs": 1 }' ```

GiteaMirror commented

2026-05-03 13:31:54 -05:00

@tobiaswuerth commented on GitHub (Nov 6, 2025):

+1

@tobiaswuerth commented on GitHub (Nov 6, 2025): +1

GiteaMirror commented

2026-05-03 13:31:56 -05:00

@rick-github commented on GitHub (Nov 13, 2025):

https://github.com/ollama/ollama/releases/tag/v0.12.11

@rick-github commented on GitHub (Nov 13, 2025): https://github.com/ollama/ollama/releases/tag/v0.12.11

GiteaMirror commented

2026-05-03 13:31:57 -05:00

@jmorganca commented on GitHub (Nov 13, 2025):

Wanted to say a huge thanks to @baptistejamin for the PR that got this in! And thank you to @BruceMacD who did some original work around this, @jessegross for the reviews and @ParthSareen for some fit and finish on the feature 🎉

Thanks for closing this @rick-github 😊

@jmorganca commented on GitHub (Nov 13, 2025): Wanted to say a huge thanks to @baptistejamin for the PR that got this in! And thank you to @BruceMacD who did some original work around this, @jessegross for the reviews and @ParthSareen for some fit and finish on the feature 🎉 Thanks for closing this @rick-github 😊

GiteaMirror commented

2026-05-03 13:32:00 -05:00

@neuhaus commented on GitHub (Nov 24, 2025):

I came across this new Ollama API feature while wondering how to implement the function described in the paper
"LLMs can hide text in other text of the same length" (Norelli & Bronstein, 2024/2025)" using the API.

The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N.

I was wondering if this missing functionality could be added to the API.

@neuhaus commented on GitHub (Nov 24, 2025): I came across this new Ollama API feature while wondering how to implement the function described in the paper "[LLMs can hide text in other text of the same length" (Norelli & Bronstein, 2024/2025)](https://arxiv.org/abs/2510.20075)" using the API. The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N. I was wondering if this missing functionality could be added to the API.

GiteaMirror commented

2026-05-03 13:32:06 -05:00

@codelion commented on GitHub (Nov 24, 2025):

The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N.

You can use OptiLLM for it if you want, it provides the full API - https://github.com/algorithmicsuperintelligence/optillm/issues/182

@codelion commented on GitHub (Nov 24, 2025): > The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N. You can use OptiLLM for it if you want, it provides the full API - https://github.com/algorithmicsuperintelligence/optillm/issues/182

GiteaMirror commented

2026-05-03 13:32:11 -05:00

@rick-github commented on GitHub (Nov 24, 2025):

To avoid confusion, OptiLLM is an alternative to ollama, it does not provide the full probability distribution for ollama models.

@rick-github commented on GitHub (Nov 24, 2025): To avoid confusion, OptiLLM is an alternative to ollama, it does not provide the full probability distribution for ollama models.

GiteaMirror commented

2026-05-03 13:32:13 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Integrate "memlayer" locally

@Bottlecap202 commented on GitHub (Nov 24, 2025): Integrate "memlayer" locally

GiteaMirror commented

2026-05-03 13:32:14 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

And use a coral edge TPU, they are public now.

@Bottlecap202 commented on GitHub (Nov 24, 2025): And use a coral edge TPU, they are public now.

GiteaMirror commented

2026-05-03 13:32:14 -05:00

@baptistejamin commented on GitHub (Nov 24, 2025):

Just to better understand. Would you like to build a product/inference system in production from this Paper?

IMO, what you need is something low-level API, such as LLama CPP, which is made for this.

Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & play

@baptistejamin commented on GitHub (Nov 24, 2025): Just to better understand. Would you like to build a product/inference system in production from this Paper? IMO, what you need is something low-level API, such as LLama CPP, which is made for this. Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & play

GiteaMirror commented

2026-05-03 13:32:17 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Kobold cpp.

@Bottlecap202 commented on GitHub (Nov 24, 2025): Kobold cpp.

GiteaMirror commented

2026-05-03 13:32:18 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Build around llama.cpp, BUT EASIER TO USE.

@Bottlecap202 commented on GitHub (Nov 24, 2025): Build around llama.cpp, BUT EASIER TO USE.

GiteaMirror commented

2026-05-03 13:32:19 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

AND it has a community based horde. For all ages.

@Bottlecap202 commented on GitHub (Nov 24, 2025): AND it has a community based horde. For all ages.

GiteaMirror commented

2026-05-03 13:32:22 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Kobold lite

On Mon, Nov 24, 2025, 8:08 AM Baptiste Jamin @.***>
wrote:

baptistejamin left a comment (ollama/ollama#2415)
https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415

Just to better understand. Would you like to build a product/inference
system in production from this Paper?

IMO, what you need is something low-level API, such as LLama CPP, which is
made for this.

Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug &
play

—
Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDHQPUJHP4HNTXUCMBVI4YT36MGPFAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQHE3DGNBRGU
.
You are receiving this because you commented.Message ID:
@.***>

@Bottlecap202 commented on GitHub (Nov 24, 2025): Kobold lite On Mon, Nov 24, 2025, 8:08 AM Baptiste Jamin ***@***.***> wrote: > *baptistejamin* left a comment (ollama/ollama#2415) > <https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415> > > Just to better understand. Would you like to build a product/inference > system in production from this Paper? > > IMO, what you need is something low-level API, such as LLama CPP, which is > made for this. > > Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & > play > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BDHQPUJHP4HNTXUCMBVI4YT36MGPFAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQHE3DGNBRGU> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#63445