[GH-ISSUE #2415] Provide logits or logprobs in the API #63445

Closed
opened 2026-05-03 13:29:48 -05:00 by GiteaMirror · 95 comments
Owner

Originally created by @freQuensy23-coder on GitHub (Feb 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2415

Originally assigned to: @BruceMacD, @ParthSareen on GitHub.

Feature request:
How can i get logits (probabilites of each next token), during generation, just like I can do it in Open AI API (logprobs)? This feature will be helpfull for apps, that use logprobs to measure model avareness and confidence.

Originally created by @freQuensy23-coder on GitHub (Feb 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2415 Originally assigned to: @BruceMacD, @ParthSareen on GitHub. Feature request: How can i get logits (probabilites of each next token), during generation, just like I can do it in Open AI API (logprobs)? This feature will be helpfull for apps, that use logprobs to measure model avareness and confidence.
GiteaMirror added the feature requestapi labels 2026-05-03 13:29:50 -05:00
Author
Owner

@neychevr commented on GitHub (Mar 25, 2024):

Hello! Seems like a really good feature for more complex usage of Ollama.
Is this feature already WIP, or contribution would be fine?

UPD: seems there is already a pending PR with this feature implemented: https://github.com/ollama/ollama/pull/1640
Could we help somehow to speed up the merge? :)

<!-- gh-comment-id:2018003103 --> @neychevr commented on GitHub (Mar 25, 2024): Hello! Seems like a really good feature for more complex usage of Ollama. Is this feature already WIP, or contribution would be fine? UPD: seems there is already a pending PR with this feature implemented: https://github.com/ollama/ollama/pull/1640 Could we help somehow to speed up the merge? :)
Author
Owner

@josiahbryan commented on GitHub (Apr 19, 2024):

This would be super helpful to some ongoing research work I'm doing. Does anyone know of any providers that DO return logprobs, other than OpenAI of course? Any ETA when this might land here in ollama?

<!-- gh-comment-id:2066262905 --> @josiahbryan commented on GitHub (Apr 19, 2024): This would be super helpful to some ongoing research work I'm doing. Does anyone know of any providers that DO return logprobs, other than OpenAI of course? Any ETA when this might land here in ollama?
Author
Owner

@mateon1 commented on GitHub (Apr 27, 2024):

I would also like to have this, I'm interested in having both echo + logprobs, so I can get information about the prompt too, instead of just the completion. Right now I'm using very small models with pytorch to compute logits directly, but that's really slow.

<!-- gh-comment-id:2080460917 --> @mateon1 commented on GitHub (Apr 27, 2024): I would also like to have this, I'm interested in having both echo + logprobs, so I can get information about the prompt too, instead of just the completion. Right now I'm using very small models with pytorch to compute logits directly, but that's really slow.
Author
Owner

@magic-YuanTian commented on GitHub (May 2, 2024):

Any updates?

<!-- gh-comment-id:2089560675 --> @magic-YuanTian commented on GitHub (May 2, 2024): Any updates?
Author
Owner

@briancleland commented on GitHub (May 6, 2024):

Any updates?

https://github.com/ollama/ollama/pull/1640#issuecomment-2043381653

<!-- gh-comment-id:2096275826 --> @briancleland commented on GitHub (May 6, 2024): > Any updates? https://github.com/ollama/ollama/pull/1640#issuecomment-2043381653
Author
Owner

@SharmaM-dev commented on GitHub (Jul 14, 2024):

Any updates?

<!-- gh-comment-id:2227418544 --> @SharmaM-dev commented on GitHub (Jul 14, 2024): Any updates?
Author
Owner

@drdsgvo commented on GitHub (Jul 29, 2024):

Are there any updates on this very important issue? To not implement logits is not a valid solution. Anyone (including me) who needs logits will move from Ollama to a different solution! Please be aware of that.

<!-- gh-comment-id:2255475177 --> @drdsgvo commented on GitHub (Jul 29, 2024): Are there any updates on this very important issue? To not implement logits is not a valid solution. Anyone (including me) who needs logits will move from Ollama to a different solution! Please be aware of that.
Author
Owner

@The-Inscrutable-X commented on GitHub (Aug 9, 2024):

support, would be very nice

<!-- gh-comment-id:2278764360 --> @The-Inscrutable-X commented on GitHub (Aug 9, 2024): support, would be very nice
Author
Owner

@moritz-gross commented on GitHub (Aug 30, 2024):

I'm surprised this is not one of the first things implemented 🤔

<!-- gh-comment-id:2320630457 --> @moritz-gross commented on GitHub (Aug 30, 2024): I'm surprised this is not one of the first things implemented 🤔
Author
Owner

@haukelicht commented on GitHub (Sep 11, 2024):

Hi there,

any progress on integrating this feature request?

<!-- gh-comment-id:2342828327 --> @haukelicht commented on GitHub (Sep 11, 2024): Hi there, any progress on integrating this feature request?
Author
Owner

@szocsbarni commented on GitHub (Sep 19, 2024):

Hi, is there a timeline available for integration?

<!-- gh-comment-id:2359991460 --> @szocsbarni commented on GitHub (Sep 19, 2024): Hi, is there a timeline available for integration?
Author
Owner

@mommi84 commented on GitHub (Sep 19, 2024):

Hi, is there a timeline available for integration?

We are going to get AGI before this, 100%!

<!-- gh-comment-id:2360907875 --> @mommi84 commented on GitHub (Sep 19, 2024): > Hi, is there a timeline available for integration? We are going to get AGI before this, 100%!
Author
Owner

@SharmaM-dev commented on GitHub (Sep 19, 2024):

[laugh] Mridul Sharma reacted to your message:


From: Tom Soru @.>
Sent: Thursday, September 19, 2024 12:53:37 PM
To: ollama/ollama @.
>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)

Hi, is there a timeline available for integration?

We are going to get AGI before this, 100%!


Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-2360907875, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUUYCRVXZP6V33UKE4LZXLCNDAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRQHEYDOOBXGU.
You are receiving this because you commented.Message ID: @.***>

<!-- gh-comment-id:2361048161 --> @SharmaM-dev commented on GitHub (Sep 19, 2024): [laugh] Mridul Sharma reacted to your message: ________________________________ From: Tom Soru ***@***.***> Sent: Thursday, September 19, 2024 12:53:37 PM To: ollama/ollama ***@***.***> Cc: Mridul Sharma ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415) Hi, is there a timeline available for integration? We are going to get AGI before this, 100%! — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2415#issuecomment-2360907875>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6MENUUYCRVXZP6V33UKE4LZXLCNDAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRQHEYDOOBXGU>. You are receiving this because you commented.Message ID: ***@***.***>
Author
Owner

@martinkozle commented on GitHub (Sep 19, 2024):

We are going to get GTA 6 before this.

<!-- gh-comment-id:2361193021 --> @martinkozle commented on GitHub (Sep 19, 2024): We are going to get GTA 6 before this.
Author
Owner

@latent-variable commented on GitHub (Oct 16, 2024):

man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.

<!-- gh-comment-id:2415716327 --> @latent-variable commented on GitHub (Oct 16, 2024): man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.
Author
Owner

@josiahbryan commented on GitHub (Oct 16, 2024):

I've given up hope and switched back to llama.cpp for production inference.
Using it with ramalama which can pull from the ollama model library.

Really disappointed that the maintainers here show such disregard for such
a huge community request.

Makes me want to make sure I don't use the project in any way. If an
obvious thing like this is being totally ignored by the maintainers, then
it shows they don't really care much about what the community is asking
for.

On Tue, Oct 15, 2024, 11:39 PM Lino Valdovinos @.***>
wrote:

man, I really need this for to implement an CoT-Decoding pipeline for the
open-webui. I guess I'll go back to playing Sparking Zero.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2415#issuecomment-2415716327,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABEZELGRV4XP267QL3XM4ODZ3XUZLAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVG4YTMMZSG4
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:2415722000 --> @josiahbryan commented on GitHub (Oct 16, 2024): I've given up hope and switched back to llama.cpp for production inference. Using it with ramalama which can pull from the ollama model library. Really disappointed that the maintainers here show such disregard for such a huge community request. Makes me want to make sure I don't use the project in any way. If an obvious thing like this is being totally ignored by the maintainers, then it shows they don't really care much about what the community is asking for. On Tue, Oct 15, 2024, 11:39 PM Lino Valdovinos ***@***.***> wrote: > man, I really need this for to implement an CoT-Decoding pipeline for the > open-webui. I guess I'll go back to playing Sparking Zero. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2415#issuecomment-2415716327>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABEZELGRV4XP267QL3XM4ODZ3XUZLAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVG4YTMMZSG4> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@briancleland commented on GitHub (Oct 16, 2024):

@jmorganca @bmizerany Has the team given up on implementing this feature?

<!-- gh-comment-id:2415868226 --> @briancleland commented on GitHub (Oct 16, 2024): @jmorganca @bmizerany Has the team given up on implementing this feature?
Author
Owner

@NumberChiffre commented on GitHub (Oct 19, 2024):

Pls make this happen lol, as a painful user on Mac :(

<!-- gh-comment-id:2423424825 --> @NumberChiffre commented on GitHub (Oct 19, 2024): Pls make this happen lol, as a painful user on Mac :(
Author
Owner

@josiahbryan commented on GitHub (Oct 19, 2024):

@jmorganca @bmizerany you guys broadcast your partnership with Hugging Face - great! What about this though? This seems like less than 1/10th the effort - why are you ignoring everyone asking for input here? Why don't you at least provide a timeline?

<!-- gh-comment-id:2423429703 --> @josiahbryan commented on GitHub (Oct 19, 2024): @jmorganca @bmizerany you guys broadcast your partnership with Hugging Face - great! What about this though? This seems like less than 1/10th the effort - why are you ignoring everyone asking for input here? Why don't you at least provide a timeline?
Author
Owner

@athmanar commented on GitHub (Oct 22, 2024):

insane that this not given as an option? maybe better to switch to pure huggingface models

<!-- gh-comment-id:2430154717 --> @athmanar commented on GitHub (Oct 22, 2024): insane that this not given as an option? maybe better to switch to pure huggingface models
Author
Owner

@codelion commented on GitHub (Oct 28, 2024):

man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.

Cot decoding and entropy decoding are available in optillm - https://github.com/codelion/optillm

<!-- gh-comment-id:2441508131 --> @codelion commented on GitHub (Oct 28, 2024): > man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero. Cot decoding and entropy decoding are available in optillm - https://github.com/codelion/optillm
Author
Owner

@Cy-Fi commented on GitHub (Oct 29, 2024):

Ollama will stop being an option for us if crucial features like this are not beeing implemented...

<!-- gh-comment-id:2444070358 --> @Cy-Fi commented on GitHub (Oct 29, 2024): Ollama will stop being an option for us if crucial features like this are not beeing implemented...
Author
Owner

@magic-YuanTian commented on GitHub (Nov 3, 2024):

For such a simple but important feature, the team demonstrates unexpected arrogance and ignorance over such a long time. I think this is a red flag for us to give up using Ollama as LLM backend since they cannot go far for sure.

<!-- gh-comment-id:2453304389 --> @magic-YuanTian commented on GitHub (Nov 3, 2024): For such a simple but important feature, the team demonstrates unexpected arrogance and ignorance over such a long time. I think this is a red flag for us to give up using Ollama as LLM backend since they cannot go far for sure.
Author
Owner

@codelion commented on GitHub (Nov 3, 2024):

I have implemented it in PyTorch if anyone is looking for it they can use the following colab - https://colab.research.google.com/drive/1zPv47_tog2_KOFJY-WJxwPYR6mgoxKlK?usp=sharing

Here is the discussion on optillm where it was brought up as well - https://github.com/codelion/optillm/discussions/82

<!-- gh-comment-id:2453442288 --> @codelion commented on GitHub (Nov 3, 2024): I have implemented it in PyTorch if anyone is looking for it they can use the following colab - https://colab.research.google.com/drive/1zPv47_tog2_KOFJY-WJxwPYR6mgoxKlK?usp=sharing Here is the discussion on optillm where it was brought up as well - https://github.com/codelion/optillm/discussions/82
Author
Owner

@codelion commented on GitHub (Nov 13, 2024):

Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.

<!-- gh-comment-id:2472261594 --> @codelion commented on GitHub (Nov 13, 2024): Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.
Author
Owner

@jooray commented on GitHub (Nov 13, 2024):

This would be very useful for enforcing the structure of output (output_cls with langchain, that currently works with llama.cpp and hugging face).

It can reject tokens that would break the output structure. Very useful for tool calling as well.

<!-- gh-comment-id:2473788562 --> @jooray commented on GitHub (Nov 13, 2024): This would be very useful for enforcing the structure of output (output_cls with langchain, that currently works with llama.cpp and hugging face). It can reject tokens that would break the output structure. Very useful for tool calling as well.
Author
Owner

@drdsgvo commented on GitHub (Nov 13, 2024):

Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.

Great to hear that we have a cool alternative to ollama as those guys are not doing what needs to be done!

<!-- gh-comment-id:2473793084 --> @drdsgvo commented on GitHub (Nov 13, 2024): > Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code. Great to hear that we have a cool alternative to ollama as those guys are not doing what needs to be done!
Author
Owner

@jooray commented on GitHub (Nov 14, 2024):

I would suggest you being more kind. Ollama is an open source project, they are not working for you. Feel free to offer a bounty to implement this, or create a pull request.

I would really like to see this implemented, but that does not mean I have to be mean to authors of a software I get for free. And being an ass in comments (which many of you are here) is not very motivating for developers either.

<!-- gh-comment-id:2476022884 --> @jooray commented on GitHub (Nov 14, 2024): I would suggest you being more kind. Ollama is an open source project, they are not working for you. Feel free to offer a bounty to implement this, or create a pull request. I would really like to see this implemented, but that does not mean I have to be mean to authors of a software I get for free. And being an ass in comments (which many of you are here) is not very motivating for developers either.
Author
Owner

@ParthSareen commented on GitHub (Dec 9, 2024):

Hey everyone! Sorry for the delay and no updates here - going to be picking this up soon and hopefully getting it in early Jan. There's been a ton of changes on the API even more so coming on the inference engine layer so just need to be a bit careful as it would be an API addition but it is something we want to support!

<!-- gh-comment-id:2529116098 --> @ParthSareen commented on GitHub (Dec 9, 2024): Hey everyone! Sorry for the delay and no updates here - going to be picking this up soon and hopefully getting it in early Jan. There's been a ton of changes on the API even more so coming on the inference engine layer so just need to be a bit careful as it would be an API addition but it is something we want to support!
Author
Owner

@ParthSareen commented on GitHub (Dec 13, 2024):

Hey folks would like to get your thoughts:
Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?

Working on designing the API and getting the functionality right along with that :) Appreciate your patience!

<!-- gh-comment-id:2540310646 --> @ParthSareen commented on GitHub (Dec 13, 2024): Hey folks would like to get your thoughts: Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why? Working on designing the API and getting the functionality right along with that :) Appreciate your patience!
Author
Owner

@josiahbryan commented on GitHub (Dec 13, 2024):

Personally would prefer log probs, just because all my tooling is setup for
that and used to thinking in logorobs haha

On Thu, Dec 12, 2024, 7:00 PM Parth Sareen @.***> wrote:

Hey folks would like to get your thoughts:
Would you care if it was logits vs logprobs? Would you prefer one over the
other? If so why?

Working on designing the API and getting the functionality right along
with that :) Appreciate your patience!


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2415#issuecomment-2540310646,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABEZELA4VG6OFOCJ6QPM7TT2FIWTJAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBQGMYTANRUGY
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:2540314858 --> @josiahbryan commented on GitHub (Dec 13, 2024): Personally would prefer log probs, just because all my tooling is setup for that and used to thinking in logorobs haha On Thu, Dec 12, 2024, 7:00 PM Parth Sareen ***@***.***> wrote: > Hey folks would like to get your thoughts: > Would you care if it was logits vs logprobs? Would you prefer one over the > other? If so why? > > Working on designing the API and getting the functionality right along > with that :) Appreciate your patience! > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2415#issuecomment-2540310646>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABEZELA4VG6OFOCJ6QPM7TT2FIWTJAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBQGMYTANRUGY> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@martinkozle commented on GitHub (Dec 13, 2024):

But if you want to calculate the probability of the LLM generating "yes" or "no" for example you would either have to use constrained generation with logprobs, where only those 2 tokens will be non 0. Or you can directly use logits and do the constraining yourself.

<!-- gh-comment-id:2540783305 --> @martinkozle commented on GitHub (Dec 13, 2024): But if you want to calculate the probability of the LLM generating "yes" or "no" for example you would either have to use constrained generation with logprobs, where only those 2 tokens will be non 0. Or you can directly use logits and do the constraining yourself.
Author
Owner

@mommi84 commented on GitHub (Dec 13, 2024):

Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?

Definitely logprobs so that it doesn't deviate from the OpenAI standards (see examples here) and the following can be supported:

for token in response.choices[0].logprobs.content:
    for top_logprob in token.top_logprobs:
        print((top_logprob.token, np.exp(top_logprob.logprob)))
<!-- gh-comment-id:2541524528 --> @mommi84 commented on GitHub (Dec 13, 2024): > Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why? Definitely logprobs so that it doesn't deviate from the OpenAI standards ([see examples here](https://cookbook.openai.com/examples/using_logprobs)) and the following can be supported: ```python for token in response.choices[0].logprobs.content: for top_logprob in token.top_logprobs: print((top_logprob.token, np.exp(top_logprob.logprob))) ```
Author
Owner

@ParthSareen commented on GitHub (Dec 13, 2024):

Thanks @josiahbryan, @martinkozle, @mommi84. I also think it makes more sense to have logprobs for now and then happy to re-evaluate when doing the new engine. Have some fun plans for sampling :)

<!-- gh-comment-id:2541933295 --> @ParthSareen commented on GitHub (Dec 13, 2024): Thanks @josiahbryan, @martinkozle, @mommi84. I also think it makes more sense to have logprobs for now and then happy to re-evaluate when doing the new engine. Have some fun plans for sampling :)
Author
Owner

@Elimane0800 commented on GitHub (Dec 15, 2024):

Hey folks would like to get your thoughts:

Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?

Working on designing the API and getting the functionality right along with that :) Appreciate your patience!

Personally I prefer logits because we can obtain ourselves a kind of logprobs if we have logits. Plus when trying tasks such as distillation or uncertainty observation, logits are more interesting to have. So please add logits 🙏, we can do the logprobs calculations by ourselves

<!-- gh-comment-id:2543543265 --> @Elimane0800 commented on GitHub (Dec 15, 2024): > Hey folks would like to get your thoughts: > > Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why? > > > > Working on designing the API and getting the functionality right along with that :) Appreciate your patience! Personally I prefer logits because we can obtain ourselves a kind of logprobs if we have logits. Plus when trying tasks such as distillation or uncertainty observation, logits are more interesting to have. So please add logits 🙏, we can do the logprobs calculations by ourselves
Author
Owner

@Elimane0800 commented on GitHub (Dec 15, 2024):

To be more precise : Logprobs can be derived directly from logits through a softmax followed by a logarithmic transformation. This kind of operation is neither the most difficult nor the most time consuming. It's easy to get logprobs from logits but not that easy to get logits from logprobs except if you add a scale constant to get an approximation of the logits value.

<!-- gh-comment-id:2543546509 --> @Elimane0800 commented on GitHub (Dec 15, 2024): To be more precise : Logprobs can be derived directly from logits through a softmax followed by a logarithmic transformation. This kind of operation is neither the most difficult nor the most time consuming. It's easy to get logprobs from logits but not that easy to get logits from logprobs except if you add a scale constant to get an approximation of the logits value.
Author
Owner

@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024):

I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token.
Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.

<!-- gh-comment-id:2556755466 --> @BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024): I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token. Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.
Author
Owner

@Elimane0800 commented on GitHub (Dec 20, 2024):

I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token.

Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.

For having already tried to calculate logprobs from logits with cpu i can say that it is definitely not the type of task that will run during 24h. So if the only matter is a question of time and cost of the operation I can tell that it is nor the most time consuming neither the most expensive operation.

<!-- gh-comment-id:2556787198 --> @Elimane0800 commented on GitHub (Dec 20, 2024): > I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token. > > Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not. For having already tried to calculate logprobs from logits with cpu i can say that it is definitely not the type of task that will run during 24h. So if the only matter is a question of time and cost of the operation I can tell that it is nor the most time consuming neither the most expensive operation.
Author
Owner

@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024):

There are two use cases:

  • we want to know the probabilities of the N most probable tokens
  • we want to know the probabilities or the logits of all the tokens in the vocabulary.

I think the first use case is the most requested today. To answer it effectively, the API has no choice but to give the logprobs (or probs) of the N most probable tokens. Indeed, if the API only provides the logits, then it must provide them for the entire vocabulary if we want to deduce the logprobs. With vocabulary sizes sometimes approaching 100k, this overloads the API's JSON response enormously.

<!-- gh-comment-id:2557061398 --> @BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024): There are two use cases: - we want to know the probabilities of the N most probable tokens - we want to know the probabilities or the logits of all the tokens in the vocabulary. I think the first use case is the most requested today. To answer it effectively, the API has no choice but to give the logprobs (or probs) of the N most probable tokens. Indeed, if the API only provides the logits, then it must provide them for the entire vocabulary if we want to deduce the logprobs. With vocabulary sizes sometimes approaching 100k, this overloads the API's JSON response enormously.
Author
Owner

@ParthSareen commented on GitHub (Jan 3, 2025):

Hey folks, thanks for your patience! Going to be doing something like top k log probs for the API. Not saying no to logits, but given the comments, API design, and system constraints it just makes sense to do this first.

There's also some adjacent work going on right now as well in which there's some API refactoring/design to be done. Upon it's completion this will be one of the first things that go out :)

<!-- gh-comment-id:2569818107 --> @ParthSareen commented on GitHub (Jan 3, 2025): Hey folks, thanks for your patience! Going to be doing something like top k log probs for the API. Not saying no to logits, but given the comments, API design, and system constraints it just makes sense to do this first. There's also some adjacent work going on right now as well in which there's some API refactoring/design to be done. Upon it's completion this will be one of the first things that go out :)
Author
Owner

@OriginalGoku commented on GitHub (Jan 25, 2025):

any timeframe for this update?

<!-- gh-comment-id:2613775698 --> @OriginalGoku commented on GitHub (Jan 25, 2025): any timeframe for this update?
Author
Owner

@codelion commented on GitHub (Jan 25, 2025):

any timeframe for this update?

It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm

<!-- gh-comment-id:2613781285 --> @codelion commented on GitHub (Jan 25, 2025): > any timeframe for this update? It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm - https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288
Author
Owner

@ClaudiuCreanga commented on GitHub (Jan 30, 2025):

It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm

Seems like this one is an openai tooling, while we're interested in other models.

<!-- gh-comment-id:2624869723 --> @ClaudiuCreanga commented on GitHub (Jan 30, 2025): > It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm > > * [Provide logits or logprobs in the API #2415 (comment)](https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288) Seems like this one is an openai tooling, while we're interested in other models.
Author
Owner

@ParthSareen commented on GitHub (Jan 30, 2025):

Hey everyone - sorry for the delay, making my rounds right now.

It is not due to relying on llama.cpp, I did have a branch working back early in Jan. As you may know we've been working on a new go engine. All of us have been pretty heads down on that for the last month. There's going to be some bifurcation in running the new engine vs. current, which means that new API features would need to have parity on both engines. In order to do that, just need to make sure the behavior between the old and new engines are the same for the logprobs endpoint as well as tokenize/detokenize.

I've also been mainly working on our new sampling interfaces and the logprobs feature has been top of mind as well as I build it. I do hope to get to it soon I know it's super important to you all! The new engine is going to bring a lot of stability and maintainability throughout so we can support these kind of features much faster in the future.

<!-- gh-comment-id:2625205262 --> @ParthSareen commented on GitHub (Jan 30, 2025): Hey everyone - sorry for the delay, making my rounds right now. It is not due to relying on llama.cpp, I did have a branch working back early in Jan. As you may know we've been working on a new go engine. All of us have been pretty heads down on that for the last month. There's going to be some bifurcation in running the new engine vs. current, which means that new API features would need to have parity on both engines. In order to do that, just need to make sure the behavior between the old and new engines are the same for the logprobs endpoint as well as tokenize/detokenize. I've also been mainly working on our new sampling interfaces and the logprobs feature has been top of mind as well as I build it. I do hope to get to it soon I know it's super important to you all! The new engine is going to bring a lot of stability and maintainability throughout so we can support these kind of features much faster in the future.
Author
Owner

@codelion commented on GitHub (Jan 30, 2025):

It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm

Seems like this one is an openai tooling, while we're interested in other models.

This uses the same format and API as OpenAI but works for any model from hugging face. You can use it to build datasets for distillation like this - https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits

<!-- gh-comment-id:2625640103 --> @codelion commented on GitHub (Jan 30, 2025): > > It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm > > > > * [Provide logits or logprobs in the API #2415 (comment)](https://github.com/ollama/ollama/issues/2415#issuecomment-2453442288) > > Seems like this one is an openai tooling, while we're interested in other models. This uses the same format and API as OpenAI but works for any model from hugging face. You can use it to build datasets for distillation like this - https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits
Author
Owner

@chaoyupeng commented on GitHub (Feb 18, 2025):

Hi, Ollama team, any updates on the logprobs functionality updates?

<!-- gh-comment-id:2664466534 --> @chaoyupeng commented on GitHub (Feb 18, 2025): Hi, Ollama team, any updates on the logprobs functionality updates?
Author
Owner

@BruceMacD commented on GitHub (Feb 20, 2025):

I've taken over implementation of this, expect some progress soon.

<!-- gh-comment-id:2670064287 --> @BruceMacD commented on GitHub (Feb 20, 2025): I've taken over implementation of this, expect some progress soon.
Author
Owner

@BruceMacD commented on GitHub (Feb 27, 2025):

Update for those interested:
I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs:
https://github.com/ollama/ollama/pull/9282

Once that gets in I'll move forward with returning the values from the Ollama server.

<!-- gh-comment-id:2688808444 --> @BruceMacD commented on GitHub (Feb 27, 2025): Update for those interested: I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs: https://github.com/ollama/ollama/pull/9282 Once that gets in I'll move forward with returning the values from the Ollama server.
Author
Owner

@SabaPivot commented on GitHub (Mar 5, 2025):

Update for those interested: I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs: #9282

Once that gets in I'll move forward with returning the values from the Ollama server.

Cool! This is really cool!

<!-- gh-comment-id:2699681396 --> @SabaPivot commented on GitHub (Mar 5, 2025): > Update for those interested: I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs: [#9282](https://github.com/ollama/ollama/pull/9282) > > Once that gets in I'll move forward with returning the values from the Ollama server. Cool! This is really cool!
Author
Owner

@SeriousJ55 commented on GitHub (Mar 19, 2025):

Hello! Do you know when this feature will be implemented? Pull request #9282 seems to be pending.

And thanks to the people in the dev team for their amazing work!

<!-- gh-comment-id:2735670584 --> @SeriousJ55 commented on GitHub (Mar 19, 2025): Hello! Do you know when this feature will be implemented? Pull request #9282 seems to be pending. And thanks to the people in the dev team for their amazing work!
Author
Owner

@BruceMacD commented on GitHub (Mar 19, 2025):

I'm still working on this at the same time as a few other things, but I haven't forgotten about it. Updated the #9282 pull request, so hopefully that one gets in soon.

<!-- gh-comment-id:2737461231 --> @BruceMacD commented on GitHub (Mar 19, 2025): I'm still working on this at the same time as a few other things, but I haven't forgotten about it. Updated the #9282 pull request, so hopefully that one gets in soon.
Author
Owner

@K0IN commented on GitHub (Apr 7, 2025):

Hei, i know this might be the wrong thread, and it is surly somewhere documented, but why did ollama switch from the llama.cpp server to a custom implementation in the first place?

<!-- gh-comment-id:2784771369 --> @K0IN commented on GitHub (Apr 7, 2025): Hei, i know this might be the wrong thread, and it is surly somewhere documented, but why did ollama switch from the llama.cpp server to a custom implementation in the first place?
Author
Owner

@aakash232 commented on GitHub (Apr 9, 2025):

Hello, Any updates on this feature?

<!-- gh-comment-id:2789795414 --> @aakash232 commented on GitHub (Apr 9, 2025): Hello, Any updates on this feature?
Author
Owner

@qzhou711 commented on GitHub (Apr 9, 2025):

Thank you very much for the efforts of the developers. Is there any update to this feature? This is a very important feature.

<!-- gh-comment-id:2790936010 --> @qzhou711 commented on GitHub (Apr 9, 2025): Thank you very much for the efforts of the developers. Is there any update to this feature? This is a very important feature.
Author
Owner

@BruceMacD commented on GitHub (Apr 14, 2025):

Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.

<!-- gh-comment-id:2802464919 --> @BruceMacD commented on GitHub (Apr 14, 2025): Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.
Author
Owner

@brunodifranco commented on GitHub (Apr 21, 2025):

Hello! Any estimate on when the feature will be completed?

<!-- gh-comment-id:2817615280 --> @brunodifranco commented on GitHub (Apr 21, 2025): Hello! Any estimate on when the feature will be completed?
Author
Owner

@Proteusiq commented on GitHub (Apr 28, 2025):

Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.

How can we help?

<!-- gh-comment-id:2835397288 --> @Proteusiq commented on GitHub (Apr 28, 2025): > Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans. How can we help?
Author
Owner

@CodeHatchling commented on GitHub (May 1, 2025):

Please consider merging in pull request #9282 that implements this very needed feature.

<!-- gh-comment-id:2844544448 --> @CodeHatchling commented on GitHub (May 1, 2025): Please consider merging in pull request #9282 that implements this very needed feature.
Author
Owner

@BarryKeee commented on GitHub (Jun 26, 2025):

Any update on this? This feature is very much needed!

<!-- gh-comment-id:3006752180 --> @BarryKeee commented on GitHub (Jun 26, 2025): Any update on this? This feature is very much needed!
Author
Owner

@enochlev commented on GitHub (Jun 30, 2025):

May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry.

Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAI), and we are limited to only using vLLM, which is quite GPU hungry.

Again, we understand this contribution comes from your free time, and it is much appreciated.

<!-- gh-comment-id:3020176668 --> @enochlev commented on GitHub (Jun 30, 2025): May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to **95% of the researchers**. Unlocking this capability is a major contribution to the LLM research industry. Most paid hosting LLMs prohibit the extraction of logits ([all except for OPENAI](https://docs.litellm.ai/docs/completion/input)), and we are limited to only using vLLM, which is quite GPU hungry. Again, we understand this contribution comes from your free time, and it is much appreciated.
Author
Owner

@SharmaM-dev commented on GitHub (Jun 30, 2025):

[like] Mridul Sharma reacted to your message:


From: enochlev @.>
Sent: Monday, June 30, 2025 5:51:23 PM
To: ollama/ollama @.
>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)

[https://avatars.githubusercontent.com/u/47466848?s=20&v=4]enochlev left a comment (ollama/ollama#2415)https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668

May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry.

Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAIhttps://docs.litellm.ai/docs/completion/input), and we are limited to only using vLLM, which is quite GPU hungry.

Again, we understand this contribution comes from your free time, and it is much appreciated.


Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-3020176668, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUVU4NYYI4LXICDOS433GF2JXAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRQGE3TMNRWHA.
You are receiving this because you commented.Message ID: @.***>

<!-- gh-comment-id:3020181270 --> @SharmaM-dev commented on GitHub (Jun 30, 2025): [like] Mridul Sharma reacted to your message: ________________________________ From: enochlev ***@***.***> Sent: Monday, June 30, 2025 5:51:23 PM To: ollama/ollama ***@***.***> Cc: Mridul Sharma ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415) [https://avatars.githubusercontent.com/u/47466848?s=20&v=4]enochlev left a comment (ollama/ollama#2415)<https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668> May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry. Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAI<https://docs.litellm.ai/docs/completion/input>), and we are limited to only using vLLM, which is quite GPU hungry. Again, we understand this contribution comes from your free time, and it is much appreciated. — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6MENUVU4NYYI4LXICDOS433GF2JXAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRQGE3TMNRWHA>. You are receiving this because you commented.Message ID: ***@***.***>
Author
Owner

@baptistejamin commented on GitHub (Jul 1, 2025):

I can be used for other things besides research. At Crisp, we utilize log-probs with fine-tuned models for binary classification on highly complex queries.

It can be super helpful.

Categorize this into two categories: SPAM, HAM

Query:

You can then have the log prob for SPAM, and use thresholds to classify.

<!-- gh-comment-id:3024514583 --> @baptistejamin commented on GitHub (Jul 1, 2025): I can be used for other things besides research. At Crisp, we utilize log-probs with fine-tuned models for binary classification on highly complex queries. It can be super helpful. ``` Categorize this into two categories: SPAM, HAM Query: ``` You can then have the log prob for SPAM, and use thresholds to classify.
Author
Owner

@jaylinwylie commented on GitHub (Jul 15, 2025):

Looking like we are waiting on the pull request to be approved. Unless theres a branch/fork we can experiment with in the meantime?

<!-- gh-comment-id:3072034290 --> @jaylinwylie commented on GitHub (Jul 15, 2025): Looking like we are waiting on the pull request to be approved. Unless theres a branch/fork we can experiment with in the meantime?
Author
Owner

@unacceptable commented on GitHub (Jul 16, 2025):

I am surprised that @jmorganca or @rick-github haven't closed this out yet saying to use a proxy. That's what they did in #1053 and #8573 for auth and token usage.

<!-- gh-comment-id:3079258385 --> @unacceptable commented on GitHub (Jul 16, 2025): I am surprised that @jmorganca or @rick-github haven't closed this out yet saying to use a proxy. That's what they did in #1053 and #8573 for auth and token usage.
Author
Owner

@rick-github commented on GitHub (Jul 16, 2025):

A proxy does not have access to logits. See here to learn more about logits.

<!-- gh-comment-id:3080004708 --> @rick-github commented on GitHub (Jul 16, 2025): A proxy does not have access to logits. See [here](https://huggingface.co/blog/logits-processor-zoo#what-are-logits-in-language-models) to learn more about logits.
Author
Owner

@kjam commented on GitHub (Aug 12, 2025):

Commenting to also say this is useful for implementing privacy and security controls, such as differential privacy predictions and regularization when managing adversarial input. Would be nice to merge https://github.com/ollama/ollama/pull/9282

<!-- gh-comment-id:3177738811 --> @kjam commented on GitHub (Aug 12, 2025): Commenting to also say this is useful for implementing privacy and security controls, such as differential privacy predictions and regularization when managing adversarial input. Would be nice to merge https://github.com/ollama/ollama/pull/9282
Author
Owner

@codelion commented on GitHub (Aug 12, 2025):

A proxy does not have access to logits. See here to learn more about logits.

Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - https://github.com/codelion/optillm/issues/182

<!-- gh-comment-id:3177836078 --> @codelion commented on GitHub (Aug 12, 2025): > A proxy does not have access to logits. See [here](https://huggingface.co/blog/logits-processor-zoo#what-are-logits-in-language-models) to learn more about logits. Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - https://github.com/codelion/optillm/issues/182
Author
Owner

@Tritonio commented on GitHub (Aug 12, 2025):

A proxy does not have access to logits. See here to learn more about logits.

Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - codelion/optillm#182

Is optiLLM able to get log-probs from Ollama though? I may be wrong but from a cursory look at the code it looks like it uses torch internally to get the logits, so if that is the case it's not acting as a proxy in front of ollama when it does so.

<!-- gh-comment-id:3178059316 --> @Tritonio commented on GitHub (Aug 12, 2025): > > A proxy does not have access to logits. See [here](https://huggingface.co/blog/logits-processor-zoo#what-are-logits-in-language-models) to learn more about logits. > > Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - [codelion/optillm#182](https://github.com/codelion/optillm/issues/182) Is optiLLM able to get log-probs from Ollama though? I may be wrong but from a cursory look at the code it looks like it uses torch internally to get the logits, so if that is the case it's not acting as a proxy in front of ollama when it does so.
Author
Owner

@codelion commented on GitHub (Aug 12, 2025):

Is optiLLM able to get log-probs from Ollama though?

No it is not through Ollama, you do not need ollama you can directly do the inference in OptiLLM with the in-built server which provides full logits via the standard OpenAI compatible API.

<!-- gh-comment-id:3178406926 --> @codelion commented on GitHub (Aug 12, 2025): > Is optiLLM able to get log-probs from Ollama though? No it is not through Ollama, you do not need ollama you can directly do the inference in OptiLLM with the in-built server which provides full logits via the standard OpenAI compatible API.
Author
Owner

@rick-github commented on GitHub (Aug 12, 2025):

The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.

<!-- gh-comment-id:3178425186 --> @rick-github commented on GitHub (Aug 12, 2025): The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.
Author
Owner

@codelion commented on GitHub (Aug 12, 2025):

The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.

OptiLLM is just an alternative, this issue has been open for over 18 months, if ollama wanted to implement it, they would have done it by now.

<!-- gh-comment-id:3178454133 --> @codelion commented on GitHub (Aug 12, 2025): > The issue is about getting logits from ollama, not optillm. Please don't distract from the issue. OptiLLM is just an alternative, this issue has been open for over 18 months, if ollama wanted to implement it, they would have done it by now.
Author
Owner

@rick-github commented on GitHub (Aug 12, 2025):

Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.

<!-- gh-comment-id:3178468110 --> @rick-github commented on GitHub (Aug 12, 2025): Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.
Author
Owner

@SharmaM-dev commented on GitHub (Aug 12, 2025):

[like] Mridul Sharma reacted to your message:


From: frob @.>
Sent: Tuesday, August 12, 2025 9:15:34 AM
To: ollama/ollama @.
>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)

[https://avatars.githubusercontent.com/u/14946854?s=20&v=4]rick-github left a comment (ollama/ollama#2415)https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110

Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.


Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-3178468110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUX3YFJQSFJMEUUJLJL3NGWDNAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZYGQ3DQMJRGA.
You are receiving this because you commented.Message ID: @.***>

<!-- gh-comment-id:3178809677 --> @SharmaM-dev commented on GitHub (Aug 12, 2025): [like] Mridul Sharma reacted to your message: ________________________________ From: frob ***@***.***> Sent: Tuesday, August 12, 2025 9:15:34 AM To: ollama/ollama ***@***.***> Cc: Mridul Sharma ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415) [https://avatars.githubusercontent.com/u/14946854?s=20&v=4]rick-github left a comment (ollama/ollama#2415)<https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110> Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama. — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A6MENUX3YFJQSFJMEUUJLJL3NGWDNAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZYGQ3DQMJRGA>. You are receiving this because you commented.Message ID: ***@***.***>
Author
Owner

@VKrishna04 commented on GitHub (Aug 28, 2025):

yes please add it

<!-- gh-comment-id:3234428939 --> @VKrishna04 commented on GitHub (Aug 28, 2025): yes please add it
Author
Owner

@CodeHatchling commented on GitHub (Sep 7, 2025):

I figured I'd throw in a couple of the many possible use cases for a feature like this.

Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model.

Another case is assisted writing, where the top N continuations are offered instead of the usual single continuation, such as with my tokenscape project on github.

Cheers!

<!-- gh-comment-id:3264157739 --> @CodeHatchling commented on GitHub (Sep 7, 2025): I figured I'd throw in a couple of the many possible use cases for a feature like this. Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model. Another case is assisted writing, where the top N continuations are offered instead of the usual single continuation, such as with my tokenscape project on github. Cheers!
Author
Owner

@martinkozle commented on GitHub (Sep 9, 2025):

Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model.

Ollama does have structured outputs which you can use for this use-case.

https://github.com/ollama/ollama/blob/main/docs/openai.md#structured-outputs
https://ollama.com/blog/structured-outputs

In the Pydantic model use an enum with the valid options.

Not exactly what you said and doesn't excuse the lack of logits feature, but it may be useful if you need it for this specific case.

<!-- gh-comment-id:3269498776 --> @martinkozle commented on GitHub (Sep 9, 2025): > Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model. Ollama does have structured outputs which you can use for this use-case. https://github.com/ollama/ollama/blob/main/docs/openai.md#structured-outputs https://ollama.com/blog/structured-outputs In the Pydantic model use an enum with the valid options. Not exactly what you said and doesn't excuse the lack of logits feature, but it may be useful if you need it for this specific case.
Author
Owner

@Abdulrahman392011 commented on GitHub (Sep 21, 2025):

hey guys, so after this long read. I can't really tell when i can expect this to be available.

I also want to say that ollama already provide the logprobs for openai-python compatibility layer. so I don't really understand what's the hold up. logically speaking it should be almost copy and paste with some modifications. can someone please correct me.

<!-- gh-comment-id:3316266205 --> @Abdulrahman392011 commented on GitHub (Sep 21, 2025): hey guys, so after this long read. I can't really tell when i can expect this to be available. I also want to say that ollama already provide the logprobs for openai-python compatibility layer. so I don't really understand what's the hold up. logically speaking it should be almost copy and paste with some modifications. can someone please correct me.
Author
Owner

@rick-github commented on GitHub (Sep 21, 2025):

Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.

<!-- gh-comment-id:3316274708 --> @rick-github commented on GitHub (Sep 21, 2025): Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.
Author
Owner

@Abdulrahman392011 commented on GitHub (Sep 21, 2025):

Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.

yes you're right i mistakenly saw the logprobs in the
https://docs.ollama.com/openai
but it was unchecked. sorry for the confusion.

<!-- gh-comment-id:3316276645 --> @Abdulrahman392011 commented on GitHub (Sep 21, 2025): > Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API. yes you're right i mistakenly saw the logprobs in the https://docs.ollama.com/openai but it was unchecked. sorry for the confusion.
Author
Owner

@Abdulrahman392011 commented on GitHub (Sep 21, 2025):

so what is needed to do this. why is it not implemented yet. I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. so in my mind it's more like a command that just need modification when llama.cpp instance is initiated.

I am here to learn, so correct me please.

<!-- gh-comment-id:3316277995 --> @Abdulrahman392011 commented on GitHub (Sep 21, 2025): so what is needed to do this. why is it not implemented yet. I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. so in my mind it's more like a command that just need modification when llama.cpp instance is initiated. I am here to learn, so correct me please.
Author
Owner

@rick-github commented on GitHub (Sep 21, 2025):

I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs.

This issue was opened before llama.cpp provided OpenAI compatible logprobs. In the meantime, ollama has migrated away from llama.cpp as the primary backend. Work to support logprobs needs to be done on the new ollama engine. The main developers are busy with other tasks.

<!-- gh-comment-id:3316281223 --> @rick-github commented on GitHub (Sep 21, 2025): > I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. This issue was opened before llama.cpp provided OpenAI compatible logprobs. In the meantime, ollama has migrated away from llama.cpp as the primary backend. Work to support logprobs needs to be done on the new ollama engine. The main developers are busy with other tasks.
Author
Owner

@baptistejamin commented on GitHub (Nov 1, 2025):

I just relesed this PR containing logprogs: https://github.com/ollama/ollama/pull/12899

You can try with:

GOTOOLCHAIN=auto go build .
./ollama serve
 curl http://localhost:11434/api/generate -d '{
                                                               "model": "llama3.2",
                                                               "prompt": "The capital of France is",
                                                               "stream": false,
                                                               "logprobs": true,
                                                               "top_logprobs": 1
                                                             }'
<!-- gh-comment-id:3476355744 --> @baptistejamin commented on GitHub (Nov 1, 2025): I just relesed this PR containing logprogs: https://github.com/ollama/ollama/pull/12899 You can try with: ``` GOTOOLCHAIN=auto go build . ``` ``` ./ollama serve ``` ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "The capital of France is", "stream": false, "logprobs": true, "top_logprobs": 1 }' ```
Author
Owner

@tobiaswuerth commented on GitHub (Nov 6, 2025):

+1

<!-- gh-comment-id:3497183936 --> @tobiaswuerth commented on GitHub (Nov 6, 2025): +1
Author
Owner

@rick-github commented on GitHub (Nov 13, 2025):

https://github.com/ollama/ollama/releases/tag/v0.12.11

<!-- gh-comment-id:3530063076 --> @rick-github commented on GitHub (Nov 13, 2025): https://github.com/ollama/ollama/releases/tag/v0.12.11
Author
Owner

@jmorganca commented on GitHub (Nov 13, 2025):

Wanted to say a huge thanks to @baptistejamin for the PR that got this in! And thank you to @BruceMacD who did some original work around this, @jessegross for the reviews and @ParthSareen for some fit and finish on the feature 🎉

Thanks for closing this @rick-github 😊

<!-- gh-comment-id:3530080198 --> @jmorganca commented on GitHub (Nov 13, 2025): Wanted to say a huge thanks to @baptistejamin for the PR that got this in! And thank you to @BruceMacD who did some original work around this, @jessegross for the reviews and @ParthSareen for some fit and finish on the feature 🎉 Thanks for closing this @rick-github 😊
Author
Owner

@neuhaus commented on GitHub (Nov 24, 2025):

I came across this new Ollama API feature while wondering how to implement the function described in the paper
"LLMs can hide text in other text of the same length" (Norelli & Bronstein, 2024/2025)" using the API.

The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N.

I was wondering if this missing functionality could be added to the API.

<!-- gh-comment-id:3570771286 --> @neuhaus commented on GitHub (Nov 24, 2025): I came across this new Ollama API feature while wondering how to implement the function described in the paper "[LLMs can hide text in other text of the same length" (Norelli & Bronstein, 2024/2025)](https://arxiv.org/abs/2510.20075)" using the API. The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N. I was wondering if this missing functionality could be added to the API.
Author
Owner

@codelion commented on GitHub (Nov 24, 2025):

The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N.

You can use OptiLLM for it if you want, it provides the full API - https://github.com/algorithmicsuperintelligence/optillm/issues/182

<!-- gh-comment-id:3570817766 --> @codelion commented on GitHub (Nov 24, 2025): > The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N. You can use OptiLLM for it if you want, it provides the full API - https://github.com/algorithmicsuperintelligence/optillm/issues/182
Author
Owner

@rick-github commented on GitHub (Nov 24, 2025):

To avoid confusion, OptiLLM is an alternative to ollama, it does not provide the full probability distribution for ollama models.

<!-- gh-comment-id:3570870476 --> @rick-github commented on GitHub (Nov 24, 2025): To avoid confusion, OptiLLM is an alternative to ollama, it does not provide the full probability distribution for ollama models.
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Integrate "memlayer" locally

<!-- gh-comment-id:3570959226 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Integrate "memlayer" locally
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

And use a coral edge TPU, they are public now.

<!-- gh-comment-id:3570961030 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): And use a coral edge TPU, they are public now.
Author
Owner

@baptistejamin commented on GitHub (Nov 24, 2025):

Just to better understand. Would you like to build a product/inference system in production from this Paper?

IMO, what you need is something low-level API, such as LLama CPP, which is made for this.

Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & play

<!-- gh-comment-id:3570963415 --> @baptistejamin commented on GitHub (Nov 24, 2025): Just to better understand. Would you like to build a product/inference system in production from this Paper? IMO, what you need is something low-level API, such as LLama CPP, which is made for this. Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & play
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Kobold cpp.

<!-- gh-comment-id:3570988246 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Kobold cpp.
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Build around llama.cpp, BUT EASIER TO USE.

<!-- gh-comment-id:3570989862 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Build around llama.cpp, BUT EASIER TO USE.
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

AND it has a community based horde. For all ages.

<!-- gh-comment-id:3570991889 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): AND it has a community based horde. For all ages.
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Kobold lite

On Mon, Nov 24, 2025, 8:08 AM Baptiste Jamin @.***>
wrote:

baptistejamin left a comment (ollama/ollama#2415)
https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415

Just to better understand. Would you like to build a product/inference
system in production from this Paper?

IMO, what you need is something low-level API, such as LLama CPP, which is
made for this.

Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug &
play


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDHQPUJHP4HNTXUCMBVI4YT36MGPFAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQHE3DGNBRGU
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:3571026616 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Kobold lite On Mon, Nov 24, 2025, 8:08 AM Baptiste Jamin ***@***.***> wrote: > *baptistejamin* left a comment (ollama/ollama#2415) > <https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415> > > Just to better understand. Would you like to build a product/inference > system in production from this Paper? > > IMO, what you need is something low-level API, such as LLama CPP, which is > made for this. > > Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & > play > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2415#issuecomment-3570963415>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BDHQPUJHP4HNTXUCMBVI4YT36MGPFAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQHE3DGNBRGU> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63445