[GH-ISSUE #8786] Prediction aborted due to token repeat limit reached error in granite3.1-dense:8b #5705

New Issue

GiteaMirror · 2026-04-12T17:00:02-05:00

GiteaMirror commented

2026-04-12 17:00:02 -05:00

Originally created by @ALLMI78 on GitHub (Feb 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8786

What is the issue?

I am using the ollama API (v0.5.7) with the granite3.1-dense:8b-instruct-q6_K model. Although the model generally performs well, I occasionally encounter an error where the response returns an JSON output containing multiple <fim_prefix> tokens instead of a valid answer.

response >{"model":"granite3.1-dense:8b-instruct-q6_K","created_at":"2025-02-03T10:02:35.4537802Z","message":{"role":"assistant","content":"\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e"},"done":false}<

In the logs, I see:

time=2025-02-03T11:02:35.503+01:00 level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached"
[GIN] 2025/02/03 - 11:02:35 | 200 |   18.2363314s |       127.0.0.1 | POST     "/api/chat"

I suspect that this error might be related to exceeding the context length (currently set at 32768 tokens) or what does "token repeat" mean? Since (as far is i know) Ollama does not provide a built-in method to count tokens before sending the request, I am unable to trim or control the context length dynamically, which may be causing the system to abort the prediction.

Are there recommended workarounds or configuration adjustments (e.g., trimming the conversation history, parameter tuning) to mitigate this issue?
Would it be possible to implement or expose a token counting mechanism to avoid exceeding the limit?
Is there any additional debug information or logging that might help pinpoint the root cause?

Any guidance or suggestions to resolve this would be greatly appreciated!

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.5.7

EDIT: exceeding the context length is not the problem, got this error with granite3.2 @ 18k conetxt length and a 32k context size

Originally created by @ALLMI78 on GitHub (Feb 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8786 ### What is the issue? I am using the ollama API (v0.5.7) with the granite3.1-dense:8b-instruct-q6_K model. Although the model generally performs well, I occasionally encounter an error where the response returns an JSON output containing multiple <fim_prefix> tokens instead of a valid answer. > response >{"model":"granite3.1-dense:8b-instruct-q6_K","created_at":"2025-02-03T10:02:35.4537802Z","message":{"role":"assistant","content":"\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e\u003cfim_prefix\u003e"},"done":false}< In the logs, I see: ``` time=2025-02-03T11:02:35.503+01:00 level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached" [GIN] 2025/02/03 - 11:02:35 | 200 | 18.2363314s | 127.0.0.1 | POST "/api/chat" ``` I suspect that this error might be related to _exceeding the context length (currently set at 32768 tokens)_ or what does "token repeat" mean? Since (as far is i know) Ollama does not provide a built-in method to count tokens before sending the request, I am unable to trim or control the context length dynamically, which may be causing the system to abort the prediction. - Are there recommended workarounds or configuration adjustments (e.g., trimming the conversation history, parameter tuning) to mitigate this issue? - Would it be possible to implement or expose a token counting mechanism to avoid exceeding the limit? - Is there any additional debug information or logging that might help pinpoint the root cause? Any guidance or suggestions to resolve this would be greatly appreciated! ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.7 EDIT: exceeding the context length is not the problem, got this error with granite3.2 @ 18k conetxt length and a 32k context size

GiteaMirror added the bug label 2026-04-12 17:00:02 -05:00

GiteaMirror closed this issue

2026-04-12 17:00:03 -05:00

GiteaMirror commented

2026-04-12 17:00:03 -05:00

@rick-github commented on GitHub (Feb 3, 2025):

token repeat limit reached is about the output tokens, not the input tokens. ollama has detected that a repeating pattern is being generated which generally indicates the model has lost coherence. This is a condition models get into occasionally, it can be triggered by exceeding the context buffer but it's not the only cause. If it's a buffer size issue you will see lines about shifting in the log.

If you can give some context about the type of query you are sending, there might be some specific advice. Generally, you could try increasing the context buffer, setting num_predict to control the number of output tokens, reduce temperature to control variability, or try adjusting prompting.

@rick-github commented on GitHub (Feb 3, 2025): `token repeat limit reached` is about the output tokens, not the input tokens. ollama has detected that a repeating pattern is being generated which generally indicates the model has lost coherence. This is a condition models get into occasionally, it can be triggered by exceeding the context buffer but it's not the only cause. If it's a buffer size issue you will see lines about `shifting` in the log. If you can give some context about the type of query you are sending, there might be some specific advice. Generally, you could try increasing the context buffer, setting `num_predict` to control the number of output tokens, reduce `temperature` to control variability, or try adjusting prompting.

GiteaMirror commented

2026-04-12 17:00:04 -05:00

@ALLMI78 commented on GitHub (Feb 3, 2025):

Hello Rick,

Thanks for your quick response. I'm currently testing different parameters, but I can't say for sure yet whether any of them make a difference. My temperature was set to 0.4, and I just increased it to 0.6 to see what happens. Do you think lowering it might be a better approach?

What impact could the following parameters have?

options.repeat_last_n = 64;    
options.repeat_penalty = 1.1;

One important note: I only experience this issue with Granite3.1-dense models. When I use Llama 3.1, Tulu, or Qwen, the problem does not occur.

Regarding the "context about the type of query": Two LLMs perform analyses and need to generate a signal by inserting numbers into a template. Unfortunately, I can't provide more details, but I might be close to the 32k context limit when the error occurs. In the beginning, when the conversation history is still short, I believe the issue does not appear.

Looking forward to your thoughts!

EDIT:

no "shifting" in the log
changes of the temperature from 0.4 to 0.6 dit not help
1 truncate but at a different time and in another run

time=2025-02-03T14:46:05.922+01:00 level=DEBUG source=prompt.go:77 msg="truncating input messages which exceed context length" truncated=3

time=2025-02-03T14:56:13.646+01:00 level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached"

[GIN] 2025/02/03 - 14:56:13 | 200 | 18.3095295s | 127.0.0.1 | POST "/api/chat"

@ALLMI78 commented on GitHub (Feb 3, 2025): Hello Rick, Thanks for your quick response. I'm currently testing different parameters, but I can't say for sure yet whether any of them make a difference. My temperature was set to 0.4, and I just increased it to 0.6 to see what happens. Do you think lowering it might be a better approach? What impact could the following parameters have? ```cpp options.repeat_last_n = 64; options.repeat_penalty = 1.1; ``` One important note: I **only** experience this issue with Granite3.1-dense models. When I use Llama 3.1, Tulu, or Qwen, the problem does not occur. Regarding the "context about the type of query": Two LLMs perform analyses and need to generate a signal by inserting numbers into a template. Unfortunately, I can't provide more details, but I might be close to the 32k context limit when the error occurs. In the beginning, when the conversation history is still short, I believe the issue does not appear. Looking forward to your thoughts! EDIT: - no "shifting" in the log - changes of the temperature from 0.4 to 0.6 dit not help - 1 truncate but at a different time and in another run time=2025-02-03T14:46:05.922+01:00 level=DEBUG source=prompt.go:77 msg="truncating input messages which exceed context length" truncated=3 time=2025-02-03T14:56:13.646+01:00 level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached" [GIN] 2025/02/03 - 14:56:13 | 200 | 18.3095295s | 127.0.0.1 | POST "/api/chat"

GiteaMirror commented

2026-04-12 17:00:04 -05:00

@rick-github commented on GitHub (Feb 3, 2025):

Lower temperature would be better.

It's interesting that your output is composed of <fim_prefix> tokens when the template doesn't support FIM. Does your prompt ask the model to do that, or is spontaneous? Has the template been modified? Are you using the chat or generate endpoints? I poked around a bit with granite3.1-dense:8b-instruct-q6_K and was unable to trigger the behaviour you see.

@rick-github commented on GitHub (Feb 3, 2025): Lower `temperature` would be better. It's interesting that your output is composed of `<fim_prefix>` tokens when the template doesn't support FIM. Does your prompt ask the model to do that, or is spontaneous? Has the template been modified? Are you using the chat or generate endpoints? I poked around a bit with granite3.1-dense:8b-instruct-q6_K and was unable to trigger the behaviour you see.

GiteaMirror commented

2026-04-12 17:00:05 -05:00

@ALLMI78 commented on GitHub (Feb 3, 2025):

I’m using the chat endpoint. I can test again with a lower temperature.

But I need to be careful not to use the wrong terms. When I said "template," I meant that my LLMs perform analyses and then have to generate a "SIGNAL."

Example for my template or SIGNAL:
####[SIGNAL_START] OPTIONA=INT; OPTIONB=INT; OPTIONC=INT; VALUEA=%.2f; VALUEB=%.2f; [SIGNAL_END]####

The LLMs are tasked with filling in specific numerical values based on their analysis and outputting them as a SIGNAL. And they do that very well, until the error...

This predefined structure for a signal is what I referred to as a "template." However, I now realize that there are also model templates (related to the models), but I have no experience with those. I wasn’t referring to them and haven’t changed anything there. That was my mistake—sorry for the confusion.

@ALLMI78 commented on GitHub (Feb 3, 2025): I’m using the chat endpoint. I can test again with a lower temperature. But I need to be careful not to use the wrong terms. When I said "template," I meant that my LLMs perform analyses and then have to generate a "SIGNAL." Example for my template or SIGNAL: `####[SIGNAL_START] OPTIONA=INT; OPTIONB=INT; OPTIONC=INT; VALUEA=%.2f; VALUEB=%.2f; [SIGNAL_END]####` The LLMs are tasked with filling in specific numerical values based on their analysis and outputting them as a SIGNAL. And they do that very well, until the error... This predefined structure for a signal is what I referred to as a "template." However, I now realize that there are also model templates (related to the models), but I have no experience with those. I wasn’t referring to them and haven’t changed anything there. That was my mistake—sorry for the confusion.

GiteaMirror commented

2026-04-12 17:00:06 -05:00

@rick-github commented on GitHub (Feb 3, 2025):

So your client is something like this:

#!/usr/bin/env python3

import ollama
import argparse

prompt = """
Analyse the following data and return a SIGNAL in the format specified.  Only the SIGNAL should be returned, no explantory text.  You are to determine the colour, shape and mass of an object.  Use the following values for the attributes:

Shape:
  ROUND = 1
  SQUARE = 2
Colour:
  RED = 1
  GREEN = 2
  BLUE = 3

The SIGNAL to be returned must be in the following format:
####[SIGNAL_START] SHAPE=INT; COLOUR=INT; WEIGHT=%.2f; [SIGNAL_END]####

Here is the data:
{data}
"""

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", default="granite3.1-dense:8b")
parser.add_argument("-t", "--temperature", default=0.4)
parser.add_argument("-c", "--context", default=2048)
parser.add_argument("data", nargs='*')
args = parser.parse_args()

for d in args.data:
  response = ollama.chat(
      model=args.model,
      messages=[{"role":"user","content":prompt.format(data=d)}],
      options={"temperature":args.temperature, "num_ctx":args.context},
  )
  print(response["message"]["content"])

$ ./8786.py 'the red ball weighs 4 kilos' 'the blue box weighs 10 and a half kilos'
####[SIGNAL_START] SHAPE=1; COLOUR=1; WEIGHT=4.00; [SIGNAL_END]####
####[SIGNAL_START] SHAPE=2; COLOUR=3; WEIGHT=10.50; [SIGNAL_END]####

Have you tried using structured outputs? The addition of a schema may make the model adhere more closely to the required output.

#!/usr/bin/env python3

import ollama
import argparse
import json
from pydantic import BaseModel, Field
from decimal import Decimal

prompt = """
Analyse the following data and return a SIGNAL in the format specified.  You are to determine the colour, shape and mass of an object.  Use the following values for the attributes:

Shape:
  ROUND = 1
  SQUARE = 2
Colour:
  RED = 1
  GREEN = 2
  BLUE = 3

Here is the data:
{data}
"""

class Signal(BaseModel):
  SHAPE: int = Field(..., description="Shape of the object")
  COLOUR: int = Field(..., description="Colour of the object")
  WEIGHT: Decimal = Field(..., description="Weight of the object", decimal_places=2)

  def __str__(self):
    return f"####[SIGNAL_START] SHAPE={self.SHAPE}; COLOUR={self.COLOUR}; WEIGHT={self.WEIGHT:.2f}; [SIGNAL_END]####"


parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", default="granite3.1-dense:8b")
parser.add_argument("-t", "--temperature", default=0.4)
parser.add_argument("-c", "--context", default=2048)
parser.add_argument("data", nargs='*')
args = parser.parse_args()

for d in args.data:
  response = ollama.chat(
      model=args.model,
      messages=[{"role":"user","content":prompt.format(data=d)}],
      options={"temperature":args.temperature, "num_ctx":args.context},
      format=Signal.model_json_schema(),
  )
  signal = Signal.model_validate_json(response["message"]["content"])
  print(signal)

$ ./8786-structured.py 'the red ball weighs 4 kilos' 'the blue box weighs 10 and a half kilos'
####[SIGNAL_START] SHAPE=1; COLOUR=1; WEIGHT=4.00; [SIGNAL_END]####
####[SIGNAL_START] SHAPE=2; COLOUR=3; WEIGHT=10.50; [SIGNAL_END]####

@rick-github commented on GitHub (Feb 3, 2025): So your client is something like this: ```python #!/usr/bin/env python3 import ollama import argparse prompt = """ Analyse the following data and return a SIGNAL in the format specified. Only the SIGNAL should be returned, no explantory text. You are to determine the colour, shape and mass of an object. Use the following values for the attributes: Shape: ROUND = 1 SQUARE = 2 Colour: RED = 1 GREEN = 2 BLUE = 3 The SIGNAL to be returned must be in the following format: ####[SIGNAL_START] SHAPE=INT; COLOUR=INT; WEIGHT=%.2f; [SIGNAL_END]#### Here is the data: {data} """ parser = argparse.ArgumentParser() parser.add_argument("-m", "--model", default="granite3.1-dense:8b") parser.add_argument("-t", "--temperature", default=0.4) parser.add_argument("-c", "--context", default=2048) parser.add_argument("data", nargs='*') args = parser.parse_args() for d in args.data: response = ollama.chat( model=args.model, messages=[{"role":"user","content":prompt.format(data=d)}], options={"temperature":args.temperature, "num_ctx":args.context}, ) print(response["message"]["content"]) ``` ```console $ ./8786.py 'the red ball weighs 4 kilos' 'the blue box weighs 10 and a half kilos' ####[SIGNAL_START] SHAPE=1; COLOUR=1; WEIGHT=4.00; [SIGNAL_END]#### ####[SIGNAL_START] SHAPE=2; COLOUR=3; WEIGHT=10.50; [SIGNAL_END]#### ``` Have you tried using structured outputs? The addition of a schema may make the model adhere more closely to the required output. ```python #!/usr/bin/env python3 import ollama import argparse import json from pydantic import BaseModel, Field from decimal import Decimal prompt = """ Analyse the following data and return a SIGNAL in the format specified. You are to determine the colour, shape and mass of an object. Use the following values for the attributes: Shape: ROUND = 1 SQUARE = 2 Colour: RED = 1 GREEN = 2 BLUE = 3 Here is the data: {data} """ class Signal(BaseModel): SHAPE: int = Field(..., description="Shape of the object") COLOUR: int = Field(..., description="Colour of the object") WEIGHT: Decimal = Field(..., description="Weight of the object", decimal_places=2) def __str__(self): return f"####[SIGNAL_START] SHAPE={self.SHAPE}; COLOUR={self.COLOUR}; WEIGHT={self.WEIGHT:.2f}; [SIGNAL_END]####" parser = argparse.ArgumentParser() parser.add_argument("-m", "--model", default="granite3.1-dense:8b") parser.add_argument("-t", "--temperature", default=0.4) parser.add_argument("-c", "--context", default=2048) parser.add_argument("data", nargs='*') args = parser.parse_args() for d in args.data: response = ollama.chat( model=args.model, messages=[{"role":"user","content":prompt.format(data=d)}], options={"temperature":args.temperature, "num_ctx":args.context}, format=Signal.model_json_schema(), ) signal = Signal.model_validate_json(response["message"]["content"]) print(signal) ``` ```console $ ./8786-structured.py 'the red ball weighs 4 kilos' 'the blue box weighs 10 and a half kilos' ####[SIGNAL_START] SHAPE=1; COLOUR=1; WEIGHT=4.00; [SIGNAL_END]#### ####[SIGNAL_START] SHAPE=2; COLOUR=3; WEIGHT=10.50; [SIGNAL_END]#### ```

GiteaMirror commented

2026-04-12 17:00:06 -05:00

@ALLMI78 commented on GitHub (Feb 3, 2025):

Hello Rick,

Thank you so much for taking the time to recreate this issue. If you're trying to trigger the error, it would make sense to set the context length to 32k and send more data in your request to get close to this limit. But yes, my setup (in MQL5) is structured similarly, with the difference that I send the response from LLMA to LLMB and vice versa. I let both models discuss and generate analyses while they control and refine each other's outputs, with additional instructions from me. i keep last 3 messages (U>A>U>next assistant answer) in context window, but some are long...

I've seen "structured outputs" and tools before, but I'm still unsure if I want to rebuild my system around them. My current purely text-based solution, where I manually parse the responses, runs 90% stable and allows me to use all models. I'm hesitant because I don't know if all models fully support tools and structured outputs yet. Additionally, since my client runs in MQL5, I need to be careful with implementation—I can't use universal solutions like in Python.

Unfortunately, I had to remove Granite now, as I couldn't find a solution. Parameter changes didn't help. I've replaced Granite with DeepSeek-Qwen2.5, and the system is running quite well with it. Something about the Granite-Dense models is causing issues in my setup, but I’m not sure if you’ll be able to reproduce it.

@ALLMI78 commented on GitHub (Feb 3, 2025): Hello Rick, Thank you so much for taking the time to recreate this issue. If you're trying to trigger the error, it would make sense to set the context length to 32k and send more data in your request to get close to this limit. But yes, my setup (in MQL5) is structured similarly, with the difference that I send the response from LLMA to LLMB and vice versa. I let both models discuss and generate analyses while they control and refine each other's outputs, with additional instructions from me. i keep last 3 messages (U>A>U>next assistant answer) in context window, but some are long... I've seen "structured outputs" and tools before, but I'm still unsure if I want to rebuild my system around them. My current purely text-based solution, where I manually parse the responses, runs 90% stable and allows me to use all models. I'm hesitant because I don't know if all models fully support tools and structured outputs yet. Additionally, since my client runs in MQL5, I need to be careful with implementation—I can't use universal solutions like in Python. Unfortunately, I had to remove Granite now, as I couldn't find a solution. Parameter changes didn't help. I've replaced Granite with DeepSeek-Qwen2.5, and the system is running quite well with it. Something about the Granite-Dense models is causing issues in my setup, but I’m not sure if you’ll be able to reproduce it.

GiteaMirror commented

2026-04-12 17:00:07 -05:00

@rick-github commented on GitHub (Feb 3, 2025):

No, I wasn't trying to trigger the error, just wanted to get a feel for the use case and offer a workaround. I've already tried and failed to trigger the issue, so it may be specific to the data that you are feeding the model. It's an unfortunate fact that each model has its quirks and sometimes returns unexpected results. In those cases it's sometimes easier to switch to a different model rather than trying to untie to Gordian knot of model weights as you have done.

All models support structured outputs but then, as with tools, how well they adhere to the schema is down to how the model was trained. Some will be better than others. I understand that re-tooling is extra burden for unknown results.

In the absence of a clear trigger, I don't think we can make much headway, and switching models is the best solution.

@rick-github commented on GitHub (Feb 3, 2025): No, I wasn't trying to trigger the error, just wanted to get a feel for the use case and offer a workaround. I've already tried and failed to trigger the issue, so it may be specific to the data that you are feeding the model. It's an unfortunate fact that each model has its quirks and sometimes returns unexpected results. In those cases it's sometimes easier to switch to a different model rather than trying to untie to Gordian knot of model weights as you have done. All models support structured outputs but then, as with tools, how well they adhere to the schema is down to how the model was trained. Some will be better than others. I understand that re-tooling is extra burden for unknown results. In the absence of a clear trigger, I don't think we can make much headway, and switching models is the best solution.

GiteaMirror commented

2026-04-12 17:00:07 -05:00

@ALLMI78 commented on GitHub (Feb 3, 2025):

thanks for your awesome work here ;)

@ALLMI78 commented on GitHub (Feb 3, 2025): thanks for your awesome work here ;)

GiteaMirror commented

2026-04-12 17:00:08 -05:00

@ALLMI78 commented on GitHub (Feb 9, 2025):

Same Problem with the granite-3.2-8b-instruct-preview

tested with hf.co/AaronFeng753/granite-3.2-8b-instruct-preview-Q8_0-GGUF:Q8_0

and the context length is not the problem, i had this error with 18k token @ 32k context window size...

@ALLMI78 commented on GitHub (Feb 9, 2025): Same Problem with the granite-3.2-8b-instruct-preview tested with hf.co/AaronFeng753/granite-3.2-8b-instruct-preview-Q8_0-GGUF:Q8_0 and the context length is not the problem, i had this error with 18k token @ 32k context window size...

GiteaMirror commented

2026-04-12 17:00:08 -05:00

@rick-github commented on GitHub (Feb 9, 2025):

In the absence of more details there's not much that can be done.

@rick-github commented on GitHub (Feb 9, 2025): In the absence of more details there's not much that can be done.

GiteaMirror referenced this issue

2026-04-12 23:41:58 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #11892

GiteaMirror referenced this issue

2026-04-16 05:54:15 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #17163

GiteaMirror referenced this issue

2026-04-19 16:19:19 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #22432

GiteaMirror referenced this issue

2026-04-22 22:26:07 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #37765

GiteaMirror referenced this issue

2026-04-24 22:49:32 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #43140

GiteaMirror referenced this issue

2026-04-29 13:28:57 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #58589

GiteaMirror referenced this issue

2026-05-05 06:09:25 -05:00

[PR #5705] [MERGED] Enable windows error dialog for subprocess #74186

Sign in to join this conversation.

Branches Tags

main

hoyyeva/opencode-image-modality

hoyyeva/anthropic-renderer-local-image-path

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#5705