[GH-ISSUE #14959] Cloud and local models produce no meaningful output on large-context generation tasks via OpenAI-compatible API #71679

Closed
opened 2026-05-05 02:19:44 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Adam-Researchh on GitHub (Mar 19, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14959

Bug

When using Ollama's OpenAI-compatible API (as consumed by agent frameworks), both cloud and local models fail to produce meaningful output on large-context generation tasks. They ingest the input (~22K tokens from a 1,879-line Python file), then return only 145-178 tokens of regurgitated source code instead of the requested analysis.

Repro

Models tested via Ollama's OpenAI-compatible API endpoint:

Model Type Result
glm-5:cloud Cloud 178 tokens, code chunks, no analysis
deepseek-v3.2:cloud Cloud Same — code regurgitation, no findings
nemotron-3-nano:30b Local 145 tokens, zero analysis after 3 min

Task: "Review this Python file for bugs, logic issues, and improvements" with a 1,879-line file attached as context.

All three models read the file successfully but fail to generate any substantive response. Output is fragments of the input source code echoed back.

Key Context — Regression?

  • Qwen 3.5 17B via direct Ollama API previously handled similar large-file generation tasks (thousands of training pairs from large files) without issues
  • The failure pattern is identical across cloud and local models, suggesting it may be in the API/serving layer rather than the models themselves

Environment

  • Ollama: 0.18.2
  • macOS, Apple M1 Ultra, 128GB RAM
  • Consuming via OpenAI-compatible API (not CLI)

Cross-ref: https://github.com/openclaw/openclaw/issues/50526

Originally created by @Adam-Researchh on GitHub (Mar 19, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14959 ## Bug When using Ollama's OpenAI-compatible API (as consumed by agent frameworks), both cloud and local models fail to produce meaningful output on large-context generation tasks. They ingest the input (~22K tokens from a 1,879-line Python file), then return only 145-178 tokens of regurgitated source code instead of the requested analysis. ## Repro Models tested via Ollama's OpenAI-compatible API endpoint: | Model | Type | Result | |-------|------|--------| | `glm-5:cloud` | Cloud | 178 tokens, code chunks, no analysis | | `deepseek-v3.2:cloud` | Cloud | Same — code regurgitation, no findings | | `nemotron-3-nano:30b` | Local | 145 tokens, zero analysis after 3 min | Task: "Review this Python file for bugs, logic issues, and improvements" with a 1,879-line file attached as context. All three models read the file successfully but fail to generate any substantive response. Output is fragments of the input source code echoed back. ## Key Context — Regression? - **Qwen 3.5 17B** via direct Ollama API previously handled similar large-file generation tasks (thousands of training pairs from large files) without issues - The failure pattern is identical across cloud and local models, suggesting it may be in the API/serving layer rather than the models themselves ## Environment - Ollama: **0.18.2** - macOS, Apple M1 Ultra, 128GB RAM - Consuming via OpenAI-compatible API (not CLI) Cross-ref: https://github.com/openclaw/openclaw/issues/50526
Author
Owner

@rick-github commented on GitHub (Mar 19, 2026):

Seems to work fine through the API.

$ file=/usr/lib/python3.12/subprocess.py
$ wc $file
 2216  8092 88725 /usr/lib/python3.12/subprocess.py
$ echo '{                               
  "model": "nemotron-3-nano:30b",
  "messages":[
    {
      "role":"user",
      "content":'"$((echo Review this Python file for bugs, logic issues, and improvements ; echo ; cat $file) | jq -sR)"'
    }
  ]
}' | curl -s http://localhost:11434/v1/chat/completions -d @- | jq -r '"\(.choices[0].message.content)\n\(.usage)"'

Below is a **compact but thorough code‑review** of the whole file.  
I have grouped the findings into three categories:

| Category | What to look for | Why it matters | How to fix it (short description) |
|----------|------------------|----------------|-----------------------------------|
| **Logic / functional bugs** | Situations where the code can raise an exception, produce a wrong result, or leak resources. | These are the things that can break a user’s program or cause hard‑to‑debug crashes. | Detailed fixes (often a one‑liner) are given after each bullet. |
| **Performance / design improvements** | Unnecessary work, inefficient data‑structures, or APIs that can be simplified. | Makes the module faster, smaller and easier to maintain. | Suggested refactorings. |
| **Style / safety / Python‑3.12+ readiness** | Use of deprecated constructs, missing type‑hints, missing `__all__` entries, etc. | Future‑proofing and better IDE support. | Minor changes. |

## 1️⃣  Functional / Logic Bugs

### 1.1  `getstatusoutput` / `getoutput` compare a **bytes** value with a **str** newline  
...
### 1.2  `Popen._translate_newlines` receives **bytes** but is written for **str**
...
## 2️⃣  Design / Performance Improvements

| Issue | Why it hurts | Suggested improvement |
|-------|--------------|------------------------|
...
## 3️⃣  Style / Future‑Proofing Recommendations

| Item | Comment |
|------|---------|
...
## 4️⃣  Minimal Patch that Eliminates the Critical Bugs

Below is a **self‑contained diff** that you can drop into the file (or apply as a patch).  
It fixes the three most severe runtime bugs without changing the public API.
## 5️⃣  Quick “What to do next” checklist

1. **Run the test suite**  

   ```bash
   python -X -m test.test_subprocess
...
### TL;DR

*The file works, but it contains a handful of subtle bugs that can raise
`TypeError`/`AttributeError` or dead‑lock when children write large amounts of
...
{"prompt_tokens":19159,"completion_tokens":11509,"total_tokens":30668}
$ echo '{                               
  "model": "deepseek-v3.2:cloud",
  "messages":[
    {
      "role":"user",
      "content":'"$((echo Review this Python file for bugs, logic issues, and improvements ; echo ; cat $file) | jq -sR)"'
    }
  ]
}' | curl -s http://localhost:11434/v1/chat/completions -d @- | jq -r '"\(.choices[0].message.content)\n\(.usage)"'

This appears to be the Python standard library's `subprocess` module implementation. Here's a comprehensive review:

## Bugs & Issues Found:

### 1. **Potential Race Condition in Signal Handling (POSIX)**
...
The code is generally robust and handles many edge cases, but could benefit from modernization and cleanup of some legacy patterns.
{"prompt_tokens":19584,"completion_tokens":7396,"total_tokens":26980}
$ echo '{                               
  "model": "glm-5:cloud",
  "messages":[
    {
      "role":"user",
      "content":'"$((echo Review this Python file for bugs, logic issues, and improvements ; echo ; cat $file) | jq -sR)"'
    }
  ]
}' | curl -s http://localhost:11434/v1/chat/completions -d @- | jq -r '"\(.choices[0].message.content)\n\(.usage)"'

Here is a review of the provided Python `subprocess` module implementation, focusing on bugs, logic issues, and potential improvements.

### Critical Bugs

**1. Crash on POSIX if `sys.__stdout__` is None**
...
{"prompt_tokens":18149,"completion_tokens":11384,"total_tokens":29533}
<!-- gh-comment-id:4092334937 --> @rick-github commented on GitHub (Mar 19, 2026): Seems to work fine through the API. ```console $ file=/usr/lib/python3.12/subprocess.py $ wc $file 2216 8092 88725 /usr/lib/python3.12/subprocess.py $ echo '{ "model": "nemotron-3-nano:30b", "messages":[ { "role":"user", "content":'"$((echo Review this Python file for bugs, logic issues, and improvements ; echo ; cat $file) | jq -sR)"' } ] }' | curl -s http://localhost:11434/v1/chat/completions -d @- | jq -r '"\(.choices[0].message.content)\n\(.usage)"' Below is a **compact but thorough code‑review** of the whole file. I have grouped the findings into three categories: | Category | What to look for | Why it matters | How to fix it (short description) | |----------|------------------|----------------|-----------------------------------| | **Logic / functional bugs** | Situations where the code can raise an exception, produce a wrong result, or leak resources. | These are the things that can break a user’s program or cause hard‑to‑debug crashes. | Detailed fixes (often a one‑liner) are given after each bullet. | | **Performance / design improvements** | Unnecessary work, inefficient data‑structures, or APIs that can be simplified. | Makes the module faster, smaller and easier to maintain. | Suggested refactorings. | | **Style / safety / Python‑3.12+ readiness** | Use of deprecated constructs, missing type‑hints, missing `__all__` entries, etc. | Future‑proofing and better IDE support. | Minor changes. | ## 1️⃣ Functional / Logic Bugs ### 1.1 `getstatusoutput` / `getoutput` compare a **bytes** value with a **str** newline ... ### 1.2 `Popen._translate_newlines` receives **bytes** but is written for **str** ... ## 2️⃣ Design / Performance Improvements | Issue | Why it hurts | Suggested improvement | |-------|--------------|------------------------| ... ## 3️⃣ Style / Future‑Proofing Recommendations | Item | Comment | |------|---------| ... ## 4️⃣ Minimal Patch that Eliminates the Critical Bugs Below is a **self‑contained diff** that you can drop into the file (or apply as a patch). It fixes the three most severe runtime bugs without changing the public API. ## 5️⃣ Quick “What to do next” checklist 1. **Run the test suite** ```bash python -X -m test.test_subprocess ... ### TL;DR *The file works, but it contains a handful of subtle bugs that can raise `TypeError`/`AttributeError` or dead‑lock when children write large amounts of ... {"prompt_tokens":19159,"completion_tokens":11509,"total_tokens":30668} ``` ```console $ echo '{ "model": "deepseek-v3.2:cloud", "messages":[ { "role":"user", "content":'"$((echo Review this Python file for bugs, logic issues, and improvements ; echo ; cat $file) | jq -sR)"' } ] }' | curl -s http://localhost:11434/v1/chat/completions -d @- | jq -r '"\(.choices[0].message.content)\n\(.usage)"' This appears to be the Python standard library's `subprocess` module implementation. Here's a comprehensive review: ## Bugs & Issues Found: ### 1. **Potential Race Condition in Signal Handling (POSIX)** ... The code is generally robust and handles many edge cases, but could benefit from modernization and cleanup of some legacy patterns. {"prompt_tokens":19584,"completion_tokens":7396,"total_tokens":26980} ``` ```console $ echo '{ "model": "glm-5:cloud", "messages":[ { "role":"user", "content":'"$((echo Review this Python file for bugs, logic issues, and improvements ; echo ; cat $file) | jq -sR)"' } ] }' | curl -s http://localhost:11434/v1/chat/completions -d @- | jq -r '"\(.choices[0].message.content)\n\(.usage)"' Here is a review of the provided Python `subprocess` module implementation, focusing on bugs, logic issues, and potential improvements. ### Critical Bugs **1. Crash on POSIX if `sys.__stdout__` is None** ... {"prompt_tokens":18149,"completion_tokens":11384,"total_tokens":29533} ```
Author
Owner

@ryanmon1 commented on GitHub (Mar 19, 2026):

same issue here but not with code

<!-- gh-comment-id:4093221825 --> @ryanmon1 commented on GitHub (Mar 19, 2026): same issue here but not with code
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71679