[GH-ISSUE #13949] Ollama API Compatibility Issue with Claude Code / Anthropic CLI #71187

New Issue

GiteaMirror · 2026-05-05T00:37:51-05:00

GiteaMirror commented

2026-05-05 00:37:51 -05:00

Originally created by @choki-lin on GitHub (Jan 28, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13949

What is the issue?

Environment

Ollama Version: 0.15.2
Deployment: Docker container behind Traefik reverse proxy
Client: Claude Code CLI (Anthropic's official CLI tool)
Models Tested: llama3.2:1b, qwen3-coder:30b
API Endpoint: /v1/messages (Anthropic-compatible endpoint)

Issue Summary
When using Claude Code CLI to connect to Ollama's Anthropic-compatible API (/v1/messages), the server becomes unresponsive after receiving requests to unsupported endpoints, specifically /v1/messages/count_tokens?beta=true. This eventually causes Ollama to timeout and restart.
Expected Behavior
Ollama should gracefully handle requests to unsupported endpoints by returning 404 errors without affecting subsequent requests to supported endpoints.
Actual Behavior

Claude Code sends initial requests to /v1/messages/count_tokens?beta=true → 404 (expected)
Claude Code sends multiple requests to /v1/messages?beta=true → Some return 404, others succeed
Subsequent requests start returning 500 errors with increasing timeouts (10s, 20s, 40s, 80s+)
Eventually: "aborting completion request due to client closing the connection"
Ollama becomes completely unresponsive - even curl requests stop working
Manual restart required to restore functionality
After restart, the cycle repeats

Steps to Reproduce

Setup Ollama with Anthropic API endpoint
bash# Run Ollama 0.15.2 in Docker
docker run -d
-v ollama:/root/.ollama
-p 11434:11434
--name ollama
ollama/ollama:0.15.2

Pull a model

docker exec ollama ollama pull llama3.2:1b
2. Configure Traefik (or any reverse proxy) with SSE support
yaml# Traefik labels for Ollama service
labels:

"traefik.http.routers.ollama.tls.options=no-http2@file" # Force HTTP/1.1 for SSE
"traefik.http.services.ollama.loadbalancer.responseforwarding.flushinterval=1ms" # Enable streaming

Test with Claude Code CLI
bashexport ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=https://your-ollama-server.com
export ANTHROPIC_API_KEY=""

claude --model llama3.2:1b
Type a message and wait. Claude Code will hang indefinitely.
4. Observe logs
bashdocker logs ollama -f
Logs Showing the Issue
[GIN] 2026/01/28 - 03:19:47 | 404 | 12.018µs | POST "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/28 - 03:19:47 | 404 | 9.552628ms | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 | 875.863µs | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 | 1.162474ms | POST "/v1/messages?beta=true"
time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b

[Model loads successfully...]

[GIN] 2026/01/28 - 03:21:11 | 500 | 1m22s | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true"
time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection"

[Ollama becomes unresponsive - manual restart required]
[After manual restart: docker restart ollama]

time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config"
time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)"
What Works
Direct curl requests work perfectly, even with streaming:
bash# Non-streaming - works
curl "https://your-ollama-server.com/v1/messages"
-H "Content-Type: application/json"
-d '{
"model": "llama3.2:1b",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'

Streaming - works

curl "https://your-ollama-server.com/v1/messages?beta=true"
-H "Content-Type: application/json"
-d '{
"model": "llama3.2:1b",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Count to 3"}],
"stream": true
}'
--no-buffer
Both return proper responses with correct SSE streaming format.
Analysis
The issue appears to be triggered by:

Unsupported endpoint: /v1/messages/count_tokens?beta=true - Claude Code uses this for token counting
Multiple rapid requests: Claude Code sends several requests in quick succession (some to unsupported endpoints)
State corruption: After receiving 404s to unsupported endpoints, Ollama's request handling degrades, causing legitimate requests to timeout

Additional Context

After each Ollama restart, the first curl request works immediately
Claude Code triggers the issue consistently every time it connects
The issue is reproducible with both llama3.2:1b and qwen3-coder:30b
Traefik proxy is properly configured for SSE (proven by curl working with stream: true)

Relevant log output

4. Observe logs
bashdocker logs ollama -f
Logs Showing the Issue
[GIN] 2026/01/28 - 03:19:47 | 404 |      12.018µs | POST "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/28 - 03:19:47 | 404 |    9.552628ms | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 |     875.863µs | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 |    1.162474ms | POST "/v1/messages?beta=true"
time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b

[Model loads successfully...]

[GIN] 2026/01/28 - 03:21:11 | 500 |         1m22s | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true"
time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection"

[Ollama becomes unresponsive - manual restart required]
[After manual restart: docker restart ollama]

time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config"
time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)"

OS

Linux

GPU

No response

CPU

No response

Ollama version

0.15.2

Originally created by @choki-lin on GitHub (Jan 28, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13949 ### What is the issue? Environment Ollama Version: 0.15.2 Deployment: Docker container behind Traefik reverse proxy Client: Claude Code CLI (Anthropic's official CLI tool) Models Tested: llama3.2:1b, qwen3-coder:30b API Endpoint: /v1/messages (Anthropic-compatible endpoint) Issue Summary When using Claude Code CLI to connect to Ollama's Anthropic-compatible API (/v1/messages), the server becomes unresponsive after receiving requests to unsupported endpoints, specifically /v1/messages/count_tokens?beta=true. This eventually causes Ollama to timeout and restart. Expected Behavior Ollama should gracefully handle requests to unsupported endpoints by returning 404 errors without affecting subsequent requests to supported endpoints. Actual Behavior Claude Code sends initial requests to /v1/messages/count_tokens?beta=true → 404 (expected) Claude Code sends multiple requests to /v1/messages?beta=true → Some return 404, others succeed Subsequent requests start returning 500 errors with increasing timeouts (10s, 20s, 40s, 80s+) Eventually: "aborting completion request due to client closing the connection" Ollama becomes completely unresponsive - even curl requests stop working Manual restart required to restore functionality After restart, the cycle repeats Steps to Reproduce 1. Setup Ollama with Anthropic API endpoint bash# Run Ollama 0.15.2 in Docker docker run -d \ -v ollama:/root/.ollama \ -p 11434:11434 \ --name ollama \ ollama/ollama:0.15.2 # Pull a model docker exec ollama ollama pull llama3.2:1b 2. Configure Traefik (or any reverse proxy) with SSE support yaml# Traefik labels for Ollama service labels: - "traefik.http.routers.ollama.tls.options=no-http2@file" # Force HTTP/1.1 for SSE - "traefik.http.services.ollama.loadbalancer.responseforwarding.flushinterval=1ms" # Enable streaming 3. Test with Claude Code CLI bashexport ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=https://your-ollama-server.com export ANTHROPIC_API_KEY="" claude --model llama3.2:1b Type a message and wait. Claude Code will hang indefinitely. 4. Observe logs bashdocker logs ollama -f Logs Showing the Issue [GIN] 2026/01/28 - 03:19:47 | 404 | 12.018µs | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/28 - 03:19:47 | 404 | 9.552628ms | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 875.863µs | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 1.162474ms | POST "/v1/messages?beta=true" time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b [Model loads successfully...] [GIN] 2026/01/28 - 03:21:11 | 500 | 1m22s | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages" [GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages" [GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true" time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection" [Ollama becomes unresponsive - manual restart required] [After manual restart: docker restart ollama] time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config" time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)" What Works Direct curl requests work perfectly, even with streaming: bash# Non-streaming - works curl "https://your-ollama-server.com/v1/messages" \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2:1b", "max_tokens": 100, "messages": [{"role": "user", "content": "Hello"}], "stream": false }' # Streaming - works curl "https://your-ollama-server.com/v1/messages?beta=true" \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2:1b", "max_tokens": 100, "messages": [{"role": "user", "content": "Count to 3"}], "stream": true }' \ --no-buffer Both return proper responses with correct SSE streaming format. Analysis The issue appears to be triggered by: Unsupported endpoint: /v1/messages/count_tokens?beta=true - Claude Code uses this for token counting Multiple rapid requests: Claude Code sends several requests in quick succession (some to unsupported endpoints) State corruption: After receiving 404s to unsupported endpoints, Ollama's request handling degrades, causing legitimate requests to timeout Additional Context After each Ollama restart, the first curl request works immediately Claude Code triggers the issue consistently every time it connects The issue is reproducible with both llama3.2:1b and qwen3-coder:30b Traefik proxy is properly configured for SSE (proven by curl working with stream: true) ### Relevant log output ```shell 4. Observe logs bashdocker logs ollama -f Logs Showing the Issue [GIN] 2026/01/28 - 03:19:47 | 404 | 12.018µs | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/28 - 03:19:47 | 404 | 9.552628ms | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 875.863µs | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 1.162474ms | POST "/v1/messages?beta=true" time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b [Model loads successfully...] [GIN] 2026/01/28 - 03:21:11 | 500 | 1m22s | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages" [GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages" [GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true" time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection" [Ollama becomes unresponsive - manual restart required] [After manual restart: docker restart ollama] time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config" time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)" ``` ### OS Linux ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.15.2

GiteaMirror added the bug label 2026-05-05 00:37:51 -05:00

GiteaMirror commented

2026-05-05 00:37:53 -05:00

@nishtahir commented on GitHub (Jan 28, 2026):

I'm seeing similar issues on 0.15.2 when I launch using ollama serve on macos

OLLAMA_CONTEXT_LENGTH=64000 ollama serve

time=2026-01-28T01:40:39.337-05:00 level=INFO source=routes.go:1631 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:64000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/.../.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2026-01-28T01:40:39.339-05:00 level=INFO source=images.go:473 msg="total blobs: 11"
time=2026-01-28T01:40:39.340-05:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-01-28T01:40:39.340-05:00 level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.2)"
time=2026-01-28T01:40:39.341-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-01-28T01:40:39.345-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.2/bin/ollama runner --ollama-engine --port 56868"
time=2026-01-28T01:40:39.476-05:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M1 Max" libdirs="" driver=0.0 pci_id="" type=discrete total="21.3 GiB" available="21.3 GiB"
[GIN] 2026/01/28 - 01:40:46 | 404 |      16.584µs |       127.0.0.1 | POST     "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/28 - 01:40:46 | 404 |    3.886042ms |       127.0.0.1 | POST     "/v1/messages?beta=true"

@nishtahir commented on GitHub (Jan 28, 2026): I'm seeing similar issues on `0.15.2` when I launch using `ollama serve` on macos ``` OLLAMA_CONTEXT_LENGTH=64000 ollama serve time=2026-01-28T01:40:39.337-05:00 level=INFO source=routes.go:1631 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:64000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/.../.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2026-01-28T01:40:39.339-05:00 level=INFO source=images.go:473 msg="total blobs: 11" time=2026-01-28T01:40:39.340-05:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-01-28T01:40:39.340-05:00 level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.2)" time=2026-01-28T01:40:39.341-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-01-28T01:40:39.345-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.2/bin/ollama runner --ollama-engine --port 56868" time=2026-01-28T01:40:39.476-05:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M1 Max" libdirs="" driver=0.0 pci_id="" type=discrete total="21.3 GiB" available="21.3 GiB" [GIN] 2026/01/28 - 01:40:46 | 404 | 16.584µs | 127.0.0.1 | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/28 - 01:40:46 | 404 | 3.886042ms | 127.0.0.1 | POST "/v1/messages?beta=true" ```

GiteaMirror commented

2026-05-05 00:37:54 -05:00

@rick-github commented on GitHub (Jan 28, 2026):

Ollama doesn't support Claude telemetry, set the following environment variables to prevent the 404s:

DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

@rick-github commented on GitHub (Jan 28, 2026): Ollama doesn't support Claude telemetry, set the following environment variables to prevent the 404s: ``` DISABLE_TELEMETRY=1 DISABLE_ERROR_REPORTING=1 CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 ```

GiteaMirror commented

2026-05-05 00:37:58 -05:00

@choki-lin commented on GitHub (Jan 28, 2026):

Good morning guys, thanks for the reply.

I did more tests with no luck.

Environments:

Inside the docker running ollama on my nas:

env

DISABLE_TELEMETRY=1
HOSTNAME=f9aa23af0edd
OLLAMA_CONTEXT_LENGTH=64000
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
HOME=/root
OLLAMA_HOST=0.0.0.0
NVIDIA_DRIVER_CAPABILITIES=compute,utility
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DISABLE_ERROR_REPORTING=1
PWD=/
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
NVIDIA_VISIBLE_DEVICES=all

#on the machine launching claude

export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=https://my_domain.com.ar
export ANTHROPIC_API_KEY=""

ollama launch claude --model qwen3-coder:30b

Nothing happends inside claude. I said "hi" and never answered my message.

also tried:

export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://192.168.68.150
export ANTHROPIC_API_KEY=""

ollama launch claude --model qwen3-coder:30b

Also tried:
claude --model llama3.2:1b
claude --model qwen3-coder:30b

I double check if it's working with this curl:

curl "https://my_domain.com.ar/v1/messages?beta=true"
-H "Content-Type: application/json"
-d '{
"model": "qwen3-coder:30b",
"max_tokens": 50,
"messages": [{"role": "user", "content": "di hola"}],
"stream": true
}' \
--no-buffer
event: message_start
data: {"type":"message_start","message":{"id":"msg_2d9032d8c9e156915b347e64","type":"message","role":"assistant","model":"qwen3-coder:30b","content":[],"usage":{"input_tokens":0,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"¡"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hola"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ¿"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"En"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" qué"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" puedo"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ayud"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"arte"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" hoy"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"?"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Thanks for your help!

@choki-lin commented on GitHub (Jan 28, 2026): Good morning guys, thanks for the reply. I did more tests with no luck. Environments: Inside the docker running ollama on my nas: # env DISABLE_TELEMETRY=1 HOSTNAME=f9aa23af0edd OLLAMA_CONTEXT_LENGTH=64000 LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 HOME=/root OLLAMA_HOST=0.0.0.0 NVIDIA_DRIVER_CAPABILITIES=compute,utility TERM=xterm PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin DISABLE_ERROR_REPORTING=1 PWD=/ CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 NVIDIA_VISIBLE_DEVICES=all #on the machine launching claude export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=https://my_domain.com.ar export ANTHROPIC_API_KEY="" ollama launch claude --model qwen3-coder:30b Nothing happends inside claude. I said "hi" and never answered my message. also tried: export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=http://192.168.68.150 export ANTHROPIC_API_KEY="" ollama launch claude --model qwen3-coder:30b Also tried: claude --model llama3.2:1b claude --model qwen3-coder:30b I double check if it's working with this curl: curl "https://my_domain.com.ar/v1/messages?beta=true" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-coder:30b", "max_tokens": 50, "messages": [{"role": "user", "content": "di hola"}], "stream": true }' \ --no-buffer event: message_start data: {"type":"message_start","message":{"id":"msg_2d9032d8c9e156915b347e64","type":"message","role":"assistant","model":"qwen3-coder:30b","content":[],"usage":{"input_tokens":0,"output_tokens":0}}} event: content_block_start data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"¡"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hola"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ¿"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"En"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" qué"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" puedo"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ayud"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"arte"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" hoy"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"?"}} event: content_block_stop data: {"type":"content_block_stop","index":0} event: message_delta data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}} event: message_stop data: {"type":"message_stop"} Thanks for your help!

GiteaMirror commented

2026-05-05 00:37:59 -05:00

@wyattearp commented on GitHub (Jan 28, 2026):

Reporting in from linux land, same issue on Ubuntu 24.02. The 0.15.2 version reports a missing 404 for the messages API:

[GIN] 2026/01/28 - 23:40:13 | 404 |       3.232µs | 192.168.100.252 | POST     "/v1/messages?beta=true"
[GIN] 2026/01/28 - 23:40:13 | 404 |       3.968µs | 192.168.100.252 | POST     "/v1/messages?beta=true"
[GIN] 2026/01/28 - 23:40:13 | 404 |       3.616µs | 192.168.100.252 | POST     "/v1/messages?beta=true"
wyatt@spark-6f4b:~$ curl -X POST http://localhost:11434/v1/messages \ \
-H "Content-Type: application/json" \
-H "x-api-key: ollama" \
-H "anthropic-version: 2023-06-01" \
-d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "messages": [{ "role": "user", "content": "Hello, how are you?" }]
}'
404 page not found

@wyattearp commented on GitHub (Jan 28, 2026): Reporting in from linux land, same issue on Ubuntu 24.02. The 0.15.2 version reports a missing 404 for the messages API: ``` [GIN] 2026/01/28 - 23:40:13 | 404 | 3.232µs | 192.168.100.252 | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 23:40:13 | 404 | 3.968µs | 192.168.100.252 | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 23:40:13 | 404 | 3.616µs | 192.168.100.252 | POST "/v1/messages?beta=true" wyatt@spark-6f4b:~$ curl -X POST http://localhost:11434/v1/messages \ \ -H "Content-Type: application/json" \ -H "x-api-key: ollama" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "qwen3-coder", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }' 404 page not found ```

GiteaMirror commented

2026-05-05 00:38:00 -05:00

@rick-github commented on GitHub (Jan 29, 2026):

@wyattearp What's the output of ollama -v?

$ ollama -v
ollama version is 0.15.2
$ curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "messages": [{ "role": "user", "content": "Hello, how are you?" }]
}'
{"id":"msg_97a90bbca03afc72a993992b","type":"message","role":"assistant","model":"qwen3-coder","content":[{"type":"text","text":"Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?"}],"stop_reason":"end_turn","usage":{"input_tokens":14,"output_tokens":37}}

@rick-github commented on GitHub (Jan 29, 2026): @wyattearp What's the output of `ollama -v`? ```console $ ollama -v ollama version is 0.15.2 $ curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{ "model": "qwen3-coder", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }' {"id":"msg_97a90bbca03afc72a993992b","type":"message","role":"assistant","model":"qwen3-coder","content":[{"type":"text","text":"Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?"}],"stop_reason":"end_turn","usage":{"input_tokens":14,"output_tokens":37}} ```

GiteaMirror commented

2026-05-05 00:38:01 -05:00

@wyattearp commented on GitHub (Jan 30, 2026):

Oh I am so annoyed - scratch the Linux version of this statement - it's the freaking docker container somehow not being updated for 11 months. If I manually drop into the container and re-install because of the new zstd dep:

wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash
root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
ERROR: This version requires zstd for extraction. Please install zstd and try again:
  - Debian/Ubuntu: sudo apt-get install zstd
  - RHEL/CentOS/Fedora: sudo dnf install zstd
  - Arch: sudo pacman -S zstd
root@e55a7fd7beb5:/app/backend# apt install zstd
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package zstd
root@e55a7fd7beb5:/app/backend# apt update && apt install zstd
Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:4 http://deb.debian.org/debian bookworm/main arm64 Packages [8691 kB]
Get:5 http://deb.debian.org/debian bookworm-updates/main arm64 Packages [6936 B]
Get:6 http://deb.debian.org/debian-security bookworm-security/main arm64 Packages [289 kB]
Fetched 9241 kB in 1s (8628 kB/s)                        
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
20 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  zstd
0 upgraded, 1 newly installed, 0 to remove and 20 not upgraded.
Need to get 584 kB of archives.
After this operation, 1956 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bookworm/main arm64 zstd arm64 1.5.4+dfsg2-5 [584 kB]
Fetched 584 kB in 0s (3299 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package zstd.
(Reading database ... 17995 files and directories currently installed.)
Preparing to unpack .../zstd_1.5.4+dfsg2-5_arm64.deb ...
Unpacking zstd (1.5.4+dfsg2-5) ...
Setting up zstd (1.5.4+dfsg2-5) ...
root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-arm64.tar.zst
######################################################################## 100.0%
WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
root@e55a7fd7beb5:/app/backend# kill 1
root@e55a7fd7beb5:/app/backend# 

.......

wyatt@spark-6f4b:~$ docker restart open-webui
wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash
root@e55a7fd7beb5:/app/backend# ollama -v
ollama version is 0.15.2
root@e55a7fd7beb5:/app/backend# curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{ "model": "gpt-oss:20b", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }'
{"id":"msg_9948aec9e4a2d46a6e562f44","type":"message","role":"assistant","model":"gpt-oss:20b","content":[{"type":"thinking","thinking":"User says \"Hello, how are you?\" They expect a friendly reply. We'll answer politely."},{"type":"text","text":"Hello! I'm doing great—thanks for asking. How about you? Is there anything you'd like to chat about or any question I can help with?"}],"stop_reason":"end_turn","usage":{"input_tokens":73,"output_tokens":59}}
root@e55a7fd7beb5:/app/backend#

Go back to hunting the mac version :-|

@wyattearp commented on GitHub (Jan 30, 2026): Oh I am so annoyed - scratch the Linux version of this statement - it's the freaking docker container somehow not being updated for 11 months. If I manually drop into the container and re-install because of the new `zstd` dep: ``` wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local ERROR: This version requires zstd for extraction. Please install zstd and try again: - Debian/Ubuntu: sudo apt-get install zstd - RHEL/CentOS/Fedora: sudo dnf install zstd - Arch: sudo pacman -S zstd root@e55a7fd7beb5:/app/backend# apt install zstd Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package zstd root@e55a7fd7beb5:/app/backend# apt update && apt install zstd Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB] Get:2 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB] Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB] Get:4 http://deb.debian.org/debian bookworm/main arm64 Packages [8691 kB] Get:5 http://deb.debian.org/debian bookworm-updates/main arm64 Packages [6936 B] Get:6 http://deb.debian.org/debian-security bookworm-security/main arm64 Packages [289 kB] Fetched 9241 kB in 1s (8628 kB/s) Reading package lists... Done Building dependency tree... Done Reading state information... Done 20 packages can be upgraded. Run 'apt list --upgradable' to see them. Reading package lists... Done Building dependency tree... Done Reading state information... Done The following NEW packages will be installed: zstd 0 upgraded, 1 newly installed, 0 to remove and 20 not upgraded. Need to get 584 kB of archives. After this operation, 1956 kB of additional disk space will be used. Get:1 http://deb.debian.org/debian bookworm/main arm64 zstd arm64 1.5.4+dfsg2-5 [584 kB] Fetched 584 kB in 0s (3299 kB/s) debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package zstd. (Reading database ... 17995 files and directories currently installed.) Preparing to unpack .../zstd_1.5.4+dfsg2-5_arm64.deb ... Unpacking zstd (1.5.4+dfsg2-5) ... Setting up zstd (1.5.4+dfsg2-5) ... root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading ollama-linux-arm64.tar.zst ######################################################################## 100.0% WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. root@e55a7fd7beb5:/app/backend# kill 1 root@e55a7fd7beb5:/app/backend# ....... wyatt@spark-6f4b:~$ docker restart open-webui wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash root@e55a7fd7beb5:/app/backend# ollama -v ollama version is 0.15.2 root@e55a7fd7beb5:/app/backend# curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{ "model": "gpt-oss:20b", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }' {"id":"msg_9948aec9e4a2d46a6e562f44","type":"message","role":"assistant","model":"gpt-oss:20b","content":[{"type":"thinking","thinking":"User says \"Hello, how are you?\" They expect a friendly reply. We'll answer politely."},{"type":"text","text":"Hello! I'm doing great—thanks for asking. How about you? Is there anything you'd like to chat about or any question I can help with?"}],"stop_reason":"end_turn","usage":{"input_tokens":73,"output_tokens":59}} root@e55a7fd7beb5:/app/backend# ``` Go back to hunting the mac version :-|

GiteaMirror commented

2026-05-05 00:38:01 -05:00

@fraction01 commented on GitHub (Feb 14, 2026):

same problem using claude code CLI on linux:

Feb 14 15:35:04 mypc ollama[1490]: time=2026-02-14T15:35:04.372+01:00 level=INFO source=server.go:1385 msg="llama runner started in 4.21 seconds"
Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 |      23.197µs |       127.0.0.1 | HEAD     "/"
Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 |       49.22µs |       127.0.0.1 | GET      "/api/ps"
Feb 14 15:50:58 mypc ollama[1490]: [GIN] 2026/02/14 - 15:50:58 | 200 |        15m58s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 15:55:39 mypc ollama[1490]: [GIN] 2026/02/14 - 15:55:39 | 500 |         4m41s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:00:28 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:28 | 200 |      65.402µs |       127.0.0.1 | GET      "/api/version"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |     619.886µs |       127.0.0.1 | GET      "/api/tags"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |  259.829278ms |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |  192.574489ms |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |     559.553µs |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |  189.006135ms |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:39 | 500 |         4m59s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:05:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:05:39 | 500 |         4m58s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:10:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:10:39 | 500 |         4m57s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:15:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:15:39 | 500 |         4m55s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:20:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:20:39 | 500 |         4m51s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:25:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:25:39 | 500 |         4m42s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:30:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:30:39 | 500 |         4m22s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:35:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:35:39 | 500 |         4m26s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:40:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:40:39 | 500 |         4m27s |       127.0.0.1 | POST     "/v1/messages?beta=true"

@fraction01 commented on GitHub (Feb 14, 2026): same problem using claude code CLI on linux: ``` Feb 14 15:35:04 mypc ollama[1490]: time=2026-02-14T15:35:04.372+01:00 level=INFO source=server.go:1385 msg="llama runner started in 4.21 seconds" Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 | 23.197µs | 127.0.0.1 | HEAD "/" Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 | 49.22µs | 127.0.0.1 | GET "/api/ps" Feb 14 15:50:58 mypc ollama[1490]: [GIN] 2026/02/14 - 15:50:58 | 200 | 15m58s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 15:55:39 mypc ollama[1490]: [GIN] 2026/02/14 - 15:55:39 | 500 | 4m41s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:00:28 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:28 | 200 | 65.402µs | 127.0.0.1 | GET "/api/version" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 619.886µs | 127.0.0.1 | GET "/api/tags" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 259.829278ms | 127.0.0.1 | POST "/api/show" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 192.574489ms | 127.0.0.1 | POST "/api/show" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 559.553µs | 127.0.0.1 | POST "/api/show" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 189.006135ms | 127.0.0.1 | POST "/api/show" Feb 14 16:00:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:39 | 500 | 4m59s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:05:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:05:39 | 500 | 4m58s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:10:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:10:39 | 500 | 4m57s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:15:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:15:39 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:20:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:20:39 | 500 | 4m51s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:25:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:25:39 | 500 | 4m42s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:30:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:30:39 | 500 | 4m22s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:35:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:35:39 | 500 | 4m26s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:40:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:40:39 | 500 | 4m27s | 127.0.0.1 | POST "/v1/messages?beta=true" ```

GiteaMirror commented

2026-05-05 00:38:02 -05:00

@rick-github commented on GitHub (Feb 14, 2026):

Your client has a 5 minute timeout and the prompts that are being sent seem to take longer than that to process. A full server log and information on the client and what it is doing will help in debugging.

@rick-github commented on GitHub (Feb 14, 2026): Your client has a 5 minute timeout and the prompts that are being sent seem to take longer than that to process. A full server log and information on the client and what it is doing will help in debugging.

GiteaMirror commented

2026-05-05 00:38:03 -05:00

@lvvorovi commented on GitHub (Apr 2, 2026):

time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds"
time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000

time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format=""
time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""
time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198

time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5

time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""

time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"

CLAUDE

export PATH="$HOME/.local/bin:$PATH"
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export API_TIMEOUT_MS=600000000
export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000
export CLAUDE_ENABLE_STREAM_WATCHDOG=0
export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000
export OLLAMA_KEEP_ALIVE=60000000
export OLLAMA_CONTEXT_LENGTH=128000
export OLLAMA_DEBUG=1

ollama serve
ollama launch claude --model qwen3.5:0.8b

Claude Code v2.1.90
ollama version is 0.19.0

@lvvorovi commented on GitHub (Apr 2, 2026): time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" --------------- # CLAUDE export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # OLLAMA export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 --------------------------------- ollama serve ollama launch claude --model qwen3.5:0.8b ------------------------------ Claude Code v2.1.90 ollama version is 0.19.0

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#71187