[GH-ISSUE #13949] Ollama API Compatibility Issue with Claude Code / Anthropic CLI #71187

Open
opened 2026-05-05 00:37:51 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @choki-lin on GitHub (Jan 28, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13949

What is the issue?

Environment

Ollama Version: 0.15.2
Deployment: Docker container behind Traefik reverse proxy
Client: Claude Code CLI (Anthropic's official CLI tool)
Models Tested: llama3.2:1b, qwen3-coder:30b
API Endpoint: /v1/messages (Anthropic-compatible endpoint)

Issue Summary
When using Claude Code CLI to connect to Ollama's Anthropic-compatible API (/v1/messages), the server becomes unresponsive after receiving requests to unsupported endpoints, specifically /v1/messages/count_tokens?beta=true. This eventually causes Ollama to timeout and restart.
Expected Behavior
Ollama should gracefully handle requests to unsupported endpoints by returning 404 errors without affecting subsequent requests to supported endpoints.
Actual Behavior

Claude Code sends initial requests to /v1/messages/count_tokens?beta=true → 404 (expected)
Claude Code sends multiple requests to /v1/messages?beta=true → Some return 404, others succeed
Subsequent requests start returning 500 errors with increasing timeouts (10s, 20s, 40s, 80s+)
Eventually: "aborting completion request due to client closing the connection"
Ollama becomes completely unresponsive - even curl requests stop working
Manual restart required to restore functionality
After restart, the cycle repeats

Steps to Reproduce

  1. Setup Ollama with Anthropic API endpoint
    bash# Run Ollama 0.15.2 in Docker
    docker run -d
    -v ollama:/root/.ollama
    -p 11434:11434
    --name ollama
    ollama/ollama:0.15.2

Pull a model

docker exec ollama ollama pull llama3.2:1b
2. Configure Traefik (or any reverse proxy) with SSE support
yaml# Traefik labels for Ollama service
labels:

  • "traefik.http.routers.ollama.tls.options=no-http2@file" # Force HTTP/1.1 for SSE
  • "traefik.http.services.ollama.loadbalancer.responseforwarding.flushinterval=1ms" # Enable streaming
  1. Test with Claude Code CLI
    bashexport ANTHROPIC_AUTH_TOKEN=ollama
    export ANTHROPIC_BASE_URL=https://your-ollama-server.com
    export ANTHROPIC_API_KEY=""

claude --model llama3.2:1b
Type a message and wait. Claude Code will hang indefinitely.
4. Observe logs
bashdocker logs ollama -f
Logs Showing the Issue
[GIN] 2026/01/28 - 03:19:47 | 404 | 12.018µs | POST "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/28 - 03:19:47 | 404 | 9.552628ms | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 | 875.863µs | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 | 1.162474ms | POST "/v1/messages?beta=true"
time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b

[Model loads successfully...]

[GIN] 2026/01/28 - 03:21:11 | 500 | 1m22s | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true"
time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection"

[Ollama becomes unresponsive - manual restart required]
[After manual restart: docker restart ollama]

time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config"
time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)"
What Works
Direct curl requests work perfectly, even with streaming:
bash# Non-streaming - works
curl "https://your-ollama-server.com/v1/messages"
-H "Content-Type: application/json"
-d '{
"model": "llama3.2:1b",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'

Streaming - works

curl "https://your-ollama-server.com/v1/messages?beta=true"
-H "Content-Type: application/json"
-d '{
"model": "llama3.2:1b",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Count to 3"}],
"stream": true
}'
--no-buffer
Both return proper responses with correct SSE streaming format.
Analysis
The issue appears to be triggered by:

Unsupported endpoint: /v1/messages/count_tokens?beta=true - Claude Code uses this for token counting
Multiple rapid requests: Claude Code sends several requests in quick succession (some to unsupported endpoints)
State corruption: After receiving 404s to unsupported endpoints, Ollama's request handling degrades, causing legitimate requests to timeout

Additional Context

After each Ollama restart, the first curl request works immediately
Claude Code triggers the issue consistently every time it connects
The issue is reproducible with both llama3.2:1b and qwen3-coder:30b
Traefik proxy is properly configured for SSE (proven by curl working with stream: true)

Relevant log output

4. Observe logs
bashdocker logs ollama -f
Logs Showing the Issue
[GIN] 2026/01/28 - 03:19:47 | 404 |      12.018µs | POST "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/28 - 03:19:47 | 404 |    9.552628ms | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 |     875.863µs | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:19:49 | 404 |    1.162474ms | POST "/v1/messages?beta=true"
time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b

[Model loads successfully...]

[GIN] 2026/01/28 - 03:21:11 | 500 |         1m22s | POST "/v1/messages?beta=true"
[GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages"
[GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true"
time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection"

[Ollama becomes unresponsive - manual restart required]
[After manual restart: docker restart ollama]

time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config"
time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)"

OS

Linux

GPU

No response

CPU

No response

Ollama version

0.15.2

Originally created by @choki-lin on GitHub (Jan 28, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13949 ### What is the issue? Environment Ollama Version: 0.15.2 Deployment: Docker container behind Traefik reverse proxy Client: Claude Code CLI (Anthropic's official CLI tool) Models Tested: llama3.2:1b, qwen3-coder:30b API Endpoint: /v1/messages (Anthropic-compatible endpoint) Issue Summary When using Claude Code CLI to connect to Ollama's Anthropic-compatible API (/v1/messages), the server becomes unresponsive after receiving requests to unsupported endpoints, specifically /v1/messages/count_tokens?beta=true. This eventually causes Ollama to timeout and restart. Expected Behavior Ollama should gracefully handle requests to unsupported endpoints by returning 404 errors without affecting subsequent requests to supported endpoints. Actual Behavior Claude Code sends initial requests to /v1/messages/count_tokens?beta=true → 404 (expected) Claude Code sends multiple requests to /v1/messages?beta=true → Some return 404, others succeed Subsequent requests start returning 500 errors with increasing timeouts (10s, 20s, 40s, 80s+) Eventually: "aborting completion request due to client closing the connection" Ollama becomes completely unresponsive - even curl requests stop working Manual restart required to restore functionality After restart, the cycle repeats Steps to Reproduce 1. Setup Ollama with Anthropic API endpoint bash# Run Ollama 0.15.2 in Docker docker run -d \ -v ollama:/root/.ollama \ -p 11434:11434 \ --name ollama \ ollama/ollama:0.15.2 # Pull a model docker exec ollama ollama pull llama3.2:1b 2. Configure Traefik (or any reverse proxy) with SSE support yaml# Traefik labels for Ollama service labels: - "traefik.http.routers.ollama.tls.options=no-http2@file" # Force HTTP/1.1 for SSE - "traefik.http.services.ollama.loadbalancer.responseforwarding.flushinterval=1ms" # Enable streaming 3. Test with Claude Code CLI bashexport ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=https://your-ollama-server.com export ANTHROPIC_API_KEY="" claude --model llama3.2:1b Type a message and wait. Claude Code will hang indefinitely. 4. Observe logs bashdocker logs ollama -f Logs Showing the Issue [GIN] 2026/01/28 - 03:19:47 | 404 | 12.018µs | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/28 - 03:19:47 | 404 | 9.552628ms | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 875.863µs | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 1.162474ms | POST "/v1/messages?beta=true" time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b [Model loads successfully...] [GIN] 2026/01/28 - 03:21:11 | 500 | 1m22s | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages" [GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages" [GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true" time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection" [Ollama becomes unresponsive - manual restart required] [After manual restart: docker restart ollama] time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config" time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)" What Works Direct curl requests work perfectly, even with streaming: bash# Non-streaming - works curl "https://your-ollama-server.com/v1/messages" \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2:1b", "max_tokens": 100, "messages": [{"role": "user", "content": "Hello"}], "stream": false }' # Streaming - works curl "https://your-ollama-server.com/v1/messages?beta=true" \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2:1b", "max_tokens": 100, "messages": [{"role": "user", "content": "Count to 3"}], "stream": true }' \ --no-buffer Both return proper responses with correct SSE streaming format. Analysis The issue appears to be triggered by: Unsupported endpoint: /v1/messages/count_tokens?beta=true - Claude Code uses this for token counting Multiple rapid requests: Claude Code sends several requests in quick succession (some to unsupported endpoints) State corruption: After receiving 404s to unsupported endpoints, Ollama's request handling degrades, causing legitimate requests to timeout Additional Context After each Ollama restart, the first curl request works immediately Claude Code triggers the issue consistently every time it connects The issue is reproducible with both llama3.2:1b and qwen3-coder:30b Traefik proxy is properly configured for SSE (proven by curl working with stream: true) ### Relevant log output ```shell 4. Observe logs bashdocker logs ollama -f Logs Showing the Issue [GIN] 2026/01/28 - 03:19:47 | 404 | 12.018µs | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/28 - 03:19:47 | 404 | 9.552628ms | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 875.863µs | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:19:49 | 404 | 1.162474ms | POST "/v1/messages?beta=true" time=2026-01-28T03:19:49.683Z level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=llama3.2:1b [Model loads successfully...] [GIN] 2026/01/28 - 03:21:11 | 500 | 1m22s | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 03:21:30 | 500 | 10.514709239s | POST "/v1/messages" [GIN] 2026/01/28 - 03:21:57 | 500 | 20.733825012s | POST "/v1/messages" [GIN] 2026/01/28 - 03:23:41 | 500 | 41.709381611s | POST "/v1/messages?beta=true" time=2026-01-28T03:23:41.178Z level=INFO source=runner.go:682 msg="aborting completion request due to client closing the connection" [Ollama becomes unresponsive - manual restart required] [After manual restart: docker restart ollama] time=2026-01-28T03:23:42.384Z level=INFO source=routes.go:1631 msg="server config" time=2026-01-28T03:23:42.386Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.2)" ``` ### OS Linux ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.15.2
GiteaMirror added the bug label 2026-05-05 00:37:51 -05:00
Author
Owner

@nishtahir commented on GitHub (Jan 28, 2026):

I'm seeing similar issues on 0.15.2 when I launch using ollama serve on macos

OLLAMA_CONTEXT_LENGTH=64000 ollama serve

time=2026-01-28T01:40:39.337-05:00 level=INFO source=routes.go:1631 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:64000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/.../.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2026-01-28T01:40:39.339-05:00 level=INFO source=images.go:473 msg="total blobs: 11"
time=2026-01-28T01:40:39.340-05:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-01-28T01:40:39.340-05:00 level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.2)"
time=2026-01-28T01:40:39.341-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-01-28T01:40:39.345-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.2/bin/ollama runner --ollama-engine --port 56868"
time=2026-01-28T01:40:39.476-05:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M1 Max" libdirs="" driver=0.0 pci_id="" type=discrete total="21.3 GiB" available="21.3 GiB"
[GIN] 2026/01/28 - 01:40:46 | 404 |      16.584µs |       127.0.0.1 | POST     "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/28 - 01:40:46 | 404 |    3.886042ms |       127.0.0.1 | POST     "/v1/messages?beta=true"
<!-- gh-comment-id:3809301595 --> @nishtahir commented on GitHub (Jan 28, 2026): I'm seeing similar issues on `0.15.2` when I launch using `ollama serve` on macos ``` OLLAMA_CONTEXT_LENGTH=64000 ollama serve time=2026-01-28T01:40:39.337-05:00 level=INFO source=routes.go:1631 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:64000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/.../.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2026-01-28T01:40:39.339-05:00 level=INFO source=images.go:473 msg="total blobs: 11" time=2026-01-28T01:40:39.340-05:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-01-28T01:40:39.340-05:00 level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.2)" time=2026-01-28T01:40:39.341-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-01-28T01:40:39.345-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.2/bin/ollama runner --ollama-engine --port 56868" time=2026-01-28T01:40:39.476-05:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M1 Max" libdirs="" driver=0.0 pci_id="" type=discrete total="21.3 GiB" available="21.3 GiB" [GIN] 2026/01/28 - 01:40:46 | 404 | 16.584µs | 127.0.0.1 | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/28 - 01:40:46 | 404 | 3.886042ms | 127.0.0.1 | POST "/v1/messages?beta=true" ```
Author
Owner

@rick-github commented on GitHub (Jan 28, 2026):

Ollama doesn't support Claude telemetry, set the following environment variables to prevent the 404s:

DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
<!-- gh-comment-id:3809324876 --> @rick-github commented on GitHub (Jan 28, 2026): Ollama doesn't support Claude telemetry, set the following environment variables to prevent the 404s: ``` DISABLE_TELEMETRY=1 DISABLE_ERROR_REPORTING=1 CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 ```
Author
Owner

@choki-lin commented on GitHub (Jan 28, 2026):

Good morning guys, thanks for the reply.

I did more tests with no luck.

Environments:

Inside the docker running ollama on my nas:

env

DISABLE_TELEMETRY=1
HOSTNAME=f9aa23af0edd
OLLAMA_CONTEXT_LENGTH=64000
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
HOME=/root
OLLAMA_HOST=0.0.0.0
NVIDIA_DRIVER_CAPABILITIES=compute,utility
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DISABLE_ERROR_REPORTING=1
PWD=/
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
NVIDIA_VISIBLE_DEVICES=all

#on the machine launching claude

export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=https://my_domain.com.ar
export ANTHROPIC_API_KEY=""

ollama launch claude --model qwen3-coder:30b

Nothing happends inside claude. I said "hi" and never answered my message.

also tried:

export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://192.168.68.150
export ANTHROPIC_API_KEY=""

ollama launch claude --model qwen3-coder:30b

Also tried:
claude --model llama3.2:1b
claude --model qwen3-coder:30b

I double check if it's working with this curl:

curl "https://my_domain.com.ar/v1/messages?beta=true"
-H "Content-Type: application/json"
-d '{
"model": "qwen3-coder:30b",
"max_tokens": 50,
"messages": [{"role": "user", "content": "di hola"}],
"stream": true
}' \
--no-buffer
event: message_start
data: {"type":"message_start","message":{"id":"msg_2d9032d8c9e156915b347e64","type":"message","role":"assistant","model":"qwen3-coder:30b","content":[],"usage":{"input_tokens":0,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"¡"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hola"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ¿"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"En"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" qué"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" puedo"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ayud"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"arte"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" hoy"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"?"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Thanks for your help!

<!-- gh-comment-id:3811310204 --> @choki-lin commented on GitHub (Jan 28, 2026): Good morning guys, thanks for the reply. I did more tests with no luck. Environments: Inside the docker running ollama on my nas: # env DISABLE_TELEMETRY=1 HOSTNAME=f9aa23af0edd OLLAMA_CONTEXT_LENGTH=64000 LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 HOME=/root OLLAMA_HOST=0.0.0.0 NVIDIA_DRIVER_CAPABILITIES=compute,utility TERM=xterm PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin DISABLE_ERROR_REPORTING=1 PWD=/ CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 NVIDIA_VISIBLE_DEVICES=all #on the machine launching claude export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=https://my_domain.com.ar export ANTHROPIC_API_KEY="" ollama launch claude --model qwen3-coder:30b Nothing happends inside claude. I said "hi" and never answered my message. also tried: export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=http://192.168.68.150 export ANTHROPIC_API_KEY="" ollama launch claude --model qwen3-coder:30b Also tried: claude --model llama3.2:1b claude --model qwen3-coder:30b I double check if it's working with this curl: curl "https://my_domain.com.ar/v1/messages?beta=true" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-coder:30b", "max_tokens": 50, "messages": [{"role": "user", "content": "di hola"}], "stream": true }' \ --no-buffer event: message_start data: {"type":"message_start","message":{"id":"msg_2d9032d8c9e156915b347e64","type":"message","role":"assistant","model":"qwen3-coder:30b","content":[],"usage":{"input_tokens":0,"output_tokens":0}}} event: content_block_start data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"¡"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hola"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ¿"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"En"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" qué"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" puedo"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" ayud"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"arte"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" hoy"}} event: content_block_delta data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"?"}} event: content_block_stop data: {"type":"content_block_stop","index":0} event: message_delta data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}} event: message_stop data: {"type":"message_stop"} Thanks for your help!
Author
Owner

@wyattearp commented on GitHub (Jan 28, 2026):

Reporting in from linux land, same issue on Ubuntu 24.02. The 0.15.2 version reports a missing 404 for the messages API:

[GIN] 2026/01/28 - 23:40:13 | 404 |       3.232µs | 192.168.100.252 | POST     "/v1/messages?beta=true"
[GIN] 2026/01/28 - 23:40:13 | 404 |       3.968µs | 192.168.100.252 | POST     "/v1/messages?beta=true"
[GIN] 2026/01/28 - 23:40:13 | 404 |       3.616µs | 192.168.100.252 | POST     "/v1/messages?beta=true"
wyatt@spark-6f4b:~$ curl -X POST http://localhost:11434/v1/messages \ \
-H "Content-Type: application/json" \
-H "x-api-key: ollama" \
-H "anthropic-version: 2023-06-01" \
-d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "messages": [{ "role": "user", "content": "Hello, how are you?" }]
}'
404 page not found
<!-- gh-comment-id:3814549754 --> @wyattearp commented on GitHub (Jan 28, 2026): Reporting in from linux land, same issue on Ubuntu 24.02. The 0.15.2 version reports a missing 404 for the messages API: ``` [GIN] 2026/01/28 - 23:40:13 | 404 | 3.232µs | 192.168.100.252 | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 23:40:13 | 404 | 3.968µs | 192.168.100.252 | POST "/v1/messages?beta=true" [GIN] 2026/01/28 - 23:40:13 | 404 | 3.616µs | 192.168.100.252 | POST "/v1/messages?beta=true" wyatt@spark-6f4b:~$ curl -X POST http://localhost:11434/v1/messages \ \ -H "Content-Type: application/json" \ -H "x-api-key: ollama" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "qwen3-coder", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }' 404 page not found ```
Author
Owner

@rick-github commented on GitHub (Jan 29, 2026):

@wyattearp What's the output of ollama -v?

$ ollama -v
ollama version is 0.15.2
$ curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "messages": [{ "role": "user", "content": "Hello, how are you?" }]
}'
{"id":"msg_97a90bbca03afc72a993992b","type":"message","role":"assistant","model":"qwen3-coder","content":[{"type":"text","text":"Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?"}],"stop_reason":"end_turn","usage":{"input_tokens":14,"output_tokens":37}}
<!-- gh-comment-id:3816361852 --> @rick-github commented on GitHub (Jan 29, 2026): @wyattearp What's the output of `ollama -v`? ```console $ ollama -v ollama version is 0.15.2 $ curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{ "model": "qwen3-coder", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }' {"id":"msg_97a90bbca03afc72a993992b","type":"message","role":"assistant","model":"qwen3-coder","content":[{"type":"text","text":"Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?"}],"stop_reason":"end_turn","usage":{"input_tokens":14,"output_tokens":37}} ```
Author
Owner

@wyattearp commented on GitHub (Jan 30, 2026):

Oh I am so annoyed - scratch the Linux version of this statement - it's the freaking docker container somehow not being updated for 11 months. If I manually drop into the container and re-install because of the new zstd dep:

wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash
root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
ERROR: This version requires zstd for extraction. Please install zstd and try again:
  - Debian/Ubuntu: sudo apt-get install zstd
  - RHEL/CentOS/Fedora: sudo dnf install zstd
  - Arch: sudo pacman -S zstd
root@e55a7fd7beb5:/app/backend# apt install zstd
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package zstd
root@e55a7fd7beb5:/app/backend# apt update && apt install zstd
Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:4 http://deb.debian.org/debian bookworm/main arm64 Packages [8691 kB]
Get:5 http://deb.debian.org/debian bookworm-updates/main arm64 Packages [6936 B]
Get:6 http://deb.debian.org/debian-security bookworm-security/main arm64 Packages [289 kB]
Fetched 9241 kB in 1s (8628 kB/s)                        
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
20 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  zstd
0 upgraded, 1 newly installed, 0 to remove and 20 not upgraded.
Need to get 584 kB of archives.
After this operation, 1956 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bookworm/main arm64 zstd arm64 1.5.4+dfsg2-5 [584 kB]
Fetched 584 kB in 0s (3299 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package zstd.
(Reading database ... 17995 files and directories currently installed.)
Preparing to unpack .../zstd_1.5.4+dfsg2-5_arm64.deb ...
Unpacking zstd (1.5.4+dfsg2-5) ...
Setting up zstd (1.5.4+dfsg2-5) ...
root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-arm64.tar.zst
######################################################################## 100.0%
WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
root@e55a7fd7beb5:/app/backend# kill 1
root@e55a7fd7beb5:/app/backend# 

.......

wyatt@spark-6f4b:~$ docker restart open-webui
wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash
root@e55a7fd7beb5:/app/backend# ollama -v
ollama version is 0.15.2
root@e55a7fd7beb5:/app/backend# curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{ "model": "gpt-oss:20b", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }'
{"id":"msg_9948aec9e4a2d46a6e562f44","type":"message","role":"assistant","model":"gpt-oss:20b","content":[{"type":"thinking","thinking":"User says \"Hello, how are you?\" They expect a friendly reply. We'll answer politely."},{"type":"text","text":"Hello! I'm doing great—thanks for asking. How about you? Is there anything you'd like to chat about or any question I can help with?"}],"stop_reason":"end_turn","usage":{"input_tokens":73,"output_tokens":59}}
root@e55a7fd7beb5:/app/backend# 

Go back to hunting the mac version :-|

<!-- gh-comment-id:3821294051 --> @wyattearp commented on GitHub (Jan 30, 2026): Oh I am so annoyed - scratch the Linux version of this statement - it's the freaking docker container somehow not being updated for 11 months. If I manually drop into the container and re-install because of the new `zstd` dep: ``` wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local ERROR: This version requires zstd for extraction. Please install zstd and try again: - Debian/Ubuntu: sudo apt-get install zstd - RHEL/CentOS/Fedora: sudo dnf install zstd - Arch: sudo pacman -S zstd root@e55a7fd7beb5:/app/backend# apt install zstd Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package zstd root@e55a7fd7beb5:/app/backend# apt update && apt install zstd Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB] Get:2 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB] Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB] Get:4 http://deb.debian.org/debian bookworm/main arm64 Packages [8691 kB] Get:5 http://deb.debian.org/debian bookworm-updates/main arm64 Packages [6936 B] Get:6 http://deb.debian.org/debian-security bookworm-security/main arm64 Packages [289 kB] Fetched 9241 kB in 1s (8628 kB/s) Reading package lists... Done Building dependency tree... Done Reading state information... Done 20 packages can be upgraded. Run 'apt list --upgradable' to see them. Reading package lists... Done Building dependency tree... Done Reading state information... Done The following NEW packages will be installed: zstd 0 upgraded, 1 newly installed, 0 to remove and 20 not upgraded. Need to get 584 kB of archives. After this operation, 1956 kB of additional disk space will be used. Get:1 http://deb.debian.org/debian bookworm/main arm64 zstd arm64 1.5.4+dfsg2-5 [584 kB] Fetched 584 kB in 0s (3299 kB/s) debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package zstd. (Reading database ... 17995 files and directories currently installed.) Preparing to unpack .../zstd_1.5.4+dfsg2-5_arm64.deb ... Unpacking zstd (1.5.4+dfsg2-5) ... Setting up zstd (1.5.4+dfsg2-5) ... root@e55a7fd7beb5:/app/backend# curl -fsSL https://ollama.com/install.sh | sh >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading ollama-linux-arm64.tar.zst ######################################################################## 100.0% WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. root@e55a7fd7beb5:/app/backend# kill 1 root@e55a7fd7beb5:/app/backend# ....... wyatt@spark-6f4b:~$ docker restart open-webui wyatt@spark-6f4b:~$ docker exec -it open-webui /bin/bash root@e55a7fd7beb5:/app/backend# ollama -v ollama version is 0.15.2 root@e55a7fd7beb5:/app/backend# curl -X POST http://localhost:11434/v1/messages -H "Content-Type: application/json" -H "x-api-key: ollama" -H "anthropic-version: 2023-06-01" -d '{ "model": "gpt-oss:20b", "max_tokens": 1024, "messages": [{ "role": "user", "content": "Hello, how are you?" }] }' {"id":"msg_9948aec9e4a2d46a6e562f44","type":"message","role":"assistant","model":"gpt-oss:20b","content":[{"type":"thinking","thinking":"User says \"Hello, how are you?\" They expect a friendly reply. We'll answer politely."},{"type":"text","text":"Hello! I'm doing great—thanks for asking. How about you? Is there anything you'd like to chat about or any question I can help with?"}],"stop_reason":"end_turn","usage":{"input_tokens":73,"output_tokens":59}} root@e55a7fd7beb5:/app/backend# ``` Go back to hunting the mac version :-|
Author
Owner

@fraction01 commented on GitHub (Feb 14, 2026):

same problem using claude code CLI on linux:

Feb 14 15:35:04 mypc ollama[1490]: time=2026-02-14T15:35:04.372+01:00 level=INFO source=server.go:1385 msg="llama runner started in 4.21 seconds"
Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 |      23.197µs |       127.0.0.1 | HEAD     "/"
Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 |       49.22µs |       127.0.0.1 | GET      "/api/ps"
Feb 14 15:50:58 mypc ollama[1490]: [GIN] 2026/02/14 - 15:50:58 | 200 |        15m58s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 15:55:39 mypc ollama[1490]: [GIN] 2026/02/14 - 15:55:39 | 500 |         4m41s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:00:28 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:28 | 200 |      65.402µs |       127.0.0.1 | GET      "/api/version"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |     619.886µs |       127.0.0.1 | GET      "/api/tags"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |  259.829278ms |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |  192.574489ms |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |     559.553µs |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 |  189.006135ms |       127.0.0.1 | POST     "/api/show"
Feb 14 16:00:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:39 | 500 |         4m59s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:05:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:05:39 | 500 |         4m58s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:10:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:10:39 | 500 |         4m57s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:15:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:15:39 | 500 |         4m55s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:20:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:20:39 | 500 |         4m51s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:25:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:25:39 | 500 |         4m42s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:30:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:30:39 | 500 |         4m22s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:35:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:35:39 | 500 |         4m26s |       127.0.0.1 | POST     "/v1/messages?beta=true"
Feb 14 16:40:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:40:39 | 500 |         4m27s |       127.0.0.1 | POST     "/v1/messages?beta=true"

<!-- gh-comment-id:3902079565 --> @fraction01 commented on GitHub (Feb 14, 2026): same problem using claude code CLI on linux: ``` Feb 14 15:35:04 mypc ollama[1490]: time=2026-02-14T15:35:04.372+01:00 level=INFO source=server.go:1385 msg="llama runner started in 4.21 seconds" Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 | 23.197µs | 127.0.0.1 | HEAD "/" Feb 14 15:35:29 mypc ollama[1490]: [GIN] 2026/02/14 - 15:35:29 | 200 | 49.22µs | 127.0.0.1 | GET "/api/ps" Feb 14 15:50:58 mypc ollama[1490]: [GIN] 2026/02/14 - 15:50:58 | 200 | 15m58s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 15:55:39 mypc ollama[1490]: [GIN] 2026/02/14 - 15:55:39 | 500 | 4m41s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:00:28 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:28 | 200 | 65.402µs | 127.0.0.1 | GET "/api/version" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 619.886µs | 127.0.0.1 | GET "/api/tags" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 259.829278ms | 127.0.0.1 | POST "/api/show" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 192.574489ms | 127.0.0.1 | POST "/api/show" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 559.553µs | 127.0.0.1 | POST "/api/show" Feb 14 16:00:29 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:29 | 200 | 189.006135ms | 127.0.0.1 | POST "/api/show" Feb 14 16:00:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:00:39 | 500 | 4m59s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:05:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:05:39 | 500 | 4m58s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:10:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:10:39 | 500 | 4m57s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:15:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:15:39 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:20:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:20:39 | 500 | 4m51s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:25:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:25:39 | 500 | 4m42s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:30:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:30:39 | 500 | 4m22s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:35:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:35:39 | 500 | 4m26s | 127.0.0.1 | POST "/v1/messages?beta=true" Feb 14 16:40:39 mypc ollama[1490]: [GIN] 2026/02/14 - 16:40:39 | 500 | 4m27s | 127.0.0.1 | POST "/v1/messages?beta=true" ```
Author
Owner

@rick-github commented on GitHub (Feb 14, 2026):

Your client has a 5 minute timeout and the prompts that are being sent seem to take longer than that to process. A full server log and information on the client and what it is doing will help in debugging.

<!-- gh-comment-id:3902144748 --> @rick-github commented on GitHub (Feb 14, 2026): Your client has a 5 minute timeout and the prompts that are being sent seem to take longer than that to process. A full server log and information on the client and what it is doing will help in debugging.
Author
Owner

@lvvorovi commented on GitHub (Apr 2, 2026):

time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds"
time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000

time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format=""
time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""
time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198

time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5

time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""

time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"


CLAUDE

export PATH="$HOME/.local/bin:$PATH"
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export API_TIMEOUT_MS=600000000
export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000
export CLAUDE_ENABLE_STREAM_WATCHDOG=0
export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000
export OLLAMA_KEEP_ALIVE=60000000
export OLLAMA_CONTEXT_LENGTH=128000
export OLLAMA_DEBUG=1


ollama serve
ollama launch claude --model qwen3.5:0.8b


Claude Code v2.1.90
ollama version is 0.19.0

<!-- gh-comment-id:4178942648 --> @lvvorovi commented on GitHub (Apr 2, 2026): time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" --------------- # CLAUDE export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # OLLAMA export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 --------------------------------- ollama serve ollama launch claude --model qwen3.5:0.8b ------------------------------ Claude Code v2.1.90 ollama version is 0.19.0
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71187