[GH-ISSUE #14698] Bug: qwen3.5:397b-cloud hangs on /v1/chat/completions endpoint #35269

Closed
opened 2026-04-22 19:39:48 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @oggixx on GitHub (Mar 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14698

Model

  • qwen3.5:397b-cloud

Description

The model qwen3.5:397b-cloud hangs on the /v1/chat/completions endpoint while /api/generate works fine.

Steps to Reproduce

  1. Start Ollama with qwen3.5:397b-cloud model loaded
  2. Send POST request to /v1/chat/completions
  3. Request hangs/times out with 500 Internal Server Error

Expected Behavior

Chat completion should return within reasonable time

Actual Behavior

  • /api/generate works fine
  • /v1/chat/completions hangs/times out (500 Internal Server Error)
  • Other models (qwen3.5:cloud, minimax-m2.5:cloud) work fine on same server with both endpoints

Investigation

# Works:
curl -X POST http://localhost:11434/api/generate -d '{"model": "qwen3.5:397b-cloud", "prompt": "Hi"}'

# Hangs/Timeouts:
curl -X POST http://localhost:11434/v1/chat/completions -d '{"model": "qwen3.5:397b-cloud", "messages":[{"role":"user","content":"Hi"}]}'

Notes

  • Model size: 397b parameters
  • Same model works on /api/generate but not on /v1/chat/completions
  • Other models work on both endpoints without issues

Created via GitHub API

Originally created by @oggixx on GitHub (Mar 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14698 ## Model - **qwen3.5:397b-cloud** ## Description The model `qwen3.5:397b-cloud` hangs on the `/v1/chat/completions` endpoint while `/api/generate` works fine. ## Steps to Reproduce 1. Start Ollama with `qwen3.5:397b-cloud` model loaded 2. Send POST request to `/v1/chat/completions` 3. Request hangs/times out with 500 Internal Server Error ## Expected Behavior Chat completion should return within reasonable time ## Actual Behavior - `/api/generate` works fine - `/v1/chat/completions` hangs/times out (500 Internal Server Error) - Other models (`qwen3.5:cloud`, `minimax-m2.5:cloud`) work fine on same server with both endpoints ## Investigation ```bash # Works: curl -X POST http://localhost:11434/api/generate -d '{"model": "qwen3.5:397b-cloud", "prompt": "Hi"}' # Hangs/Timeouts: curl -X POST http://localhost:11434/v1/chat/completions -d '{"model": "qwen3.5:397b-cloud", "messages":[{"role":"user","content":"Hi"}]}' ``` ## Notes - Model size: 397b parameters - Same model works on `/api/generate` but not on `/v1/chat/completions` - Other models work on both endpoints without issues --- *Created via GitHub API*
GiteaMirror added the cloud label 2026-04-22 19:39:48 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 8, 2026):

Seems like a transient issue.

$ curl -X POST http://localhost:11434/v1/chat/completions -d '{"model": "qwen3.5:397b-cloud", "messages":[{"role":"user","content":"Hi"}]}'
{"id":"chatcmpl-392","object":"chat.completion","created":1772968250,"model":"qwen3.5:397b-cloud","system_fingerprint":"fp_ollama",
"choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How's it going? Is there anything I can help you with today?",...
<!-- gh-comment-id:4018857726 --> @rick-github commented on GitHub (Mar 8, 2026): Seems like a transient issue. ```console $ curl -X POST http://localhost:11434/v1/chat/completions -d '{"model": "qwen3.5:397b-cloud", "messages":[{"role":"user","content":"Hi"}]}' {"id":"chatcmpl-392","object":"chat.completion","created":1772968250,"model":"qwen3.5:397b-cloud","system_fingerprint":"fp_ollama", "choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How's it going? Is there anything I can help you with today?",... ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35269