[GH-ISSUE #15147] Problem with submitting multiple images to qwen3.5:27b as opposed to qwen3.5:397b-cloud #71757

Open
opened 2026-05-05 02:27:22 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @fchahun on GitHub (Mar 30, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15147

What is the issue?

I am facing a problem with qwen3.5:27b when submitting a list of 2-3 images (representing the pages of a same document) along with a single text prompt requiring content extraction of the images. With qwen3.5:397b-cloud, everything runs fine as expected. But with qwen3.5:27b, using exactly the same prompt and same images, I consistently get the following error message: "Internal Server Error -- Failed to create new sequence: SameBatch may not be specified within numKeep (index: 3 numKeep: 4 SameBatch: 2017)". This is true whether using the "generate" or "chat" API endpoints.
I am using Ollama 0.18.0 running on an NVIDIA L4 GPU with 24Gb VRAM.
I located the source of this error message in https://github.com/ollama/ollama/blob/main/runner/ollamarunner/runner.go but I do not see how to circumvent this behavior.
Is this a bug, or a normal behavior that can be circumvented by modifying some parameters?

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.18.0

Originally created by @fchahun on GitHub (Mar 30, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15147 ### What is the issue? I am facing a problem with qwen3.5:27b when submitting a list of 2-3 images (representing the pages of a same document) along with a single text prompt requiring content extraction of the images. With qwen3.5:397b-cloud, everything runs fine as expected. But with qwen3.5:27b, using exactly the same prompt and same images, I consistently get the following error message: "**Internal Server Error -- Failed to create new sequence: SameBatch may not be specified within numKeep (index: 3 numKeep: 4 SameBatch: 2017)"**. This is true whether using the "generate" or "chat" API endpoints. I am using Ollama 0.18.0 running on an NVIDIA L4 GPU with 24Gb VRAM. I located the source of this error message in https://github.com/ollama/ollama/blob/main/runner/ollamarunner/runner.go but I do not see how to circumvent this behavior. Is this a bug, or a normal behavior that can be circumvented by modifying some parameters? ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.18.0
GiteaMirror added the bug label 2026-05-05 02:27:22 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 30, 2026):

Have you tried increasing the context window?

Server logs will aid in debugging.

<!-- gh-comment-id:4156841879 --> @rick-github commented on GitHub (Mar 30, 2026): Have you tried increasing the [context window](https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-can-i-specify-the-context-window-size)? [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@fchahun commented on GitHub (Mar 31, 2026):

Thank you: increasing num_ctx to 16384 did solve the problem (actually I found that, for this run, 12750 tokens at least are required based on reported prompt + eval count)

Two things actually fooled me in this case:

  • The Ollama doc says that default context windows is 32K for 24-48 GiB VRAM but it happens that the "24 GiB VRAM" NVIDIA L4 GPU is seen with slightly less than 24GiB VRAM, hence a default num_ctx set at 2K.
  • Qwen3.5:27b is very verbose in thinking output, even with think="low", which adds up to token count

But would it be possible for such cases to consider having a more explicit error message, hinting towards increasing the context window, rather than simply "Failed to create new sequence: SameBatch may not be specified within numKeep" ?

<!-- gh-comment-id:4163239327 --> @fchahun commented on GitHub (Mar 31, 2026): Thank you: increasing **num_ctx** to 16384 did solve the problem (actually I found that, for this run, 12750 tokens at least are required based on reported prompt + eval count) Two things actually fooled me in this case: - The Ollama doc says that default context windows is **32K for 24-48 GiB VRAM** but it happens that the "24 GiB VRAM" NVIDIA L4 GPU is seen with slightly less than 24GiB VRAM, hence a default **num_ctx** set at 2K. - Qwen3.5:27b is **very** verbose in thinking output, even with think="low", which adds up to token count But would it be possible for such cases to consider having a **more explicit error message**, hinting towards increasing the context window, rather than simply "_Failed to create new sequence: SameBatch may not be specified within numKeep_" ?
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15147
Analyzed: 2026-04-18T18:23:00.237216

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274311062 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15147 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15147 **Analyzed**: 2026-04-18T18:23:00.237216 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71757