[GH-ISSUE #10274] Add a way to interleave messages and images in /api/chat #6748

Open
opened 2026-04-12 18:30:36 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @jmorganca on GitHub (Apr 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10274

Currently, /api/chat accepts a list of messages that each have a content and images field: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-images

However, a common request is to interleave, for example:

Here is an image of my dog: <image 1>

Here is an image of my neighbour's dog: <image 2>

What color is my dog?

Today, when passing an array of images, they will be sent to the model like this:

<image 1> <image 2> Here is an image of my dog: 

Here is an image of my neighbour's dog:

What color is my dog?

Lowering the output quality of prompts with interleaved images and text

Originally created by @jmorganca on GitHub (Apr 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10274 Currently, `/api/chat` accepts a list of messages that each have a `content` and `images` field: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-images However, a common request is to interleave, for example: ``` Here is an image of my dog: <image 1> Here is an image of my neighbour's dog: <image 2> What color is my dog? ``` Today, when passing an array of images, they will be sent to the model like this: ``` <image 1> <image 2> Here is an image of my dog: Here is an image of my neighbour's dog: What color is my dog? ``` Lowering the output quality of prompts with interleaved images and text
GiteaMirror added the feature request label 2026-04-12 18:30:36 -05:00
Author
Owner

@greengrass821 commented on GitHub (Apr 21, 2025):

hey @jmorganca i am intrested in picking this task , can you assign me if no one has been assigned yet

<!-- gh-comment-id:2818439634 --> @greengrass821 commented on GitHub (Apr 21, 2025): hey @jmorganca i am intrested in picking this task , can you assign me if no one has been assigned yet
Author
Owner

@Luca-Olivieri commented on GitHub (May 19, 2025):

Ehi there, I'm also interested in the matter.

From the scripts server/routes.go and server/prompt.go, it seems that you can actually place yourself the image placeholder [img] in the ollama.chat() request. Then they get substitued with [img-{ID}] with increasing IDs. Otherwise, Ollama is going to place the placeholder themselves in the beginning.

However I'm not familiar with Go, therefore I might be wrong and someone should validate this hypothesis.
I tried to place the images in different places interleaved with text and asked some LLMs to state precisely the ordering of the elements in the prompt and almost always the LLM could describe the correct ordering.

<!-- gh-comment-id:2891762234 --> @Luca-Olivieri commented on GitHub (May 19, 2025): Ehi there, I'm also interested in the matter. From the scripts `server/routes.go` and `server/prompt.go`, it seems that you can actually place yourself the image placeholder `[img]` in the `ollama.chat()` request. Then they get substitued with `[img-{ID}]` with increasing IDs. Otherwise, Ollama is going to place the placeholder themselves in the beginning. However I'm not familiar with Go, therefore I might be wrong and someone should validate this hypothesis. I tried to place the images in different places interleaved with text and asked some LLMs to state precisely the ordering of the elements in the prompt and almost always the LLM could describe the correct ordering.
Author
Owner

@sagnikpal2004 commented on GitHub (Jun 24, 2025):

I looked through the code, and I think @Luca-Olivieri is right. There is already functionality that substitutes increasing image IDs for each [img] tag present in the textual prompt. If these tags are not present, then the tags are placed as a prefix.
This would mean, for any application that uses the Ollama API, it would have to manually place those tags in the desired positions, and Ollama should be able to take care of the ordering.

However, when using the ollama run command to run a model in the CLI, when image paths are inserted, they get inserted as a prefixes with no regard for their position in the actual prompt. I am working on a PR that fixes that.

<!-- gh-comment-id:2999411411 --> @sagnikpal2004 commented on GitHub (Jun 24, 2025): I looked through the code, and I think @Luca-Olivieri is right. There is already functionality that substitutes increasing image IDs for each [img] tag present in the textual prompt. If these tags are not present, then the tags are placed as a prefix. This would mean, for any application that uses the Ollama API, it would have to manually place those tags in the desired positions, and Ollama should be able to take care of the ordering. However, when using the `ollama run` command to run a model in the CLI, when image paths are inserted, they get inserted as a prefixes with no regard for their position in the actual prompt. I am working on a PR that fixes that.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6748