[PR #2296] [MERGED] append image tags to user content #36720

Closed
opened 2026-04-22 21:22:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2296
Author: @mxyng
Created: 2/1/2024
Status: Merged
Merged: 2/1/2024
Merged by: @mxyng

Base: mainHead: mxyng/img-tags


📝 Commits (8)

📊 Changes

6 files changed (+89 additions, -36 deletions)

View changed files

📝 llm/dyn_ext_server.go (+3 -6)
📝 llm/llama.go (+1 -1)
📝 server/images.go (+17 -8)
📝 server/images_test.go (+26 -7)
📝 server/routes.go (+40 -13)
📝 server/routes_test.go (+2 -1)

📄 Description

summary of changes:

  1. add [img-x] to prompt content when there are images. x corresponds to the image's id. for generate, this is just the image's index in the Images list. for chat, this is the image's index of among all images in the messages list
  2. account for image embedding when trimming the context. image projections produce 768 tokens for clip models. check and add this number to the total tokens count
  3. if the image tokens exceed the max token count, do not add images to the final images list and strip the image tag from the prompt content

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2296 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 2/1/2024 **Status:** ✅ Merged **Merged:** 2/1/2024 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/img-tags` --- ### 📝 Commits (8) - [`b4e11be`](https://github.com/ollama/ollama/commit/b4e11be8ef3c4afc82a7357d51f93b336c1866a1) append image tags to user content - [`8450bf6`](https://github.com/ollama/ollama/commit/8450bf66e60ab563552d31c0c69039cc12fe4603) trim images - [`f11bf07`](https://github.com/ollama/ollama/commit/f11bf0740bfc4a9653a4c59bf3cb9a00361654b1) use `llm.ImageData` - [`d046bee`](https://github.com/ollama/ollama/commit/d046bee790fd8549d324d2558693722a21b897e8) use llm.ImageData for chat - [`fb56988`](https://github.com/ollama/ollama/commit/fb5698801426f045e46dfc228f1adca70ed79bbc) account for image projection in token count - [`d125510`](https://github.com/ollama/ollama/commit/d125510b4b7fef09b8a5795f30692da354a0d9cd) remove image tags - [`e49dc9f`](https://github.com/ollama/ollama/commit/e49dc9f3d882ca5a4d56f9b4dea1987c39ab8aef) fix tests - [`f376140`](https://github.com/ollama/ollama/commit/f3761405c88d36becaba7589362aa976a39aa59c) use image id ### 📊 Changes **6 files changed** (+89 additions, -36 deletions) <details> <summary>View changed files</summary> 📝 `llm/dyn_ext_server.go` (+3 -6) 📝 `llm/llama.go` (+1 -1) 📝 `server/images.go` (+17 -8) 📝 `server/images_test.go` (+26 -7) 📝 `server/routes.go` (+40 -13) 📝 `server/routes_test.go` (+2 -1) </details> ### 📄 Description summary of changes: 1. add `[img-x]` to prompt content when there are images. `x` corresponds to the image's id. for generate, this is just the image's index in the Images list. for chat, this is the image's index of among all images in the messages list 2. account for image embedding when trimming the context. image projections produce 768 tokens for clip models. check and add this number to the total tokens count 3. if the image tokens exceed the max token count, do not add images to the final images list and strip the image tag from the prompt content --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 21:22:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#36720