[GH-ISSUE #2769] Multimodal request seems to ignore the prompt at the first request. #1669

Closed
opened 2026-04-12 11:38:26 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @joakimeriksson on GitHub (Feb 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2769

If I use any of the existing Ollama examples and try to change the prompt it will not work. It seems like the first prompt is completely ignored - or assumed to be "describe the image".

First I did try the follwing example (with an image from an old Jfokus conference):

import ollama

res = ollama.chat(
        model="llava",
        messages=[
                {
                        'role': 'user',
                        'images': ['../jman-html5.jpg'],
                        'content': 'What is the man wearing?',
                }
        ]
)

Running this I always get the default description of the image as if I sent "describe the image".

>python image.py
 The image you've provided is a stylized illustration of a character that appears to be a superhero. The character is depicted with a muscular build, wearing a cape with a logo on the chest and the number "5" visible. The cape suggests a classic superhero theme. The character is also wearing a mask with a design that resembles a helmet, which could imply some sort of identity concealment or protection.

In the background, there's a banner with text that says "WWW.MHM.NET," which likely refers to a website related to the fictional superhero team "The Justice League" or similar content. The style of the image is reminiscent of comic book art and is designed in a way that would be appealing for fans of superheroes. 
(base) joakimeriksson@MAPVJLV3F57XY ollama % python image.py
 This is an image of a character, who appears to be a superhero based on the cape and the emblem on the chest. The character has a muscular build, with visible abs and defined biceps, suggesting strength and physical fitness. They are wearing a costume that includes a mask with a design that could be associated with a sports team or a particular brand, as indicated by the logo on the mask. The number "6" is prominently displayed on the chest of the costume, which might signify a team number or an identification number. Additionally, there's text on the image that reads "JWM" and "WWW," which are likely part of the character's name or a branding element. The overall style of the image is reminiscent of comic book art, with bold lines, shading, and a dynamic pose. 

Changing to the following works:

import ollama

res = ollama.chat(
        model="llava",
        messages=[
                {
                        'role': 'user',
                        'images': ['../jman-html5.jpg'],
                        'content': ' ',
                },
                {
                        'role': 'user',
                        'content': 'What is the man wearing?',
                        'images': []
                }
        ]
)

Running this will give a response about what the man i wearing.

>python image.py
 The man in the image is wearing a suit of armor, which appears to be inspired by a superhero theme. 
 He has a muscular build and is standing with his arms crossed. 

Originally created by @joakimeriksson on GitHub (Feb 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2769 If I use any of the existing Ollama examples and try to change the prompt it will not work. It seems like the first prompt is completely ignored - or assumed to be "describe the image". First I did try the follwing example (with an image from an old Jfokus conference): ```python import ollama res = ollama.chat( model="llava", messages=[ { 'role': 'user', 'images': ['../jman-html5.jpg'], 'content': 'What is the man wearing?', } ] ) ``` Running this I always get the default description of the image as if I sent "describe the image". ``` >python image.py The image you've provided is a stylized illustration of a character that appears to be a superhero. The character is depicted with a muscular build, wearing a cape with a logo on the chest and the number "5" visible. The cape suggests a classic superhero theme. The character is also wearing a mask with a design that resembles a helmet, which could imply some sort of identity concealment or protection. In the background, there's a banner with text that says "WWW.MHM.NET," which likely refers to a website related to the fictional superhero team "The Justice League" or similar content. The style of the image is reminiscent of comic book art and is designed in a way that would be appealing for fans of superheroes. (base) joakimeriksson@MAPVJLV3F57XY ollama % python image.py This is an image of a character, who appears to be a superhero based on the cape and the emblem on the chest. The character has a muscular build, with visible abs and defined biceps, suggesting strength and physical fitness. They are wearing a costume that includes a mask with a design that could be associated with a sports team or a particular brand, as indicated by the logo on the mask. The number "6" is prominently displayed on the chest of the costume, which might signify a team number or an identification number. Additionally, there's text on the image that reads "JWM" and "WWW," which are likely part of the character's name or a branding element. The overall style of the image is reminiscent of comic book art, with bold lines, shading, and a dynamic pose. ``` Changing to the following works: ```python import ollama res = ollama.chat( model="llava", messages=[ { 'role': 'user', 'images': ['../jman-html5.jpg'], 'content': ' ', }, { 'role': 'user', 'content': 'What is the man wearing?', 'images': [] } ] ) ``` Running this will give a response about what the man i wearing. ``` >python image.py The man in the image is wearing a suit of armor, which appears to be inspired by a superhero theme. He has a muscular build and is standing with his arms crossed. ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1669