[GH-ISSUE #5264] How to use the .mf model configuration file to register a customize vision-language model in Ollama #3297

Open
opened 2026-04-12 13:51:35 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @LJY16114 on GitHub (Jun 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5264

This is my .mf model configuration file:

FROM /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-f16.gguf
TEMPLATE """
{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>
"""
PARAMETER stop "<|system|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>

Ollama can recognize our customized .gguf model and display a model option in open-webui.
However, the loaded model cannot understand the content of the input image and randomly says things that are completely unrelated to the image.

What could be the cause of this problem?

Thank you in advance!

@jmorganca

Originally created by @LJY16114 on GitHub (Jun 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5264 This is my .mf model configuration file: FROM /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-f16.gguf TEMPLATE """ {{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|> """ PARAMETER stop "<|system|>" PARAMETER stop "<|user|>" PARAMETER stop "<|assistant|> Ollama can recognize our customized .gguf model and display a model option in open-webui. However, the loaded model cannot understand the content of the input image and randomly says things that are completely unrelated to the image. What could be the cause of this problem? Thank you in advance! @jmorganca
Author
Owner

@ljluestc commented on GitHub (Mar 11, 2025):


# Specify the base language model file
FROM /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-f16.gguf

# Specify the multimodal projection file (adjust the path as needed)
# This is critical for image processing
PARAMETER mmproj /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf

# Define the template for multimodal input
# The <image> tag is a placeholder for the image data
TEMPLATE """
{{ if .System }}<|start_header_id|>system<|end_header_id>

{{ .System }}<|eot_id>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id>

<image>
{{ .Prompt }}<|eot_id>{{ end }}<|start_header_id|>assistant<|end_header_id>

{{ .Response }}<|eot_id>
"""

# Stop tokens to ensure proper conversation flow
PARAMETER stop "<|system|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "<|eot_id>"

# Optional: Set additional parameters for better performance
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​
<!-- gh-comment-id:2715469114 --> @ljluestc commented on GitHub (Mar 11, 2025): ``` # Specify the base language model file FROM /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-f16.gguf # Specify the multimodal projection file (adjust the path as needed) # This is critical for image processing PARAMETER mmproj /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf # Define the template for multimodal input # The <image> tag is a placeholder for the image data TEMPLATE """ {{ if .System }}<|start_header_id|>system<|end_header_id> {{ .System }}<|eot_id>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id> <image> {{ .Prompt }}<|eot_id>{{ end }}<|start_header_id|>assistant<|end_header_id> {{ .Response }}<|eot_id> """ # Stop tokens to ensure proper conversation flow PARAMETER stop "<|system|>" PARAMETER stop "<|user|>" PARAMETER stop "<|assistant|>" PARAMETER stop "<|eot_id>" # Optional: Set additional parameters for better performance PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER num_ctx 4096 ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​ ```
Author
Owner

@rjmalagon commented on GitHub (Mar 17, 2025):


# Specify the base language model file
FROM /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-f16.gguf

# Specify the multimodal projection file (adjust the path as needed)
# This is critical for image processing
PARAMETER mmproj /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf

# Define the template for multimodal input
# The <image> tag is a placeholder for the image data
TEMPLATE """
{{ if .System }}<|start_header_id|>system<|end_header_id>

{{ .System }}<|eot_id>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id>

<image>
{{ .Prompt }}<|eot_id>{{ end }}<|start_header_id|>assistant<|end_header_id>

{{ .Response }}<|eot_id>
"""

# Stop tokens to ensure proper conversation flow
PARAMETER stop "<|system|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "<|eot_id>"

# Optional: Set additional parameters for better performance
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

This is not correct, the projection file is not loaded like a PARAMETER.
Both gguf files are loaded with FROM

FROM /path/main_model.gguf
FROM /path/projection_mmproj.gguf
<!-- gh-comment-id:2728280403 --> @rjmalagon commented on GitHub (Mar 17, 2025): > ``` > > # Specify the base language model file > FROM /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-f16.gguf > > # Specify the multimodal projection file (adjust the path as needed) > # This is critical for image processing > PARAMETER mmproj /root/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf > > # Define the template for multimodal input > # The <image> tag is a placeholder for the image data > TEMPLATE """ > {{ if .System }}<|start_header_id|>system<|end_header_id> > > {{ .System }}<|eot_id>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id> > > <image> > {{ .Prompt }}<|eot_id>{{ end }}<|start_header_id|>assistant<|end_header_id> > > {{ .Response }}<|eot_id> > """ > > # Stop tokens to ensure proper conversation flow > PARAMETER stop "<|system|>" > PARAMETER stop "<|user|>" > PARAMETER stop "<|assistant|>" > PARAMETER stop "<|eot_id>" > > # Optional: Set additional parameters for better performance > PARAMETER temperature 0.7 > PARAMETER top_p 0.9 > PARAMETER num_ctx 4096 > ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​ > ``` This is not correct, the projection file is not loaded like a `PARAMETER`. Both gguf files are loaded with `FROM` ``` FROM /path/main_model.gguf FROM /path/projection_mmproj.gguf ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3297