[PR #10807] fix: mllama quality #13372

Closed
opened 2026-04-13 00:25:16 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/10807

State: closed
Merged: Yes


this change contains a series of fixes for mllama impacting both the model itself and the graph

model changes

  • attn_gate and ffn_gate did not have tanh applied to them
  • attn_q and attn_k tensors for the vision model did not have their attention heads swapped as is tradition for llama models

graph changes

  • during refactoring, ffn_gate was moved out of mllama.VisionMLP into its parent but was not applied in the correct spot
  • remove mllama.VisionSelfAttention.Gate which is unused
  • use nn.Attention
**Original Pull Request:** https://github.com/ollama/ollama/pull/10807 **State:** closed **Merged:** Yes --- this change contains a series of fixes for mllama impacting both the model itself and the graph model changes - `attn_gate` and `ffn_gate` did not have `tanh` applied to them - `attn_q` and `attn_k` tensors for the vision model did not have their attention heads swapped as is tradition for llama models graph changes - during refactoring, ffn_gate was moved out of `mllama.VisionMLP` into its parent but was not applied in the correct spot - remove `mllama.VisionSelfAttention.Gate` which is unused - use `nn.Attention`
GiteaMirror added the pull-request label 2026-04-13 00:25:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13372