[PR #10388] [CLOSED] Fixes for Mistral 3 Small #44476

Closed
opened 2026-04-24 23:56:56 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10388
Author: @deece
Created: 4/24/2025
Status: Closed

Base: mainHead: main


📝 Commits (3)

  • de7a840 Large tensors may have more than INT_MAX (32bit signed) elements
  • 16f8733 This patch squishes the assertion when we attempt to copy into a view
  • 731d54f Create a generic device properties cache, and use this in scale.cu

📊 Changes

4 files changed (+59 additions, -2 deletions)

View changed files

📝 ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu (+3 -1)
ml/backend/ggml/ggml/src/ggml-cuda/device.cu (+22 -0)
ml/backend/ggml/ggml/src/ggml-cuda/device.cuh (+9 -0)
📝 ml/backend/ggml/ggml/src/ggml-cuda/scale.cu (+25 -1)

📄 Description

These patches are requires for Mistral 3 Small to work with (larger?) images.

The first prevents an overflow in in the CUDA scale function, by chunking the work into blocks of at most INT_MAX.

The second squishes an assertion that is triggered by Mistral 3 Small where on layer 27, we copy to a view that is backed by memory that is a different (larger) size to the view. The size of the view matches the source, so this operation is safe.

These patches fix https://github.com/ollama/ollama/issues/10234


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10388 **Author:** [@deece](https://github.com/deece) **Created:** 4/24/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (3) - [`de7a840`](https://github.com/ollama/ollama/commit/de7a840b8cb6c79803e41c8ff69a0ce638a270a1) Large tensors may have more than INT_MAX (32bit signed) elements - [`16f8733`](https://github.com/ollama/ollama/commit/16f87337512bae24c3a324b5f778241572ac6e6b) This patch squishes the assertion when we attempt to copy into a view - [`731d54f`](https://github.com/ollama/ollama/commit/731d54f5bfa872c3d8a67e6a9a5410781d95440e) Create a generic device properties cache, and use this in scale.cu ### 📊 Changes **4 files changed** (+59 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu` (+3 -1) ➕ `ml/backend/ggml/ggml/src/ggml-cuda/device.cu` (+22 -0) ➕ `ml/backend/ggml/ggml/src/ggml-cuda/device.cuh` (+9 -0) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/scale.cu` (+25 -1) </details> ### 📄 Description These patches are requires for Mistral 3 Small to work with (larger?) images. The first prevents an overflow in in the CUDA scale function, by chunking the work into blocks of at most INT_MAX. The second squishes an assertion that is triggered by Mistral 3 Small where on layer 27, we copy to a view that is backed by memory that is a different (larger) size to the view. The size of the view matches the source, so this operation is safe. These patches fix https://github.com/ollama/ollama/issues/10234 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 23:56:56 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44476