[PR #15761] x/mlxrunner/model: add baseline test coverage for quant metadata scanning (supersedes #15745) #46495

Open
opened 2026-04-25 01:55:12 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15761
Author: @jodagreyhame
Created: 4/23/2026
Status: 🔄 Open

Base: mainHead: pr3/mlxrunner-model-baseline-tests


📝 Commits (2)

  • 620bcf8 x/mlxrunner: recognise mlx-lm plural aux naming at load time
  • 26b0e2c x/mlxrunner/model: add baseline test coverage for quant metadata scanning

📊 Changes

9 files changed (+745 additions, -27 deletions)

View changed files

📝 x/mlxrunner/model/embedding.go (+14 -3)
📝 x/mlxrunner/model/embedding_test.go (+32 -0)
📝 x/mlxrunner/model/linear.go (+18 -3)
x/mlxrunner/model/linear_test.go (+175 -0)
x/mlxrunner/model/quant_test.go (+157 -0)
📝 x/mlxrunner/model/root.go (+6 -1)
x/mlxrunner/model/root_test.go (+147 -0)
📝 x/mlxrunner/runner.go (+69 -20)
x/mlxrunner/runner_test.go (+127 -0)

📄 Description

Supersedes #15745, which was auto-closed when its head branch was transiently deleted from my fork during local cleanup; GitHub does not allow reopening cross-fork PRs once their head ref is deleted, so this is a fresh PR pointing at the same (now restored) branch.

Branch contents are unchanged from the closed PR's last state, plus:

  • Codex review on quant_test.go:124 (P1: int8_groupSize64 test had wrong shapes — [4,4]/[4,1] resolves to groupSize4 == 32 and early-returns (32, 4), never reaching the groupSize8 == 64 branch the test name advertised; corrected to [4, 16]/[4, 1] which actually triggers the groupSize8 == 64 arm and asserts (64, 8, true)).
  • Two latent test bugs surfaced when running with MLX active (Codex didn't catch these because skipIfNoMLX masks them in the bot's environment): TestMakeLinearLayer_OllamaNativeQuantized was using affine mode + shapes that triggered shape inference, overwriting the defaults the test intended to verify — switched to nvfp4 mode so defaults flow through cleanly. TestResolveLinearQuantParams_AffineShapeInference was using shapes that hit the explicit groupSize4 == 32 arm instead of the ambiguous case it was advertising — corrected to [4, 16] / [4, 2] which actually triggers groupSize4 == 64 && groupSize8 == 32.

Summary

The x/mlxrunner/model package ships substantial quant-metadata logic with embedding_test.go as the sole test file. This PR adds targeted tests for the remaining exported and package-private functions that exist on main today, filling a coverage gap without changing any production code.

Zero production code is touched. Every test passes against the current model/ package unmodified. The PR is self-contained and independent of the naming-recognition and config.json-override PRs.

Scope

This PR covers baseline behaviour in the model/ package that is common across all three PRs in the series — i.e. behaviour that holds regardless of whether the naming-recognition PR or the config-overrides PR has been merged:

  • QuantizationParams — the type → (group_size, bits, mode) lookup.
  • TensorQuantParams — the nil-map / missing-key / empty-QuantType fallbacks and the GroupSize override branch. Does not cover the Bits/Mode field semantics (those are introduced by the config-overrides PR and tested there).
  • ResolveLinearQuantParams — the non-affine shape-inference skip and the ambiguous 64/32 hint-bits tiebreak. Does not cover the new per-tensor-fields branch added by the config-overrides PR.
  • InferAffineQuantParamsFromShapes — all explicit and fallback branches.
  • parseGlobalQuantMetadata — present / absent / malformed metadata.
  • mainTensorNames — the Ollama-native singular-suffix filter. Plural-suffix filter extension is tested in the naming-recognition PR.
  • inferQuantTypeFromShapes — the <tensorName>.scale lookup path (singular).
  • readBlobTensorQuantInfo — both the Ollama-native happy path and the four error paths.
  • MakeLinearLayer — dense fallback, Ollama-native quantised, and the existing per-tensor {QuantType, GroupSize} override semantics.

Explicitly not in scope:

  • Plural aux naming (<module>.scales / <module>.biases) — introduced and tested by the naming-recognition PR.
  • config.json quant overrides and the new TensorQuantInfo.Bits/.Mode field semantics — introduced and tested by the config-overrides PR.
  • Any behaviour that only becomes true after one of the sibling PRs merges.

Order of merge doesn't matter: the assertions in this PR only describe behaviour that both sibling PRs preserve.

Why

The package's behaviour is nuanced enough that regression guards are worth having in place before any follow-up touches these functions. In particular:

  • Shape-based affine quant inference has several distinct branches (explicit 32/64 match, the 64/32 ambiguous pair resolved by hint bits, and the isCommonGroupSize fallback in both directions) — none tested.
  • readBlobTensorQuantInfo has four error paths (missing file, truncated header, oversize header, malformed JSON) and an Ollama-native happy path — neither tested outside of the indirect coverage via embedding_test.go.
  • QuantizationParams's internal uppercasing and the exact default-branch behaviour for unknown types are load-bearing for several other call sites and weren't locked in.

Fixing these in isolation also makes future edits in this package safer to review.

Test plan

  • go test ./x/mlxrunner/model/... on macOS arm64 with MLX — all pass.
  • go test ./x/mlxrunner/model/... on Linux / CI without MLX — MLX-dependent cases skip via existing skipIfNoMLX; pure-Go cases pass.
  • go vet ./x/mlxrunner/model/... — clean.

Files touched

x/mlxrunner/model/quant_test.go     new
x/mlxrunner/model/root_test.go      new
x/mlxrunner/model/linear_test.go    new

No production code changes.

Risk

Zero — test-only.

Relationship to other PRs

Fully independent. Lands cleanly before, after, or between the naming-recognition PR and the config.json-overrides PR.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15761 **Author:** [@jodagreyhame](https://github.com/jodagreyhame) **Created:** 4/23/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `pr3/mlxrunner-model-baseline-tests` --- ### 📝 Commits (2) - [`620bcf8`](https://github.com/ollama/ollama/commit/620bcf80228ce31ab9ac1964c03f4627428f1009) x/mlxrunner: recognise mlx-lm plural aux naming at load time - [`26b0e2c`](https://github.com/ollama/ollama/commit/26b0e2c4695b2a59fee0924e2c7ab972152009af) x/mlxrunner/model: add baseline test coverage for quant metadata scanning ### 📊 Changes **9 files changed** (+745 additions, -27 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/model/embedding.go` (+14 -3) 📝 `x/mlxrunner/model/embedding_test.go` (+32 -0) 📝 `x/mlxrunner/model/linear.go` (+18 -3) ➕ `x/mlxrunner/model/linear_test.go` (+175 -0) ➕ `x/mlxrunner/model/quant_test.go` (+157 -0) 📝 `x/mlxrunner/model/root.go` (+6 -1) ➕ `x/mlxrunner/model/root_test.go` (+147 -0) 📝 `x/mlxrunner/runner.go` (+69 -20) ➕ `x/mlxrunner/runner_test.go` (+127 -0) </details> ### 📄 Description Supersedes #15745, which was auto-closed when its head branch was transiently deleted from my fork during local cleanup; GitHub does not allow reopening cross-fork PRs once their head ref is deleted, so this is a fresh PR pointing at the same (now restored) branch. Branch contents are unchanged from the closed PR's last state, **plus**: - Codex review on `quant_test.go:124` (P1: `int8_groupSize64` test had wrong shapes — `[4,4]/[4,1]` resolves to `groupSize4 == 32` and early-returns `(32, 4)`, never reaching the `groupSize8 == 64` branch the test name advertised; corrected to `[4, 16]/[4, 1]` which actually triggers the `groupSize8 == 64` arm and asserts `(64, 8, true)`). - Two latent test bugs surfaced when running with MLX active (Codex didn't catch these because `skipIfNoMLX` masks them in the bot's environment): `TestMakeLinearLayer_OllamaNativeQuantized` was using `affine` mode + shapes that triggered shape inference, overwriting the defaults the test intended to verify — switched to `nvfp4` mode so defaults flow through cleanly. `TestResolveLinearQuantParams_AffineShapeInference` was using shapes that hit the explicit `groupSize4 == 32` arm instead of the ambiguous case it was advertising — corrected to `[4, 16] / [4, 2]` which actually triggers `groupSize4 == 64 && groupSize8 == 32`. --- ## Summary The `x/mlxrunner/model` package ships substantial quant-metadata logic with `embedding_test.go` as the sole test file. This PR adds targeted tests for the remaining exported and package-private functions that exist on main today, filling a coverage gap without changing any production code. Zero production code is touched. Every test passes against the current `model/` package unmodified. The PR is self-contained and independent of the naming-recognition and config.json-override PRs. ## Scope This PR covers baseline behaviour in the `model/` package that is common across all three PRs in the series — i.e. behaviour that holds regardless of whether the naming-recognition PR or the config-overrides PR has been merged: - `QuantizationParams` — the type → `(group_size, bits, mode)` lookup. - `TensorQuantParams` — the nil-map / missing-key / empty-`QuantType` fallbacks and the `GroupSize` override branch. Does **not** cover the `Bits`/`Mode` field semantics (those are introduced by the config-overrides PR and tested there). - `ResolveLinearQuantParams` — the non-affine shape-inference skip and the ambiguous 64/32 hint-bits tiebreak. Does **not** cover the new per-tensor-fields branch added by the config-overrides PR. - `InferAffineQuantParamsFromShapes` — all explicit and fallback branches. - `parseGlobalQuantMetadata` — present / absent / malformed metadata. - `mainTensorNames` — the Ollama-native singular-suffix filter. Plural-suffix filter extension is tested in the naming-recognition PR. - `inferQuantTypeFromShapes` — the `<tensorName>.scale` lookup path (singular). - `readBlobTensorQuantInfo` — both the Ollama-native happy path and the four error paths. - `MakeLinearLayer` — dense fallback, Ollama-native quantised, and the existing per-tensor `{QuantType, GroupSize}` override semantics. Explicitly **not** in scope: - Plural aux naming (`<module>.scales` / `<module>.biases`) — introduced and tested by the naming-recognition PR. - `config.json` quant overrides and the new `TensorQuantInfo.Bits`/`.Mode` field semantics — introduced and tested by the config-overrides PR. - Any behaviour that only becomes true after one of the sibling PRs merges. Order of merge doesn't matter: the assertions in this PR only describe behaviour that both sibling PRs preserve. ## Why The package's behaviour is nuanced enough that regression guards are worth having in place before any follow-up touches these functions. In particular: - Shape-based affine quant inference has several distinct branches (explicit 32/64 match, the 64/32 ambiguous pair resolved by hint bits, and the `isCommonGroupSize` fallback in both directions) — none tested. - `readBlobTensorQuantInfo` has four error paths (missing file, truncated header, oversize header, malformed JSON) and an Ollama-native happy path — neither tested outside of the indirect coverage via `embedding_test.go`. - `QuantizationParams`'s internal uppercasing and the exact default-branch behaviour for unknown types are load-bearing for several other call sites and weren't locked in. Fixing these in isolation also makes future edits in this package safer to review. ## Test plan - [ ] `go test ./x/mlxrunner/model/...` on macOS arm64 with MLX — all pass. - [ ] `go test ./x/mlxrunner/model/...` on Linux / CI without MLX — MLX-dependent cases skip via existing `skipIfNoMLX`; pure-Go cases pass. - [ ] `go vet ./x/mlxrunner/model/...` — clean. ## Files touched ``` x/mlxrunner/model/quant_test.go new x/mlxrunner/model/root_test.go new x/mlxrunner/model/linear_test.go new ``` No production code changes. ## Risk Zero — test-only. ## Relationship to other PRs Fully independent. Lands cleanly before, after, or between the naming-recognition PR and the config.json-overrides PR. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:55:12 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46495