[PR #13309] [CLOSED] Fix Critical Data Loss in mxbai-embed-large Model (80% Batch Failure) #60857

Closed
opened 2026-04-29 15:58:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13309
Author: @ljluestc
Created: 12/3/2025
Status: Closed

Base: mainHead: fix/mxbai-embed-data-loss


📝 Commits (10+)

  • d75c7bb Fix critical 80% data loss in mxbai-embed-large batch processing
  • 9fdbf15 Remove redundant test files, keep essential components
  • b7971ee Add comprehensive PR description
  • 228bf60 Replace vibe coding with professional code formatting
  • 5cc5ee3 Create focused PR description with essential components only
  • a775c0b Remove redundant PR_DESCRIPTION.md file
  • e8e93bb Add comprehensive PR description with detailed issue analysis
  • 0411de0 Remove empty PR description file
  • 0371bd7 Add comprehensive PR description with detailed issue analysis
  • 18c3a58 Add comprehensive PR description with detailed issue analysis

📊 Changes

5 files changed (+587 additions, -7 deletions)

View changed files

integration/mxbai_embed_test.go (+254 -0)
📝 llama/llama.go (+67 -4)
📝 runner/llamarunner/runner.go (+29 -1)
📝 server/routes.go (+45 -2)
test_large_batch_stress.py (+192 -0)

📄 Description

Fix Critical Data Loss in mxbai-embed-large Model (80% Batch Failure)

🚨 Critical Issue Summary

The mxbai-embed-large embedding model was experiencing catastrophic data loss where 80% of batch requests were failing (5000 inputs → ~1000 outputs) due to incorrect batch processing using llama_decode() instead of llama_encode() for encoder-only models.

Impact Assessment

  • Severity: Critical - Production systems losing 80% of embedding data
  • Scope: Affects all encoder-only models (mxbai-embed-large, similar embedding models)
  • Root Cause: Architecture mismatch - using decoder API for encoder-only models
  • Error Pattern: "cannot decode batches with this context (use llama_encode() instead)"

🔍 Root Cause Analysis

The issue stemmed from a fundamental architectural mismatch in llama.cpp model processing:

Model Architecture Types

  1. Encoder-only models (like mxbai-embed-large)

    • Use bidirectional attention for embedding generation
    • Require llama_encode() for batch processing
    • Cannot use llama_decode() - it's designed for text generation
  2. Decoder models (text generation models)

    • Use causal attention for token generation
    • Require llama_decode() for processing
    • Standard for LLM text generation
  3. Hybrid models (encoder-decoder)

    • Can use both depending on the task

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13309 **Author:** [@ljluestc](https://github.com/ljluestc) **Created:** 12/3/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix/mxbai-embed-data-loss` --- ### 📝 Commits (10+) - [`d75c7bb`](https://github.com/ollama/ollama/commit/d75c7bb4673b42b99a285d37818af41fbf8622a1) Fix critical 80% data loss in mxbai-embed-large batch processing - [`9fdbf15`](https://github.com/ollama/ollama/commit/9fdbf15e585a34ff606dd03e0db372f08e335de0) Remove redundant test files, keep essential components - [`b7971ee`](https://github.com/ollama/ollama/commit/b7971eed958135e8e8a755ab15e31e27b7324d38) Add comprehensive PR description - [`228bf60`](https://github.com/ollama/ollama/commit/228bf60f4663169013548c0165fb14cc68acd201) Replace vibe coding with professional code formatting - [`5cc5ee3`](https://github.com/ollama/ollama/commit/5cc5ee378cc744bf7741eaaf5e73392a6d8d789a) Create focused PR description with essential components only - [`a775c0b`](https://github.com/ollama/ollama/commit/a775c0b5a8de72af8d6ca3dc57e6e0b58123c643) Remove redundant PR_DESCRIPTION.md file - [`e8e93bb`](https://github.com/ollama/ollama/commit/e8e93bb0d68d40bb30d6c9a89f2ca19acf1333fb) Add comprehensive PR description with detailed issue analysis - [`0411de0`](https://github.com/ollama/ollama/commit/0411de062043a0371027fe0ff0accf6ba362fb36) Remove empty PR description file - [`0371bd7`](https://github.com/ollama/ollama/commit/0371bd703252c50c2e9efde029e54b074b766b2b) Add comprehensive PR description with detailed issue analysis - [`18c3a58`](https://github.com/ollama/ollama/commit/18c3a58935e7b717fed8638e9274492cdae8318a) Add comprehensive PR description with detailed issue analysis ### 📊 Changes **5 files changed** (+587 additions, -7 deletions) <details> <summary>View changed files</summary> ➕ `integration/mxbai_embed_test.go` (+254 -0) 📝 `llama/llama.go` (+67 -4) 📝 `runner/llamarunner/runner.go` (+29 -1) 📝 `server/routes.go` (+45 -2) ➕ `test_large_batch_stress.py` (+192 -0) </details> ### 📄 Description # Fix Critical Data Loss in mxbai-embed-large Model (80% Batch Failure) ## 🚨 Critical Issue Summary The `mxbai-embed-large` embedding model was experiencing **catastrophic data loss** where 80% of batch requests were failing (5000 inputs → ~1000 outputs) due to incorrect batch processing using `llama_decode()` instead of `llama_encode()` for encoder-only models. ### Impact Assessment - **Severity**: Critical - Production systems losing 80% of embedding data - **Scope**: Affects all encoder-only models (mxbai-embed-large, similar embedding models) - **Root Cause**: Architecture mismatch - using decoder API for encoder-only models - **Error Pattern**: "cannot decode batches with this context (use llama_encode() instead)" ## 🔍 Root Cause Analysis The issue stemmed from a fundamental architectural mismatch in llama.cpp model processing: ### Model Architecture Types 1. **Encoder-only models** (like mxbai-embed-large) - Use bidirectional attention for embedding generation - Require `llama_encode()` for batch processing - Cannot use `llama_decode()` - it's designed for text generation 2. **Decoder models** (text generation models) - Use causal attention for token generation - Require `llama_decode()` for processing - Standard for LLM text generation 3. **Hybrid models** (encoder-decoder) - Can use both depending on the task --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 15:58:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#60857