[GH-ISSUE #12508] [Performance] embeddinggemma: 164x slowdown with excessive whitespace in text #70364

Open
opened 2026-05-04 21:16:57 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @FunKite on GitHub (Oct 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12508

Originally assigned to: @mxyng on GitHub.

What is the issue?

Description

The embeddinggemma:300m-qat-q8_0 and embeddinggemma:300m-qat-q4_0 models exhibit severe performance degradation (up to
164x slowdown) when processing text containing excessive whitespace characters, such as text extracted from PDFs. This
appears to be a tokenizer or preprocessing bottleneck that scales exponentially with whitespace density.

Key Finding: The same text with normalized whitespace processes in 0.098s, while text with double/triple spaces takes
16.059s.

Reproduction Steps

  1. Extract text from a PDF (or create text with excessive whitespace):

Simulated PDF extraction result with double spaces

messy_text = "In a few cases (clearly labelled in what follows), evaluations were performed by third
parties." * 20 #Important! Github removed my extra spaces between words, you need to reinsert them

  1. Generate embedding with messy text:

import requests
import time

response = requests.post(
'http://localhost:11434/api/embeddings',
json={
'model': 'embeddinggemma:300m-qat-q8_0',
'prompt': messy_text,
'keep_alive': '30m'
},
timeout=30
)
Result: ~16 seconds

  1. Clean whitespace and retry:

import re
clean_text = re.sub(r'\s+', ' ', messy_text)

response = requests.post(
'http://localhost:11434/api/embeddings',
json={
'model': 'embeddinggemma:300m-qat-q8_0',
'prompt': clean_text,
'keep_alive': '30m'
},
timeout=30
)
Result: ~0.1 seconds (164x faster!)

Performance Data

Controlled Test Results:

  • Normal spacing (1360 chars, single spaces): 0.096s (baseline)
  • Double spaces (2120 chars): 12.847s (134x slower)
  • PDF-extracted (2807 chars, mixed 2-3x spaces): 16.059s (167x slower)
  • Same PDF cleaned (2232 chars, normalized): 0.098s (164x faster)

Pattern-Based Testing:

  • "x" * 2000 (no spaces): 0.089s ✓
  • Normal sentence (6 spaces): 0.069s ✓
  • Same sentence with double spaces: 4.295s ✗
  • PDF chunk with excessive whitespace: 16.059s ✗

Batch Processing Impact (147 PDF chunks):

  • Without cleaning: ~15 minutes (0.16 chunks/sec)
  • With cleaning: 14.7 seconds (10.1 chunks/sec)
  • Improvement: 61x faster

System Information

Ollama Server:

  • Hardware: Mac M1 Max
  • OS: macOS Tahoe 26.0.1
  • Ollama Version: 0.12.3
  • Network: Exposed to LAN (accessed remotely)
  • Model: embeddinggemma: 300m-qat-q8_0 (768 dimensions), e84a7acc2394
  • Model Size: 0.74GB (loaded in memory)

Client (Testing):

  • OS: Ubuntu Server (Linux)
  • Python: 3.13.3
  • Connection: Remote API access over LAN
  • Network Latency: <15ms (ruled out as cause)

API Endpoint: /api/embeddings

Model Comparison

Tested identical messy text across different embedding models:

  • embeddinggemma:300m-qat-q8_0 (768d): Messy 16.059s, Clean 0.098s - AFFECTED
  • embeddinggemma:300m-qat-q4_0 (768d): Messy 16.534s, Clean 0.079s - AFFECTED
  • snowflake-arctic-embed2:latest (1024d): Messy 0.115s, Clean 0.098s - NOT AFFECTED

Conclusion: Issue is specific to embeddinggemma models, not related to network, hardware, or OS.

Root Cause Analysis

The performance scales exponentially with whitespace density, not linearly with text length.

Hypothesis: Tokenizer or preprocessor has O(n²) complexity when handling consecutive whitespace

Evidence:

  • Pure character sequences are fast (~0.09s)
  • Normal text with proper spacing is fast (~0.1s)
  • Identical semantic content with extra spaces is very slow (16s)
  • Other models (snowflake) handle same text fine on same hardware
  • Network latency ruled out (consistent <15ms)

Complete Reproduction Script

#!/usr/bin/env python3
"""Test whitespace performance bug in embeddinggemma"""
import requests
import time
import re

OLLAMA_URL = "http://localhost:11434" # Update to your Ollama server
MODEL = "embeddinggemma:300m-qat-q8_0"

Test text with excessive whitespace (simulates PDF extraction)

messy_text = "In a few cases (clearly labelled in what follows), evaluations were performed by third
parties." * 20 #Github removed spaces between words, you need to reinsert them

Cleaned version

clean_text = re.sub(r'\s+', ' ', messy_text)

print(f"Messy: {len(messy_text)} chars, {messy_text.count(' ')} spaces")
print(f"Clean: {len(clean_text)} chars, {clean_text.count(' ')} spaces\n")

Test messy

print("Testing messy text...")
start = time.time()
r1 = requests.post(f"{OLLAMA_URL}/api/embeddings", json={
"model": MODEL, "prompt": messy_text, "keep_alive": "30m"
}, timeout=60)
messy_time = time.time() - start
print(f"Messy: {messy_time:.3f}s")

Test clean

print("\nTesting clean text...")
start = time.time()
r2 = requests.post(f"{OLLAMA_URL}/api/embeddings", json={
"model": MODEL, "prompt": clean_text, "keep_alive": "30m"
}, timeout=60)
clean_time = time.time() - start
print(f"Clean: {clean_time:.3f}s")

print(f"\nSpeedup: {messy_time/clean_time:.1f}x faster with cleaning")

Expected Output:
Messy: 2120 chars, 760 spaces
Clean: 1360 chars, 380 spaces

Testing messy text...
Messy: 12.847s

Testing clean text...
Clean: 0.096s

Speedup: 133.8x faster with cleaning

Impact

Severity: High

  • Affects all PDF/document processing workflows
  • Makes embeddinggemma unusable for RAG systems without preprocessing
  • Can cause timeouts in production (16s vs typical 1-3s timeout)
  • Silent failure mode (no error, just extreme slowness)

Use Cases Affected:

  • PDF document embedding
  • Web scraping (HTML often has extra whitespace)
  • Legacy document processing
  • Any workflow using text extraction tools

Workaround

Pre-process all text before embedding:

import re
text = re.sub(r'\s+', ' ', text.strip())

Expected Behavior

  • Embedding time should scale linearly with text length
  • Whitespace pattern should not significantly impact performance
  • A 2800-character text should take ~0.1-0.2s regardless of spacing

Additional Context

  • Discovered while processing PDF documents with PyPDF extraction
  • PDF extraction tools commonly create patterns like: "System Card: Claude Sonnet 4.5"
  • Reproducible across different PDF sources with similar extraction artifacts
  • Tested on Mac M1 Max hardware - issue is not hardware-specific
  • Network latency confirmed negligible (<15ms) - slowdown occurs server-side
  • Issue #3944 reported pure whitespace hanging (now fixed), but this is different - about excessive whitespace within
    text
  • Issue #12239 reports general embeddinggemma slowness, but doesn't identify the whitespace cause

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.12.3

Originally created by @FunKite on GitHub (Oct 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12508 Originally assigned to: @mxyng on GitHub. ### What is the issue? Description The embeddinggemma:300m-qat-q8_0 and embeddinggemma:300m-qat-q4_0 models exhibit severe performance degradation (up to 164x slowdown) when processing text containing excessive whitespace characters, such as text extracted from PDFs. This appears to be a tokenizer or preprocessing bottleneck that scales exponentially with whitespace density. Key Finding: The same text with normalized whitespace processes in 0.098s, while text with double/triple spaces takes 16.059s. Reproduction Steps 1. Extract text from a PDF (or create text with excessive whitespace): Simulated PDF extraction result with double spaces messy_text = "In a few cases (clearly labelled in what follows), evaluations were performed by third parties." * 20 #Important! Github removed my extra spaces between words, you need to reinsert them 2. Generate embedding with messy text: import requests import time response = requests.post( 'http://localhost:11434/api/embeddings', json={ 'model': 'embeddinggemma:300m-qat-q8_0', 'prompt': messy_text, 'keep_alive': '30m' }, timeout=30 ) Result: ~16 seconds 3. Clean whitespace and retry: import re clean_text = re.sub(r'\s+', ' ', messy_text) response = requests.post( 'http://localhost:11434/api/embeddings', json={ 'model': 'embeddinggemma:300m-qat-q8_0', 'prompt': clean_text, 'keep_alive': '30m' }, timeout=30 ) Result: ~0.1 seconds (164x faster!) Performance Data Controlled Test Results: - Normal spacing (1360 chars, single spaces): 0.096s (baseline) - Double spaces (2120 chars): 12.847s (134x slower) - PDF-extracted (2807 chars, mixed 2-3x spaces): 16.059s (167x slower) - Same PDF cleaned (2232 chars, normalized): 0.098s (164x faster) Pattern-Based Testing: - "x" * 2000 (no spaces): 0.089s ✓ - Normal sentence (6 spaces): 0.069s ✓ - Same sentence with double spaces: 4.295s ✗ - PDF chunk with excessive whitespace: 16.059s ✗ Batch Processing Impact (147 PDF chunks): - Without cleaning: ~15 minutes (0.16 chunks/sec) - With cleaning: 14.7 seconds (10.1 chunks/sec) - Improvement: 61x faster System Information Ollama Server: - Hardware: Mac M1 Max - OS: macOS Tahoe 26.0.1 - Ollama Version: 0.12.3 - Network: Exposed to LAN (accessed remotely) - Model: embeddinggemma: 300m-qat-q8_0 (768 dimensions), e84a7acc2394 - Model Size: 0.74GB (loaded in memory) Client (Testing): - OS: Ubuntu Server (Linux) - Python: 3.13.3 - Connection: Remote API access over LAN - Network Latency: <15ms (ruled out as cause) API Endpoint: /api/embeddings Model Comparison Tested identical messy text across different embedding models: - embeddinggemma:300m-qat-q8_0 (768d): Messy 16.059s, Clean 0.098s - AFFECTED - embeddinggemma:300m-qat-q4_0 (768d): Messy 16.534s, Clean 0.079s - AFFECTED - snowflake-arctic-embed2:latest (1024d): Messy 0.115s, Clean 0.098s - NOT AFFECTED Conclusion: Issue is specific to embeddinggemma models, not related to network, hardware, or OS. Root Cause Analysis The performance scales exponentially with whitespace density, not linearly with text length. Hypothesis: Tokenizer or preprocessor has O(n²) complexity when handling consecutive whitespace Evidence: - Pure character sequences are fast (~0.09s) - Normal text with proper spacing is fast (~0.1s) - Identical semantic content with extra spaces is very slow (16s) - Other models (snowflake) handle same text fine on same hardware - Network latency ruled out (consistent <15ms) Complete Reproduction Script #!/usr/bin/env python3 """Test whitespace performance bug in embeddinggemma""" import requests import time import re OLLAMA_URL = "http://localhost:11434" # Update to your Ollama server MODEL = "embeddinggemma:300m-qat-q8_0" Test text with excessive whitespace (simulates PDF extraction) messy_text = "In a few cases (clearly labelled in what follows), evaluations were performed by third parties." * 20 #Github removed spaces between words, you need to reinsert them Cleaned version clean_text = re.sub(r'\s+', ' ', messy_text) print(f"Messy: {len(messy_text)} chars, {messy_text.count(' ')} spaces") print(f"Clean: {len(clean_text)} chars, {clean_text.count(' ')} spaces\n") Test messy print("Testing messy text...") start = time.time() r1 = requests.post(f"{OLLAMA_URL}/api/embeddings", json={ "model": MODEL, "prompt": messy_text, "keep_alive": "30m" }, timeout=60) messy_time = time.time() - start print(f"Messy: {messy_time:.3f}s") Test clean print("\nTesting clean text...") start = time.time() r2 = requests.post(f"{OLLAMA_URL}/api/embeddings", json={ "model": MODEL, "prompt": clean_text, "keep_alive": "30m" }, timeout=60) clean_time = time.time() - start print(f"Clean: {clean_time:.3f}s") print(f"\nSpeedup: {messy_time/clean_time:.1f}x faster with cleaning") Expected Output: Messy: 2120 chars, 760 spaces Clean: 1360 chars, 380 spaces Testing messy text... Messy: 12.847s Testing clean text... Clean: 0.096s Speedup: 133.8x faster with cleaning Impact Severity: High - Affects all PDF/document processing workflows - Makes embeddinggemma unusable for RAG systems without preprocessing - Can cause timeouts in production (16s vs typical 1-3s timeout) - Silent failure mode (no error, just extreme slowness) Use Cases Affected: - PDF document embedding - Web scraping (HTML often has extra whitespace) - Legacy document processing - Any workflow using text extraction tools Workaround Pre-process all text before embedding: import re text = re.sub(r'\s+', ' ', text.strip()) Expected Behavior - Embedding time should scale linearly with text length - Whitespace pattern should not significantly impact performance - A 2800-character text should take ~0.1-0.2s regardless of spacing Additional Context - Discovered while processing PDF documents with PyPDF extraction - PDF extraction tools commonly create patterns like: "System Card: Claude Sonnet 4.5" - Reproducible across different PDF sources with similar extraction artifacts - Tested on Mac M1 Max hardware - issue is not hardware-specific - Network latency confirmed negligible (<15ms) - slowdown occurs server-side - Issue #3944 reported pure whitespace hanging (now fixed), but this is different - about excessive whitespace within text - Issue #12239 reports general embeddinggemma slowness, but doesn't identify the whitespace cause ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-05-04 21:16:57 -05:00
Author
Owner

@by-lin commented on GitHub (Oct 13, 2025):

What tools are you using to extract information from PDFs? Have you tried using tools like Tika and tesseract to extract from PDFs?

<!-- gh-comment-id:3398758033 --> @by-lin commented on GitHub (Oct 13, 2025): What tools are you using to extract information from PDFs? Have you tried using tools like Tika and tesseract to extract from PDFs?
Author
Owner

@FunKite commented on GitHub (Oct 14, 2025):

@by-lin, thanks for reaching out. I discovered that extra spaces cause a massive slowdown in embeddinggemma while processing PDF documents with PyPDF extraction. I don't think I've tried the other tools you mentioned but as detailed in my initial post I was able to perform additional space removal preprocessing as a workaround. I think other folks are going to encounter this issue and preprocessing isn't necessary with other embedding models such as snowflake-arctic-embed2 as detailed in my initial post.

<!-- gh-comment-id:3403008499 --> @FunKite commented on GitHub (Oct 14, 2025): @by-lin, thanks for reaching out. I discovered that extra spaces cause a massive slowdown in embeddinggemma while processing PDF documents with PyPDF extraction. I don't think I've tried the other tools you mentioned but as detailed in my initial post I was able to perform additional space removal preprocessing as a workaround. I think other folks are going to encounter this issue and preprocessing isn't necessary with other embedding models such as snowflake-arctic-embed2 as detailed in my initial post.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70364