[GH-ISSUE #15816] [Performance Disclosure] Temple C-Runtime: 41x faster TTFB and 59x storage reduction vs pgvector #72138

Open
opened 2026-05-05 03:32:11 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @xxartfulxx on GitHub (Apr 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15816

Summary

I am disclosing a performance breakthrough in LLM inference and storage architecture. Temple is a C-backed runtime boundary inspired by the 'motorbike' philosophy of Terry Davis.

I have released a public Proof Harness to verify these claims, while the core optimization engine remains a private suite.

Benchmarked Results (Audited)

On the frozen_v1_x128 dataset, Temple outperforms the pgvector baseline significantly:

Metric pgvector (Baseline) Temple (Private Suite) Advantage
Startup (TTFB) 0.414s 0.010s 41x Faster
Storage Footprint 1,073 KB 18 KB 59x Smaller
Data I/O (Read) 389 KB 16 KB 23x Less I/O

HF-backed scaling proof:

~10k train examples
Source: comparison.md

Train examples: 9,984
Temple first batch: 0.060s
Baseline first batch: 0.432s
Temple throughput: 2354.376 ex/s
Baseline throughput: 420.715 ex/s
Temple token throughput: 46631.687 tok/s
Baseline token throughput: 8039.136 tok/s
Temple total train time: 4.241s
Baseline total train time: 24.163s
Temple padding waste: 0.019
Baseline padding waste: 0.130
Temple bytes read: 217,938
Baseline bytes read: 5,559,424
Temple storage: 234,190
Baseline storage: 6,447,104
Eval exact match: 0.0 / 0.0
~100k train examples
Source: comparison.md

Train examples: 100,608
Temple first batch: 0.050s
Baseline first batch: 2.434s
Temple throughput: 4986.644 ex/s
Baseline throughput: 515.727 ex/s
Temple token throughput: 96738.244 tok/s
Baseline token throughput: 9612.227 tok/s
Temple total train time: 20.175s
Baseline total train time: 197.514s
Temple padding waste: 0.018
Baseline padding waste: 0.129
Temple bytes read: 2,196,046
Baseline bytes read: 56,122,496
Temple storage: 2,359,231
Baseline storage: 60,243,968
Eval exact match: 0.0 / 0.0
What the curve shows

Temple startup stayed near-zero: 0.060s -> 0.050s
Temple throughput increased: 2354 -> 4987 ex/s
Temple padding waste stayed low: 0.019 -> 0.018
Baseline startup got much worse: 0.432s -> 2.434s
Temple kept a very large bytes-read and storage advantage

The "Temple" Architecture

  1. The Public Proof (Available now): The GitHub repository contains the src/temple.c runtime layer and the proof/ harness. This allows anyone to verify the accuracy jumps (98% on gemma2:2b) and the speed-to-correct metrics using the public interface.
  2. The Private Suite: The logic responsible for the 90% model storage reduction and the 0.01s startup is contained in a separate, private runtime suite. This suite handles live training and immutable version review.

Why This Matters

By moving the validation and monolith path into a dedicated C engine, Temple bypasses the "Python tax" and standard vector bloat. This is not just a benchmark; it is a validated path to running high-accuracy models on edge hardware with near-zero latency.

Repo: https://github.com/xxartfulxx/Temple

i will shortly show a proof page demo of the rest of the suite i have built.

CC: @jmorganca

Originally created by @xxartfulxx on GitHub (Apr 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15816 ### Summary I am disclosing a performance breakthrough in LLM inference and storage architecture. **Temple** is a C-backed runtime boundary inspired by the 'motorbike' philosophy of Terry Davis. I have released a **public Proof Harness** to verify these claims, while the core optimization engine remains a **private suite**. ### Benchmarked Results (Audited) On the `frozen_v1_x128` dataset, Temple outperforms the pgvector baseline significantly: | Metric | pgvector (Baseline) | Temple (Private Suite) | Advantage | | :--- | :--- | :--- | :--- | | **Startup (TTFB)** | 0.414s | **0.010s** | **41x Faster** | | **Storage Footprint** | 1,073 KB | **18 KB** | **59x Smaller** | | **Data I/O (Read)** | 389 KB | **16 KB** | **23x Less I/O** | HF-backed scaling proof: ~10k train examples Source: comparison.md Train examples: 9,984 Temple first batch: 0.060s Baseline first batch: 0.432s Temple throughput: 2354.376 ex/s Baseline throughput: 420.715 ex/s Temple token throughput: 46631.687 tok/s Baseline token throughput: 8039.136 tok/s Temple total train time: 4.241s Baseline total train time: 24.163s Temple padding waste: 0.019 Baseline padding waste: 0.130 Temple bytes read: 217,938 Baseline bytes read: 5,559,424 Temple storage: 234,190 Baseline storage: 6,447,104 Eval exact match: 0.0 / 0.0 ~100k train examples Source: comparison.md Train examples: 100,608 Temple first batch: 0.050s Baseline first batch: 2.434s Temple throughput: 4986.644 ex/s Baseline throughput: 515.727 ex/s Temple token throughput: 96738.244 tok/s Baseline token throughput: 9612.227 tok/s Temple total train time: 20.175s Baseline total train time: 197.514s Temple padding waste: 0.018 Baseline padding waste: 0.129 Temple bytes read: 2,196,046 Baseline bytes read: 56,122,496 Temple storage: 2,359,231 Baseline storage: 60,243,968 Eval exact match: 0.0 / 0.0 What the curve shows Temple startup stayed near-zero: 0.060s -> 0.050s Temple throughput increased: 2354 -> 4987 ex/s Temple padding waste stayed low: 0.019 -> 0.018 Baseline startup got much worse: 0.432s -> 2.434s Temple kept a very large bytes-read and storage advantage ### The "Temple" Architecture 1. **The Public Proof (Available now):** The GitHub repository contains the `src/temple.c` runtime layer and the `proof/` harness. This allows anyone to verify the accuracy jumps (98% on gemma2:2b) and the speed-to-correct metrics using the public interface. 2. **The Private Suite:** The logic responsible for the 90% model storage reduction and the 0.01s startup is contained in a separate, private runtime suite. This suite handles live training and immutable version review. ### Why This Matters By moving the validation and monolith path into a dedicated C engine, Temple bypasses the "Python tax" and standard vector bloat. This is not just a benchmark; it is a validated path to running high-accuracy models on edge hardware with near-zero latency. **Repo:** https://github.com/xxartfulxx/Temple i will shortly show a proof page demo of the rest of the suite i have built. CC: @jmorganca
Author
Owner
<!-- gh-comment-id:4320478954 --> @xxartfulxx commented on GitHub (Apr 25, 2026): https://github.com/user-attachments/assets/a544c8f5-b3ca-4c7b-8f35-db5135fd7b8b
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72138