[GH-ISSUE #14926] Showcase / question: a board-proven offline language runtime on ESP32-C3, and whether some local language capability may eventually be delivered beyond general local models #9605

Open
opened 2026-04-12 22:30:30 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Alpha-Guardian on GitHub (Mar 18, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14926

Hi Ollama folks,

I wanted to share a small but unusual language-runtime project that may still be relevant to the broader question of how local language capability is packaged and delivered, even though it sits far outside the usual desktop/server local-model path.

We built a public demo line called Engram and deployed it on a commodity ESP32-C3.

Current public numbers:

  • Host-side benchmark capability

    • LogiQA = 0.392523
    • IFEval = 0.780037
  • Published board proof

    • LogiQA 642 = 249 / 642 = 0.3878504672897196
    • host_full_match = 642 / 642
    • runtime artifact size = 1,380,771 bytes

Important scope note:

This is not presented as unrestricted open-input native LLM generation on MCU.

The board-side path is closer to a flash-resident, table-driven runtime with:

  • packed token weights
  • hashed lookup structures
  • fixed compiled probe batches
  • streaming fold / checksum style execution over precompiled structures

So this is not a standard local dense model running in a familiar local inference engine. It is closer to a task-specialized language runtime whose behavior has been crystallized into a compact executable form under severe physical constraints.

Repo:
https://github.com/Alpha-Guardian/Engram

Why I’m posting here is that Ollama represents one of the clearest public paths for turning open language models into a local developer and end-user runtime experience.

What I’d be curious about is whether systems like this should be thought of as:

  • completely outside the normal local-model family
  • an extreme endpoint where some language capability is better delivered as a dedicated executable form rather than a general local model
  • or an early sign that future local language systems may include both general local models and highly specialized runtimes for certain capability slices

If this direction is relevant to your team, I’d be glad to compare notes.

Originally created by @Alpha-Guardian on GitHub (Mar 18, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14926 Hi Ollama folks, I wanted to share a small but unusual language-runtime project that may still be relevant to the broader question of how local language capability is packaged and delivered, even though it sits far outside the usual desktop/server local-model path. We built a public demo line called Engram and deployed it on a commodity ESP32-C3. Current public numbers: * Host-side benchmark capability * `LogiQA = 0.392523` * `IFEval = 0.780037` * Published board proof * `LogiQA 642 = 249 / 642 = 0.3878504672897196` * `host_full_match = 642 / 642` * runtime artifact size = `1,380,771 bytes` Important scope note: This is **not** presented as unrestricted open-input native LLM generation on MCU. The board-side path is closer to a flash-resident, table-driven runtime with: * packed token weights * hashed lookup structures * fixed compiled probe batches * streaming fold / checksum style execution over precompiled structures So this is not a standard local dense model running in a familiar local inference engine. It is closer to a task-specialized language runtime whose behavior has been crystallized into a compact executable form under severe physical constraints. Repo: https://github.com/Alpha-Guardian/Engram Why I’m posting here is that Ollama represents one of the clearest public paths for turning open language models into a local developer and end-user runtime experience. What I’d be curious about is whether systems like this should be thought of as: * completely outside the normal local-model family * an extreme endpoint where some language capability is better delivered as a dedicated executable form rather than a general local model * or an early sign that future local language systems may include both general local models and highly specialized runtimes for certain capability slices If this direction is relevant to your team, I’d be glad to compare notes.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9605