[GH-ISSUE #15778] Proposal: Local semantic compression for long‑term user memory (accepted by DeepSeek) #72112

Open
opened 2026-05-05 03:30:12 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @cassavasea-rgb on GitHub (Apr 23, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15778

Every time a user starts a new chat with an LLM, the model forgets everything about them. The user has to re‑introduce themselves, re‑explain their context, and restate their preferences. This “LLM amnesia” is a real usability problem.

I submitted a proposal to DeepSeek to solve this using local semantic compression. It was accepted as a feature request and added to their backlog.

The core idea

  1. Local storage – A folder (for example, /llm_memory/) on the user’s device stores short text files – “facts” about the user (e.g., “user name is X”, “prefers concise answers”, “currently working on project Y”).
  2. Semantic merging – When the folder reaches a limit (say, 10 files), the system merges semantically similar facts into one instead of deleting old ones.
  3. Automatic loading – When a new chat starts, the client sends a compressed profile (last 5‑10 merged facts) to the model inside the system message. The user sees nothing, but the model already knows the relevant context.
  4. Privacy by default – No data leaves the user’s device. The server never sees the memory folder, which makes the approach GDPR‑compliant.

Why it matters for local LLMs (Ollama, LM Studio, etc.)

  • No server‑side storage costs
  • No privacy risks
  • Fixed context size (no exponential growth)
  • Important facts are preserved via merging, not lost via rotation

Where it stands now
The proposal is validated – DeepSeek has already accepted it as a future feature. This is not production code, but a well‑defined architecture that can be implemented client‑side.

Question for the community
Could something like this be built as a client‑side plugin or script for Ollama? Are there existing implementations or experiments along these lines? What edge cases should be considered?

I am a user, not a developer, but this problem has been bothering me for a while. I would love to hear from people who actually build this stuff.

Originally created by @cassavasea-rgb on GitHub (Apr 23, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15778 Every time a user starts a new chat with an LLM, the model forgets everything about them. The user has to re‑introduce themselves, re‑explain their context, and restate their preferences. This “LLM amnesia” is a real usability problem. I submitted a proposal to DeepSeek to solve this using local semantic compression. It was accepted as a feature request and added to their backlog. **The core idea** 1. **Local storage** – A folder (for example, `/llm_memory/`) on the user’s device stores short text files – “facts” about the user (e.g., “user name is X”, “prefers concise answers”, “currently working on project Y”). 2. **Semantic merging** – When the folder reaches a limit (say, 10 files), the system merges semantically similar facts into one instead of deleting old ones. 3. **Automatic loading** – When a new chat starts, the client sends a compressed profile (last 5‑10 merged facts) to the model inside the system message. The user sees nothing, but the model already knows the relevant context. 4. **Privacy by default** – No data leaves the user’s device. The server never sees the memory folder, which makes the approach GDPR‑compliant. **Why it matters for local LLMs (Ollama, LM Studio, etc.)** - No server‑side storage costs - No privacy risks - Fixed context size (no exponential growth) - Important facts are preserved via merging, not lost via rotation **Where it stands now** The proposal is validated – DeepSeek has already accepted it as a future feature. This is not production code, but a well‑defined architecture that can be implemented client‑side. **Question for the community** Could something like this be built as a client‑side plugin or script for Ollama? Are there existing implementations or experiments along these lines? What edge cases should be considered? I am a user, not a developer, but this problem has been bothering me for a while. I would love to hear from people who actually build this stuff.
GiteaMirror added the feature request label 2026-05-05 03:30:12 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72112