[GH-ISSUE #14007] Optimization: Reuse JSON encoder in completion loop #9155

Closed
opened 2026-04-12 22:00:23 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @gyungrai-clumio on GitHub (Feb 1, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14007

In the code block below, JSON encoder is created for every token.

It may increase GC pressure during streaming, especially for long responses, and it seems we can reuse the encoder so that optimize it.

It should be fix in both ollamarunner and llamarunner

bd6c1d6b49/runner/llamarunner/runner.go (L715-L747)

Originally created by @gyungrai-clumio on GitHub (Feb 1, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14007 In the code block below, JSON encoder is created for every token. It may increase GC pressure during streaming, especially for long responses, and it seems we can reuse the encoder so that optimize it. It should be fix in both `ollamarunner` and `llamarunner` https://github.com/ollama/ollama/blob/bd6c1d6b49aca86dbb1a59182b293c0d1f7b8db8/runner/llamarunner/runner.go#L715-L747
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9155