[GH-ISSUE #14008] Optimization: Reuse JSON encoder in completion loop #71216

Open
opened 2026-05-05 00:42:25 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @grwang91 on GitHub (Feb 1, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14008

In the code block below, JSON encoder is created for every token.

It may increase GC pressure during streaming, especially for long responses, and it seems we can reuse the encoder so that optimize it.

It should be fix in both ollamarunner and llamarunner

bd6c1d6b49/runner/llamarunner/runner.go (L715-L747)

Originally created by @grwang91 on GitHub (Feb 1, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14008 In the code block below, JSON encoder is created for every token. It may increase GC pressure during streaming, especially for long responses, and it seems we can reuse the encoder so that optimize it. It should be fix in both ollamarunner and llamarunner https://github.com/ollama/ollama/blob/bd6c1d6b49aca86dbb1a59182b293c0d1f7b8db8/runner/llamarunner/runner.go#L715-L747
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71216