[PR #15404] openai: implement previous_response_id for /v1/responses #77433

Open
opened 2026-05-05 10:06:02 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15404
Author: @fsayahmob
Created: 4/7/2026
Status: 🔄 Open

Base: mainHead: feature/previous-response-id


📝 Commits (2)

  • 8297390 openai: implement previous_response_id for /v1/responses
  • a1c78bd envconfig: make response store TTL and max configurable

📊 Changes

6 files changed (+722 additions, -23 deletions)

View changed files

📝 envconfig/config.go (+24 -0)
📝 middleware/openai.go (+99 -16)
📝 openai/responses.go (+17 -5)
server/response_store.go (+225 -0)
server/response_store_test.go (+341 -0)
📝 server/routes.go (+16 -2)

📄 Description

What

Implements previous_response_id on /v1/responses, currently marked as
"Not supported" in the codebase (responses.go:724).

The server now stores responses in memory and chains them. When a client
passes previous_response_id, the full conversation history is
reconstructed server-side — no need to resend all messages each time.

Why

The Responses API is OpenAI's successor to Chat Completions for multi-turn
conversations. Right now Ollama accepts the field but ignores it. This
means any client built against the OpenAI Responses API (agents, IDE
plugins, voice pipelines) can't do multi-turn on Ollama without falling
back to manual history management.

How it works

Request 1:

POST /v1/responses
{"model": "gemma3:4b", "input": "My name is John."}
→ {"id": "resp_abc123...", "output": [...]}

Request 2 — server reconstructs history automatically:

POST /v1/responses
{"model": "gemma3:4b", "input": "What's my name?", "previous_response_id": "resp_abc123..."}
→ "Your name is John."

No client-side history tracking needed. Works with streaming too.

Implementation

  • server/response_store.go — in-memory store with TTL (30min), GC, LRU eviction, max 1024 entries
  • server/response_store_test.go — 13 tests covering chain traversal, circular refs, depth limit, concurrency, TTL expiry
  • openai/responses.go — added PreviousResponseID to request, propagated to response
  • middleware/openai.go — chain lookup via gin.Context (no import cycles), store on completion
  • server/routes.go — wiring

Response IDs follow the OpenAI format: resp_ + 32 hex chars.

What I didn't change

  • Existing /v1/responses behavior without previous_response_id is identical
  • No changes to /api/chat, /api/generate, or /v1/chat/completions
  • No new dependencies
  • FromResponsesRequest signature uses variadic to stay backward compatible

Tests

go test -run "TestResponseStore|TestNewResponseID" ./server/

13/13 pass. Existing openai tests unaffected.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15404 **Author:** [@fsayahmob](https://github.com/fsayahmob) **Created:** 4/7/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feature/previous-response-id` --- ### 📝 Commits (2) - [`8297390`](https://github.com/ollama/ollama/commit/82973908723679e5ec355d58c8ca1794f0ba243c) openai: implement previous_response_id for /v1/responses - [`a1c78bd`](https://github.com/ollama/ollama/commit/a1c78bde1b5fc0ad3cbd1fb98ece7626ebf1555f) envconfig: make response store TTL and max configurable ### 📊 Changes **6 files changed** (+722 additions, -23 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+24 -0) 📝 `middleware/openai.go` (+99 -16) 📝 `openai/responses.go` (+17 -5) ➕ `server/response_store.go` (+225 -0) ➕ `server/response_store_test.go` (+341 -0) 📝 `server/routes.go` (+16 -2) </details> ### 📄 Description ## What Implements `previous_response_id` on `/v1/responses`, currently marked as "Not supported" in the codebase (responses.go:724). The server now stores responses in memory and chains them. When a client passes `previous_response_id`, the full conversation history is reconstructed server-side — no need to resend all messages each time. ## Why The Responses API is OpenAI's successor to Chat Completions for multi-turn conversations. Right now Ollama accepts the field but ignores it. This means any client built against the OpenAI Responses API (agents, IDE plugins, voice pipelines) can't do multi-turn on Ollama without falling back to manual history management. ## How it works Request 1: ``` POST /v1/responses {"model": "gemma3:4b", "input": "My name is John."} → {"id": "resp_abc123...", "output": [...]} ``` Request 2 — server reconstructs history automatically: ``` POST /v1/responses {"model": "gemma3:4b", "input": "What's my name?", "previous_response_id": "resp_abc123..."} → "Your name is John." ``` No client-side history tracking needed. Works with streaming too. ## Implementation - `server/response_store.go` — in-memory store with TTL (30min), GC, LRU eviction, max 1024 entries - `server/response_store_test.go` — 13 tests covering chain traversal, circular refs, depth limit, concurrency, TTL expiry - `openai/responses.go` — added `PreviousResponseID` to request, propagated to response - `middleware/openai.go` — chain lookup via gin.Context (no import cycles), store on completion - `server/routes.go` — wiring Response IDs follow the OpenAI format: `resp_` + 32 hex chars. ## What I didn't change - Existing `/v1/responses` behavior without `previous_response_id` is identical - No changes to `/api/chat`, `/api/generate`, or `/v1/chat/completions` - No new dependencies - `FromResponsesRequest` signature uses variadic to stay backward compatible ## Tests ``` go test -run "TestResponseStore|TestNewResponseID" ./server/ ``` 13/13 pass. Existing openai tests unaffected. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:06:02 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77433