[GH-ISSUE #6138] Empty response from API call given context #3832

Closed
opened 2026-04-12 14:39:57 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @stavsap on GitHub (Aug 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6138

Originally assigned to: @jmorganca on GitHub.

What is the issue?

when generate api call is made with context from previous call, the response is instant and empty.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.3.2

Originally created by @stavsap on GitHub (Aug 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6138 Originally assigned to: @jmorganca on GitHub. ### What is the issue? when generate api call is made with context from previous call, the response is instant and empty. ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.3.2
GiteaMirror added the bug label 2026-04-12 14:39:57 -05:00
Author
Owner

@jmorganca commented on GitHub (Aug 2, 2024):

Sorry about this – I believe this was fixed in ff7c9060ec and will be released in an upcoming update

<!-- gh-comment-id:2264684754 --> @jmorganca commented on GitHub (Aug 2, 2024): Sorry about this – I believe this was fixed in https://github.com/ollama/ollama/commit/ff7c9060ec62fc3ce5fdc9cbec0788d46d908d81 and will be released in an upcoming update
Author
Owner

@Siddharth-Latthe-07 commented on GitHub (Aug 2, 2024):

The issue might be related to configuration issues, API usage errors, or bugs in the software.
Troubleshooting solutions:-

  1. Verify API Usage:- check that the api calls that are being made are correct.
  2. check ollama logs and update it
    3.Memory and Resource Constraints
  3. As a diagnostic step, try making the generate API call without any context to see if it works as expected. This can help determine if the issue is specifically related to context handling.
    sample snippet of api call:-
import requests

def generate_text(prompt, context=None):
    url = "http://127.0.0.1:11434/generate"
    payload = {
        "prompt": prompt,
        "parameters": {
            "max_length": 50  # Adjust as needed
        }
    }
    if context:
        payload["context"] = context

    response = requests.post(url, json=payload)
    return response.json()

# Initial call
initial_response = generate_text("What is the capital of France?")
print("Initial Response:", initial_response)

# Subsequent call with context
context = initial_response.get("text", "")
subsequent_response = generate_text("What is the capital of Germany?", context=context)
print("Subsequent Response:", subsequent_response)

Hope this helps,
Thanks

<!-- gh-comment-id:2265429257 --> @Siddharth-Latthe-07 commented on GitHub (Aug 2, 2024): The issue might be related to configuration issues, API usage errors, or bugs in the software. Troubleshooting solutions:- 1. Verify API Usage:- check that the api calls that are being made are correct. 2. check ollama logs and update it 3.Memory and Resource Constraints 4. As a diagnostic step, try making the generate API call without any context to see if it works as expected. This can help determine if the issue is specifically related to context handling. sample snippet of api call:- ``` import requests def generate_text(prompt, context=None): url = "http://127.0.0.1:11434/generate" payload = { "prompt": prompt, "parameters": { "max_length": 50 # Adjust as needed } } if context: payload["context"] = context response = requests.post(url, json=payload) return response.json() # Initial call initial_response = generate_text("What is the capital of France?") print("Initial Response:", initial_response) # Subsequent call with context context = initial_response.get("text", "") subsequent_response = generate_text("What is the capital of Germany?", context=context) print("Subsequent Response:", subsequent_response) ``` Hope this helps, Thanks
Author
Owner

@stavsap commented on GitHub (Aug 2, 2024):

thanks for the help, but every thing is working fine on 0.3.1 and previous versions, it happens only on 0.3.2 for now.

i am testing the same simple flow.

on 0.3.2 calls without context works fine.

<!-- gh-comment-id:2266051370 --> @stavsap commented on GitHub (Aug 2, 2024): thanks for the help, but every thing is working fine on 0.3.1 and previous versions, it happens only on 0.3.2 for now. i am testing the same simple flow. on 0.3.2 calls without context works fine.
Author
Owner

@igorschlum commented on GitHub (Aug 2, 2024):

@stavsap could you please tell if it works with version 0.3.3 and close this issue?

<!-- gh-comment-id:2266132609 --> @igorschlum commented on GitHub (Aug 2, 2024): @stavsap could you please tell if it works with version 0.3.3 and close this issue?
Author
Owner

@ismailbgr commented on GitHub (Aug 2, 2024):

The issue seems to be still present in /api/chat endpoint

image

All the requests after the first one are empty. As you can see in the logs the reply times are significantly lower than the first one.
image

My version:
image

<!-- gh-comment-id:2266229865 --> @ismailbgr commented on GitHub (Aug 2, 2024): The issue seems to be still present in /api/chat endpoint <img width="571" alt="image" src="https://github.com/user-attachments/assets/15986675-8b77-41e5-9a1e-65ddfd5e0ed4"> All the requests after the first one are empty. As you can see in the logs the reply times are significantly lower than the first one. ![image](https://github.com/user-attachments/assets/2a344328-6803-4114-a40e-135a2992f5c1) My version: ![image](https://github.com/user-attachments/assets/9af472f9-abb2-4576-acbe-b5fa460c22ad)
Author
Owner

@jmorganca commented on GitHub (Aug 3, 2024):

This should be fixed in 0.3.3: https://github.com/ollama/ollama/releases/tag/v0.3.3

@ismailbgr do you know what the /api/chat request looks like? It seems that the model is being loaded the first time and nothing is happening after – do you know what fields you are setting in the request?

<!-- gh-comment-id:2266312857 --> @jmorganca commented on GitHub (Aug 3, 2024): This should be fixed in 0.3.3: https://github.com/ollama/ollama/releases/tag/v0.3.3 @ismailbgr do you know what the `/api/chat` request looks like? It seems that the model is being loaded the first time and nothing is happening after – do you know what fields you are setting in the request?
Author
Owner

@ismailbgr commented on GitHub (Aug 3, 2024):

This should be fixed in 0.3.3: https://github.com/ollama/ollama/releases/tag/v0.3.3

Apperantly the code was using an outdated version from another location. Thanks for help.

<!-- gh-comment-id:2266342144 --> @ismailbgr commented on GitHub (Aug 3, 2024): > This should be fixed in 0.3.3: https://github.com/ollama/ollama/releases/tag/v0.3.3 Apperantly the code was using an outdated version from another location. Thanks for help.
Author
Owner

@stavsap commented on GitHub (Aug 3, 2024):

it works on 0.3.3, thanks for the help

<!-- gh-comment-id:2266466337 --> @stavsap commented on GitHub (Aug 3, 2024): it works on 0.3.3, thanks for the help
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3832