[GH-ISSUE #1948] Understanding Response Data Structure #1122

Closed
opened 2026-04-12 10:52:04 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @tmattoneill on GitHub (Jan 12, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1948

I'm really confused by Ollama's response from the API. Most other LLM's I've used return a consistent model / JSON object that can serve as the 'assistant' response. However, Ollama returns a different, seemingly random JSON / object every time. This makes it nearly impossible to extract the reply from any prompt. See below:

generate_response("Hello world")
Hello world
{'dialogue': {'bot': 'Hello! How can I help you today?', 'user': 'Hello world'}}
generate_response("Hello world")
Hello world
{'outputText': 'Hello, World!\n'}
generate_response("Hello world")
Hello world
{'message': 'Hello! How can I assist you today?'}

The code generating this is:

    HOST = "localhost"
    PORT = "11434"
    api_request = {
        "model": "mistral",
        "stream": False,
        "raw": True,
        "format": "json",
        "prompt": f"[INST]{prompt}[/INST]"
    }

    try:
        response = requests.post(f"http://{HOST}:{PORT}/api/generate", json=api_request)
        response.raise_for_status()
        message = json.loads(response.text)['response']
        response = json.loads(message)
    except requests.exceptions.RequestException as e:
        raise ValueError("Error making API request") from e
    except json.JSONDecodeError as e:
        raise ValueError("Error parsing API response") from e

Can someone explain this to me? I've been through the docs extensively and can not for the life of me figure out how to do this pretty straightforward task.

Originally created by @tmattoneill on GitHub (Jan 12, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1948 I'm really confused by Ollama's response from the API. Most other LLM's I've used return a consistent model / JSON object that can serve as the 'assistant' response. However, Ollama returns a different, seemingly random JSON / object every time. This makes it nearly impossible to extract the reply from any prompt. See below: ``` generate_response("Hello world") Hello world {'dialogue': {'bot': 'Hello! How can I help you today?', 'user': 'Hello world'}} generate_response("Hello world") Hello world {'outputText': 'Hello, World!\n'} generate_response("Hello world") Hello world {'message': 'Hello! How can I assist you today?'} ``` The code generating this is: ``` HOST = "localhost" PORT = "11434" api_request = { "model": "mistral", "stream": False, "raw": True, "format": "json", "prompt": f"[INST]{prompt}[/INST]" } try: response = requests.post(f"http://{HOST}:{PORT}/api/generate", json=api_request) response.raise_for_status() message = json.loads(response.text)['response'] response = json.loads(message) except requests.exceptions.RequestException as e: raise ValueError("Error making API request") from e except json.JSONDecodeError as e: raise ValueError("Error parsing API response") from e ``` Can someone explain this to me? I've been through the docs extensively and can not for the life of me figure out how to do this pretty straightforward task.
Author
Owner

@mxyng commented on GitHub (Jan 13, 2024):

I think there's some confusion here. What you're experiencing is the LLM responding in JSON, as requested by your Python script "format": "json". It looks like you've already figured out the structure of the response json.loads(response.text)['response']. The message you're returning (json.loads(message)) is the output from the LLM.

If you unset format you will notice the response cannot be JSON deserialized. That's because the response from the LLM is no longer valid JSON but rather plain text

<!-- gh-comment-id:1890227009 --> @mxyng commented on GitHub (Jan 13, 2024): I think there's some confusion here. What you're experiencing is the LLM responding in JSON, as requested by your Python script `"format": "json"`. It looks like you've already figured out the structure of the response `json.loads(response.text)['response']`. The message you're returning (`json.loads(message)`) is the output from the LLM. If you unset `format` you will notice the response cannot be JSON deserialized. That's because the response from the LLM is no longer valid JSON but rather plain text
Author
Owner

@tmattoneill commented on GitHub (Jan 13, 2024):

Thank you @mxyng I appreciate it. Yes, I can get the payload of the 'response' but the issue is that the contents of the response are different every time so I can't reliably extract the contents of that response. As you can see in my examples at the top, each has a different structure.

Is there a best-practice to get these results? If, say, I was building a chat-bot how could I use that response?

<!-- gh-comment-id:1890374598 --> @tmattoneill commented on GitHub (Jan 13, 2024): Thank you @mxyng I appreciate it. Yes, I can get the payload of the 'response' but the issue is that the contents of the response are different every time so I can't reliably extract the contents of that response. As you can see in my examples at the top, each has a different structure. Is there a best-practice to get these results? If, say, I was building a chat-bot how could I use that response?
Author
Owner

@MarsThunder commented on GitHub (Oct 1, 2025):

This is over a year later, so useful to others. I would imagine what you want is to inject a 'seed' number into your query. That would lock the response into the same response each time (at least it should).
response_chat = ollama.chat(model, messages=[], options={'seed': 120})

<!-- gh-comment-id:3356711195 --> @MarsThunder commented on GitHub (Oct 1, 2025): This is over a year later, so useful to others. I would imagine what you want is to inject a 'seed' number into your query. That would lock the response into the same response each time (at least it should). `response_chat = ollama.chat(model, messages=[], options={'seed': 120})`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1122