[GH-ISSUE #7484] Invalid prompt generation when the request message exceeds the context size #30519

Open
opened 2026-04-22 10:13:11 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @b4rtaz on GitHub (Nov 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7484

What is the issue?

Hello! You're doing a great job! Thank you so much!

Probably I found a bug when the user message exceedes the num_ctx value in the API server.

I started the server in the debug mode: OLLAMA_ORIGINS=* OLLAMA_DEBUG=1 ollama serve

The below JS script works correctly with the x/llama3.2-vision:latest model.

async function test() {
  const r = await fetch('http://127.0.0.1:11434/v1/chat/completions', {
    method: 'POST',
    headers: {
      Accept: 'application/json',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'x/llama3.2-vision:latest',
      messages: [
        {
          role: 'user',
          content: [
            {
              type: 'text',
              text: 'describe the image.',
            },
            {
              type: 'image_url',
              image_url: {
                url: IMAGE_BASE64
              }
            }
          ]
        }

      ]
    }),
  });
  const j = await r.json();
  console.log(j);
}

In the console I can see:

time=2024-11-03T23:59:41.397+01:00 level=DEBUG source=routes.go:1453 msg="chat request" images=1 prompt="<|start_header_id|>user<|end_header_id|>\n\ndescribe the image\n\n[img-0]<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

In this case the generated sequence looks correctly. But if I change in my script text: 'describe the image.', => text: 'describe the image.'.repeat(200), then I see in the console:

time=2024-11-04T00:04:30.828+01:00 level=DEBUG source=routes.go:1453 msg="chat request" images=1 prompt="<|start_header_id|>user<|end_header_id|>\n\n[img-0]<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

So by some reason now the content after <|start_header_id|>user<|end_header_id|>\n\n has disappeared. The problem here is that the API returns a response generated without the queried message.

When I increase the num_ctx value then it starts work again.

Expected behavior: I think the API should return an error stating that the request contains a message that is too long.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.4.0-rc6

Originally created by @b4rtaz on GitHub (Nov 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7484 ### What is the issue? Hello! You're doing a great job! Thank you so much! Probably I found a bug when the user message exceedes the `num_ctx` value in the API server. I started the server in the debug mode: `OLLAMA_ORIGINS=* OLLAMA_DEBUG=1 ollama serve` The below JS script works correctly with the `x/llama3.2-vision:latest` model. ```ts async function test() { const r = await fetch('http://127.0.0.1:11434/v1/chat/completions', { method: 'POST', headers: { Accept: 'application/json', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'x/llama3.2-vision:latest', messages: [ { role: 'user', content: [ { type: 'text', text: 'describe the image.', }, { type: 'image_url', image_url: { url: IMAGE_BASE64 } } ] } ] }), }); const j = await r.json(); console.log(j); } ``` In the console I can see: ``` time=2024-11-03T23:59:41.397+01:00 level=DEBUG source=routes.go:1453 msg="chat request" images=1 prompt="<|start_header_id|>user<|end_header_id|>\n\ndescribe the image\n\n[img-0]<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" ``` In this case the generated sequence looks correctly. But if I change in my script `text: 'describe the image.',` => `text: 'describe the image.'.repeat(200),` then I see in the console: ``` time=2024-11-04T00:04:30.828+01:00 level=DEBUG source=routes.go:1453 msg="chat request" images=1 prompt="<|start_header_id|>user<|end_header_id|>\n\n[img-0]<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" ``` So by some reason now the content after `<|start_header_id|>user<|end_header_id|>\n\n` has disappeared. The problem here is that the API returns a response generated without the queried message. When I increase the `num_ctx` value then it starts work again. **Expected behavior**: I think the API should return an error stating that the request contains a message that is too long. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.4.0-rc6
GiteaMirror added the bug label 2026-04-22 10:13:11 -05:00
Author
Owner

@jessegross commented on GitHub (Nov 4, 2024):

Not a complete solution you should avoid this problem and have better results if you use the native Ollama API (/api/chat endpoint).

<!-- gh-comment-id:2455746012 --> @jessegross commented on GitHub (Nov 4, 2024): Not a complete solution you should avoid this problem and have better results if you use the native Ollama API (`/api/chat` endpoint).
Author
Owner

@jessegross commented on GitHub (Nov 5, 2024):

Another improvement but not perfect solution is that current main now uses a much smaller size estimate for images, causing the context limit to be hit much slower.

For background, the problem is that Ollama truncates the conversation history on message boundaries when the total exceeds the context size. When using the OpenAI compatibility API, text and the image appears as two separate messages. When the context limit is exceeded, it removes the oldest one, which is the text. If you use the Ollama native API, both text and image will be seen as one message, which behaves slight better in some ways, though the context limit is still there.

<!-- gh-comment-id:2458258351 --> @jessegross commented on GitHub (Nov 5, 2024): Another improvement but not perfect solution is that current `main` now uses a much smaller size estimate for images, causing the context limit to be hit much slower. For background, the problem is that Ollama truncates the conversation history on message boundaries when the total exceeds the context size. When using the OpenAI compatibility API, text and the image appears as two separate messages. When the context limit is exceeded, it removes the oldest one, which is the text. If you use the Ollama native API, both text and image will be seen as one message, which behaves slight better in some ways, though the context limit is still there.
Author
Owner

@CrazyBoyM commented on GitHub (Feb 23, 2025):

the same problem and ollama api is not good for developers.fuck

<!-- gh-comment-id:2676989523 --> @CrazyBoyM commented on GitHub (Feb 23, 2025): the same problem and ollama api is not good for developers.fuck
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30519