[GH-ISSUE #9342] long context windows with llama3.3 #31862

Closed
opened 2026-04-22 12:37:40 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @nicho2 on GitHub (Feb 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9342

What is the issue?

Hello, i have a chatbot (dify platform) and when my context windows size is > 20000 the inference is very short and stop.
It seems the llm stop its inference not normaly

My configuration with llama3.3 is :

Image

Image

log ollama :

time=2025-02-25T15:05:37.465Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=19375 used=0 remaining=19375

[GIN] 2025/02/25 - 15:06:09 | 200 | 42.278632457s |      172.18.0.1 | POST     "/api/chat"

time=2025-02-25T15:06:09.195Z level=DEBUG source=sched.go:466 msg="context for request finished"

time=2025-02-25T15:06:09.195Z level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d duration=5m0s

time=2025-02-25T15:06:09.195Z level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d refCount=0

Trace wireshark

HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Date: Tue, 25 Feb 2025 15:06:08 GMT
Transfer-Encoding: chunked

{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.014582706Z","message":{"role":"assistant","content":"Il"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.123724884Z","message":{"role":"assistant","content":" existe"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.203487645Z","message":{"role":"assistant","content":" de"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.280554916Z","message":{"role":"assistant","content":" nombre"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.359012075Z","message":{"role":"assistant","content":"uses"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.434597984Z","message":{"role":"assistant","content":" att"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.509937528Z","message":{"role":"assistant","content":"a"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.585886273Z","message":{"role":"assistant","content":"ques"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.661012564Z","message":{"role":"assistant","content":" poss"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.737326959Z","message":{"role":"assistant","content":"ibles"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.813419906Z","message":{"role":"assistant","content":" contre"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.888345812Z","message":{"role":"assistant","content":" les"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.967605251Z","message":{"role":"assistant","content":" syst"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.043350434Z","message":{"role":"assistant","content":"..mes"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.119013925Z","message":{"role":"assistant","content":" Windows"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.194936789Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.195014561Z","message":{"role":"assistant","content":""},"done_reason":"length","done":true,"total_duration":42278341822,"load_duration":9569189812,"prompt_eval_count":19375,"prompt_eval_duration":30601000000,"eval_count":16,"eval_duration":1184000000}

message : Il existe de nombreuses attaques possibles contre les systèmes Windows. ==> and nothing else

Relevant log output


OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.5.12

Originally created by @nicho2 on GitHub (Feb 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9342 ### What is the issue? Hello, i have a chatbot (dify platform) and when my context windows size is > 20000 the inference is very short and stop. It seems the llm stop its inference not normaly My configuration with llama3.3 is : ![Image](https://github.com/user-attachments/assets/d05d5660-21f0-4aa0-bbb9-6abbd596d323) ![Image](https://github.com/user-attachments/assets/e3571aac-2d62-4ab0-8960-5c032d287dbb) log ollama : time=2025-02-25T15:05:37.465Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=19375 used=0 remaining=19375 [GIN] 2025/02/25 - 15:06:09 | 200 | 42.278632457s | 172.18.0.1 | POST "/api/chat" time=2025-02-25T15:06:09.195Z level=DEBUG source=sched.go:466 msg="context for request finished" time=2025-02-25T15:06:09.195Z level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d duration=5m0s time=2025-02-25T15:06:09.195Z level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d refCount=0 Trace wireshark HTTP/1.1 200 OK Content-Type: application/x-ndjson Date: Tue, 25 Feb 2025 15:06:08 GMT Transfer-Encoding: chunked {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.014582706Z","message":{"role":"assistant","content":"Il"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.123724884Z","message":{"role":"assistant","content":" existe"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.203487645Z","message":{"role":"assistant","content":" de"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.280554916Z","message":{"role":"assistant","content":" nombre"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.359012075Z","message":{"role":"assistant","content":"uses"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.434597984Z","message":{"role":"assistant","content":" att"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.509937528Z","message":{"role":"assistant","content":"a"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.585886273Z","message":{"role":"assistant","content":"ques"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.661012564Z","message":{"role":"assistant","content":" poss"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.737326959Z","message":{"role":"assistant","content":"ibles"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.813419906Z","message":{"role":"assistant","content":" contre"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.888345812Z","message":{"role":"assistant","content":" les"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:08.967605251Z","message":{"role":"assistant","content":" syst"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.043350434Z","message":{"role":"assistant","content":"..mes"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.119013925Z","message":{"role":"assistant","content":" Windows"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.194936789Z","message":{"role":"assistant","content":"."},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-25T15:06:09.195014561Z","message":{"role":"assistant","content":""},"done_reason":"length","done":true,"total_duration":42278341822,"load_duration":9569189812,"prompt_eval_count":19375,"prompt_eval_duration":30601000000,"eval_count":16,"eval_duration":1184000000} message : Il existe de nombreuses attaques possibles contre les systèmes Windows. ==> and nothing else ### Relevant log output ```shell ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.12
GiteaMirror added the bug label 2026-04-22 12:37:40 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 26, 2025):

Can you show the request from the wireshark trace? Not so much the contents of "messages", more the parameters that are being sent.

<!-- gh-comment-id:2683573859 --> @rick-github commented on GitHub (Feb 26, 2025): Can you show the request from the wireshark trace? Not so much the contents of `"messages"`, more the parameters that are being sent.
Author
Owner

@nicho2 commented on GitHub (Feb 26, 2025):

POST /api/chat HTTP/1.1
Host: 10.2.142.77:11434
User-Agent: python-requests/2.31.0
Accept-Encoding: gzip, deflate, br, zstd
Accept: */*
Connection: keep-alive
Content-Type: application/json
Content-Length: 81409

{"model": "llama3.3:latest", "stream": true, "options": {"num_predict": 16, "num_ctx": 50000}, "messages": [{"role": "system", "content": "Use the following context officielle de MITRE.\n......................lisateur.\nTes r\u00e9ponses sont pr\u00e9cises, source-centr\u00e9es, et p\u00e9dagogiques.\n"}, {"role": "user", "content": "j'ai un syst\u00e8me windows, je suis impact\u00e9 par quelles attaques"}]}


HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Date: Wed, 26 Feb 2025 06:08:00 GMT
Transfer-Encoding: chunked

{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.432961267Z","message":{"role":"assistant","content":"Je"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.530921208Z","message":{"role":"assistant","content":" vois"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.607343718Z","message":{"role":"assistant","content":" que"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.682280714Z","message":{"role":"assistant","content":" vous"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.756958093Z","message":{"role":"assistant","content":" utilise"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.831737407Z","message":{"role":"assistant","content":"z"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.910299561Z","message":{"role":"assistant","content":" un"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.986129177Z","message":{"role":"assistant","content":" syst..me"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.060267673Z","message":{"role":"assistant","content":" Windows"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.136932758Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.213017518Z","message":{"role":"assistant","content":" Il"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.287115356Z","message":{"role":"assistant","content":" existe"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.362972323Z","message":{"role":"assistant","content":" de"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.437849014Z","message":{"role":"assistant","content":" nombre"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.512939742Z","message":{"role":"assistant","content":"uses"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.588254473Z","message":{"role":"assistant","content":" att"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.588400654Z","message":{"role":"assistant","content":""},"done_reason":"length","done":true,"total_duration":41434776684,"load_duration":9331274096,"prompt_eval_count":19375,"prompt_eval_duration":30062000000,"eval_count":16,"eval_duration":1159000000}

Also, I haven't this problem with Deepseek-R1 70B

<!-- gh-comment-id:2684010216 --> @nicho2 commented on GitHub (Feb 26, 2025): POST /api/chat HTTP/1.1 Host: 10.2.142.77:11434 User-Agent: python-requests/2.31.0 Accept-Encoding: gzip, deflate, br, zstd Accept: */* Connection: keep-alive Content-Type: application/json Content-Length: 81409 {"model": "llama3.3:latest", "stream": true, "options": {"num_predict": 16, "num_ctx": 50000}, "messages": [{"role": "system", "content": "Use the following context officielle de MITRE.\n......................lisateur.\nTes r\u00e9ponses sont pr\u00e9cises, source-centr\u00e9es, et p\u00e9dagogiques.\n"}, {"role": "user", "content": "j'ai un syst\u00e8me windows, je suis impact\u00e9 par quelles attaques"}]} HTTP/1.1 200 OK Content-Type: application/x-ndjson Date: Wed, 26 Feb 2025 06:08:00 GMT Transfer-Encoding: chunked {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.432961267Z","message":{"role":"assistant","content":"Je"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.530921208Z","message":{"role":"assistant","content":" vois"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.607343718Z","message":{"role":"assistant","content":" que"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.682280714Z","message":{"role":"assistant","content":" vous"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.756958093Z","message":{"role":"assistant","content":" utilise"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.831737407Z","message":{"role":"assistant","content":"z"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.910299561Z","message":{"role":"assistant","content":" un"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:00.986129177Z","message":{"role":"assistant","content":" syst..me"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.060267673Z","message":{"role":"assistant","content":" Windows"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.136932758Z","message":{"role":"assistant","content":"."},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.213017518Z","message":{"role":"assistant","content":" Il"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.287115356Z","message":{"role":"assistant","content":" existe"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.362972323Z","message":{"role":"assistant","content":" de"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.437849014Z","message":{"role":"assistant","content":" nombre"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.512939742Z","message":{"role":"assistant","content":"uses"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.588254473Z","message":{"role":"assistant","content":" att"},"done":false} {"model":"llama3.3:latest","created_at":"2025-02-26T06:08:01.588400654Z","message":{"role":"assistant","content":""},"done_reason":"length","done":true,"total_duration":41434776684,"load_duration":9331274096,"prompt_eval_count":19375,"prompt_eval_duration":30062000000,"eval_count":16,"eval_duration":1159000000} **Also, I haven't this problem with Deepseek-R1 70B**
Author
Owner

@rick-github commented on GitHub (Feb 26, 2025):

{"num_predict": 16

You are limiting the number of tokens in the response to 16.

<!-- gh-comment-id:2684827729 --> @rick-github commented on GitHub (Feb 26, 2025): ``` {"num_predict": 16 ``` You are limiting the number of tokens in the response to 16.
Author
Owner

@ENUMERA8OR commented on GitHub (Feb 26, 2025):

As far as I know, ollama has a default token window of 2000 tokens. You can modify it by modifying the variable in ollama config file.

<!-- gh-comment-id:2685730018 --> @ENUMERA8OR commented on GitHub (Feb 26, 2025): As far as I know, ollama has a default token window of 2000 tokens. You can modify it by modifying the variable in ollama config file.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31862