[GH-ISSUE #13313] ministral-3:14b num_ctx over 128512 results in repetition #55306

Closed
opened 2026-04-29 08:49:24 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @dan-and on GitHub (Dec 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13313

What is the issue?

ministral-3:14b ( ministral-3:14b-instruct-2512-q4_K_M ) model description mentions a context length of 262144.

during my first tests, I figured out that any value over 128512 results in repetitions of the first, sometimes the first two tokens and ministral-3:14b is not usable anymore until, the num_ctx is set to 128512 or lower.

Update: Same applies to ministral-3:14b-instruct-2512-q8_0 , but at num_ctx over 206848

Important:
This is not a problem with ministral-3:8b or ministral-3:3b . These two models work fine with a num_ctx of 262144

System:
I am running Ubuntu 24.04, with 3x RTX 3080 - 20GB and 2x RTX 3060 - 12GB, so a total of 84 GB raw (76 GB usable) VRAM.

ollama_0.13.1_ministral.txt

Relevant log output

# ollama show ministral-3:14b
  Model
    architecture        mistral3
    parameters          13.9B
    context length      262144
    embedding length    5120
    quantization        Q4_K_M

  Capabilities
    completion
    vision
    tools

  Parameters
    temperature    0.15

  System
    You are Ministral-3-14B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French
      startup headquartered in Paris.
    You power an AI assistant called Le Chat.
    ...


# ollama run ministral-3:14b
>>> /set parameter num_ctx 128512
Set parameter 'num_ctx' to '128512'
>>> why is the sky blue?
The sky appears blue due to a phenomenon called **Rayleigh scattering**, which is the scattering of sunlight by the molecules and tiny particles in Earth's atmosphere. Here’s a step-by-step explanation:

1. **Sunlight Composition**: Sunlight (white light) is made up of all colors of the visible spectrum—red, orange, yellow, green, blue, indigo, and violet—each with different wavelengths. Blue and violet light have the shortest
wavelengths, while red light has the longest.

2. **Scattering by the Atmosphere**: When sunlight reaches Earth's atmosphere, it interacts with nitrogen and oxygen molecules. Shorter wavelengths (blue and violet) are scattered more efficiently than longer wavelengths (red, orange,
yellow) because they interact more strongly with the molecules.

3. **Why Not Violet?**: Although violet light is scattered even more than blue, the human eye is less sensitive to violet light, and some of it is also absorbed by the upper atmosphere. As a result, blue light dominates what we
perceive.

4. **Perception**: Our eyes detect the scattered blue light from all directions, making the sky appear blue during the day.

At sunrise or sunset, the sky often appears red or orange because sunlight passes through more of the atmosphere, scattering the shorter blue wavelengths and leaving the longer red and orange wavelengths to reach our eyes.

>>> /set parameter num_ctx 128513
Set parameter 'num_ctx' to '128513'
>>> why is the sky blue?
TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe

>>> Send a message (/? for help)


# ollama ps
NAME               ID              SIZE     PROCESSOR    CONTEXT    UNTIL
ministral-3:14b    8a5cdca192c0    40 GB    100% GPU     128513     59 minutes from now


# ollama -v
ollama version is 0.13.1

# nvidia-smi
Wed Dec  3 11:16:24 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        On  |   00000000:23:00.0 Off |                  N/A |
|  0%   32C    P8             21W /  105W |       4MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        On  |   00000000:25:00.0 Off |                  N/A |
|  0%   29C    P8              4W /  105W |   14319MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3060        On  |   00000000:2F:00.0 Off |                  N/A |
|  0%   36C    P8              9W /  105W |   10791MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3080        On  |   00000000:30:00.0 Off |                  N/A |
|  0%   32C    P8              8W /  105W |   14137MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA GeForce RTX 3060        On  |   00000000:31:00.0 Off |                  N/A |
|  0%   26C    P8              5W /  105W |       4MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    1   N/A  N/A           20744      C   /usr/bin/ollama                       14310MiB |
|    2   N/A  N/A           20744      C   /usr/bin/ollama                       10782MiB |
|    3   N/A  N/A           20744      C   /usr/bin/ollama                       14128MiB |
+-----------------------------------------------------------------------------------------+

OS

Linux Ubuntu 24.04

GPU

3x RTX 3080 20GB
2x RTX 3060 12GB

CPU

AMD Ryzen 5 5500GT

Ollama version

0.13.1

Originally created by @dan-and on GitHub (Dec 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13313 ### What is the issue? ministral-3:14b ( ministral-3:14b-instruct-2512-q4_K_M ) model description mentions a context length of 262144. during my first tests, I figured out that any value over 128512 results in repetitions of the first, sometimes the first two tokens and ministral-3:14b is not usable anymore until, the num_ctx is set to 128512 or lower. _Update: Same applies to ministral-3:14b-instruct-2512-q8_0 , but at num_ctx over 206848_ **Important:** This is not a problem with ministral-3:8b or ministral-3:3b . These two models work fine with a num_ctx of 262144 **System:** I am running Ubuntu 24.04, with 3x RTX 3080 - 20GB and 2x RTX 3060 - 12GB, so a total of 84 GB raw (76 GB usable) VRAM. [ollama_0.13.1_ministral.txt](https://github.com/user-attachments/files/23904723/ollama_0.13.1_ministral.txt) ### Relevant log output ```shell # ollama show ministral-3:14b Model architecture mistral3 parameters 13.9B context length 262144 embedding length 5120 quantization Q4_K_M Capabilities completion vision tools Parameters temperature 0.15 System You are Ministral-3-14B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. You power an AI assistant called Le Chat. ... # ollama run ministral-3:14b >>> /set parameter num_ctx 128512 Set parameter 'num_ctx' to '128512' >>> why is the sky blue? The sky appears blue due to a phenomenon called **Rayleigh scattering**, which is the scattering of sunlight by the molecules and tiny particles in Earth's atmosphere. Here’s a step-by-step explanation: 1. **Sunlight Composition**: Sunlight (white light) is made up of all colors of the visible spectrum—red, orange, yellow, green, blue, indigo, and violet—each with different wavelengths. Blue and violet light have the shortest wavelengths, while red light has the longest. 2. **Scattering by the Atmosphere**: When sunlight reaches Earth's atmosphere, it interacts with nitrogen and oxygen molecules. Shorter wavelengths (blue and violet) are scattered more efficiently than longer wavelengths (red, orange, yellow) because they interact more strongly with the molecules. 3. **Why Not Violet?**: Although violet light is scattered even more than blue, the human eye is less sensitive to violet light, and some of it is also absorbed by the upper atmosphere. As a result, blue light dominates what we perceive. 4. **Perception**: Our eyes detect the scattered blue light from all directions, making the sky appear blue during the day. At sunrise or sunset, the sky often appears red or orange because sunlight passes through more of the atmosphere, scattering the shorter blue wavelengths and leaving the longer red and orange wavelengths to reach our eyes. >>> /set parameter num_ctx 128513 Set parameter 'num_ctx' to '128513' >>> why is the sky blue? TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe >>> Send a message (/? for help) # ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL ministral-3:14b 8a5cdca192c0 40 GB 100% GPU 128513 59 minutes from now # ollama -v ollama version is 0.13.1 # nvidia-smi Wed Dec 3 11:16:24 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3080 On | 00000000:23:00.0 Off | N/A | | 0% 32C P8 21W / 105W | 4MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3080 On | 00000000:25:00.0 Off | N/A | | 0% 29C P8 4W / 105W | 14319MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3060 On | 00000000:2F:00.0 Off | N/A | | 0% 36C P8 9W / 105W | 10791MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3080 On | 00000000:30:00.0 Off | N/A | | 0% 32C P8 8W / 105W | 14137MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA GeForce RTX 3060 On | 00000000:31:00.0 Off | N/A | | 0% 26C P8 5W / 105W | 4MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 1 N/A N/A 20744 C /usr/bin/ollama 14310MiB | | 2 N/A N/A 20744 C /usr/bin/ollama 10782MiB | | 3 N/A N/A 20744 C /usr/bin/ollama 14128MiB | +-----------------------------------------------------------------------------------------+ ``` ### OS Linux Ubuntu 24.04 ### GPU 3x RTX 3080 20GB 2x RTX 3060 12GB ### CPU AMD Ryzen 5 5500GT ### Ollama version 0.13.1
GiteaMirror added the bug label 2026-04-29 08:49:24 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 3, 2025):

This may be caused by the model being split over multiple GPUs. I ran the same commands as your example and the second why is the sky blue? ran without a problem, the difference being that in my test the model fit on one GPU. What is the output of nvidia-smi for the the two different num_ctx values?

It's also interesting that the size of the model in your case (40GB) is less than in my single GPU case (44GB). Typically the memory footprint goes up when a model is split across multiple devices.

<!-- gh-comment-id:3607281540 --> @rick-github commented on GitHub (Dec 3, 2025): This may be caused by the model being split over multiple GPUs. I ran the same commands as your example and the second `why is the sky blue?` ran without a problem, the difference being that in my test the model fit on one GPU. What is the output of `nvidia-smi` for the the two different `num_ctx` values? It's also interesting that the size of the model in your case (40GB) is less than in my single GPU case (44GB). Typically the memory footprint goes up when a model is split across multiple devices.
Author
Owner

@dan-and commented on GitHub (Dec 3, 2025):

That may explain, why CPU offloading does create issues / is not supported currently ( see https://github.com/ollama/ollama/issues/13312 )

However, I am able to run ministral-3:14b with num_ctx of 206848 over several GPUs, as it requires 60GB of VRAM:

ollama ps

NAME ID SIZE PROCESSOR CONTEXT UNTIL
ministral-3_14b_q8:206k 932e1d757f66 60 GB 100% GPU 206848 59 minutes from now

$ nvidia-smi
Wed Dec  3 17:29:28 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        On  |   00000000:23:00.0 Off |                  N/A |
| 50%   38C    P8             23W /  105W |   18839MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        On  |   00000000:25:00.0 Off |                  N/A |
| 50%   38C    P8             16W /  105W |   19629MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3060        On  |   00000000:2F:00.0 Off |                  N/A |
|  0%   36C    P8              9W /  105W |       4MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3080        On  |   00000000:30:00.0 Off |                  N/A |
| 50%   35C    P8              9W /  105W |   18895MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA GeForce RTX 3060        On  |   00000000:31:00.0 Off |                  N/A |
|  0%   33C    P8             10W /  105W |       4MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           79030      C   /usr/bin/ollama                       18830MiB |
|    1   N/A  N/A           79030      C   /usr/bin/ollama                       19620MiB |
|    3   N/A  N/A           79030      C   /usr/bin/ollama                       18886MiB |
+-----------------------------------------------------------------------------------------+ 

$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
ministral-3_14b_q8:206k 932e1d757f66 60 GB 100% GPU 206849 59 minutes from now

$ nvidia-smi
Wed Dec  3 17:44:31 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        On  |   00000000:23:00.0 Off |                  N/A |
| 44%   37C    P2             95W /  105W |   14423MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        On  |   00000000:25:00.0 Off |                  N/A |
| 50%   39C    P2             90W /  105W |   16337MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3060        On  |   00000000:2F:00.0 Off |                  N/A |
|  0%   36C    P2             27W /  105W |   10955MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3080        On  |   00000000:30:00.0 Off |                  N/A |
| 50%   39C    P2             81W /  105W |   16339MiB /  20480MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA GeForce RTX 3060        On  |   00000000:31:00.0 Off |                  N/A |
|  0%   34C    P2             28W /  105W |     113MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           81364      C   /usr/bin/ollama                       14414MiB |
|    1   N/A  N/A           81364      C   /usr/bin/ollama                       16328MiB |
|    2   N/A  N/A           81364      C   /usr/bin/ollama                       10946MiB |
|    3   N/A  N/A           81364      C   /usr/bin/ollama                       16330MiB |
|    4   N/A  N/A           81364      C   /usr/bin/ollama                         104MiB |
+-----------------------------------------------------------------------------------------+

From 3 GPU allocation to 5 GPUs - especially the latest one with just 104MiB is concerning.

<!-- gh-comment-id:3607963165 --> @dan-and commented on GitHub (Dec 3, 2025): That may explain, why CPU offloading does create issues / is not supported currently ( see https://github.com/ollama/ollama/issues/13312 ) However, I am able to run ministral-3:14b with num_ctx of 206848 over several GPUs, as it requires 60GB of VRAM: # ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL ministral-3_14b_q8:206k 932e1d757f66 60 GB 100% GPU 206848 59 minutes from now ``` $ nvidia-smi Wed Dec 3 17:29:28 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3080 On | 00000000:23:00.0 Off | N/A | | 50% 38C P8 23W / 105W | 18839MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3080 On | 00000000:25:00.0 Off | N/A | | 50% 38C P8 16W / 105W | 19629MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3060 On | 00000000:2F:00.0 Off | N/A | | 0% 36C P8 9W / 105W | 4MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3080 On | 00000000:30:00.0 Off | N/A | | 50% 35C P8 9W / 105W | 18895MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA GeForce RTX 3060 On | 00000000:31:00.0 Off | N/A | | 0% 33C P8 10W / 105W | 4MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 79030 C /usr/bin/ollama 18830MiB | | 1 N/A N/A 79030 C /usr/bin/ollama 19620MiB | | 3 N/A N/A 79030 C /usr/bin/ollama 18886MiB | +-----------------------------------------------------------------------------------------+ ``` --- $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL ministral-3_14b_q8:206k 932e1d757f66 60 GB 100% GPU 206849 59 minutes from now ``` $ nvidia-smi Wed Dec 3 17:44:31 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3080 On | 00000000:23:00.0 Off | N/A | | 44% 37C P2 95W / 105W | 14423MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3080 On | 00000000:25:00.0 Off | N/A | | 50% 39C P2 90W / 105W | 16337MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3060 On | 00000000:2F:00.0 Off | N/A | | 0% 36C P2 27W / 105W | 10955MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3080 On | 00000000:30:00.0 Off | N/A | | 50% 39C P2 81W / 105W | 16339MiB / 20480MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA GeForce RTX 3060 On | 00000000:31:00.0 Off | N/A | | 0% 34C P2 28W / 105W | 113MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 81364 C /usr/bin/ollama 14414MiB | | 1 N/A N/A 81364 C /usr/bin/ollama 16328MiB | | 2 N/A N/A 81364 C /usr/bin/ollama 10946MiB | | 3 N/A N/A 81364 C /usr/bin/ollama 16330MiB | | 4 N/A N/A 81364 C /usr/bin/ollama 104MiB | +-----------------------------------------------------------------------------------------+ ``` From 3 GPU allocation to 5 GPUs - especially the latest one with just 104MiB is concerning.
Author
Owner

@dabe-19 commented on GitHub (Dec 4, 2025):

I made this comment on a similar thread. Don't have quite the same capacity as you, but if you're expecting total VRAM usage to be close to your full GPU capacity, this might be related. It was certainly preventing me from running any of the new, smaller models at ctx 4096 and lowering to 2048 didn't even help.

https://github.com/ollama/ollama/issues/13315#issuecomment-3609968353

<!-- gh-comment-id:3609976974 --> @dabe-19 commented on GitHub (Dec 4, 2025): I made this comment on a similar thread. Don't have quite the same capacity as you, but if you're expecting total VRAM usage to be close to your full GPU capacity, this might be related. It was certainly preventing me from running any of the new, smaller models at ctx 4096 and lowering to 2048 didn't even help. https://github.com/ollama/ollama/issues/13315#issuecomment-3609968353
Author
Owner

@crwsolutions commented on GitHub (Dec 4, 2025):

I have the same problem, I think. This is the behavior out of the box:

C:\Users\yes>ollama run ministral-3:14b
>>> Hi
HelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHello

>>> /bye

C:\Users\yes>ollama --version
ollama version is 0.13.1

My PC: Windows 11, nvidia 5060 TI + nvidia 3060

<!-- gh-comment-id:3611855630 --> @crwsolutions commented on GitHub (Dec 4, 2025): I have the same problem, I think. This is the behavior out of the box: ```bash C:\Users\yes>ollama run ministral-3:14b >>> Hi HelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHello >>> /bye C:\Users\yes>ollama --version ollama version is 0.13.1 ``` My PC: Windows 11, nvidia 5060 TI + nvidia 3060
Author
Owner

@dan-and commented on GitHub (Dec 8, 2025):

Fixed with ollama 0.13.2-rc2

<!-- gh-comment-id:3623843425 --> @dan-and commented on GitHub (Dec 8, 2025): Fixed with ollama 0.13.2-rc2
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55306