[GH-ISSUE #9988] Incorrect memory requirement calculation for small models (32B model showing 659.2 GiB requirement) #53056

Closed
opened 2026-04-29 01:46:35 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @jaybom on GitHub (Mar 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9988

What is the issue?

When trying to run a small 32B model (qwq) using ollama run qwq, the system incorrectly reports that it needs 659.2 GiB of system memory, which is clearly incorrect for a 32B model. The system has 173.9 GiB available memory.

Relevant log output

"{"code": "completion_request_error", "message": "[ollama] Error: PluginInvokeError: {\"args\":{\"description\":\"[models] Error: API request failed with status code 500: {\\\"error\\\":\\\"model requires more system memory (659.2 GiB) than is available (173.9 GiB)\\\"}\"},\"error_type\":\"InvokeError\",\"message\":\"[models] Error: API request failed with status code 500: {\\\"error\\\":\\\"model requires more system memory (659.2 GiB) than is available (173.9 GiB)\\\"}\"}", "status": 400}<EOL>"

OS

Windows

GPU

Nvidia

CPU

No response

Ollama version

0.6.2

Originally created by @jaybom on GitHub (Mar 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9988 ### What is the issue? When trying to run a small 32B model (qwq) using ollama run qwq, the system incorrectly reports that it needs 659.2 GiB of system memory, which is clearly incorrect for a 32B model. The system has 173.9 GiB available memory. ### Relevant log output ```shell "{"code": "completion_request_error", "message": "[ollama] Error: PluginInvokeError: {\"args\":{\"description\":\"[models] Error: API request failed with status code 500: {\\\"error\\\":\\\"model requires more system memory (659.2 GiB) than is available (173.9 GiB)\\\"}\"},\"error_type\":\"InvokeError\",\"message\":\"[models] Error: API request failed with status code 500: {\\\"error\\\":\\\"model requires more system memory (659.2 GiB) than is available (173.9 GiB)\\\"}\"}", "status": 400}<EOL>" ``` ### OS Windows ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.6.2
GiteaMirror added the bug label 2026-04-29 01:46:35 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 26, 2025):

What context size are you setting? Server logs may aid in debugging.

<!-- gh-comment-id:2753046980 --> @rick-github commented on GitHub (Mar 26, 2025): What context size are you setting? [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@jaybom commented on GitHub (Mar 26, 2025):

C:\Users\Administrator>ollama serve
2025/03/26 10:00:34 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:8m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\ollama\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:20 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES:]"
time=2025-03-26T10:00:34.912+08:00 level=ERROR source=images.go:422 msg="couldn't remove blob" blob=blobs error="remove F:\ollama\.ollama\models\blobs\blobs: The directory is not empty."
time=2025-03-26T10:00:34.914+08:00 level=INFO source=images.go:432 msg="total blobs: 49"
time=2025-03-26T10:00:34.918+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-26T10:00:34.921+08:00 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434 (version 0.6.2)"
time=2025-03-26T10:00:34.921+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-26T10:00:34.922+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=2
time=2025-03-26T10:00:34.923+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=52 efficiency=0 threads=104
time=2025-03-26T10:00:34.923+08:00 level=INFO source=gpu_windows.go:214 msg="" package=1 cores=52 efficiency=0 threads=104
time=2025-03-26T10:00:35.392+08:00 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-881b6982-5eba-2cbe-6d7b-9ac090c9a7ee library=cuda compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" overhead="1018.7 MiB"
time=2025-03-26T10:00:35.768+08:00 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-1e34a307-8fae-e07b-75bc-a69cc18fff6f library=cuda compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" overhead="187.9 MiB"
time=2025-03-26T10:00:35.776+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-881b6982-5eba-2cbe-6d7b-9ac090c9a7ee library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" total="24.0 GiB" available="22.8 GiB"
time=2025-03-26T10:00:35.777+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-1e34a307-8fae-e07b-75bc-a69cc18fff6f library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" total="24.0 GiB" available="22.8 GiB"
[GIN] 2025/03/26 - 10:00:51 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/26 - 10:00:52 | 200 | 58.9146ms | 127.0.0.1 | POST "/api/show"
time=2025-03-26T10:00:52.229+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.vision.block_count default=0
time=2025-03-26T10:00:52.252+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-26T10:00:52.252+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-26T10:00:52.257+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-26T10:00:52.259+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-26T10:00:52.284+08:00 level=INFO source=server.go:105 msg="system memory" total="127.7 GiB" free="85.7 GiB" free_swap="87.8 GiB"
time=2025-03-26T10:00:52.285+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.vision.block_count default=0
time=2025-03-26T10:00:52.317+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-26T10:00:52.318+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-26T10:00:52.320+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-26T10:00:52.321+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-26T10:00:52.324+08:00 level=WARN source=server.go:133 msg="model request too large for system" requested="659.2 GiB" available=186216058880 total="127.7 GiB" free="85.7 GiB" swap="87.8 GiB"
time=2025-03-26T10:00:52.324+08:00 level=INFO source=sched.go:429 msg="NewLlamaServer failed" model=F:\ollama.ollama\models\blobs\sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb error="model requires more system memory (659.2 GiB) than is available (173.4 GiB)"
[GIN] 2025/03/26 - 10:00:52 | 500 | 251.9982ms | 127.0.0.1 | POST "/api/generate"

Image

<!-- gh-comment-id:2753051562 --> @jaybom commented on GitHub (Mar 26, 2025): C:\Users\Administrator>ollama serve 2025/03/26 10:00:34 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:8m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\\ollama\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:20 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES:]" time=2025-03-26T10:00:34.912+08:00 level=ERROR source=images.go:422 msg="couldn't remove blob" blob=blobs error="remove F:\\ollama\\.ollama\\models\\blobs\\blobs: The directory is not empty." time=2025-03-26T10:00:34.914+08:00 level=INFO source=images.go:432 msg="total blobs: 49" time=2025-03-26T10:00:34.918+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-26T10:00:34.921+08:00 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434 (version 0.6.2)" time=2025-03-26T10:00:34.921+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-26T10:00:34.922+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=2 time=2025-03-26T10:00:34.923+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=52 efficiency=0 threads=104 time=2025-03-26T10:00:34.923+08:00 level=INFO source=gpu_windows.go:214 msg="" package=1 cores=52 efficiency=0 threads=104 time=2025-03-26T10:00:35.392+08:00 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-881b6982-5eba-2cbe-6d7b-9ac090c9a7ee library=cuda compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" overhead="1018.7 MiB" time=2025-03-26T10:00:35.768+08:00 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-1e34a307-8fae-e07b-75bc-a69cc18fff6f library=cuda compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" overhead="187.9 MiB" time=2025-03-26T10:00:35.776+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-881b6982-5eba-2cbe-6d7b-9ac090c9a7ee library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" total="24.0 GiB" available="22.8 GiB" time=2025-03-26T10:00:35.777+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-1e34a307-8fae-e07b-75bc-a69cc18fff6f library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090" total="24.0 GiB" available="22.8 GiB" [GIN] 2025/03/26 - 10:00:51 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2025/03/26 - 10:00:52 | 200 | 58.9146ms | 127.0.0.1 | POST "/api/show" time=2025-03-26T10:00:52.229+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.vision.block_count default=0 time=2025-03-26T10:00:52.252+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.252+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.257+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.259+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.284+08:00 level=INFO source=server.go:105 msg="system memory" total="127.7 GiB" free="85.7 GiB" free_swap="87.8 GiB" time=2025-03-26T10:00:52.285+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.vision.block_count default=0 time=2025-03-26T10:00:52.317+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.318+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.320+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.321+08:00 level=WARN source=ggml.go:149 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.324+08:00 level=WARN source=server.go:133 msg="model request too large for system" requested="659.2 GiB" available=186216058880 total="127.7 GiB" free="85.7 GiB" swap="87.8 GiB" time=2025-03-26T10:00:52.324+08:00 level=INFO source=sched.go:429 msg="NewLlamaServer failed" model=F:\ollama\.ollama\models\blobs\sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb error="model requires more system memory (659.2 GiB) than is available (173.4 GiB)" [GIN] 2025/03/26 - 10:00:52 | 500 | 251.9982ms | 127.0.0.1 | POST "/api/generate" ![Image](https://github.com/user-attachments/assets/89890626-81c6-4d64-9bba-c732463c6b22)
Author
Owner

@jaybom commented on GitHub (Mar 26, 2025):

您设置的上下文大小是多少?服务器日志可能有助于调试。

C:\Users\Administrator>ollama serve 2025/03/26 10:00:34 routes.go:1230: INFO 服务器配置 env=“map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_A实时:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:8m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\ollama.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:20 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES:]“ time=2025-03-26T10:00:34.912+08:00 level=ERROR source=images.go:422 msg=”无法删除blob“ blob=blobs error=”删除F:\ollama.ollama\models\blobs\blobs:目录不为空。C:\Users\Administrator>ollama serve 2025/03/26 10:00:34 routes.go:1230: INFO 服务器配置 env=“map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:8m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\ollama.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:20 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES:]“ 时间=2025-03-26T10:00:34.912+08:00 level=ERROR source=images.go:422 msg=”couldn't remove blob“ blob=blobs error=”remove F:\ollama.ollama\models\blobs\blobs: The directory is not empty.“ time=2025-03-26T10:00:34.914+08:00 level=INFO source=images.go:432 msg=”总blobs: 49“ time=2025-03-26T10:00:34.918+08:00 level=INFO source=images.go:439 msg=”已删除的未使用blob总数: 0“ time=2025-03-26T10:00:34.921+08:00 level=INFO source=routes.go:1297 msg=“正在监听 127.0.0.1:11434 (版本 0.6.2)” time=2025-03-26T10:00:34.921+08:00 level=INFO source=gpu.go:217 msg=“寻找兼容的 GPU” time=2025-03-26T10:00:34.922+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=2 time=2025-03-26T10:00:34.923+08:00 level=INFO source=gpu_windows.go:214 mC:\使用rs\广告管理员>ollama serve 2025/03/26 10:00:34 路线s.go:1230: INFO 系列rver 配置env=“map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434 OLy_length default=128 time=2025-03-26T10:00:52.252+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.257+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.259+08:00 level=WARN source=ggml.go:149 msg=“未找到密钥” key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.284+08:00 level=INFO source=server.go:105 msg=“系统内存” total=“127.7 GiB” free=“85.7 GiB” free_swap=“87.8 GiB” time=2025-03-26T10:00:52.285+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.vision.block_count default=0 时间=2025-03-26T10:00:52.317+08:00 level=警告 source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.318+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.320+08:00 level=WARN source=ggml.go:149 msg=“密钥未源d“ key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.321+08:00 level=WARN source=ggml.go:149 msg=”未找到密钥“ key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.324+08:00 level=WARN source=server.go:133 msg=”模型请求对系统来说太大“ requested=”659.2 GiB“ 可用=186216058880 total=”127.7 GiB”free=“85.7 GiB” swap=“87.8 GiB” time=2025-03-26T10:00:52.324+08:00 level=INFO source=sched.go:429 msg=“NewLlamaServer failed” model=F:\ollama.ollama\models\blobs\sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb error=“model requires more system memory (659.2 GiB) than is available (173.4 GiB)” [GIN] 2025/03/26 - 10:00:52 |y_leng th 默认值 = 128 时间 = 2025-03-26T1

Image

<!-- gh-comment-id:2753052980 --> @jaybom commented on GitHub (Mar 26, 2025): > 您设置的上下文大小是多少?[服务器日志](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues)可能有助于调试。 > C:\Users\Administrator>ollama serve 2025/03/26 10:00:34 routes.go:1230: INFO 服务器配置 env=“map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_A实时:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:8m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\ollama\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:20 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES:]“ time=2025-03-26T10:00:34.912+08:00 level=ERROR source=images.go:422 msg=”无法删除blob“ blob=blobs error=”删除F:\ollama\.ollama\models\blobs\blobs:目录不为空。C:\Users\Administrator>ollama serve 2025/03/26 10:00:34 routes.go:1230: INFO 服务器配置 env=“map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:8m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\ollama\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:20 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES:]“ 时间=2025-03-26T10:00:34.912+08:00 level=ERROR source=images.go:422 msg=”couldn't remove blob“ blob=blobs error=”remove F:\ollama\.ollama\models\blobs\blobs: The directory is not empty.“ time=2025-03-26T10:00:34.914+08:00 level=INFO source=images.go:432 msg=”总blobs: 49“ time=2025-03-26T10:00:34.918+08:00 level=INFO source=images.go:439 msg=”已删除的未使用blob总数: 0“ time=2025-03-26T10:00:34.921+08:00 level=INFO source=routes.go:1297 msg=“正在监听 127.0.0.1:11434 (版本 0.6.2)” time=2025-03-26T10:00:34.921+08:00 level=INFO source=gpu.go:217 msg=“寻找兼容的 GPU” time=2025-03-26T10:00:34.922+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=2 time=2025-03-26T10:00:34.923+08:00 level=INFO source=gpu_windows.go:214 mC:\使用rs\广告管理员>ollama serve 2025/03/26 10:00:34 路线s.go:1230: INFO 系列rver 配置env=“map[CUDA_VISIBLE_DEVICES:0,1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:131072 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:4096 OLLAMA_HOST:http://127.0.0.1:11434 OLy_length default=128 time=2025-03-26T10:00:52.252+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.257+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.259+08:00 level=WARN source=ggml.go:149 msg=“未找到密钥” key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.284+08:00 level=INFO source=server.go:105 msg=“系统内存” total=“127.7 GiB” free=“85.7 GiB” free_swap=“87.8 GiB” time=2025-03-26T10:00:52.285+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.vision.block_count default=0 时间=2025-03-26T10:00:52.317+08:00 level=警告 source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.318+08:00 level=WARN source=ggml.go:149 msg=“找不到密钥” key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.320+08:00 level=WARN source=ggml.go:149 msg=“密钥未源d“ key=qwen2.attention.key_length default=128 time=2025-03-26T10:00:52.321+08:00 level=WARN source=ggml.go:149 msg=”未找到密钥“ key=qwen2.attention.value_length default=128 time=2025-03-26T10:00:52.324+08:00 level=WARN source=server.go:133 msg=”模型请求对系统来说太大“ requested=”659.2 GiB“ 可用=186216058880 total=”127.7 GiB”free=“85.7 GiB” swap=“87.8 GiB” time=2025-03-26T10:00:52.324+08:00 level=INFO source=sched.go:429 msg=“NewLlamaServer failed” model=F:\ollama.ollama\models\blobs\sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb error=“model requires more system memory (659.2 GiB) than is available (173.4 GiB)” [GIN] 2025/03/26 - 10:00:52 |y_leng th 默认值 = 128 时间 = 2025-03-26T1 > > ![Image](https://github.com/user-attachments/assets/89890626-81c6-4d64-9bba-c732463c6b22)
Author
Owner

@rick-github commented on GitHub (Mar 26, 2025):

OLLAMA_CONTEXT_LENGTH:131072

Context is too big.

<!-- gh-comment-id:2753059274 --> @rick-github commented on GitHub (Mar 26, 2025): ``` OLLAMA_CONTEXT_LENGTH:131072 ``` Context is too big.
Author
Owner

@konian71 commented on GitHub (Mar 27, 2025):

Today I noticed that the VRAM requirements for qwen2.5-coder:32b and QwQ:32b are higher than before. I haven't changed anything, and with a context window of 25,000, the Nvidia cards' 3 x 24 GB VRAM are suddenly insufficient. Previously, the 72 GB VRAM overflow started at around 67,000 context windows. Now the limit is ~20,000!

<!-- gh-comment-id:2759522052 --> @konian71 commented on GitHub (Mar 27, 2025): Today I noticed that the VRAM requirements for qwen2.5-coder:32b and QwQ:32b are higher than before. I haven't changed anything, and with a context window of 25,000, the Nvidia cards' 3 x 24 GB VRAM are suddenly insufficient. Previously, the 72 GB VRAM overflow started at around 67,000 context windows. Now the limit is ~20,000!
Author
Owner

@konian71 commented on GitHub (Mar 27, 2025):

OLLAMA_CONTEXT_LENGTH:131072

Context is too big.

It never required more than 170 GB of RAM. It was around 120 GB or even less? I've tested it in the past...

<!-- gh-comment-id:2759531054 --> @konian71 commented on GitHub (Mar 27, 2025): > ``` > OLLAMA_CONTEXT_LENGTH:131072 > ``` > > Context is too big. It never required more than 170 GB of RAM. It was around 120 GB or even less? I've tested it in the past...
Author
Owner

@jessegross commented on GitHub (Mar 27, 2025):

@konian71 Can you give the specific versions of Ollama that have different behavior along with the logs and output of ollama ps? Also the model, context length, etc. that you used.

<!-- gh-comment-id:2759722495 --> @jessegross commented on GitHub (Mar 27, 2025): @konian71 Can you give the specific versions of Ollama that have different behavior along with the logs and output of `ollama ps`? Also the model, context length, etc. that you used.
Author
Owner

@rick-github commented on GitHub (Mar 28, 2025):

2 x A100-40G

parallel model version num_ctx nvidia-smi /api/ps/size_vram /api/ps/size
1 qwen2.5-coder:32b 0.5.0 2048 19757 20090 20090
1 qwen2.5-coder:32b 0.5.0 4096 20331 20651 20651
1 qwen2.5-coder:32b 0.5.0 8192 21683 22019 22019
1 qwen2.5-coder:32b 0.5.0 16384 24387 24755 24755
1 qwen2.5-coder:32b 0.5.0 65536 47168 50009 50009
1 qwen2.5-coder:32b 0.5.0 131072 72590 79352 79961
1 qwen2.5-coder:32b 0.5.7 2048 19757 20090 20090
1 qwen2.5-coder:32b 0.5.7 4096 20331 20651 20651
1 qwen2.5-coder:32b 0.5.7 8192 21683 22019 22019
1 qwen2.5-coder:32b 0.5.7 16384 24387 24755 24755
1 qwen2.5-coder:32b 0.5.7 65536 47168 50009 50009
1 qwen2.5-coder:32b 0.5.7 131072 72590 79352 79961
1 qwen2.5-coder:32b 0.5.12 2048 19757 20090 20090
1 qwen2.5-coder:32b 0.5.12 4096 20331 20651 20651
1 qwen2.5-coder:32b 0.5.12 8192 21683 22019 22019
1 qwen2.5-coder:32b 0.5.12 16384 24387 24755 24755
1 qwen2.5-coder:32b 0.5.12 65536 47168 50009 50009
1 qwen2.5-coder:32b 0.5.12 131072 72590 79352 79961
1 qwen2.5-coder:32b 0.6.0 2048 19757 20090 20090
1 qwen2.5-coder:32b 0.6.0 4096 20331 20651 20651
1 qwen2.5-coder:32b 0.6.0 8192 21683 22019 22019
1 qwen2.5-coder:32b 0.6.0 16384 24387 24755 24755
1 qwen2.5-coder:32b 0.6.0 65536 47168 50009 50009
1 qwen2.5-coder:32b 0.6.0 131072 72590 79352 79961
1 qwen2.5-coder:32b 0.6.3 2048 19757 20090 20090
1 qwen2.5-coder:32b 0.6.3 4096 20331 20651 20651
1 qwen2.5-coder:32b 0.6.3 8192 21683 22019 22019
1 qwen2.5-coder:32b 0.6.3 16384 24387 24755 24755
1 qwen2.5-coder:32b 0.6.3 65536 47168 50009 50009
1 qwen2.5-coder:32b 0.6.3 131072 72590 79352 79961
1 qwq:32b 0.5.0 2048 19757 20090 20090
1 qwq:32b 0.5.0 4096 20331 20651 20651
1 qwq:32b 0.5.0 8192 21683 22019 22019
1 qwq:32b 0.5.0 16384 24387 24755 24755
1 qwq:32b 0.5.0 65536 47168 50009 50009
1 qwq:32b 0.5.0 131072 72590 79352 79961
1 qwq:32b 0.5.7 2048 19757 20090 20090
1 qwq:32b 0.5.7 4096 20331 20651 20651
1 qwq:32b 0.5.7 8192 21683 22019 22019
1 qwq:32b 0.5.7 16384 24387 24755 24755
1 qwq:32b 0.5.7 65536 47168 50009 50009
1 qwq:32b 0.5.7 131072 72590 79352 79961
1 qwq:32b 0.5.12 2048 19757 20090 20090
1 qwq:32b 0.5.12 4096 20331 20651 20651
1 qwq:32b 0.5.12 8192 21683 22019 22019
1 qwq:32b 0.5.12 16384 24387 24755 24755
1 qwq:32b 0.5.12 65536 47168 50009 50009
1 qwq:32b 0.5.12 131072 72590 79352 79961
1 qwq:32b 0.6.0 2048 19757 20090 20090
1 qwq:32b 0.6.0 4096 20331 20651 20651
1 qwq:32b 0.6.0 8192 21683 22019 22019
1 qwq:32b 0.6.0 16384 24387 24755 24755
1 qwq:32b 0.6.0 65536 47168 50009 50009
1 qwq:32b 0.6.0 131072 72590 79352 79961
1 qwq:32b 0.6.3 2048 19757 20090 20090
1 qwq:32b 0.6.3 4096 20331 20651 20651
1 qwq:32b 0.6.3 8192 21683 22019 22019
1 qwq:32b 0.6.3 16384 24387 24755 24755
1 qwq:32b 0.6.3 65536 47168 50009 50009
1 qwq:32b 0.6.3 131072 72590 79352 79961
4 qwen2.5-coder:32b 0.5.0 2048 21683 22019 22019
4 qwen2.5-coder:32b 0.5.0 4096 24387 24755 24755
4 qwen2.5-coder:32b 0.5.0 8192 29795 30227 30227
4 qwen2.5-coder:32b 0.5.0 16384 47168 50009 50009
4 qwen2.5-coder:32b 0.5.0 65536 67688 79359 140768
4 qwen2.5-coder:32b 0.5.0 131072 8 0 150735
4 qwen2.5-coder:32b 0.5.7 2048 21683 22019 22019
4 qwen2.5-coder:32b 0.5.7 4096 24387 24755 24755
4 qwen2.5-coder:32b 0.5.7 8192 29795 30227 30227
4 qwen2.5-coder:32b 0.5.7 16384 47168 50009 50009
4 qwen2.5-coder:32b 0.5.7 65536 67504 79359 140768
4 qwen2.5-coder:32b 0.5.7 131072 8 0 150735
4 qwen2.5-coder:32b 0.5.12 2048 21683 22019 22019
4 qwen2.5-coder:32b 0.5.12 4096 24387 24755 24755
4 qwen2.5-coder:32b 0.5.12 8192 29795 30227 30227
4 qwen2.5-coder:32b 0.5.12 16384 47168 50009 50009
4 qwen2.5-coder:32b 0.5.12 65536 67504 79359 140768
4 qwen2.5-coder:32b 0.5.12 131072 8 0 150735
4 qwen2.5-coder:32b 0.6.0 2048 21683 22019 22019
4 qwen2.5-coder:32b 0.6.0 4096 24387 24755 24755
4 qwen2.5-coder:32b 0.6.0 8192 29795 30227 30227
4 qwen2.5-coder:32b 0.6.0 16384 47168 50009 50009
4 qwen2.5-coder:32b 0.6.0 65536 67504 79359 140768
4 qwen2.5-coder:32b 0.6.0 131072 8 0 150735
4 qwen2.5-coder:32b 0.6.3 2048 21683 22019 22019
4 qwen2.5-coder:32b 0.6.3 4096 24387 24755 24755
4 qwen2.5-coder:32b 0.6.3 8192 29795 30227 30227
4 qwen2.5-coder:32b 0.6.3 16384 47168 50009 50009
4 qwen2.5-coder:32b 0.6.3 65536 67504 79359 140768
4 qwen2.5-coder:32b 0.6.3 131072 8 0 150735
4 qwq:32b 0.5.0 2048 21683 22019 22019
4 qwq:32b 0.5.0 4096 24387 24755 24755
4 qwq:32b 0.5.0 8192 29795 30227 30227
4 qwq:32b 0.5.0 16384 47168 50009 50009
4 qwq:32b 0.5.0 65536 67688 79359 140768
4 qwq:32b 0.5.0 131072 8 0 150735
4 qwq:32b 0.5.7 2048 21683 22019 22019
4 qwq:32b 0.5.7 4096 24387 24755 24755
4 qwq:32b 0.5.7 8192 29795 30227 30227
4 qwq:32b 0.5.7 16384 47168 50009 50009
4 qwq:32b 0.5.7 65536 67504 79359 140768
4 qwq:32b 0.5.7 131072 8 0 150735
4 qwq:32b 0.5.12 2048 21683 22019 22019
4 qwq:32b 0.5.12 4096 24387 24755 24755
4 qwq:32b 0.5.12 8192 29795 30227 30227
4 qwq:32b 0.5.12 16384 47168 50009 50009
4 qwq:32b 0.5.12 65536 67504 79359 140768
4 qwq:32b 0.5.12 131072 8 0 150735
4 qwq:32b 0.6.0 2048 21683 22019 22019
4 qwq:32b 0.6.0 4096 24387 24755 24755
4 qwq:32b 0.6.0 8192 29795 30227 30227
4 qwq:32b 0.6.0 16384 47168 50009 50009
4 qwq:32b 0.6.0 65536 67504 79359 140768
4 qwq:32b 0.6.0 131072 8 0 150735
4 qwq:32b 0.6.3 2048 21683 22019 22019
4 qwq:32b 0.6.3 4096 24387 24755 24755
4 qwq:32b 0.6.3 8192 29795 30227 30227
4 qwq:32b 0.6.3 16384 47168 50009 50009
4 qwq:32b 0.6.3 65536 67504 79359 140768
4 qwq:32b 0.6.3 131072 8 0 150735
<!-- gh-comment-id:2759871949 --> @rick-github commented on GitHub (Mar 28, 2025): 2 x A100-40G | parallel | model | version | num_ctx | nvidia-smi | /api/ps/size_vram | /api/ps/size | | -- | -- | -- | -- | -- | -- | -- | |1 | qwen2.5-coder:32b | 0.5.0 | 2048 | 19757| 20090| 20090 | |1 | qwen2.5-coder:32b | 0.5.0 | 4096 | 20331| 20651| 20651 | 1 | qwen2.5-coder:32b | 0.5.0 | 8192 | 21683| 22019| 22019 | 1 | qwen2.5-coder:32b | 0.5.0 | 16384 | 24387| 24755| 24755 | 1 | qwen2.5-coder:32b | 0.5.0 | 65536 | 47168| 50009| 50009 | 1 | qwen2.5-coder:32b | 0.5.0 | 131072 | 72590| 79352| 79961 | 1 | qwen2.5-coder:32b | 0.5.7 | 2048 | 19757| 20090| 20090 | 1 | qwen2.5-coder:32b | 0.5.7 | 4096 | 20331| 20651| 20651 | 1 | qwen2.5-coder:32b | 0.5.7 | 8192 | 21683| 22019| 22019 | 1 | qwen2.5-coder:32b | 0.5.7 | 16384 | 24387| 24755| 24755 | 1 | qwen2.5-coder:32b | 0.5.7 | 65536 | 47168| 50009| 50009 | 1 | qwen2.5-coder:32b | 0.5.7 | 131072 | 72590| 79352| 79961 | 1 | qwen2.5-coder:32b | 0.5.12 | 2048 | 19757| 20090| 20090 | 1 | qwen2.5-coder:32b | 0.5.12 | 4096 | 20331| 20651| 20651 | 1 | qwen2.5-coder:32b | 0.5.12 | 8192 | 21683| 22019| 22019 | 1 | qwen2.5-coder:32b | 0.5.12 | 16384 | 24387| 24755| 24755 | 1 | qwen2.5-coder:32b | 0.5.12 | 65536 | 47168| 50009| 50009 | 1 | qwen2.5-coder:32b | 0.5.12 | 131072 | 72590| 79352| 79961 | 1 | qwen2.5-coder:32b | 0.6.0 | 2048 | 19757| 20090| 20090 | 1 | qwen2.5-coder:32b | 0.6.0 | 4096 | 20331| 20651| 20651 | 1 | qwen2.5-coder:32b | 0.6.0 | 8192 | 21683| 22019| 22019 | 1 | qwen2.5-coder:32b | 0.6.0 | 16384 | 24387| 24755| 24755 | 1 | qwen2.5-coder:32b | 0.6.0 | 65536 | 47168| 50009| 50009 | 1 | qwen2.5-coder:32b | 0.6.0 | 131072 | 72590| 79352| 79961 | 1 | qwen2.5-coder:32b | 0.6.3 | 2048 | 19757| 20090| 20090 | 1 | qwen2.5-coder:32b | 0.6.3 | 4096 | 20331| 20651| 20651 | 1 | qwen2.5-coder:32b | 0.6.3 | 8192 | 21683| 22019| 22019 | 1 | qwen2.5-coder:32b | 0.6.3 | 16384 | 24387| 24755| 24755 | 1 | qwen2.5-coder:32b | 0.6.3 | 65536 | 47168| 50009| 50009 | 1 | qwen2.5-coder:32b | 0.6.3 | 131072 | 72590| 79352| 79961 | 1 | qwq:32b | 0.5.0 | 2048 | 19757| 20090| 20090 | 1 | qwq:32b | 0.5.0 | 4096 | 20331| 20651| 20651 | 1 | qwq:32b | 0.5.0 | 8192 | 21683| 22019| 22019 | 1 | qwq:32b | 0.5.0 | 16384 | 24387| 24755| 24755 | 1 | qwq:32b | 0.5.0 | 65536 | 47168| 50009| 50009 | 1 | qwq:32b | 0.5.0 | 131072 | 72590| 79352| 79961 | 1 | qwq:32b | 0.5.7 | 2048 | 19757| 20090| 20090 | 1 | qwq:32b | 0.5.7 | 4096 | 20331| 20651| 20651 | 1 | qwq:32b | 0.5.7 | 8192 | 21683| 22019| 22019 | 1 | qwq:32b | 0.5.7 | 16384 | 24387| 24755| 24755 | 1 | qwq:32b | 0.5.7 | 65536 | 47168| 50009| 50009 | 1 | qwq:32b | 0.5.7 | 131072 | 72590| 79352| 79961 | 1 | qwq:32b | 0.5.12 | 2048 | 19757| 20090| 20090 | 1 | qwq:32b | 0.5.12 | 4096 | 20331| 20651| 20651 | 1 | qwq:32b | 0.5.12 | 8192 | 21683| 22019| 22019 | 1 | qwq:32b | 0.5.12 | 16384 | 24387| 24755| 24755 | 1 | qwq:32b | 0.5.12 | 65536 | 47168| 50009| 50009 | 1 | qwq:32b | 0.5.12 | 131072 | 72590| 79352| 79961 | 1 | qwq:32b | 0.6.0 | 2048 | 19757| 20090| 20090 | 1 | qwq:32b | 0.6.0 | 4096 | 20331| 20651| 20651 | 1 | qwq:32b | 0.6.0 | 8192 | 21683| 22019| 22019 | 1 | qwq:32b | 0.6.0 | 16384 | 24387| 24755| 24755 | 1 | qwq:32b | 0.6.0 | 65536 | 47168| 50009| 50009 | 1 | qwq:32b | 0.6.0 | 131072 | 72590| 79352| 79961 | 1 | qwq:32b | 0.6.3 | 2048 | 19757| 20090| 20090 | 1 | qwq:32b | 0.6.3 | 4096 | 20331| 20651| 20651 | 1 | qwq:32b | 0.6.3 | 8192 | 21683| 22019| 22019 | 1 | qwq:32b | 0.6.3 | 16384 | 24387| 24755| 24755 | 1 | qwq:32b | 0.6.3 | 65536 | 47168| 50009| 50009 | 1 | qwq:32b | 0.6.3 | 131072 | 72590| 79352| 79961 | 4 | qwen2.5-coder:32b | 0.5.0 | 2048 | 21683| 22019| 22019 | 4 | qwen2.5-coder:32b | 0.5.0 | 4096 | 24387| 24755| 24755 | 4 | qwen2.5-coder:32b | 0.5.0 | 8192 | 29795| 30227| 30227 | 4 | qwen2.5-coder:32b | 0.5.0 | 16384 | 47168| 50009| 50009 | 4 | qwen2.5-coder:32b | 0.5.0 | 65536 | 67688| 79359| 140768 | 4 | qwen2.5-coder:32b | 0.5.0 | 131072 | 8| 0| 150735 | 4 | qwen2.5-coder:32b | 0.5.7 | 2048 | 21683| 22019| 22019 | 4 | qwen2.5-coder:32b | 0.5.7 | 4096 | 24387| 24755| 24755 | 4 | qwen2.5-coder:32b | 0.5.7 | 8192 | 29795| 30227| 30227 | 4 | qwen2.5-coder:32b | 0.5.7 | 16384 | 47168| 50009| 50009 | 4 | qwen2.5-coder:32b | 0.5.7 | 65536 | 67504| 79359| 140768 | 4 | qwen2.5-coder:32b | 0.5.7 | 131072 | 8| 0| 150735 | 4 | qwen2.5-coder:32b | 0.5.12 | 2048 | 21683| 22019| 22019 | 4 | qwen2.5-coder:32b | 0.5.12 | 4096 | 24387| 24755| 24755 | 4 | qwen2.5-coder:32b | 0.5.12 | 8192 | 29795| 30227| 30227 | 4 | qwen2.5-coder:32b | 0.5.12 | 16384 | 47168| 50009| 50009 | 4 | qwen2.5-coder:32b | 0.5.12 | 65536 | 67504| 79359| 140768 | 4 | qwen2.5-coder:32b | 0.5.12 | 131072 | 8| 0| 150735 | 4 | qwen2.5-coder:32b | 0.6.0 | 2048 | 21683| 22019| 22019 | 4 | qwen2.5-coder:32b | 0.6.0 | 4096 | 24387| 24755| 24755 | 4 | qwen2.5-coder:32b | 0.6.0 | 8192 | 29795| 30227| 30227 | 4 | qwen2.5-coder:32b | 0.6.0 | 16384 | 47168| 50009| 50009 | 4 | qwen2.5-coder:32b | 0.6.0 | 65536 | 67504| 79359| 140768 | 4 | qwen2.5-coder:32b | 0.6.0 | 131072 | 8| 0| 150735 | 4 | qwen2.5-coder:32b | 0.6.3 | 2048 | 21683| 22019| 22019 | 4 | qwen2.5-coder:32b | 0.6.3 | 4096 | 24387| 24755| 24755 | 4 | qwen2.5-coder:32b | 0.6.3 | 8192 | 29795| 30227| 30227 | 4 | qwen2.5-coder:32b | 0.6.3 | 16384 | 47168| 50009| 50009 | 4 | qwen2.5-coder:32b | 0.6.3 | 65536 | 67504| 79359| 140768 | 4 | qwen2.5-coder:32b | 0.6.3 | 131072 | 8| 0| 150735 | 4 | qwq:32b | 0.5.0 | 2048 | 21683| 22019| 22019 | 4 | qwq:32b | 0.5.0 | 4096 | 24387| 24755| 24755 | 4 | qwq:32b | 0.5.0 | 8192 | 29795| 30227| 30227 | 4 | qwq:32b | 0.5.0 | 16384 | 47168| 50009| 50009 | 4 | qwq:32b | 0.5.0 | 65536 | 67688| 79359| 140768 | 4 | qwq:32b | 0.5.0 | 131072 | 8| 0| 150735 | 4 | qwq:32b | 0.5.7 | 2048 | 21683| 22019| 22019 | 4 | qwq:32b | 0.5.7 | 4096 | 24387| 24755| 24755 | 4 | qwq:32b | 0.5.7 | 8192 | 29795| 30227| 30227 | 4 | qwq:32b | 0.5.7 | 16384 | 47168| 50009| 50009 | 4 | qwq:32b | 0.5.7 | 65536 | 67504| 79359| 140768 | 4 | qwq:32b | 0.5.7 | 131072 | 8| 0| 150735 | 4 | qwq:32b | 0.5.12 | 2048 | 21683| 22019| 22019 | 4 | qwq:32b | 0.5.12 | 4096 | 24387| 24755| 24755 | 4 | qwq:32b | 0.5.12 | 8192 | 29795| 30227| 30227 | 4 | qwq:32b | 0.5.12 | 16384 | 47168| 50009| 50009 | 4 | qwq:32b | 0.5.12 | 65536 | 67504| 79359| 140768 | 4 | qwq:32b | 0.5.12 | 131072 | 8| 0| 150735 | 4 | qwq:32b | 0.6.0 | 2048 | 21683| 22019| 22019 | 4 | qwq:32b | 0.6.0 | 4096 | 24387| 24755| 24755 | 4 | qwq:32b | 0.6.0 | 8192 | 29795| 30227| 30227 | 4 | qwq:32b | 0.6.0 | 16384 | 47168| 50009| 50009 | 4 | qwq:32b | 0.6.0 | 65536 | 67504| 79359| 140768 | 4 | qwq:32b | 0.6.0 | 131072 | 8| 0| 150735 | 4 | qwq:32b | 0.6.3 | 2048 | 21683| 22019| 22019 | 4 | qwq:32b | 0.6.3 | 4096 | 24387| 24755| 24755 | 4 | qwq:32b | 0.6.3 | 8192 | 29795| 30227| 30227 | 4 | qwq:32b | 0.6.3 | 16384 | 47168| 50009| 50009 | 4 | qwq:32b | 0.6.3 | 65536 | 67504| 79359| 140768 | 4 | qwq:32b | 0.6.3 | 131072 | 8| 0| 150735 |
Author
Owner

@konian71 commented on GitHub (Mar 28, 2025):

@konian71 Can you give the specific versions of Ollama that have different behavior along with the logs and output of ollama ps? Also the model, context length, etc. that you used.

Okay, I had a mistake in my configuration. I forgot to disable parallel processing after a test. So my statement is definitely wrong. I just realized that when I saw @rick-github's table. Please excuse any incorrect statements.

<!-- gh-comment-id:2761305004 --> @konian71 commented on GitHub (Mar 28, 2025): > [@konian71](https://github.com/konian71) Can you give the specific versions of Ollama that have different behavior along with the logs and output of `ollama ps`? Also the model, context length, etc. that you used. Okay, I had a mistake in my configuration. I forgot to disable parallel processing after a test. So my statement is definitely wrong. I just realized that when I saw @rick-github's table. Please excuse any incorrect statements.
Author
Owner

@jessegross commented on GitHub (Mar 28, 2025):

Thanks a lot for the table @rick-github !

<!-- gh-comment-id:2762141151 --> @jessegross commented on GitHub (Mar 28, 2025): Thanks a lot for the table @rick-github !
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53056