[GH-ISSUE #6556] cuda_v12 returns poor results or crashes for Driver Version: 525.147.05 #29885

Closed
opened 2026-04-22 09:10:42 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @rick-github on GitHub (Aug 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6556

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Between 0.3.7-rc5 and 0.3.7-rc6 the default CUDA driver was switched from v11 to v12 and results from a variety of models degraded. I first noticed this with 0.3.7-rc6 but the problem also exists in -rc4 if OLLAMA_LLM_LIBRARY is set to cuda_v12. The problem persists into 0.3.8.

$ for l in cuda_v11 cuda_v12 ; do for m in hermes3:8b-llama3.1-q4_0 llama3.1 qwen2:1.5b ; do echo $l $m ; OLLAMA_LLM_LIBRARY=$l OLLAMA_DOCKER_TAG=0.3.8 docker compose up -d ollama 2>/dev/null && sleep 2 && curl -s localhost:11434/api/chat -d '{"model":"'$m'","messages":[{"role":"user","content":"say 'hello'"}],"stream":false}' | jq '{"response":"\(.message.content)","error":"\(.error)"}' ; done ; done
cuda_v11 hermes3:8b-llama3.1-q4_0
{
  "response": "\nHello! How can I assist you today?",
  "error": "null"
}
cuda_v11 llama3.1
{
  "response": "Hello! How can I assist you today?",
  "error": "null"
}
cuda_v11 qwen2:1.5b
{
  "response": "Hello! How can I help you today? Is there anything specific you'd like to talk about or learn more about? Please feel free to ask me any questions or provide more information.",
  "error": "null"
}
cuda_v12 hermes3:8b-llama3.1-q4_0
{
  "response": "null",
  "error": "an unknown error was encountered while running the model CUDA error: an illegal memory access was encountered\n  current device: 0, in function ggml_backend_cuda_synchronize at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2416\n  cudaStreamSynchronize(cuda_ctx->stream())\n/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: CUDA error"
}
cuda_v12 llama3.1
{
  "response": "Hello! How can I'm happy to help you with something?",
  "error": "null"
}
cuda_v12 qwen2:1.5b
{
  "response": "null",
  "error": "an unknown error was encountered while running the model CUDA error: an illegal memory access was encountered\n  current device: 0, in function ggml_backend_cuda_synchronize at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2416\n  cudaStreamSynchronize(cuda_ctx->stream())\n/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: CUDA error"
}
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   44C    P8     8W / 200W |   6251MiB / 12282MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     33633      C   /app/.venv/bin/python            1070MiB |
|    0   N/A  N/A   1902358      C   ...a_v12/ollama_llama_server     5178MiB |
+-----------------------------------------------------------------------------+

The problem does not occur on systems with more recent Nvidia drivers (NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2, NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4):

cuda_v11 hermes3:8b-llama3.1-q4_0
{
  "response": "\nHello! How can I assist you today?",
  "error": "null"
}
cuda_v11 llama3.1
{
  "response": "Hello! How can I assist you today?",
  "error": "null"
}
cuda_v11 qwen2:1.5b
{
  "response": "Hello! How can I assist you today?",
  "error": "null"
}
cuda_v12 hermes3:8b-llama3.1-q4_0
{
  "response": "\nHello! How can I assist you today?",
  "error": "null"
}
cuda_v12 llama3.1
{
  "response": "Hello! How can I assist you today?",
  "error": "null"
}
cuda_v12 qwen2:1.5b
{
  "response": "Hello! How can I assist you today?",
  "error": "null"
}

This is more of a FYI since it can be worked around by setting OLLAMA_LLM_LIBRARY or (hopefully, I have yet to try) upgrading the nvidia driver.

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.3.8

Originally created by @rick-github on GitHub (Aug 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6556 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Between 0.3.7-rc5 and 0.3.7-rc6 the default CUDA driver was switched from v11 to v12 and results from a variety of models degraded. I first noticed this with 0.3.7-rc6 but the problem also exists in -rc4 if OLLAMA_LLM_LIBRARY is set to cuda_v12. The problem persists into 0.3.8. ``` $ for l in cuda_v11 cuda_v12 ; do for m in hermes3:8b-llama3.1-q4_0 llama3.1 qwen2:1.5b ; do echo $l $m ; OLLAMA_LLM_LIBRARY=$l OLLAMA_DOCKER_TAG=0.3.8 docker compose up -d ollama 2>/dev/null && sleep 2 && curl -s localhost:11434/api/chat -d '{"model":"'$m'","messages":[{"role":"user","content":"say 'hello'"}],"stream":false}' | jq '{"response":"\(.message.content)","error":"\(.error)"}' ; done ; done cuda_v11 hermes3:8b-llama3.1-q4_0 { "response": "\nHello! How can I assist you today?", "error": "null" } cuda_v11 llama3.1 { "response": "Hello! How can I assist you today?", "error": "null" } cuda_v11 qwen2:1.5b { "response": "Hello! How can I help you today? Is there anything specific you'd like to talk about or learn more about? Please feel free to ask me any questions or provide more information.", "error": "null" } cuda_v12 hermes3:8b-llama3.1-q4_0 { "response": "null", "error": "an unknown error was encountered while running the model CUDA error: an illegal memory access was encountered\n current device: 0, in function ggml_backend_cuda_synchronize at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2416\n cudaStreamSynchronize(cuda_ctx->stream())\n/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: CUDA error" } cuda_v12 llama3.1 { "response": "Hello! How can I'm happy to help you with something?", "error": "null" } cuda_v12 qwen2:1.5b { "response": "null", "error": "an unknown error was encountered while running the model CUDA error: an illegal memory access was encountered\n current device: 0, in function ggml_backend_cuda_synchronize at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2416\n cudaStreamSynchronize(cuda_ctx->stream())\n/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: CUDA error" } ``` ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 30% 44C P8 8W / 200W | 6251MiB / 12282MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 33633 C /app/.venv/bin/python 1070MiB | | 0 N/A N/A 1902358 C ...a_v12/ollama_llama_server 5178MiB | +-----------------------------------------------------------------------------+ ``` The problem does not occur on systems with more recent Nvidia drivers (`NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2`, `NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4`): ``` cuda_v11 hermes3:8b-llama3.1-q4_0 { "response": "\nHello! How can I assist you today?", "error": "null" } cuda_v11 llama3.1 { "response": "Hello! How can I assist you today?", "error": "null" } cuda_v11 qwen2:1.5b { "response": "Hello! How can I assist you today?", "error": "null" } cuda_v12 hermes3:8b-llama3.1-q4_0 { "response": "\nHello! How can I assist you today?", "error": "null" } cuda_v12 llama3.1 { "response": "Hello! How can I assist you today?", "error": "null" } cuda_v12 qwen2:1.5b { "response": "Hello! How can I assist you today?", "error": "null" } ``` This is more of a FYI since it can be worked around by setting OLLAMA_LLM_LIBRARY or (hopefully, I have yet to try) upgrading the nvidia driver. ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.8
GiteaMirror added the nvidiabug labels 2026-04-22 09:10:42 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29885