ollama run gpt-oss:20b trigger Error: llama runner process has terminated: error:fault #7836

Closed
opened 2025-11-12 14:20:50 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @priece on GitHub (Aug 8, 2025).

What is the issue?

$ollama run gpt-oss:20b
Error: llama runner process has terminated: error:fault

error log:

ollama-error.log

OS:
Linux localhost.localdomain 5.10.0-216.0.0.115.oe2203sp4.x86_64 #1 SMP Thu Jun 27 15:13:44 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi
Fri Aug 8 09:27:46 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla M60 On | 00000000:84:00.0 Off | Off |
| N/A 35C P8 16W / 150W | 3MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla M60 On | 00000000:85:00.0 Off | Off |
| N/A 33C P8 14W / 150W | 3MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla M60 On | 00000000:88:00.0 Off | 0 |
| N/A 33C P8 15W / 150W | 3MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla M60 On | 00000000:89:00.0 Off | 0 |
| N/A 38C P8 15W / 150W | 3MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Relevant log output

Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 |       45.69µs |       127.0.0.1 | HEAD     "/"
Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 |  183.855132ms |       127.0.0.1 | POST     "/api/show"
Aug 08 08:52:33 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:33.618+08:00 level=INFO source=sched.go:802 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 library=cuda parallel=1 required="23.6 GiB"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:135 msg="system memory" total="251.3 GiB" free="227.6 GiB" free_swap="16.0 GiB"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split=7,6,6,6 memory.available="[7.9 GiB 7.9 GiB 7.4 GiB 7.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.6 GiB" memory.required.partial="23.6 GiB" memory.required.kv="300.0 MiB" memory.required.allocations="[6.7 GiB 5.7 GiB 5.6 GiB 5.7 GiB]" memory.weights.total="11.7 GiB" memory.weights.repeating="10.7 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="2.0 GiB" memory.graph.partial="2.0 GiB"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.194+08:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 8192 --batch-size 512 --n-gpu-layers 25 --threads 20 --parallel 1 --tensor-split 7,6,6,6 --port 38131"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.218+08:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.228+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38131"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.315+08:00 level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.446+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices:
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 0: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 1: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 2: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 3: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices:
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 0: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 1: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 2: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 3: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:35.159+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 CUDA.3.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.3.USE_GRAPHS=1 CUDA.3.PEER_MAX_BATCH_SIZE=128 CUDA.4.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.4.USE_GRAPHS=1 CUDA.4.PEER_MAX_BATCH_SIZE=128 CUDA.5.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.5.USE_GRAPHS=1 CUDA.5.PEER_MAX_BATCH_SIZE=128 CUDA.6.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.6.USE_GRAPHS=1 CUDA.6.PEER_MAX_BATCH_SIZE=128 CUDA.7.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.7.USE_GRAPHS=1 CUDA.7.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: unexpected fault address 0x1f6ce0000
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: fatal error: fault
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x1f6ce0000 pc=0x55e648b26780]
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 72 gp=0xc00011f180 m=13 mp=0xc000600808 [running]:
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.throw({0x55e649a4e48b?, 0xc00011f180?})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/panic.go:1096 +0x4a fp=0xc0000490b0 sp=0xc000049080 pc=0x55e648b96c2a
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.sigpanic()
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/signal_unix.go:939 +0x26c fp=0xc000049110 sp=0xc0000490b0 pc=0x55e648b990ac
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: indexbytebody()
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/bytealg/indexbyte_amd64.s:131 +0xe0 fp=0xc000049118 sp=0xc000049110 pc=0x55e648b26780
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.findnull(0xc000049198?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/string.go:577 +0x79 fp=0xc000049170 sp=0xc000049118 pc=0x55e648b7e8b9
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gostring(0x1f6ce0000)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/string.go:363 +0x1c fp=0xc0000491a8 sp=0xc000049170 pc=0x55e648b99f1c
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_GoString(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         _cgo_gotypes.go:319
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml.New({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/ml/backend/ggml/ggml.go:158 +0x1336 fp=0xc000049c18 sp=0xc0000491a8 pc=0x55e648fd4af6
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml.NewBackend({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/ml/backend.go:209 +0xb1 fp=0xc000049c70 sp=0xc000049c18 pc=0x55e648fc5bb1
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/model.New({0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/model/model.go:102 +0x8f fp=0xc000049d68 sp=0xc000049c70 pc=0x55e648fe500f
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000286b40, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}, ...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000049dc8 sp=0xc000049d68 pc=0x55e64908bc2d
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000286b40, {0x55e649f0c790, 0xc000153540}, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, ...}, ...}, ...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000049f20 sp=0xc000049dc8 pc=0x55e64908bf98
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000049fe0 sp=0xc000049f20 pc=0x55e64908d3c7
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.goexit({})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55e648b9e481
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xa11
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 1 gp=0xc000002380 m=nil [IO wait]:
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/proc.go:435 +0xce fp=0xc0003c9650 sp=0xc0003c9630 pc=0x55e648b96d4e
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.netpollblock(0xc0003c96a0?, 0x48b2fb46?, 0xe6?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/netpoll.go:575 +0xf7 fp=0xc0003c9688 sp=0xc0003c9650 pc=0x55e648b5b837
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.runtime_pollWait(0x7fd7be2e8eb0, 0x72)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/netpoll.go:351 +0x85 fp=0xc0003c96a8 sp=0xc0003c9688 pc=0x55e648b95f65
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).wait(0xc0001b8600?, 0x900000036?, 0x0)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003c96d0 sp=0xc0003c96a8 pc=0x55e648c1d3a7
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).waitRead(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/poll/fd_poll_runtime.go:89
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*FD).Accept(0xc0001b8600)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/poll/fd_unix.go:620 +0x295 fp=0xc0003c9778 sp=0xc0003c96d0 pc=0x55e648c22775
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*netFD).accept(0xc0001b8600)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/fd_unix.go:172 +0x29 fp=0xc0003c9830 sp=0xc0003c9778 pc=0x55e648c94d89
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).accept(0xc00015c000)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/tcpsock_posix.go:159 +0x1b fp=0xc0003c9880 sp=0xc0003c9830 pc=0x55e648caa73b
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).Accept(0xc00015c000)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/tcpsock.go:380 +0x30 fp=0xc0003c98b0 sp=0xc0003c9880 pc=0x55e648ca95f0
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*onceCloseListener).Accept(0xc0003103f0?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         <autogenerated>:1 +0x24 fp=0xc0003c98c8 sp=0xc0003c98b0 pc=0x55e648ec0d44
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*Server).Serve(0xc00023c600, {0x55e649f0a2e8, 0xc00015c000})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/http/server.go:3424 +0x30c fp=0xc0003c99f8 sp=0xc0003c98c8 pc=0x55e648e9860c
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000342b0, 0x10, 0x11})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc0003c9d08 sp=0xc0003c99f8 pc=0x55e64908d029
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner.Execute({0xc000034290?, 0x0?, 0x0?})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0003c9d30 sp=0xc0003c9d08 pc=0x55e64908d929
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/cmd.NewCLI.func2(0xc00023d400?, {0x55e649a4d07e?, 0x4?, 0x55e649a4d082?})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc0003c9d58 sp=0xc0003c9d30 pc=0x55e6497f2685
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).execute(0xc000312f08, {0xc000286900, 0x11, 0x12})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003c9e78 sp=0xc0003c9d58 pc=0x55e648d0e3dc
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000572908)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003c9f30 sp=0xc0003c9e78 pc=0x55e648d0ec25
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).Execute(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:992
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteContext(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:985
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: main.main()

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.3

Originally created by @priece on GitHub (Aug 8, 2025). ### What is the issue? $ollama run gpt-oss:20b Error: llama runner process has terminated: error:fault error log: [ollama-error.log](https://github.com/user-attachments/files/21675379/ollama-error.log) OS: Linux localhost.localdomain 5.10.0-216.0.0.115.oe2203sp4.x86_64 #1 SMP Thu Jun 27 15:13:44 CST 2024 x86_64 x86_64 x86_64 GNU/Linux nvidia-smi Fri Aug 8 09:27:46 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla M60 On | 00000000:84:00.0 Off | Off | | N/A 35C P8 16W / 150W | 3MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 Tesla M60 On | 00000000:85:00.0 Off | Off | | N/A 33C P8 14W / 150W | 3MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 Tesla M60 On | 00000000:88:00.0 Off | 0 | | N/A 33C P8 15W / 150W | 3MiB / 7680MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 Tesla M60 On | 00000000:89:00.0 Off | 0 | | N/A 38C P8 15W / 150W | 3MiB / 7680MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ### Relevant log output ```shell Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 | 45.69µs | 127.0.0.1 | HEAD "/" Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 | 183.855132ms | 127.0.0.1 | POST "/api/show" Aug 08 08:52:33 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:33.618+08:00 level=INFO source=sched.go:802 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 library=cuda parallel=1 required="23.6 GiB" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:135 msg="system memory" total="251.3 GiB" free="227.6 GiB" free_swap="16.0 GiB" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split=7,6,6,6 memory.available="[7.9 GiB 7.9 GiB 7.4 GiB 7.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.6 GiB" memory.required.partial="23.6 GiB" memory.required.kv="300.0 MiB" memory.required.allocations="[6.7 GiB 5.7 GiB 5.6 GiB 5.7 GiB]" memory.weights.total="11.7 GiB" memory.weights.repeating="10.7 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="2.0 GiB" memory.graph.partial="2.0 GiB" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.194+08:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 8192 --batch-size 512 --n-gpu-layers 25 --threads 20 --parallel 1 --tensor-split 7,6,6,6 --port 38131" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.218+08:00 level=INFO source=runner.go:925 msg="starting ollama engine" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.228+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38131" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.315+08:00 level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.446+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices: Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 0: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 1: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 2: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 3: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices: Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 0: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 1: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 2: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 3: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so Aug 08 08:52:35 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:35.159+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 CUDA.3.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.3.USE_GRAPHS=1 CUDA.3.PEER_MAX_BATCH_SIZE=128 CUDA.4.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.4.USE_GRAPHS=1 CUDA.4.PEER_MAX_BATCH_SIZE=128 CUDA.5.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.5.USE_GRAPHS=1 CUDA.5.PEER_MAX_BATCH_SIZE=128 CUDA.6.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.6.USE_GRAPHS=1 CUDA.6.PEER_MAX_BATCH_SIZE=128 CUDA.7.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.7.USE_GRAPHS=1 CUDA.7.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: unexpected fault address 0x1f6ce0000 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: fatal error: fault Aug 08 08:52:35 localhost.localdomain ollama[3690012]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x1f6ce0000 pc=0x55e648b26780] Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 72 gp=0xc00011f180 m=13 mp=0xc000600808 [running]: Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.throw({0x55e649a4e48b?, 0xc00011f180?}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/panic.go:1096 +0x4a fp=0xc0000490b0 sp=0xc000049080 pc=0x55e648b96c2a Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.sigpanic() Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/signal_unix.go:939 +0x26c fp=0xc000049110 sp=0xc0000490b0 pc=0x55e648b990ac Aug 08 08:52:35 localhost.localdomain ollama[3690012]: indexbytebody() Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/bytealg/indexbyte_amd64.s:131 +0xe0 fp=0xc000049118 sp=0xc000049110 pc=0x55e648b26780 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.findnull(0xc000049198?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/string.go:577 +0x79 fp=0xc000049170 sp=0xc000049118 pc=0x55e648b7e8b9 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gostring(0x1f6ce0000) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/string.go:363 +0x1c fp=0xc0000491a8 sp=0xc000049170 pc=0x55e648b99f1c Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_GoString(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: _cgo_gotypes.go:319 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml.New({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml/ggml.go:158 +0x1336 fp=0xc000049c18 sp=0xc0000491a8 pc=0x55e648fd4af6 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml.NewBackend({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend.go:209 +0xb1 fp=0xc000049c70 sp=0xc000049c18 pc=0x55e648fc5bb1 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/model.New({0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/model/model.go:102 +0x8f fp=0xc000049d68 sp=0xc000049c70 pc=0x55e648fe500f Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000286b40, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}, ...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000049dc8 sp=0xc000049d68 pc=0x55e64908bc2d Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000286b40, {0x55e649f0c790, 0xc000153540}, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, ...}, ...}, ...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000049f20 sp=0xc000049dc8 pc=0x55e64908bf98 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000049fe0 sp=0xc000049f20 pc=0x55e64908d3c7 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.goexit({}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55e648b9e481 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xa11 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 1 gp=0xc000002380 m=nil [IO wait]: Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/proc.go:435 +0xce fp=0xc0003c9650 sp=0xc0003c9630 pc=0x55e648b96d4e Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.netpollblock(0xc0003c96a0?, 0x48b2fb46?, 0xe6?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/netpoll.go:575 +0xf7 fp=0xc0003c9688 sp=0xc0003c9650 pc=0x55e648b5b837 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.runtime_pollWait(0x7fd7be2e8eb0, 0x72) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/netpoll.go:351 +0x85 fp=0xc0003c96a8 sp=0xc0003c9688 pc=0x55e648b95f65 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).wait(0xc0001b8600?, 0x900000036?, 0x0) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003c96d0 sp=0xc0003c96a8 pc=0x55e648c1d3a7 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).waitRead(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll/fd_poll_runtime.go:89 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*FD).Accept(0xc0001b8600) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll/fd_unix.go:620 +0x295 fp=0xc0003c9778 sp=0xc0003c96d0 pc=0x55e648c22775 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*netFD).accept(0xc0001b8600) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/fd_unix.go:172 +0x29 fp=0xc0003c9830 sp=0xc0003c9778 pc=0x55e648c94d89 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).accept(0xc00015c000) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/tcpsock_posix.go:159 +0x1b fp=0xc0003c9880 sp=0xc0003c9830 pc=0x55e648caa73b Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).Accept(0xc00015c000) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/tcpsock.go:380 +0x30 fp=0xc0003c98b0 sp=0xc0003c9880 pc=0x55e648ca95f0 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*onceCloseListener).Accept(0xc0003103f0?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: <autogenerated>:1 +0x24 fp=0xc0003c98c8 sp=0xc0003c98b0 pc=0x55e648ec0d44 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*Server).Serve(0xc00023c600, {0x55e649f0a2e8, 0xc00015c000}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http/server.go:3424 +0x30c fp=0xc0003c99f8 sp=0xc0003c98c8 pc=0x55e648e9860c Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000342b0, 0x10, 0x11}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc0003c9d08 sp=0xc0003c99f8 pc=0x55e64908d029 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner.Execute({0xc000034290?, 0x0?, 0x0?}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0003c9d30 sp=0xc0003c9d08 pc=0x55e64908d929 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/cmd.NewCLI.func2(0xc00023d400?, {0x55e649a4d07e?, 0x4?, 0x55e649a4d082?}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc0003c9d58 sp=0xc0003c9d30 pc=0x55e6497f2685 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).execute(0xc000312f08, {0xc000286900, 0x11, 0x12}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003c9e78 sp=0xc0003c9d58 pc=0x55e648d0e3dc Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000572908) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003c9f30 sp=0xc0003c9e78 pc=0x55e648d0ec25 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).Execute(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:992 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteContext(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:985 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: main.main() ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.3
GiteaMirror added the bug label 2025-11-12 14:20:50 -06:00
Author
Owner

@cp90-pixel commented on GitHub (Aug 8, 2025):

Short version. Your Ollama runner is crashing during CUDA backend init because it’s loading two different ggml CUDA libraries at once, which points to a leftover library from an earlier install. That ABI mismatch is what triggers the fatal error: fault right after ggml.New(...). You can see it in your log: it loads /usr/lib/ollama/libggml-cuda.so and then loads /usr/lib/ollama/cuda_v12/libggml-cuda.so too. Others hit the same “llama runner process has terminated: error:fault” with gpt-oss:20b on 0.11.3 and fixed it by removing the stale lib folder and reinstalling.

Your GPUs are Tesla M60s. They’re old Maxwell cards with 8 GB each. Ollama is splitting the 20B model across four of them and thinks it fits, but these older cards are touchy with CUDA Graphs. If you still see faults after cleaning the libs, disable graphs for stability. That workaround is known to fix ggml CUDA crashes.

What to do

  1. Stop Ollama and kill any runners
sudo systemctl stop ollama || true
pkill -f "ollama runner" || true
  1. Remove stale ggml libraries so only one CUDA backend exists
sudo rm -rf /usr/lib/ollama
sudo rm -f /usr/local/lib/libggml* /usr/lib/libggml* 2>/dev/null || true
sudo ldconfig
  1. Reinstall Ollama 0.11.3 or newer using the CUDA build for Linux
    Reinstall the official CUDA-enabled package the same way you originally installed Ollama. After reinstall, verify there is a single CUDA backend path under /usr/lib/ollama/ and that your logs no longer show two “loaded CUDA backend” lines. This exact cleanup fixed the identical 0.11.3 “error:fault” for other users.

  2. Start Ollama and disable CUDA Graphs for the M60s

export GGML_CUDA_DISABLE_GRAPHS=1
sudo systemctl start ollama

Run again:

GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss:20b

Disabling graphs is a known fix path in ggml for stability on older architectures.

  1. If it still crashes, make the split explicit
    Create a tiny Modelfile so Ollama does not get clever with splitting.
FROM gpt-oss:20b
PARAMETER tensor_split 7,6,6,6
PARAMETER gpu_layers 25

Then:

ollama create gpt-oss-20b-m60 -f Modelfile
GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss-20b-m60

Why this is the issue

  • The crash occurs inside ggml.New while converting a C string to Go. That usually means a mismatch between the Go code and the loaded shared library. Your log shows two CUDA backends being loaded from different paths before the fault, which is a classic leftover-libs situation. The 0.11.3 report that deleting the previous lib folder resolves “error:fault” confirms it.
  • Maxwell era GPUs often behave badly with CUDA Graphs in ggml. The environment flag to disable graphs is the standard fix.

One more reality check

Even if it runs, four M60s will be slow and sometimes flaky with large models. If you want a smoother ride, consider running a smaller variant or a newer single GPU with 24 GB VRAM. But first do the cleanup and the graphs tweak so this install stops faceplanting.

@cp90-pixel commented on GitHub (Aug 8, 2025): Short version. Your Ollama runner is crashing during CUDA backend init because it’s loading two different ggml CUDA libraries at once, which points to a leftover library from an earlier install. That ABI mismatch is what triggers the `fatal error: fault` right after `ggml.New(...)`. You can see it in your log: it loads `/usr/lib/ollama/libggml-cuda.so` and then loads `/usr/lib/ollama/cuda_v12/libggml-cuda.so` too. Others hit the same “llama runner process has terminated: error\:fault” with `gpt-oss:20b` on 0.11.3 and fixed it by removing the stale lib folder and reinstalling. Your GPUs are Tesla M60s. They’re old Maxwell cards with 8 GB each. Ollama is splitting the 20B model across four of them and thinks it fits, but these older cards are touchy with CUDA Graphs. If you still see faults after cleaning the libs, disable graphs for stability. That workaround is known to fix ggml CUDA crashes. # What to do 1. Stop Ollama and kill any runners ```bash sudo systemctl stop ollama || true pkill -f "ollama runner" || true ``` 2. Remove stale ggml libraries so only one CUDA backend exists ```bash sudo rm -rf /usr/lib/ollama sudo rm -f /usr/local/lib/libggml* /usr/lib/libggml* 2>/dev/null || true sudo ldconfig ``` 3. Reinstall Ollama 0.11.3 or newer using the CUDA build for Linux Reinstall the official CUDA-enabled package the same way you originally installed Ollama. After reinstall, verify there is a single CUDA backend path under `/usr/lib/ollama/` and that your logs no longer show two “loaded CUDA backend” lines. This exact cleanup fixed the identical 0.11.3 “error\:fault” for other users. 4. Start Ollama and disable CUDA Graphs for the M60s ```bash export GGML_CUDA_DISABLE_GRAPHS=1 sudo systemctl start ollama ``` Run again: ```bash GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss:20b ``` Disabling graphs is a known fix path in ggml for stability on older architectures. 5. If it still crashes, make the split explicit Create a tiny Modelfile so Ollama does not get clever with splitting. ``` FROM gpt-oss:20b PARAMETER tensor_split 7,6,6,6 PARAMETER gpu_layers 25 ``` Then: ```bash ollama create gpt-oss-20b-m60 -f Modelfile GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss-20b-m60 ``` # Why this is the issue * The crash occurs inside `ggml.New` while converting a C string to Go. That usually means a mismatch between the Go code and the loaded shared library. Your log shows two CUDA backends being loaded from different paths before the fault, which is a classic leftover-libs situation. The 0.11.3 report that deleting the previous `lib` folder resolves “error\:fault” confirms it. * Maxwell era GPUs often behave badly with CUDA Graphs in ggml. The environment flag to disable graphs is the standard fix. # One more reality check Even if it runs, four M60s will be slow and sometimes flaky with large models. If you want a smoother ride, consider running a smaller variant or a newer single GPU with 24 GB VRAM. But first do the cleanup and the graphs tweak so this install stops faceplanting.
Author
Owner

@priece commented on GitHub (Aug 11, 2025):

By removing old ollama, and reinstall it, the problem is resolved.
Thank you for detail answer.

@priece commented on GitHub (Aug 11, 2025): By removing old ollama, and reinstall it, the problem is resolved. Thank you for detail answer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#7836