ollama run gpt-oss:20b trigger Error: llama runner process has terminated: error:fault #7836

New Issue

GiteaMirror · 2025-11-12T14:20:50-06:00

GiteaMirror commented

2025-11-12 14:20:50 -06:00

Originally created by @priece on GitHub (Aug 8, 2025).

What is the issue?

$ollama run gpt-oss:20b
Error: llama runner process has terminated: error:fault

error log:

ollama-error.log

OS:
Linux localhost.localdomain 5.10.0-216.0.0.115.oe2203sp4.x86_64 #1 SMP Thu Jun 27 15:13:44 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi
Fri Aug 8 09:27:46 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla M60 On | 00000000:84:00.0 Off | Off |
| N/A 35C P8 16W / 150W | 3MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla M60 On | 00000000:85:00.0 Off | Off |
| N/A 33C P8 14W / 150W | 3MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla M60 On | 00000000:88:00.0 Off | 0 |
| N/A 33C P8 15W / 150W | 3MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla M60 On | 00000000:89:00.0 Off | 0 |
| N/A 38C P8 15W / 150W | 3MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Relevant log output

Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 |       45.69µs |       127.0.0.1 | HEAD     "/"
Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 |  183.855132ms |       127.0.0.1 | POST     "/api/show"
Aug 08 08:52:33 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:33.618+08:00 level=INFO source=sched.go:802 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 library=cuda parallel=1 required="23.6 GiB"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:135 msg="system memory" total="251.3 GiB" free="227.6 GiB" free_swap="16.0 GiB"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split=7,6,6,6 memory.available="[7.9 GiB 7.9 GiB 7.4 GiB 7.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.6 GiB" memory.required.partial="23.6 GiB" memory.required.kv="300.0 MiB" memory.required.allocations="[6.7 GiB 5.7 GiB 5.6 GiB 5.7 GiB]" memory.weights.total="11.7 GiB" memory.weights.repeating="10.7 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="2.0 GiB" memory.graph.partial="2.0 GiB"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.194+08:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 8192 --batch-size 512 --n-gpu-layers 25 --threads 20 --parallel 1 --tensor-split 7,6,6,6 --port 38131"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.218+08:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.228+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38131"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.315+08:00 level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.446+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices:
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 0: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 1: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 2: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:34 localhost.localdomain ollama[3690012]:   Device 3: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices:
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 0: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 1: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 2: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:   Device 3: Tesla M60, compute capability 5.2, VMM: yes
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:35.159+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 CUDA.3.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.3.USE_GRAPHS=1 CUDA.3.PEER_MAX_BATCH_SIZE=128 CUDA.4.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.4.USE_GRAPHS=1 CUDA.4.PEER_MAX_BATCH_SIZE=128 CUDA.5.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.5.USE_GRAPHS=1 CUDA.5.PEER_MAX_BATCH_SIZE=128 CUDA.6.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.6.USE_GRAPHS=1 CUDA.6.PEER_MAX_BATCH_SIZE=128 CUDA.7.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.7.USE_GRAPHS=1 CUDA.7.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: unexpected fault address 0x1f6ce0000
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: fatal error: fault
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x1f6ce0000 pc=0x55e648b26780]
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 72 gp=0xc00011f180 m=13 mp=0xc000600808 [running]:
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.throw({0x55e649a4e48b?, 0xc00011f180?})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/panic.go:1096 +0x4a fp=0xc0000490b0 sp=0xc000049080 pc=0x55e648b96c2a
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.sigpanic()
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/signal_unix.go:939 +0x26c fp=0xc000049110 sp=0xc0000490b0 pc=0x55e648b990ac
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: indexbytebody()
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/bytealg/indexbyte_amd64.s:131 +0xe0 fp=0xc000049118 sp=0xc000049110 pc=0x55e648b26780
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.findnull(0xc000049198?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/string.go:577 +0x79 fp=0xc000049170 sp=0xc000049118 pc=0x55e648b7e8b9
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gostring(0x1f6ce0000)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/string.go:363 +0x1c fp=0xc0000491a8 sp=0xc000049170 pc=0x55e648b99f1c
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_GoString(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         _cgo_gotypes.go:319
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml.New({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/ml/backend/ggml/ggml.go:158 +0x1336 fp=0xc000049c18 sp=0xc0000491a8 pc=0x55e648fd4af6
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml.NewBackend({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/ml/backend.go:209 +0xb1 fp=0xc000049c70 sp=0xc000049c18 pc=0x55e648fc5bb1
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/model.New({0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/model/model.go:102 +0x8f fp=0xc000049d68 sp=0xc000049c70 pc=0x55e648fe500f
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000286b40, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}, ...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000049dc8 sp=0xc000049d68 pc=0x55e64908bc2d
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000286b40, {0x55e649f0c790, 0xc000153540}, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, ...}, ...}, ...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000049f20 sp=0xc000049dc8 pc=0x55e64908bf98
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000049fe0 sp=0xc000049f20 pc=0x55e64908d3c7
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.goexit({})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55e648b9e481
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xa11
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 1 gp=0xc000002380 m=nil [IO wait]:
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/proc.go:435 +0xce fp=0xc0003c9650 sp=0xc0003c9630 pc=0x55e648b96d4e
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.netpollblock(0xc0003c96a0?, 0x48b2fb46?, 0xe6?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/netpoll.go:575 +0xf7 fp=0xc0003c9688 sp=0xc0003c9650 pc=0x55e648b5b837
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.runtime_pollWait(0x7fd7be2e8eb0, 0x72)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         runtime/netpoll.go:351 +0x85 fp=0xc0003c96a8 sp=0xc0003c9688 pc=0x55e648b95f65
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).wait(0xc0001b8600?, 0x900000036?, 0x0)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003c96d0 sp=0xc0003c96a8 pc=0x55e648c1d3a7
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).waitRead(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/poll/fd_poll_runtime.go:89
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*FD).Accept(0xc0001b8600)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         internal/poll/fd_unix.go:620 +0x295 fp=0xc0003c9778 sp=0xc0003c96d0 pc=0x55e648c22775
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*netFD).accept(0xc0001b8600)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/fd_unix.go:172 +0x29 fp=0xc0003c9830 sp=0xc0003c9778 pc=0x55e648c94d89
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).accept(0xc00015c000)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/tcpsock_posix.go:159 +0x1b fp=0xc0003c9880 sp=0xc0003c9830 pc=0x55e648caa73b
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).Accept(0xc00015c000)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/tcpsock.go:380 +0x30 fp=0xc0003c98b0 sp=0xc0003c9880 pc=0x55e648ca95f0
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*onceCloseListener).Accept(0xc0003103f0?)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         <autogenerated>:1 +0x24 fp=0xc0003c98c8 sp=0xc0003c98b0 pc=0x55e648ec0d44
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*Server).Serve(0xc00023c600, {0x55e649f0a2e8, 0xc00015c000})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         net/http/server.go:3424 +0x30c fp=0xc0003c99f8 sp=0xc0003c98c8 pc=0x55e648e9860c
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000342b0, 0x10, 0x11})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc0003c9d08 sp=0xc0003c99f8 pc=0x55e64908d029
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner.Execute({0xc000034290?, 0x0?, 0x0?})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0003c9d30 sp=0xc0003c9d08 pc=0x55e64908d929
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/cmd.NewCLI.func2(0xc00023d400?, {0x55e649a4d07e?, 0x4?, 0x55e649a4d082?})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc0003c9d58 sp=0xc0003c9d30 pc=0x55e6497f2685
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).execute(0xc000312f08, {0xc000286900, 0x11, 0x12})
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003c9e78 sp=0xc0003c9d58 pc=0x55e648d0e3dc
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000572908)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003c9f30 sp=0xc0003c9e78 pc=0x55e648d0ec25
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).Execute(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:992
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteContext(...)
Aug 08 08:52:35 localhost.localdomain ollama[3690012]:         github.com/spf13/cobra@v1.7.0/command.go:985
Aug 08 08:52:35 localhost.localdomain ollama[3690012]: main.main()

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.3

Originally created by @priece on GitHub (Aug 8, 2025). ### What is the issue? $ollama run gpt-oss:20b Error: llama runner process has terminated: error:fault error log: [ollama-error.log](https://github.com/user-attachments/files/21675379/ollama-error.log) OS: Linux localhost.localdomain 5.10.0-216.0.0.115.oe2203sp4.x86_64 #1 SMP Thu Jun 27 15:13:44 CST 2024 x86_64 x86_64 x86_64 GNU/Linux nvidia-smi Fri Aug 8 09:27:46 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla M60 On | 00000000:84:00.0 Off | Off | | N/A 35C P8 16W / 150W | 3MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 Tesla M60 On | 00000000:85:00.0 Off | Off | | N/A 33C P8 14W / 150W | 3MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 Tesla M60 On | 00000000:88:00.0 Off | 0 | | N/A 33C P8 15W / 150W | 3MiB / 7680MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 Tesla M60 On | 00000000:89:00.0 Off | 0 | | N/A 38C P8 15W / 150W | 3MiB / 7680MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ### Relevant log output ```shell Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 | 45.69µs | 127.0.0.1 | HEAD "/" Aug 08 08:52:32 localhost.localdomain ollama[3690012]: [GIN] 2025/08/08 - 08:52:32 | 200 | 183.855132ms | 127.0.0.1 | POST "/api/show" Aug 08 08:52:33 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:33.618+08:00 level=INFO source=sched.go:802 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 library=cuda parallel=1 required="23.6 GiB" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:135 msg="system memory" total="251.3 GiB" free="227.6 GiB" free_swap="16.0 GiB" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.103+08:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split=7,6,6,6 memory.available="[7.9 GiB 7.9 GiB 7.4 GiB 7.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.6 GiB" memory.required.partial="23.6 GiB" memory.required.kv="300.0 MiB" memory.required.allocations="[6.7 GiB 5.7 GiB 5.6 GiB 5.7 GiB]" memory.weights.total="11.7 GiB" memory.weights.repeating="10.7 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="2.0 GiB" memory.graph.partial="2.0 GiB" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.194+08:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 8192 --batch-size 512 --n-gpu-layers 25 --threads 20 --parallel 1 --tensor-split 7,6,6,6 --port 38131" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.195+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.218+08:00 level=INFO source=runner.go:925 msg="starting ollama engine" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.228+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38131" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.315+08:00 level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 Aug 08 08:52:34 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:34.446+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 08 08:52:34 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices: Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 0: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 1: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 2: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:34 localhost.localdomain ollama[3690012]: Device 3: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 08 08:52:35 localhost.localdomain ollama[3690012]: ggml_cuda_init: found 4 CUDA devices: Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 0: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 1: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 2: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: Device 3: Tesla M60, compute capability 5.2, VMM: yes Aug 08 08:52:35 localhost.localdomain ollama[3690012]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so Aug 08 08:52:35 localhost.localdomain ollama[3690012]: time=2025-08-08T08:52:35.159+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 CUDA.3.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.3.USE_GRAPHS=1 CUDA.3.PEER_MAX_BATCH_SIZE=128 CUDA.4.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.4.USE_GRAPHS=1 CUDA.4.PEER_MAX_BATCH_SIZE=128 CUDA.5.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.5.USE_GRAPHS=1 CUDA.5.PEER_MAX_BATCH_SIZE=128 CUDA.6.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.6.USE_GRAPHS=1 CUDA.6.PEER_MAX_BATCH_SIZE=128 CUDA.7.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.7.USE_GRAPHS=1 CUDA.7.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: unexpected fault address 0x1f6ce0000 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: fatal error: fault Aug 08 08:52:35 localhost.localdomain ollama[3690012]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x1f6ce0000 pc=0x55e648b26780] Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 72 gp=0xc00011f180 m=13 mp=0xc000600808 [running]: Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.throw({0x55e649a4e48b?, 0xc00011f180?}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/panic.go:1096 +0x4a fp=0xc0000490b0 sp=0xc000049080 pc=0x55e648b96c2a Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.sigpanic() Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/signal_unix.go:939 +0x26c fp=0xc000049110 sp=0xc0000490b0 pc=0x55e648b990ac Aug 08 08:52:35 localhost.localdomain ollama[3690012]: indexbytebody() Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/bytealg/indexbyte_amd64.s:131 +0xe0 fp=0xc000049118 sp=0xc000049110 pc=0x55e648b26780 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.findnull(0xc000049198?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/string.go:577 +0x79 fp=0xc000049170 sp=0xc000049118 pc=0x55e648b7e8b9 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gostring(0x1f6ce0000) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/string.go:363 +0x1c fp=0xc0000491a8 sp=0xc000049170 pc=0x55e648b99f1c Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_GoString(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: _cgo_gotypes.go:319 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml.New({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend/ggml/ggml.go:158 +0x1336 fp=0xc000049c18 sp=0xc0000491a8 pc=0x55e648fd4af6 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml.NewBackend({0x7fff48e5aca3, 0x6e}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/ml/backend.go:209 +0xb1 fp=0xc000049c70 sp=0xc000049c18 pc=0x55e648fc5bb1 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/model.New({0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/model/model.go:102 +0x8f fp=0xc000049d68 sp=0xc000049c70 pc=0x55e648fe500f Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000286b40, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, 0x4}, 0x0}, ...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000049dc8 sp=0xc000049d68 pc=0x55e64908bc2d Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000286b40, {0x55e649f0c790, 0xc000153540}, {0x7fff48e5aca3?, 0x0?}, {0x14, 0x0, 0x19, {0xc000457ba0, 0x4, ...}, ...}, ...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000049f20 sp=0xc000049dc8 pc=0x55e64908bf98 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000049fe0 sp=0xc000049f20 pc=0x55e64908d3c7 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.goexit({}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55e648b9e481 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xa11 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: goroutine 1 gp=0xc000002380 m=nil [IO wait]: Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/proc.go:435 +0xce fp=0xc0003c9650 sp=0xc0003c9630 pc=0x55e648b96d4e Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime.netpollblock(0xc0003c96a0?, 0x48b2fb46?, 0xe6?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/netpoll.go:575 +0xf7 fp=0xc0003c9688 sp=0xc0003c9650 pc=0x55e648b5b837 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.runtime_pollWait(0x7fd7be2e8eb0, 0x72) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: runtime/netpoll.go:351 +0x85 fp=0xc0003c96a8 sp=0xc0003c9688 pc=0x55e648b95f65 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).wait(0xc0001b8600?, 0x900000036?, 0x0) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003c96d0 sp=0xc0003c96a8 pc=0x55e648c1d3a7 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*pollDesc).waitRead(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll/fd_poll_runtime.go:89 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll.(*FD).Accept(0xc0001b8600) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: internal/poll/fd_unix.go:620 +0x295 fp=0xc0003c9778 sp=0xc0003c96d0 pc=0x55e648c22775 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*netFD).accept(0xc0001b8600) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/fd_unix.go:172 +0x29 fp=0xc0003c9830 sp=0xc0003c9778 pc=0x55e648c94d89 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).accept(0xc00015c000) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/tcpsock_posix.go:159 +0x1b fp=0xc0003c9880 sp=0xc0003c9830 pc=0x55e648caa73b Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net.(*TCPListener).Accept(0xc00015c000) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/tcpsock.go:380 +0x30 fp=0xc0003c98b0 sp=0xc0003c9880 pc=0x55e648ca95f0 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*onceCloseListener).Accept(0xc0003103f0?) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: <autogenerated>:1 +0x24 fp=0xc0003c98c8 sp=0xc0003c98b0 pc=0x55e648ec0d44 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http.(*Server).Serve(0xc00023c600, {0x55e649f0a2e8, 0xc00015c000}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: net/http/server.go:3424 +0x30c fp=0xc0003c99f8 sp=0xc0003c98c8 pc=0x55e648e9860c Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000342b0, 0x10, 0x11}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc0003c9d08 sp=0xc0003c99f8 pc=0x55e64908d029 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner.Execute({0xc000034290?, 0x0?, 0x0?}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0003c9d30 sp=0xc0003c9d08 pc=0x55e64908d929 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/cmd.NewCLI.func2(0xc00023d400?, {0x55e649a4d07e?, 0x4?, 0x55e649a4d082?}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc0003c9d58 sp=0xc0003c9d30 pc=0x55e6497f2685 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).execute(0xc000312f08, {0xc000286900, 0x11, 0x12}) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003c9e78 sp=0xc0003c9d58 pc=0x55e648d0e3dc Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000572908) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003c9f30 sp=0xc0003c9e78 pc=0x55e648d0ec25 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).Execute(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:992 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra.(*Command).ExecuteContext(...) Aug 08 08:52:35 localhost.localdomain ollama[3690012]: github.com/spf13/cobra@v1.7.0/command.go:985 Aug 08 08:52:35 localhost.localdomain ollama[3690012]: main.main() ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.3

GiteaMirror added the bug label 2025-11-12 14:20:50 -06:00

GiteaMirror closed this issue

2025-11-12 14:20:50 -06:00

GiteaMirror commented

2025-11-12 14:20:51 -06:00

@cp90-pixel commented on GitHub (Aug 8, 2025):

Short version. Your Ollama runner is crashing during CUDA backend init because it’s loading two different ggml CUDA libraries at once, which points to a leftover library from an earlier install. That ABI mismatch is what triggers the fatal error: fault right after ggml.New(...). You can see it in your log: it loads /usr/lib/ollama/libggml-cuda.so and then loads /usr/lib/ollama/cuda_v12/libggml-cuda.so too. Others hit the same “llama runner process has terminated: error:fault” with gpt-oss:20b on 0.11.3 and fixed it by removing the stale lib folder and reinstalling.

Your GPUs are Tesla M60s. They’re old Maxwell cards with 8 GB each. Ollama is splitting the 20B model across four of them and thinks it fits, but these older cards are touchy with CUDA Graphs. If you still see faults after cleaning the libs, disable graphs for stability. That workaround is known to fix ggml CUDA crashes.

What to do

Stop Ollama and kill any runners

sudo systemctl stop ollama || true
pkill -f "ollama runner" || true

Remove stale ggml libraries so only one CUDA backend exists

sudo rm -rf /usr/lib/ollama
sudo rm -f /usr/local/lib/libggml* /usr/lib/libggml* 2>/dev/null || true
sudo ldconfig

Reinstall Ollama 0.11.3 or newer using the CUDA build for Linux
Reinstall the official CUDA-enabled package the same way you originally installed Ollama. After reinstall, verify there is a single CUDA backend path under /usr/lib/ollama/ and that your logs no longer show two “loaded CUDA backend” lines. This exact cleanup fixed the identical 0.11.3 “error:fault” for other users.
Start Ollama and disable CUDA Graphs for the M60s

export GGML_CUDA_DISABLE_GRAPHS=1
sudo systemctl start ollama

Run again:

GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss:20b

Disabling graphs is a known fix path in ggml for stability on older architectures.

If it still crashes, make the split explicit
Create a tiny Modelfile so Ollama does not get clever with splitting.

FROM gpt-oss:20b
PARAMETER tensor_split 7,6,6,6
PARAMETER gpu_layers 25

Then:

ollama create gpt-oss-20b-m60 -f Modelfile
GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss-20b-m60

Why this is the issue

The crash occurs inside ggml.New while converting a C string to Go. That usually means a mismatch between the Go code and the loaded shared library. Your log shows two CUDA backends being loaded from different paths before the fault, which is a classic leftover-libs situation. The 0.11.3 report that deleting the previous lib folder resolves “error:fault” confirms it.
Maxwell era GPUs often behave badly with CUDA Graphs in ggml. The environment flag to disable graphs is the standard fix.

One more reality check

Even if it runs, four M60s will be slow and sometimes flaky with large models. If you want a smoother ride, consider running a smaller variant or a newer single GPU with 24 GB VRAM. But first do the cleanup and the graphs tweak so this install stops faceplanting.

@cp90-pixel commented on GitHub (Aug 8, 2025): Short version. Your Ollama runner is crashing during CUDA backend init because it’s loading two different ggml CUDA libraries at once, which points to a leftover library from an earlier install. That ABI mismatch is what triggers the `fatal error: fault` right after `ggml.New(...)`. You can see it in your log: it loads `/usr/lib/ollama/libggml-cuda.so` and then loads `/usr/lib/ollama/cuda_v12/libggml-cuda.so` too. Others hit the same “llama runner process has terminated: error\:fault” with `gpt-oss:20b` on 0.11.3 and fixed it by removing the stale lib folder and reinstalling. Your GPUs are Tesla M60s. They’re old Maxwell cards with 8 GB each. Ollama is splitting the 20B model across four of them and thinks it fits, but these older cards are touchy with CUDA Graphs. If you still see faults after cleaning the libs, disable graphs for stability. That workaround is known to fix ggml CUDA crashes. # What to do 1. Stop Ollama and kill any runners ```bash sudo systemctl stop ollama || true pkill -f "ollama runner" || true ``` 2. Remove stale ggml libraries so only one CUDA backend exists ```bash sudo rm -rf /usr/lib/ollama sudo rm -f /usr/local/lib/libggml* /usr/lib/libggml* 2>/dev/null || true sudo ldconfig ``` 3. Reinstall Ollama 0.11.3 or newer using the CUDA build for Linux Reinstall the official CUDA-enabled package the same way you originally installed Ollama. After reinstall, verify there is a single CUDA backend path under `/usr/lib/ollama/` and that your logs no longer show two “loaded CUDA backend” lines. This exact cleanup fixed the identical 0.11.3 “error\:fault” for other users. 4. Start Ollama and disable CUDA Graphs for the M60s ```bash export GGML_CUDA_DISABLE_GRAPHS=1 sudo systemctl start ollama ``` Run again: ```bash GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss:20b ``` Disabling graphs is a known fix path in ggml for stability on older architectures. 5. If it still crashes, make the split explicit Create a tiny Modelfile so Ollama does not get clever with splitting. ``` FROM gpt-oss:20b PARAMETER tensor_split 7,6,6,6 PARAMETER gpu_layers 25 ``` Then: ```bash ollama create gpt-oss-20b-m60 -f Modelfile GGML_CUDA_DISABLE_GRAPHS=1 ollama run gpt-oss-20b-m60 ``` # Why this is the issue * The crash occurs inside `ggml.New` while converting a C string to Go. That usually means a mismatch between the Go code and the loaded shared library. Your log shows two CUDA backends being loaded from different paths before the fault, which is a classic leftover-libs situation. The 0.11.3 report that deleting the previous `lib` folder resolves “error\:fault” confirms it. * Maxwell era GPUs often behave badly with CUDA Graphs in ggml. The environment flag to disable graphs is the standard fix. # One more reality check Even if it runs, four M60s will be slow and sometimes flaky with large models. If you want a smoother ride, consider running a smaller variant or a newer single GPU with 24 GB VRAM. But first do the cleanup and the graphs tweak so this install stops faceplanting.

GiteaMirror commented

2025-11-12 14:20:51 -06:00

@priece commented on GitHub (Aug 11, 2025):

By removing old ollama, and reinstall it, the problem is resolved.
Thank you for detail answer.

@priece commented on GitHub (Aug 11, 2025): By removing old ollama, and reinstall it, the problem is resolved. Thank you for detail answer.

GiteaMirror referenced this issue

2025-11-12 16:10:25 -06:00

[PR #7836] [MERGED] api: (Fix) Enable Tool streaming #11269

GiteaMirror referenced this issue

2025-11-12 16:11:13 -06:00

[PR #7960] [MERGED] Update OpenAI docs to reflect tool use functionality #11307

Sign in to join this conversation.

Branches Tags

main

parth-oai-split-thinking-content-tc

parth-refactor-launch-tui

brucemacd/signin-simplify

parth-pi-thinking

jessegross/mlx-swap

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/download-before-remove

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

drifkin/debug-request-logger

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#7836