[GH-ISSUE #11229] gemma3 do not run #7396

New Issue

GiteaMirror · 2026-04-12T19:28:58-05:00

GiteaMirror commented

2026-04-12 19:28:58 -05:00

Originally created by @zerowxt on GitHub (Jun 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11229

What is the issue?

Error Msg

I pull gemma3n:e2b and gemma3:latest, error is llama runner process has terminated: error:fault.

System Info

Windows 11 26100.4484
Ollama 0.9.3

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @zerowxt on GitHub (Jun 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11229 ### What is the issue? ## Error Msg I pull gemma3n:e2b and gemma3:latest, error is llama runner process has terminated: error:fault. ## System Info - Windows 11 26100.4484 - Ollama 0.9.3 ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_

GiteaMirror added the bug label 2026-04-12 19:28:58 -05:00

GiteaMirror closed this issue

2026-04-12 19:28:58 -05:00

GiteaMirror commented

2026-04-12 19:28:59 -05:00

@CRCODE22 commented on GitHub (Jun 28, 2025):

I also have the same error after Ollama updated after notifying me there is an update. I posted log information in here:

https://github.com/ollama/ollama/issues/11211#issuecomment-3014386830

@CRCODE22 commented on GitHub (Jun 28, 2025): I also have the same error after Ollama updated after notifying me there is an update. I posted log information in here: https://github.com/ollama/ollama/issues/11211#issuecomment-3014386830

GiteaMirror commented

2026-04-12 19:28:59 -05:00

@rick-github commented on GitHub (Jun 28, 2025):

@zerowxt Server logs will aid in debugging.

@rick-github commented on GitHub (Jun 28, 2025): @zerowxt [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.

GiteaMirror commented

2026-04-12 19:28:59 -05:00

@zerowxt commented on GitHub (Jun 28, 2025):

server.log

@zerowxt commented on GitHub (Jun 28, 2025): [server.log](https://github.com/user-attachments/files/20960440/server.log)

GiteaMirror commented

2026-04-12 19:29:00 -05:00

@vgrechin commented on GitHub (Jun 28, 2025):

I also cannot run Gemma3:12b right after latest update on Thursday on Windows
Other models are fine. Version after the update is:
ollama version is 0.9.3

@vgrechin commented on GitHub (Jun 28, 2025): I also cannot run Gemma3:12b right after latest update on Thursday on Windows Other models are fine. Version after the update is: ollama version is 0.9.3

GiteaMirror commented

2026-04-12 19:29:00 -05:00

@Amigos-Flipado commented on GitHub (Jun 28, 2025):

Same here,

@Amigos-Flipado commented on GitHub (Jun 28, 2025): Same here,

GiteaMirror commented

2026-04-12 19:29:00 -05:00

@rick-github commented on GitHub (Jun 28, 2025):

@zerowxt

load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\ggml-cuda.dll
load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\cuda_v12\ggml-cuda.dll

Your runner is loading the CUDA backend twice from different locations, so this is probably #11211. See here for details.

@rick-github commented on GitHub (Jun 28, 2025): @zerowxt ``` load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\ggml-cuda.dll load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\cuda_v12\ggml-cuda.dll ``` Your runner is loading the CUDA backend twice from different locations, so this is probably #11211. See [here](https://github.com/ollama/ollama/issues/11211#issuecomment-3014572069) for details.

GiteaMirror commented

2026-04-12 19:29:01 -05:00

@sol8712 commented on GitHub (Jun 29, 2025):

Here are the logs from my Nvidia machine getting the error with the Gemma models. On my Raspberry pi's cpu i can run both models with no problem.

Ignore the weird paths to ollama, manual rootless install.

time=2025-06-29T02:00:15.548-07:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.5 GiB" free_swap="0 B"
time=2025-06-29T02:00:15.549-07:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=48 layers.split="" memory.available="[10.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="8.6 GiB" memory.required.kv="736.0 MiB" memory.required.allocations="[8.6 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-06-29T02:00:15.591-07:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/home/ollama/bin/ollama runner --ollama-engine --model /mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 4096 --batch-size 512 --n-gpu-layers 48 --threads 6 --parallel 1 --port 38619"
time=2025-06-29T02:00:15.591-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-06-29T02:00:15.591-07:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
time=2025-06-29T02:00:15.591-07:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-06-29T02:00:15.597-07:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-06-29T02:00:15.598-07:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38619"
time=2025-06-29T02:00:15.638-07:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /home/ollama/lib/ollama/libggml-cuda.so
load_backend: loaded CPU backend from /home/ollama/lib/ollama/libggml-cpu-alderlake.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /home/ollama/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-06-29T02:00:15.679-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
unexpected fault address 0x2a9a50000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x2a9a50000 pc=0x561d4224b780]

goroutine 6 gp=0xc000187c00 m=10 mp=0xc0003ac008 [running]:
runtime.throw({0x561d43164479?, 0xc000187c00?})
        runtime/panic.go:1096 +0x4a fp=0xc000055110 sp=0xc0000550e0 pc=0x561d422bbc2a
runtime.sigpanic()
        runtime/signal_unix.go:939 +0x26c fp=0xc000055170 sp=0xc000055110 pc=0x561d422be0ac
indexbytebody()
        internal/bytealg/indexbyte_amd64.s:131 +0xe0 fp=0xc000055178 sp=0xc000055170 pc=0x561d4224b780
runtime.findnull(0xc0000551f8?)
        runtime/string.go:577 +0x79 fp=0xc0000551d0 sp=0xc000055178 pc=0x561d422a38b9
runtime.gostring(0x2a9a50000)
        runtime/string.go:363 +0x1c fp=0xc000055208 sp=0xc0000551d0 pc=0x561d422bef1c
github.com/ollama/ollama/ml/backend/ggml._Cfunc_GoString(...)
        _cgo_gotypes.go:311
github.com/ollama/ollama/ml/backend/ggml.New({0x7ffe1dde6db2, 0x60}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0})
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:158 +0x1336 fp=0xc000055c18 sp=0xc000055208 pc=0x561d426f8376
github.com/ollama/ollama/ml.NewBackend({0x7ffe1dde6db2, 0x60}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0})
        github.com/ollama/ollama/ml/backend.go:209 +0xb1 fp=0xc000055c70 sp=0xc000055c18 pc=0x561d426e9e11
github.com/ollama/ollama/model.New({0x7ffe1dde6db2?, 0x0?}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0})
        github.com/ollama/ollama/model/model.go:102 +0x8f fp=0xc000055d68 sp=0xc000055c70 pc=0x561d42707a2f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000034900, {0x7ffe1dde6db2?, 0x0?}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0}, ...)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000055dc8 sp=0xc000055d68 pc=0x561d427a9b6d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000034900, {0x561d4361c880, 0xc0000f4460}, {0x7ffe1dde6db2?, 0x0?}, {0x6, 0x0, 0x30, {0x0, 0x0, ...}, ...}, ...)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000055f20 sp=0xc000055dc8 pc=0x561d427a9ed8
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000055fe0 sp=0xc000055f20 pc=0x561d427ab307
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000055fe8 sp=0xc000055fe0 pc=0x561d422c3481
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xa11

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000605650 sp=0xc000605630 pc=0x561d422bbd4e
runtime.netpollblock(0x561d422b9ad3?, 0x42254b46?, 0x1d?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000605688 sp=0xc000605650 pc=0x561d42280837
internal/poll.runtime_pollWait(0x7fdf70020de0, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc0006056a8 sp=0xc000605688 pc=0x561d422baf65
internal/poll.(*pollDesc).wait(0xc00001a500?, 0x380016?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0006056d0 sp=0xc0006056a8 pc=0x561d423423a7
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00001a500)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc000605778 sp=0xc0006056d0 pc=0x561d42347775
net.(*netFD).accept(0xc00001a500)
        net/fd_unix.go:172 +0x29 fp=0xc000605830 sp=0xc000605778 pc=0x561d423b9c89
net.(*TCPListener).accept(0xc000379240)
        net/tcpsock_posix.go:159 +0x1b fp=0xc000605880 sp=0xc000605830 pc=0x561d423cf63b
net.(*TCPListener).Accept(0xc000379240)
        net/tcpsock.go:380 +0x30 fp=0xc0006058b0 sp=0xc000605880 pc=0x561d423ce4f0
net/http.(*onceCloseListener).Accept(0x561d4361c810?)
        <autogenerated>:1 +0x24 fp=0xc0006058c8 sp=0xc0006058b0 pc=0x561d425e5c44
net/http.(*Server).Serve(0xc00026f600, {0x561d4361a408, 0xc000379240})
        net/http/server.go:3424 +0x30c fp=0xc0006059f8 sp=0xc0006058c8 pc=0x561d425bd50c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0001ac030, 0xe, 0xf})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000605d08 sp=0xc0006059f8 pc=0x561d427aaf69
github.com/ollama/ollama/runner.Execute({0xc0001ac010?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000605d30 sp=0xc000605d08 pc=0x561d427ab869
github.com/ollama/ollama/cmd.NewCLI.func2(0xc00026f400?, {0x561d43163075?, 0x4?, 0x561d43163079?})
        github.com/ollama/ollama/cmd/cmd.go:1529 +0x45 fp=0xc000605d58 sp=0xc000605d30 pc=0x561d42f08645
github.com/spf13/cobra.(*Command).execute(0xc000020f08, {0xc0000b8ff0, 0xf, 0xf})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000605e78 sp=0xc000605d58 pc=0x561d424332dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000e0908)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000605f30 sp=0xc000605e78 pc=0x561d42433b25
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000605f50 sp=0xc000605f30 pc=0x561d42f090cd
runtime.main()
        runtime/proc.go:283 +0x29d fp=0xc000605fe0 sp=0xc000605f50 pc=0x561d42287ebd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000605fe8 sp=0xc000605fe0 pc=0x561d422c3481

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x561d422bbd4e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x561d422881f8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x561d422c3481
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x561d422bbd4e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0xc000042080)
        runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x561d4227299f
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x561d42266d85
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x561d422c3481
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x561d43322c38?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x561d422bbd4e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0x561d43eaa8e0)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x561d422703e9
runtime.bgscavenge(0xc000042080)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x561d42270979
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x561d42266d25
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x561d422c3481
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 18 gp=0xc000186380 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
        runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x561d422bbd4e
runtime.runfinq()
        runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x561d42265d47
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x561d422c3481
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x3d

goroutine 19 gp=0xc000186e00 m=nil [chan receive]:
runtime.gopark(0xc0002a17c0?, 0xc000598018?, 0x60?, 0x7?, 0x561d423a09c8?)
        runtime/proc.go:435 +0xce fp=0xc000080718 sp=0xc0000806f8 pc=0x561d422bbd4e
runtime.chanrecv(0xc000182310, 0x0, 0x1)
        runtime/chan.go:664 +0x445 fp=0xc000080790 sp=0xc000080718 pc=0x561d42257725
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:506 +0x12 fp=0xc0000807b8 sp=0xc000080790 pc=0x561d422572b2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x2f fp=0xc0000807e0 sp=0xc0000807b8 pc=0x561d42269f2f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x561d422c3481
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x85

goroutine 20 gp=0xc000187180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000118f38 sp=0xc000118f18 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc000118fc8 sp=0xc000118f38 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000118fe0 sp=0xc000118fc8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000118fe8 sp=0xc000118fe0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000119738 sp=0xc000119718 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc0001197c8 sp=0xc000119738 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc0001197e0 sp=0xc0001197c8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0001197e8 sp=0xc0001197e0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0xbc121deb01?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0xbc1247c9c3?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0xbc121dec6a?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 40 gp=0xc000102e00 m=nil [GC worker (idle)]:
runtime.gopark(0x561d43f59120?, 0x1?, 0x9e?, 0x4c?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 41 gp=0xc000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0xbc121e079f?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 42 gp=0xc000103180 m=nil [GC worker (idle)]:
runtime.gopark(0xbc121e39b5?, 0x1?, 0x7c?, 0x49?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000114738 sp=0xc000114718 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc0001147c8 sp=0xc000114738 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc0001147e0 sp=0xc0001147c8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0001147e8 sp=0xc0001147e0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 43 gp=0xc000103340 m=nil [GC worker (idle)]:
runtime.gopark(0x561d43f59120?, 0x1?, 0x5a?, 0x33?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000114f38 sp=0xc000114f18 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc000114fc8 sp=0xc000114f38 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000114fe0 sp=0xc000114fc8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000114fe8 sp=0xc000114fe0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 5 gp=0xc000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0xbc121e05f4?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000086738 sp=0xc000086718 pc=0x561d422bbd4e
runtime.gcBgMarkWorker(0xc000183730)
        runtime/mgc.go:1423 +0xe9 fp=0xc0000867c8 sp=0xc000086738 pc=0x561d42269249
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc0000867e0 sp=0xc0000867c8 pc=0x561d42269125
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x561d422c3481
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 7 gp=0xc000187dc0 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x80?, 0x1?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000087ed0 sp=0xc000087eb0 pc=0x561d422bbd4e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0xc000034908, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x229 fp=0xc000087f38 sp=0xc000087ed0 pc=0x561d4229b489
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x25 fp=0xc000087f70 sp=0xc000087f38 pc=0x561d422bd765
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc000087f98 sp=0xc000087f70 pc=0x561d422cedc8
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000034900, {0x561d4361c880, 0xc0000f4460})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:355 +0x25 fp=0xc000087fb8 sp=0xc000087f98 pc=0x561d427a5a85
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0x28 fp=0xc000087fe0 sp=0xc000087fb8 pc=0x561d427ab208
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x561d422c3481
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
time=2025-06-29T02:00:15.819-07:00 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"
time=2025-06-29T02:00:15.842-07:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:fault"
[GIN] 2025/06/29 - 02:00:15 | 500 |  702.439409ms |       127.0.0.1 | POST     "/api/chat"
time=2025-06-29T02:00:20.982-07:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.140254431 runner.size="11.2 GiB" runner.vram="8.6 GiB" runner.parallel=1 runner.pid=2786 runner.model=/mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de
time=2025-06-29T02:00:21.233-07:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.390768099 runner.size="11.2 GiB" runner.vram="8.6 GiB" runner.parallel=1 runner.pid=2786 runner.model=/mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de
time=2025-06-29T02:00:21.482-07:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.640005646 runner.size="11.2 GiB" runner.vram="8.6 GiB" runner.parallel=1 runner.pid=2786 runner.model=/mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de

@sol8712 commented on GitHub (Jun 29, 2025): Here are the logs from my Nvidia machine getting the error with the Gemma models. On my Raspberry pi's cpu i can run both models with no problem. Ignore the weird paths to ollama, manual rootless install. ``` time=2025-06-29T02:00:15.548-07:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.5 GiB" free_swap="0 B" time=2025-06-29T02:00:15.549-07:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=48 layers.split="" memory.available="[10.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="8.6 GiB" memory.required.kv="736.0 MiB" memory.required.allocations="[8.6 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-06-29T02:00:15.591-07:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/home/ollama/bin/ollama runner --ollama-engine --model /mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 4096 --batch-size 512 --n-gpu-layers 48 --threads 6 --parallel 1 --port 38619" time=2025-06-29T02:00:15.591-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 time=2025-06-29T02:00:15.591-07:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" time=2025-06-29T02:00:15.591-07:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" time=2025-06-29T02:00:15.597-07:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-06-29T02:00:15.598-07:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38619" time=2025-06-29T02:00:15.638-07:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /home/ollama/lib/ollama/libggml-cuda.so load_backend: loaded CPU backend from /home/ollama/lib/ollama/libggml-cpu-alderlake.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /home/ollama/lib/ollama/cuda_v12/libggml-cuda.so time=2025-06-29T02:00:15.679-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) unexpected fault address 0x2a9a50000 fatal error: fault [signal SIGSEGV: segmentation violation code=0x2 addr=0x2a9a50000 pc=0x561d4224b780] goroutine 6 gp=0xc000187c00 m=10 mp=0xc0003ac008 [running]: runtime.throw({0x561d43164479?, 0xc000187c00?}) runtime/panic.go:1096 +0x4a fp=0xc000055110 sp=0xc0000550e0 pc=0x561d422bbc2a runtime.sigpanic() runtime/signal_unix.go:939 +0x26c fp=0xc000055170 sp=0xc000055110 pc=0x561d422be0ac indexbytebody() internal/bytealg/indexbyte_amd64.s:131 +0xe0 fp=0xc000055178 sp=0xc000055170 pc=0x561d4224b780 runtime.findnull(0xc0000551f8?) runtime/string.go:577 +0x79 fp=0xc0000551d0 sp=0xc000055178 pc=0x561d422a38b9 runtime.gostring(0x2a9a50000) runtime/string.go:363 +0x1c fp=0xc000055208 sp=0xc0000551d0 pc=0x561d422bef1c github.com/ollama/ollama/ml/backend/ggml._Cfunc_GoString(...) _cgo_gotypes.go:311 github.com/ollama/ollama/ml/backend/ggml.New({0x7ffe1dde6db2, 0x60}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:158 +0x1336 fp=0xc000055c18 sp=0xc000055208 pc=0x561d426f8376 github.com/ollama/ollama/ml.NewBackend({0x7ffe1dde6db2, 0x60}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0}) github.com/ollama/ollama/ml/backend.go:209 +0xb1 fp=0xc000055c70 sp=0xc000055c18 pc=0x561d426e9e11 github.com/ollama/ollama/model.New({0x7ffe1dde6db2?, 0x0?}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0}) github.com/ollama/ollama/model/model.go:102 +0x8f fp=0xc000055d68 sp=0xc000055c70 pc=0x561d42707a2f github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000034900, {0x7ffe1dde6db2?, 0x0?}, {0x6, 0x0, 0x30, {0x0, 0x0, 0x0}, 0x0}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000055dc8 sp=0xc000055d68 pc=0x561d427a9b6d github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000034900, {0x561d4361c880, 0xc0000f4460}, {0x7ffe1dde6db2?, 0x0?}, {0x6, 0x0, 0x30, {0x0, 0x0, ...}, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000055f20 sp=0xc000055dc8 pc=0x561d427a9ed8 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000055fe0 sp=0xc000055f20 pc=0x561d427ab307 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000055fe8 sp=0xc000055fe0 pc=0x561d422c3481 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:959 +0xa11 goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000605650 sp=0xc000605630 pc=0x561d422bbd4e runtime.netpollblock(0x561d422b9ad3?, 0x42254b46?, 0x1d?) runtime/netpoll.go:575 +0xf7 fp=0xc000605688 sp=0xc000605650 pc=0x561d42280837 internal/poll.runtime_pollWait(0x7fdf70020de0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0006056a8 sp=0xc000605688 pc=0x561d422baf65 internal/poll.(*pollDesc).wait(0xc00001a500?, 0x380016?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0006056d0 sp=0xc0006056a8 pc=0x561d423423a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc00001a500) internal/poll/fd_unix.go:620 +0x295 fp=0xc000605778 sp=0xc0006056d0 pc=0x561d42347775 net.(*netFD).accept(0xc00001a500) net/fd_unix.go:172 +0x29 fp=0xc000605830 sp=0xc000605778 pc=0x561d423b9c89 net.(*TCPListener).accept(0xc000379240) net/tcpsock_posix.go:159 +0x1b fp=0xc000605880 sp=0xc000605830 pc=0x561d423cf63b net.(*TCPListener).Accept(0xc000379240) net/tcpsock.go:380 +0x30 fp=0xc0006058b0 sp=0xc000605880 pc=0x561d423ce4f0 net/http.(*onceCloseListener).Accept(0x561d4361c810?) <autogenerated>:1 +0x24 fp=0xc0006058c8 sp=0xc0006058b0 pc=0x561d425e5c44 net/http.(*Server).Serve(0xc00026f600, {0x561d4361a408, 0xc000379240}) net/http/server.go:3424 +0x30c fp=0xc0006059f8 sp=0xc0006058c8 pc=0x561d425bd50c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0001ac030, 0xe, 0xf}) github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000605d08 sp=0xc0006059f8 pc=0x561d427aaf69 github.com/ollama/ollama/runner.Execute({0xc0001ac010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000605d30 sp=0xc000605d08 pc=0x561d427ab869 github.com/ollama/ollama/cmd.NewCLI.func2(0xc00026f400?, {0x561d43163075?, 0x4?, 0x561d43163079?}) github.com/ollama/ollama/cmd/cmd.go:1529 +0x45 fp=0xc000605d58 sp=0xc000605d30 pc=0x561d42f08645 github.com/spf13/cobra.(*Command).execute(0xc000020f08, {0xc0000b8ff0, 0xf, 0xf}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000605e78 sp=0xc000605d58 pc=0x561d424332dc github.com/spf13/cobra.(*Command).ExecuteC(0xc0000e0908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000605f30 sp=0xc000605e78 pc=0x561d42433b25 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000605f50 sp=0xc000605f30 pc=0x561d42f090cd runtime.main() runtime/proc.go:283 +0x29d fp=0xc000605fe0 sp=0xc000605f50 pc=0x561d42287ebd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000605fe8 sp=0xc000605fe0 pc=0x561d422c3481 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x561d422bbd4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x561d422881f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x561d422c3481 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x561d422bbd4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc000042080) runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x561d4227299f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x561d42266d85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x561d422c3481 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x561d43322c38?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x561d422bbd4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x561d43eaa8e0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x561d422703e9 runtime.bgscavenge(0xc000042080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x561d42270979 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x561d42266d25 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x561d422c3481 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 18 gp=0xc000186380 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x561d422bbd4e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x561d42265d47 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x561d422c3481 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 19 gp=0xc000186e00 m=nil [chan receive]: runtime.gopark(0xc0002a17c0?, 0xc000598018?, 0x60?, 0x7?, 0x561d423a09c8?) runtime/proc.go:435 +0xce fp=0xc000080718 sp=0xc0000806f8 pc=0x561d422bbd4e runtime.chanrecv(0xc000182310, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000080790 sp=0xc000080718 pc=0x561d42257725 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000807b8 sp=0xc000080790 pc=0x561d422572b2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000807e0 sp=0xc0000807b8 pc=0x561d42269f2f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x561d422c3481 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 20 gp=0xc000187180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000118f38 sp=0xc000118f18 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc000118fc8 sp=0xc000118f38 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000118fe0 sp=0xc000118fc8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000118fe8 sp=0xc000118fe0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000102700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000119738 sp=0xc000119718 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc0001197c8 sp=0xc000119738 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001197e0 sp=0xc0001197c8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001197e8 sp=0xc0001197e0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc0001028c0 m=nil [GC worker (idle)]: runtime.gopark(0xbc121deb01?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 38 gp=0xc000102a80 m=nil [GC worker (idle)]: runtime.gopark(0xbc1247c9c3?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 39 gp=0xc000102c40 m=nil [GC worker (idle)]: runtime.gopark(0xbc121dec6a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 40 gp=0xc000102e00 m=nil [GC worker (idle)]: runtime.gopark(0x561d43f59120?, 0x1?, 0x9e?, 0x4c?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 41 gp=0xc000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0xbc121e079f?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 42 gp=0xc000103180 m=nil [GC worker (idle)]: runtime.gopark(0xbc121e39b5?, 0x1?, 0x7c?, 0x49?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000114738 sp=0xc000114718 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc0001147c8 sp=0xc000114738 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001147e0 sp=0xc0001147c8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001147e8 sp=0xc0001147e0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 43 gp=0xc000103340 m=nil [GC worker (idle)]: runtime.gopark(0x561d43f59120?, 0x1?, 0x5a?, 0x33?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000114f38 sp=0xc000114f18 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc000114fc8 sp=0xc000114f38 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000114fe0 sp=0xc000114fc8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000114fe8 sp=0xc000114fe0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 5 gp=0xc000003a40 m=nil [GC worker (idle)]: runtime.gopark(0xbc121e05f4?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000086738 sp=0xc000086718 pc=0x561d422bbd4e runtime.gcBgMarkWorker(0xc000183730) runtime/mgc.go:1423 +0xe9 fp=0xc0000867c8 sp=0xc000086738 pc=0x561d42269249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000867e0 sp=0xc0000867c8 pc=0x561d42269125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x561d422c3481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 7 gp=0xc000187dc0 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x0?, 0x0?, 0x80?, 0x1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087ed0 sp=0xc000087eb0 pc=0x561d422bbd4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc000034908, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x229 fp=0xc000087f38 sp=0xc000087ed0 pc=0x561d4229b489 sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x25 fp=0xc000087f70 sp=0xc000087f38 pc=0x561d422bd765 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc000087f98 sp=0xc000087f70 pc=0x561d422cedc8 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000034900, {0x561d4361c880, 0xc0000f4460}) github.com/ollama/ollama/runner/ollamarunner/runner.go:355 +0x25 fp=0xc000087fb8 sp=0xc000087f98 pc=0x561d427a5a85 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0x28 fp=0xc000087fe0 sp=0xc000087fb8 pc=0x561d427ab208 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x561d422c3481 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 time=2025-06-29T02:00:15.819-07:00 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2" time=2025-06-29T02:00:15.842-07:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:fault" [GIN] 2025/06/29 - 02:00:15 | 500 | 702.439409ms | 127.0.0.1 | POST "/api/chat" time=2025-06-29T02:00:20.982-07:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.140254431 runner.size="11.2 GiB" runner.vram="8.6 GiB" runner.parallel=1 runner.pid=2786 runner.model=/mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de time=2025-06-29T02:00:21.233-07:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.390768099 runner.size="11.2 GiB" runner.vram="8.6 GiB" runner.parallel=1 runner.pid=2786 runner.model=/mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de time=2025-06-29T02:00:21.482-07:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.640005646 runner.size="11.2 GiB" runner.vram="8.6 GiB" runner.parallel=1 runner.pid=2786 runner.model=/mnt/HDD/AI/ollama/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de ````

GiteaMirror commented

2026-04-12 19:29:01 -05:00

@ThomasRocha82 commented on GitHub (Jun 29, 2025):

I'm getting this:

gpu VRAM usage didn't recover within timeout. It's only Qwen 3 - as well as some of the other errors people are reporting.

I guess it's broken for that LLM. Other models work fine.

4080 Nvidia GPU, 13th Gen i9 and 64GB RAM.

Reinstall didn't work for me. Downgrading to the previous version (before this last update) did though.

@ThomasRocha82 commented on GitHub (Jun 29, 2025): I'm getting this: gpu VRAM usage didn't recover within timeout. It's only Qwen 3 - as well as some of the other errors people are reporting. I guess it's broken for that LLM. Other models work fine. 4080 Nvidia GPU, 13th Gen i9 and 64GB RAM. Reinstall didn't work for me. Downgrading to the previous version (before this last update) did though.

GiteaMirror commented

2026-04-12 19:29:01 -05:00

@rick-github commented on GitHub (Jun 29, 2025):

gpu VRAM usage didn't recover within timeout.

This is not an error, it's a status message saying that ollama is waiting for the GPU driver to release VRAM.

It would be easier to debug whatever issue you are having if you post a full log.

@rick-github commented on GitHub (Jun 29, 2025): > gpu VRAM usage didn't recover within timeout. This is not an error, it's a status message saying that ollama is waiting for the GPU driver to release VRAM. It would be easier to debug whatever issue you are having if you post a full log.

GiteaMirror commented

2026-04-12 19:29:02 -05:00

@ThomasRocha82 commented on GitHub (Jun 29, 2025):

It was a similar error as to what @zerowxt posted in their error log. Same runner error code and all.

I uninstalled it via add/remove programs (0.9.3) and re-installed the previous version via the installer (0.9.2). Prior to that, I had uninstalled it in the same way (0.9.3) and re-installed the current version (0.9.3) - no dice.

I saw https://github.com/ollama/ollama/issues/11211 but I don't do manual installs. Upgrades are through the tray icon and installs via installer and uninstalls through the uninstaller. Windows 11 Pro.

I'll get a log once I have some down time as I need the AI working on some stuff, but I have a feeling it's likely the same scenario (Qwen3, similar log entries, only happened after update -- lama runner terminated" error="exit status 2", "error loading llama server" error="llama runner process has terminated: error:fault" etc. etc.).

But for now, I'll stick with what works (0.9.2) so I can continue my work.

@ThomasRocha82 commented on GitHub (Jun 29, 2025): It was a similar error as to what @zerowxt posted in their error log. Same runner error code and all. I uninstalled it via add/remove programs (0.9.3) and re-installed the previous version via the installer (0.9.2). Prior to that, I had uninstalled it in the same way (0.9.3) and re-installed the current version (0.9.3) - no dice. I saw https://github.com/ollama/ollama/issues/11211 but I don't do manual installs. Upgrades are through the tray icon and installs via installer and uninstalls through the uninstaller. Windows 11 Pro. I'll get a log once I have some down time as I need the AI working on some stuff, but I have a feeling it's likely the same scenario (Qwen3, similar log entries, only happened after update -- lama runner terminated" error="exit status 2", "error loading llama server" error="llama runner process has terminated: error:fault" etc. etc.). But for now, I'll stick with what works (0.9.2) so I can continue my work.

GiteaMirror commented

2026-04-12 19:29:02 -05:00

@sol8712 commented on GitHub (Jun 29, 2025):

load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\ggml-cuda.dll
load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\cuda_v12\ggml-cuda.dll
Your runner is loading the CUDA backend twice from different locations, so this is probably #11211. See here for details.

Initially this seemed to fix the issue, i know it was working, however ive started getting new errors running gemma3 after applying the solution (deleted files reinstalled, confirmed only 1 cuda loaded not 2)

400: "gemma3:12b" does not support chat

@sol8712 commented on GitHub (Jun 29, 2025): > ``` > load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\ggml-cuda.dll > load_backend: loaded CUDA backend from D:\programmer\ai\ollama\lib\ollama\cuda_v12\ggml-cuda.dll > ``` > > Your runner is loading the CUDA backend twice from different locations, so this is probably [#11211](https://github.com/ollama/ollama/issues/11211). See [here](https://github.com/ollama/ollama/issues/11211#issuecomment-3014572069) for details. Initially this seemed to fix the issue, i know it was working, however ive started getting new errors running gemma3 after applying the solution (deleted files reinstalled, confirmed only 1 cuda loaded not 2) `400: "gemma3:12b" does not support chat`

GiteaMirror commented

2026-04-12 19:29:03 -05:00

@rick-github commented on GitHub (Jun 29, 2025):

400: "gemma3:12b" does not support chat

Server logs will aid in debugging.

@rick-github commented on GitHub (Jun 29, 2025): > 400: "gemma3:12b" does not support chat [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.

GiteaMirror commented

2026-04-12 19:29:04 -05:00

@sol8712 commented on GitHub (Jun 29, 2025):

400: "gemma3:12b" does not support chat

Server logs will aid in debugging.

I got to the logs and...my bad this was user error. The original multiple cuda back-end manual install issue is fixed for me, hopefully this also fixes it for @zerowxt as well and the topic can be solved with https://github.com/ollama/ollama/issues/11229#issuecomment-3015242276

@sol8712 commented on GitHub (Jun 29, 2025): > > 400: "gemma3:12b" does not support chat > > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging. I got to the logs and...my bad this was user error. The original multiple cuda back-end manual install issue is fixed for me, hopefully this also fixes it for @zerowxt as well and the topic can be solved with https://github.com/ollama/ollama/issues/11229#issuecomment-3015242276

GiteaMirror commented

2026-04-12 19:29:04 -05:00

@CrazyBunQnQ commented on GitHub (Jun 30, 2025):

I also cannot run Gemma3:12b right after latest update on Thursday on Windows Other models are fine. Version after the update is: ollama version is 0.9.3

i have same error

@CrazyBunQnQ commented on GitHub (Jun 30, 2025): > I also cannot run Gemma3:12b right after latest update on Thursday on Windows Other models are fine. Version after the update is: ollama version is 0.9.3 i have same error

GiteaMirror commented

2026-04-12 19:29:05 -05:00

@hkbb2014 commented on GitHub (Jun 30, 2025):

me too.

ollama run gemma3n:e4b
⠇ Error: llama runner process has terminated: error:fault

My server.log

@hkbb2014 commented on GitHub (Jun 30, 2025): me too. > ollama run gemma3n:e4b **⠇ Error: llama runner process has terminated: error:fault** My [server.log](https://github.com/user-attachments/files/20977518/server.log)

GiteaMirror commented

2026-04-12 19:29:05 -05:00

@Amigos-Flipado commented on GitHub (Jun 30, 2025):

I have completely uninstalled Ollama (except for the models, of course) and when I reinstalled clean everything has returned to normal and all the models now work.

@Amigos-Flipado commented on GitHub (Jun 30, 2025): I have completely uninstalled Ollama (except for the models, of course) and when I reinstalled clean everything has returned to normal and all the models now work.

GiteaMirror commented

2026-04-12 19:29:06 -05:00

@rick-github commented on GitHub (Jun 30, 2025):

@hkbb2014

load_backend: loaded CUDA backend from Y:\ollama\lib\ollama\ggml-cuda.dll
load_backend: loaded CUDA backend from Y:\ollama\lib\ollama\cuda_v12\ggml-cuda.dll

https://github.com/ollama/ollama/issues/11229#issuecomment-3015242276

@rick-github commented on GitHub (Jun 30, 2025): @hkbb2014 ``` load_backend: loaded CUDA backend from Y:\ollama\lib\ollama\ggml-cuda.dll load_backend: loaded CUDA backend from Y:\ollama\lib\ollama\cuda_v12\ggml-cuda.dll ``` https://github.com/ollama/ollama/issues/11229#issuecomment-3015242276

GiteaMirror commented

2026-04-12 19:29:06 -05:00

@vgrechin commented on GitHub (Jun 30, 2025):

I removed Ollama via Control Panel and reinstalled from the latest package of Ollama v0.9.3. It basically solved the issue, now I can run gemma3:12b again, Thanks!

@vgrechin commented on GitHub (Jun 30, 2025): I removed Ollama via Control Panel and reinstalled from the latest package of Ollama v0.9.3. It basically solved the issue, now I can run gemma3:12b again, Thanks!

GiteaMirror commented

2026-04-12 19:29:07 -05:00

@jeepshop commented on GitHub (Jun 30, 2025):

I'm curious as well, I'm running 0.9.3 and while I can run gemma3, it's extremely CPU intensive so not practical.

I've tried both of the following with similar results on my V100. 60% VRAM usage, 50% GPU, 1400%CPU (I assume that means 14 of my 32 cores?) yeilding11 tok/s.

hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL
gemma3:27b-it-q4_K_M

Also having similar performance issues with

mistral-small3.2:24b-instruct-2506-q4_K_M

@jeepshop commented on GitHub (Jun 30, 2025): I'm curious as well, I'm running 0.9.3 and while I can run gemma3, it's extremely CPU intensive so not practical. I've tried both of the following with similar results on my V100. 60% VRAM usage, 50% GPU, 1400%CPU (I assume that means 14 of my 32 cores?) yeilding11 tok/s. - hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL - gemma3:27b-it-q4_K_M Also having similar performance issues with - mistral-small3.2:24b-instruct-2506-q4_K_M

GiteaMirror commented

2026-04-12 19:29:07 -05:00

@rick-github commented on GitHub (Jul 1, 2025):

High CPU usage would imply that some of the layers are being offloaded to system RAM where the slower CPU does inference. Server logs will show layer assignment.

model	tps all layers in GPU	tps 50% layers in GPU
hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL	40.61	9.72
gemma3:27b-it-q4_K_M	41.70	9.90
mistral-small3.2:24b-instruct-2506-q4_K_M	51.02	12.79
deepseek-r1:32b	39.31	8.87

Note that vision model quants from HF usually don't have the format that ollama expects, so hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL will not be able to process images.

@rick-github commented on GitHub (Jul 1, 2025): High CPU usage would imply that some of the layers are being offloaded to system RAM where the slower CPU does inference. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will show layer assignment. | model | tps all layers in GPU | tps 50% layers in GPU | | -- | -- | -- | | hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL | 40.61 | 9.72 | | gemma3:27b-it-q4_K_M | 41.70 | 9.90 | | mistral-small3.2:24b-instruct-2506-q4_K_M | 51.02 | 12.79 | | deepseek-r1:32b | 39.31 | 8.87 | Note that vision model quants from HF usually don't have the format that ollama expects, so hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL will not be able to process images.

GiteaMirror commented

2026-04-12 19:29:07 -05:00

@jeepshop commented on GitHub (Jul 1, 2025):

I should of mentioned earlier, I have no performance issues running other similar sized and larger models. Only having this issue with gemma3 and the mistral3 models. Deepseek, Devstral, Phi4 all run at expected generation speeds.

Here is the script I'm running (thanks to you actually :))

curl -s http://localhost:11434/api/generate -d "{
  \"model\": \"gemma3:27b-it-q4_K_M\",
  \"prompt\": \"Tell me a detailed story about the rise and fall of an ancient civilization, including cultural, technological, and political aspects.\",
  \"options\": {\"num_ctx\": 16384, \"keep_alive\": \"1m\"},
  \"stream\": false
}" | jq '{
  model: "'$MODEL'",
  context_size: '$CONTEXT',
  tokens: .eval_count,
  duration_seconds: (.eval_duration / 1e9),
  tokens_per_second: (.eval_count / (.eval_duration / 1e9))
}'

I see these in the log, which seems to imply that everything is in VRAM and not any in system RAM.

Jul 01 08:42:34 Darkstar ollama[1086]: time=2025-07-01T08:42:34.396-04:00 level=INFO source=sched.go:788 msg="new model **will fit in available VRAM** in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 gpu=GPU-2229ba74-dbe4-5c32-e086-3ca1f0ab65b4 parallel=1 available=33754841088 required="19.9 GiB"

Jul 01 08:42:34 Darkstar ollama[1086]: time=2025-07-01T08:42:34.823-04:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=63 **layers.offload=63** layers.split="" **memory.available="[31.4 GiB]"** memory.gpu_overhead="0 B" **memory.required.full="19.9 GiB"** memory.required.partial="19.9 GiB" memory.required.kv="952.0 MiB" memory.required.allocations="[19.9 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB

Here is the log during that run of gemma3.
Gemma3.log

And here is a grab of nvtop during the run - showing 1400% CPU.

@jeepshop commented on GitHub (Jul 1, 2025): I should of mentioned earlier, I have no performance issues running other similar sized and larger models. Only having this issue with gemma3 and the mistral3 models. Deepseek, Devstral, Phi4 all run at expected generation speeds. Here is the script I'm running (thanks to you actually :)) ```bash curl -s http://localhost:11434/api/generate -d "{ \"model\": \"gemma3:27b-it-q4_K_M\", \"prompt\": \"Tell me a detailed story about the rise and fall of an ancient civilization, including cultural, technological, and political aspects.\", \"options\": {\"num_ctx\": 16384, \"keep_alive\": \"1m\"}, \"stream\": false }" | jq '{ model: "'$MODEL'", context_size: '$CONTEXT', tokens: .eval_count, duration_seconds: (.eval_duration / 1e9), tokens_per_second: (.eval_count / (.eval_duration / 1e9)) }' ``` I see these in the log, which seems to imply that everything is in VRAM and not any in system RAM. `Jul 01 08:42:34 Darkstar ollama[1086]: time=2025-07-01T08:42:34.396-04:00 level=INFO source=sched.go:788 msg="new model **will fit in available VRAM** in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 gpu=GPU-2229ba74-dbe4-5c32-e086-3ca1f0ab65b4 parallel=1 available=33754841088 required="19.9 GiB"` `Jul 01 08:42:34 Darkstar ollama[1086]: time=2025-07-01T08:42:34.823-04:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=63 **layers.offload=63** layers.split="" **memory.available="[31.4 GiB]"** memory.gpu_overhead="0 B" **memory.required.full="19.9 GiB"** memory.required.partial="19.9 GiB" memory.required.kv="952.0 MiB" memory.required.allocations="[19.9 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.6 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB` Here is the log during that run of gemma3. [Gemma3.log](https://github.com/user-attachments/files/20998781/Gemma3.log) And here is a grab of nvtop during the run - showing 1400% CPU. ![Image](https://github.com/user-attachments/assets/27bd3b19-c34d-438d-8089-f8be87f0e4e9)

GiteaMirror commented

2026-04-12 19:29:08 -05:00

@rick-github commented on GitHub (Jul 1, 2025):

What are the expected and actual generation speeds? I ran your script on an RTX6000 with 48G and got these results, with CPU usage ~100%:

model	tps
hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL	38.02
gemma3:27b-it-q4_K_M	38.27
mistral-small3.2:24b-instruct-2506-q4_K_M	47.09
deepseek-r1:32b-qwen-distill-q4_K_M	33.78
devstral:24b-small-2505-q4_K_M	46.37
phi4:14b-q4_K_M	72.26

I see these in the log, which seems to imply that everything is in VRAM and not any in system RAM.

These are aspirational. The ollama server has calculated that everything can fit in VRAM, but it's up to the runner to do the actual layer loading. Some layers will not be loaded into VRAM because the shape of a tensor won't fit or a necessary operation is not supported by the hardware. The log snippet shows:

Jul 01 08:42:35 Darkstar ollama[1086]: time=2025-07-01T08:42:35.205-04:00 level=INFO source=ggml.go:359 msg="model weights" buffer=CUDA0 size="16.2 GiB"
Jul 01 08:42:35 Darkstar ollama[1086]: time=2025-07-01T08:42:35.205-04:00 level=INFO source=ggml.go:359 msg="model weights" buffer=CPU size="1.1 GiB"

so some of the model weights are ending up in system RAM. This is usually the output layer, and its impact on token generation should be minimal, so might not be the cause of the problem you are seeing. Logging with OLLAMA_DEBUG=2 will show the actual layer assignment.

@rick-github commented on GitHub (Jul 1, 2025): What are the expected and actual generation speeds? I ran your script on an RTX6000 with 48G and got these results, with CPU usage ~100%: | model | tps | | -- | -- | | hf.co/unsloth/gemma-3-27b-it-qat-GGUF:Q4_K_XL | 38.02 | | gemma3:27b-it-q4_K_M | 38.27 | | mistral-small3.2:24b-instruct-2506-q4_K_M | 47.09 | | deepseek-r1:32b-qwen-distill-q4_K_M | 33.78 | | devstral:24b-small-2505-q4_K_M | 46.37 | | phi4:14b-q4_K_M | 72.26 |  > I see these in the log, which seems to imply that everything is in VRAM and not any in system RAM. These are aspirational. The ollama server has calculated that everything can fit in VRAM, but it's up to the runner to do the actual layer loading. Some layers will not be loaded into VRAM because the shape of a tensor won't fit or a necessary operation is not supported by the hardware. The log snippet shows: ``` Jul 01 08:42:35 Darkstar ollama[1086]: time=2025-07-01T08:42:35.205-04:00 level=INFO source=ggml.go:359 msg="model weights" buffer=CUDA0 size="16.2 GiB" Jul 01 08:42:35 Darkstar ollama[1086]: time=2025-07-01T08:42:35.205-04:00 level=INFO source=ggml.go:359 msg="model weights" buffer=CPU size="1.1 GiB" ``` so some of the model weights are ending up in system RAM. This is usually the output layer, and its impact on token generation should be minimal, so might not be the cause of the problem you are seeing. Logging with `OLLAMA_DEBUG=2` will show the actual layer assignment.

GiteaMirror commented

2026-04-12 19:29:09 -05:00

@jeepshop commented on GitHub (Jul 1, 2025):

Hard to know what I expect, was thinking it would be in the same ballparks as [devstral:24b-small-2505-q4_K_M, deepseek-r1:32b-q4_K_M] which I get [39, 26] tok/s and very little CPU use.

What I get with [gemma3:27b-it-q4_K_M] is 11-14 tok/s.

Attached is a fresh run at OLLAMA_DEBUG=2

Gemma3.log

@jeepshop commented on GitHub (Jul 1, 2025): Hard to know what I **expect**, was thinking it would be in the same ballparks as [devstral:24b-small-2505-q4_K_M, deepseek-r1:32b-q4_K_M] which I get [39, 26] tok/s and very little CPU use. What I get with [gemma3:27b-it-q4_K_M] is 11-14 tok/s. Attached is a fresh run at OLLAMA_DEBUG=2 [Gemma3.log](https://github.com/user-attachments/files/21003342/Gemma3.log)

GiteaMirror commented

2026-04-12 19:29:09 -05:00

@rick-github commented on GitHub (Jul 1, 2025):

Try changing the flash attention settings. https://github.com/ollama/ollama/issues/9683

@rick-github commented on GitHub (Jul 1, 2025): Try changing the flash attention settings. https://github.com/ollama/ollama/issues/9683

GiteaMirror commented

2026-04-12 19:29:10 -05:00

@jeepshop commented on GitHub (Jul 2, 2025):

I have verified that disabling flash attention does in fact solve the problem for me. Now I'm getting ~27 tok/s with almost 0 CPU load. So #9683 was definitely my issue.

@jeepshop commented on GitHub (Jul 2, 2025): I have verified that disabling flash attention does in fact solve the problem for me. Now I'm getting ~27 tok/s with almost 0 CPU load. So #9683 was definitely my issue.

GiteaMirror referenced this issue

2026-04-22 10:05:52 -05:00

[GH-ISSUE #7396] llava response inconcistency #30463

GiteaMirror referenced this issue

2026-04-28 18:56:03 -05:00

[GH-ISSUE #7396] llava response inconcistency #51214

GiteaMirror referenced this issue

2026-05-04 08:06:30 -05:00

[GH-ISSUE #7396] llava response inconcistency #66759

GiteaMirror referenced this issue

2026-05-09 14:00:03 -05:00

[GH-ISSUE #7396] llava response inconcistency #82383

Sign in to join this conversation.

Branches Tags

main

mxyng/docs-cloud

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#7396