[GH-ISSUE #13250] Cannot load model even tho I have enough vRAM #55271

New Issue

GiteaMirror · 2026-04-29T08:41:25-05:00

GiteaMirror commented

2026-04-29 08:41:25 -05:00

Originally created by @YetheSamartaka on GitHub (Nov 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13250

What is the issue?

I have two instances of Ollama in docker, one running at port 11434, other at 11439. I have 3x 4090, where 1st one has 18/24 vRAM and others have 17/24 and 17/24 vRAM. I want to load another model in the second instance which will take 5,2 GB of vRAM (Plenty enough to fit in when I have the sched spread enabled). The second instance of Ollama does not have any models loaded. For both Docker containers, I'm using these env variables:

-e OLLAMA_MAX_LOADED_MODELS=3 -e OLLAMA_KEEP_ALIVE=-1 -e OLLAMA_SCHED_SPREAD=1 -e OLLAMA_FLASH_ATTENTION=1 -e OLLAMA_KV_CACHE_TYPE=q8_0 -e OLLAMA_LOAD_TIMEOUT=60m -e OLLAMA_ORIGINS="*"

And I get following error after I try to ollama run the model qwen3-embedding-0.6b-q8_0 using command:
ollama run qwen3-embedding-0.6b-q8_0:latest "Init"

I get this error:
Error: do load request: Post "http://127.0.0.1:38057/load": EOF

But I cannot load any other models. I have to note that I already have 1 instance of this model running on the first ollama docker instance on port 11434 that is working as supposed. And if I would free up memory, I can load two of these models at both docker containers without any issues and they are working for my use case as supposed (I am then using Nomyo Router so I can serve multiple instances of the same model to do load balancing)

It seems that memory estimation is not working properly even tho the model should fit comfortably. More details in the log.

Relevant log output

time=2025-11-26T07:59:53.433Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35507"
time=2025-11-26T07:59:56.436Z level=INFO source=runner.go:449 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2025-11-26T07:59:56.437Z level=WARN source=runner.go:341 msg="unable to refresh free memory, using old values"
time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:443 msg="system memory" total="125.6 GiB" free="125.4 GiB" free_swap="32.0 GiB"
time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-46e43120-44ac-f5d2-97b9-b220f8578118 library=CUDA available="8.6 GiB" free="9.1 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-9ee3a284-3456-57ff-f595-6f2e0b1db136 library=CUDA available="8.0 GiB" free="8.4 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-0978d7da-f602-d3ba-aedb-b466066378ac library=CUDA available="8.8 GiB" free="9.2 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-26T08:00:04.012Z level=INFO source=server.go:702 msg="loading model" "model layers"=29 requested=-1
time=2025-11-26T08:00:04.036Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-26T08:00:04.039Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:34863"
time=2025-11-26T08:00:04.047Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:64 GPULayers:29[ID:GPU-9ee3a284-3456-57ff-f595-6f2e0b1db136 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-26T08:00:04.109Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q8_0 name="Qwen3 Embedding 0.6b" description="" num_tensors=310 num_key_values=37
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-46e43120-44ac-f5d2-97b9-b220f8578118
  Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-9ee3a284-3456-57ff-f595-6f2e0b1db136
  Device 2: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-0978d7da-f602-d3ba-aedb-b466066378ac
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-11-26T08:00:04.360Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
CUDA error: out of memory
current device: 1, in function ggml_cuda_set_device at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:101
  cudaSetDevice(device)
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error
/usr/lib/ollama/libggml-base.so(+0x1a858)[0x73e63c125858]
/usr/lib/ollama/libggml-base.so(ggml_print_backtrace+0x1e6)[0x73e63c125c26]
/usr/lib/ollama/libggml-base.so(ggml_abort+0x11d)[0x73e63c125dad]
/usr/lib/ollama/cuda_v12/libggml-cuda.so(+0x1223a2)[0x73e5b24a63a2]
/usr/lib/ollama/cuda_v12/libggml-cuda.so(+0x1226fe)[0x73e5b24a66fe]
/usr/lib/ollama/cuda_v12/libggml-cuda.so(+0x122fd0)[0x73e5b24a6fd0]
/usr/bin/ollama(+0x10ea0de)[0x5781890ef0de]
/usr/bin/ollama(+0x10ecdd8)[0x5781890f1dd8]
/usr/bin/ollama(+0x1081fbb)[0x578189086fbb]
/usr/bin/ollama(+0x371aa1)[0x578188376aa1]
SIGABRT: abort
PC=0x73e684907b2c m=9 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 147 gp=0xc000525880 m=9 mp=0xc000580808 [syscall]:
runtime.cgocall(0x578189086fa0, 0xc0000490f8)
        runtime/cgocall.go:167 +0x4b fp=0xc0000490d0 sp=0xc000049098 pc=0x57818836bb0b
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0x73e6187044f0, 0x73e613f00d20)
        _cgo_gotypes.go:996 +0x47 fp=0xc0000490f8 sp=0xc0000490d0 pc=0x5781887a0467
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0xc000ece3c0)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0x125 fp=0xc000049370 sp=0xc0000490f8 pc=0x5781887ae165
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00033f0e0, 0x1)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1162 +0xade fp=0xc0000496a0 sp=0xc000049370 pc=0x578188885c9e
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc00033f0e0, {0x7ffef2403d54?, 0x5781886635fa?}, {0x0, 0x40, {0xc0001b9ac0, 0x1, 0x1}, 0x0}, {0x0, ...}, ...)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1219 +0x2b1 fp=0xc000049730 sp=0xc0000496a0 pc=0x5781888863d1
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00033f0e0, {0x578189897140, 0xc0004715e0}, 0xc0005e5cc0)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1298 +0x54d fp=0xc000049ac0 sp=0xc000049730 pc=0x578188886e0d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0x578189897140?, 0xc0004715e0?}, 0xc00065bb40?)
        <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x5781888891b6
net/http.HandlerFunc.ServeHTTP(0xc0005d8780?, {0x578189897140?, 0xc0004715e0?}, 0xc00065bb60?)
        net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x57818866e2c9
net/http.(*ServeMux).ServeHTTP(0x578188313ce5?, {0x578189897140, 0xc0004715e0}, 0xc0005e5cc0)
        net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x5781886701c4
net/http.serverHandler.ServeHTTP({0x578189893730?}, {0x578189897140?, 0xc0004715e0?}, 0x1?)
        net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x57818868dc4e
net/http.(*conn).serve(0xc0005a06c0, {0x578189899548, 0xc0006131d0})
        net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x57818866c7c5
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x578188672088
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x578188376e21
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3454 +0x485

...

goroutine 146 gp=0xc0005256c0 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x60?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00019fa90 sp=0xc00019fa70 pc=0x57818836ef8e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0xc00033f198, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x229 fp=0xc00019faf8 sp=0xc00019fa90 pc=0x57818834ef09
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x25 fp=0xc00019fb30 sp=0xc00019faf8 pc=0x5781883708c5
sync.(*WaitGroup).Wait(0xc00033f190?)
        sync/waitgroup.go:118 +0x48 fp=0xc00019fb58 sp=0xc00019fb30 pc=0x578188382768
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00033f0e0, {0x578189899580, 0xc0005dac30})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:441 +0x45 fp=0xc00019ffb8 sp=0xc00019fb58 pc=0x57818887f725
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x28 fp=0xc00019ffe0 sp=0xc00019ffb8 pc=0x578188888dc8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00019ffe8 sp=0xc00019ffe0 pc=0x578188376e21
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x4c9

goroutine 149 gp=0xc000525c00 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
        runtime/proc.go:435 +0xce fp=0xc00022f5d8 sp=0xc00022f5b8 pc=0x57818836ef8e
runtime.netpollblock(0x578188392638?, 0x883086c6?, 0x81?)
        runtime/netpoll.go:575 +0xf7 fp=0xc00022f610 sp=0xc00022f5d8 pc=0x5781883342b7
internal/poll.runtime_pollWait(0x73e63dd9ed58, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc00022f630 sp=0xc00022f610 pc=0x57818836e1a5
internal/poll.(*pollDesc).wait(0xc0005dd400?, 0xc0006132d1?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00022f658 sp=0xc00022f630 pc=0x5781883f60e7
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0005dd400, {0xc0006132d1, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc00022f6f0 sp=0xc00022f658 pc=0x5781883f73da
net.(*netFD).Read(0xc0005dd400, {0xc0006132d1?, 0x0?, 0x0?})
        net/fd_posix.go:55 +0x25 fp=0xc00022f738 sp=0xc00022f6f0 pc=0x57818846c3e5
net.(*conn).Read(0xc000190ab0, {0xc0006132d1?, 0x0?, 0x0?})
        net/net.go:194 +0x45 fp=0xc00022f780 sp=0xc00022f738 pc=0x57818847a7a5
net/http.(*connReader).backgroundRead(0xc0006132c0)
        net/http/server.go:690 +0x37 fp=0xc00022f7c8 sp=0xc00022f780 pc=0x578188666697
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc00022f7e0 sp=0xc00022f7c8 pc=0x5781886665c5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00022f7e8 sp=0xc00022f7e0 pc=0x578188376e21
created by net/http.(*connReader).startBackgroundRead in goroutine 147
        net/http/server.go:686 +0xb6

rax    0x0
rbx    0xb9
rcx    0x73e684907b2c
rdx    0x6
rdi    0xb1
rsi    0xb9
rbp    0x73e635ffa310
rsp    0x73e635ffa2d0
r8     0x0
r9     0x7
r10    0x8
r11    0x246
r12    0x6
r13    0x73e5b2adaa88
r14    0x16
r15    0xc000616b20
rip    0x73e684907b2c
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
time=2025-11-26T08:00:04.512Z level=INFO source=sched.go:470 msg="Load failed" model=/root/.ollama/models/blobs/sha256-06507c7b42688469c4e7298b0a1e16deff06caf291cf0a5b278c308249c3e439 error="do load request: Post \"http://127.0.0.1:34863/load\": EOF"
time=2025-11-26T08:00:04.513Z level=ERROR source=server.go:265 msg="llama runner terminated" error="exit status 2"

OS

WSL2

GPU

Nvidia

CPU

AMD

Ollama version

0.13.0

Originally created by @YetheSamartaka on GitHub (Nov 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13250 ### What is the issue? I have two instances of Ollama in docker, one running at port 11434, other at 11439. I have 3x 4090, where 1st one has 18/24 vRAM and others have 17/24 and 17/24 vRAM. I want to load another model in the second instance which will take 5,2 GB of vRAM (Plenty enough to fit in when I have the sched spread enabled). The second instance of Ollama does not have any models loaded. For both Docker containers, I'm using these env variables: `-e OLLAMA_MAX_LOADED_MODELS=3 -e OLLAMA_KEEP_ALIVE=-1 -e OLLAMA_SCHED_SPREAD=1 -e OLLAMA_FLASH_ATTENTION=1 -e OLLAMA_KV_CACHE_TYPE=q8_0 -e OLLAMA_LOAD_TIMEOUT=60m -e OLLAMA_ORIGINS="*"` And I get following error after I try to ollama run the model qwen3-embedding-0.6b-q8_0 using command: `ollama run qwen3-embedding-0.6b-q8_0:latest "Init"` I get this error: `Error: do load request: Post "http://127.0.0.1:38057/load": EOF` But I cannot load any other models. I have to note that I already have 1 instance of this model running on the first ollama docker instance on port 11434 that is working as supposed. And if I would free up memory, I can load two of these models at both docker containers without any issues and they are working for my use case as supposed (I am then using [Nomyo Router](https://github.com/nomyo-ai/nomyo-router) so I can serve multiple instances of the same model to do load balancing) It seems that memory estimation is not working properly even tho the model should fit comfortably. More details in the log. ### Relevant log output ```shell time=2025-11-26T07:59:53.433Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35507" time=2025-11-26T07:59:56.436Z level=INFO source=runner.go:449 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout" time=2025-11-26T07:59:56.437Z level=WARN source=runner.go:341 msg="unable to refresh free memory, using old values" time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:443 msg="system memory" total="125.6 GiB" free="125.4 GiB" free_swap="32.0 GiB" time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-46e43120-44ac-f5d2-97b9-b220f8578118 library=CUDA available="8.6 GiB" free="9.1 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-9ee3a284-3456-57ff-f595-6f2e0b1db136 library=CUDA available="8.0 GiB" free="8.4 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-26T08:00:04.012Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-0978d7da-f602-d3ba-aedb-b466066378ac library=CUDA available="8.8 GiB" free="9.2 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-26T08:00:04.012Z level=INFO source=server.go:702 msg="loading model" "model layers"=29 requested=-1 time=2025-11-26T08:00:04.036Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-26T08:00:04.039Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:34863" time=2025-11-26T08:00:04.047Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:64 GPULayers:29[ID:GPU-9ee3a284-3456-57ff-f595-6f2e0b1db136 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-26T08:00:04.109Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q8_0 name="Qwen3 Embedding 0.6b" description="" num_tensors=310 num_key_values=37 load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 3 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-46e43120-44ac-f5d2-97b9-b220f8578118 Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-9ee3a284-3456-57ff-f595-6f2e0b1db136 Device 2: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-0978d7da-f602-d3ba-aedb-b466066378ac load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2025-11-26T08:00:04.360Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) CUDA error: out of memory current device: 1, in function ggml_cuda_set_device at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:101 cudaSetDevice(device) //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error /usr/lib/ollama/libggml-base.so(+0x1a858)[0x73e63c125858] /usr/lib/ollama/libggml-base.so(ggml_print_backtrace+0x1e6)[0x73e63c125c26] /usr/lib/ollama/libggml-base.so(ggml_abort+0x11d)[0x73e63c125dad] /usr/lib/ollama/cuda_v12/libggml-cuda.so(+0x1223a2)[0x73e5b24a63a2] /usr/lib/ollama/cuda_v12/libggml-cuda.so(+0x1226fe)[0x73e5b24a66fe] /usr/lib/ollama/cuda_v12/libggml-cuda.so(+0x122fd0)[0x73e5b24a6fd0] /usr/bin/ollama(+0x10ea0de)[0x5781890ef0de] /usr/bin/ollama(+0x10ecdd8)[0x5781890f1dd8] /usr/bin/ollama(+0x1081fbb)[0x578189086fbb] /usr/bin/ollama(+0x371aa1)[0x578188376aa1] SIGABRT: abort PC=0x73e684907b2c m=9 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 147 gp=0xc000525880 m=9 mp=0xc000580808 [syscall]: runtime.cgocall(0x578189086fa0, 0xc0000490f8) runtime/cgocall.go:167 +0x4b fp=0xc0000490d0 sp=0xc000049098 pc=0x57818836bb0b github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0x73e6187044f0, 0x73e613f00d20) _cgo_gotypes.go:996 +0x47 fp=0xc0000490f8 sp=0xc0000490d0 pc=0x5781887a0467 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0xc000ece3c0) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0x125 fp=0xc000049370 sp=0xc0000490f8 pc=0x5781887ae165 github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00033f0e0, 0x1) github.com/ollama/ollama/runner/ollamarunner/runner.go:1162 +0xade fp=0xc0000496a0 sp=0xc000049370 pc=0x578188885c9e github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc00033f0e0, {0x7ffef2403d54?, 0x5781886635fa?}, {0x0, 0x40, {0xc0001b9ac0, 0x1, 0x1}, 0x0}, {0x0, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1219 +0x2b1 fp=0xc000049730 sp=0xc0000496a0 pc=0x5781888863d1 github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00033f0e0, {0x578189897140, 0xc0004715e0}, 0xc0005e5cc0) github.com/ollama/ollama/runner/ollamarunner/runner.go:1298 +0x54d fp=0xc000049ac0 sp=0xc000049730 pc=0x578188886e0d github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0x578189897140?, 0xc0004715e0?}, 0xc00065bb40?) <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x5781888891b6 net/http.HandlerFunc.ServeHTTP(0xc0005d8780?, {0x578189897140?, 0xc0004715e0?}, 0xc00065bb60?) net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x57818866e2c9 net/http.(*ServeMux).ServeHTTP(0x578188313ce5?, {0x578189897140, 0xc0004715e0}, 0xc0005e5cc0) net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x5781886701c4 net/http.serverHandler.ServeHTTP({0x578189893730?}, {0x578189897140?, 0xc0004715e0?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x57818868dc4e net/http.(*conn).serve(0xc0005a06c0, {0x578189899548, 0xc0006131d0}) net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x57818866c7c5 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x578188672088 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x578188376e21 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 ... goroutine 146 gp=0xc0005256c0 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x60?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00019fa90 sp=0xc00019fa70 pc=0x57818836ef8e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc00033f198, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x229 fp=0xc00019faf8 sp=0xc00019fa90 pc=0x57818834ef09 sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x25 fp=0xc00019fb30 sp=0xc00019faf8 pc=0x5781883708c5 sync.(*WaitGroup).Wait(0xc00033f190?) sync/waitgroup.go:118 +0x48 fp=0xc00019fb58 sp=0xc00019fb30 pc=0x578188382768 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00033f0e0, {0x578189899580, 0xc0005dac30}) github.com/ollama/ollama/runner/ollamarunner/runner.go:441 +0x45 fp=0xc00019ffb8 sp=0xc00019fb58 pc=0x57818887f725 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x28 fp=0xc00019ffe0 sp=0xc00019ffb8 pc=0x578188888dc8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00019ffe8 sp=0xc00019ffe0 pc=0x578188376e21 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x4c9 goroutine 149 gp=0xc000525c00 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc00022f5d8 sp=0xc00022f5b8 pc=0x57818836ef8e runtime.netpollblock(0x578188392638?, 0x883086c6?, 0x81?) runtime/netpoll.go:575 +0xf7 fp=0xc00022f610 sp=0xc00022f5d8 pc=0x5781883342b7 internal/poll.runtime_pollWait(0x73e63dd9ed58, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00022f630 sp=0xc00022f610 pc=0x57818836e1a5 internal/poll.(*pollDesc).wait(0xc0005dd400?, 0xc0006132d1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00022f658 sp=0xc00022f630 pc=0x5781883f60e7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0005dd400, {0xc0006132d1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc00022f6f0 sp=0xc00022f658 pc=0x5781883f73da net.(*netFD).Read(0xc0005dd400, {0xc0006132d1?, 0x0?, 0x0?}) net/fd_posix.go:55 +0x25 fp=0xc00022f738 sp=0xc00022f6f0 pc=0x57818846c3e5 net.(*conn).Read(0xc000190ab0, {0xc0006132d1?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc00022f780 sp=0xc00022f738 pc=0x57818847a7a5 net/http.(*connReader).backgroundRead(0xc0006132c0) net/http/server.go:690 +0x37 fp=0xc00022f7c8 sp=0xc00022f780 pc=0x578188666697 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00022f7e0 sp=0xc00022f7c8 pc=0x5781886665c5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00022f7e8 sp=0xc00022f7e0 pc=0x578188376e21 created by net/http.(*connReader).startBackgroundRead in goroutine 147 net/http/server.go:686 +0xb6 rax 0x0 rbx 0xb9 rcx 0x73e684907b2c rdx 0x6 rdi 0xb1 rsi 0xb9 rbp 0x73e635ffa310 rsp 0x73e635ffa2d0 r8 0x0 r9 0x7 r10 0x8 r11 0x246 r12 0x6 r13 0x73e5b2adaa88 r14 0x16 r15 0xc000616b20 rip 0x73e684907b2c rflags 0x246 cs 0x33 fs 0x0 gs 0x0 time=2025-11-26T08:00:04.512Z level=INFO source=sched.go:470 msg="Load failed" model=/root/.ollama/models/blobs/sha256-06507c7b42688469c4e7298b0a1e16deff06caf291cf0a5b278c308249c3e439 error="do load request: Post \"http://127.0.0.1:34863/load\": EOF" time=2025-11-26T08:00:04.513Z level=ERROR source=server.go:265 msg="llama runner terminated" error="exit status 2" ``` ### OS WSL2 ### GPU Nvidia ### CPU AMD ### Ollama version 0.13.0

GiteaMirror added the bug needs more info labels 2026-04-29 08:41:25 -05:00

GiteaMirror closed this issue

2026-04-29 08:41:26 -05:00

GiteaMirror commented

2026-04-29 08:41:28 -05:00

@rick-github commented on GitHub (Nov 26, 2025):

$ ollama pull qwen3-embedding-0.6b-q8_0:latest
pulling manifest 
Error: pull model manifest: file does not exist

Is it the same model as qwen3-embedding:0.6b-q8_0? What changes are in the Modelfile?

@rick-github commented on GitHub (Nov 26, 2025): ```console $ ollama pull qwen3-embedding-0.6b-q8_0:latest pulling manifest Error: pull model manifest: file does not exist ``` Is it the same model as qwen3-embedding:0.6b-q8_0? What changes are in the Modelfile?

GiteaMirror commented

2026-04-29 08:41:29 -05:00

@YetheSamartaka commented on GitHub (Nov 26, 2025):

@rick-github
Yes, it is the same model - https://ollama.com/library/qwen3-embedding:0.6b-q8_0. I also have a variant that has 16K ctx with these in the modelfile:

FROM qwen3-embedding:0.6b-q8_0
PARAMETER num_ctx 16384

@YetheSamartaka commented on GitHub (Nov 26, 2025): @rick-github Yes, it is the same model - https://ollama.com/library/qwen3-embedding:0.6b-q8_0. I also have a variant that has 16K ctx with these in the modelfile: ``` FROM qwen3-embedding:0.6b-q8_0 PARAMETER num_ctx 16384 ```

GiteaMirror commented

2026-04-29 08:41:29 -05:00

@rick-github commented on GitHub (Nov 26, 2025):

What version of ollama (ollama -v)?

@rick-github commented on GitHub (Nov 26, 2025): What version of ollama (`ollama -v`)?

GiteaMirror commented

2026-04-29 08:41:30 -05:00

@YetheSamartaka commented on GitHub (Nov 26, 2025):

@rick-github
edit: I mistakenly put 1 instead of 0 at the beginning xD

0.13.0 as stated in the opening message.

@YetheSamartaka commented on GitHub (Nov 26, 2025): @rick-github edit: I mistakenly put 1 instead of 0 at the beginning xD 0.13.0 as stated in the opening message.

GiteaMirror commented

2026-04-29 08:41:31 -05:00

@adex345 commented on GitHub (Nov 27, 2025):

I have same issue but I use default port and no docker. After update I can finnaly use llama 3b with rocm 7.1 (it was using cpu before) but qwen3 30b and qwen3 coder 30b doesn't work:

lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.665+01:00 level=INFO source=sched.go:470 msg="Load failed" model=/var/lib/ollama/blobs/sha256-78b329e716e7e9775973d392cd132b1f1ff1c8287a992887caeb6fd6c56ba9cc error="do load request: Post \"http://127.0.0.1:34163/load\": EOF"
lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.665+01:00 level=DEBUG source=server.go:1755 msg="stopping llama server" pid=1144796
lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.665+01:00 level=DEBUG source=server.go:1761 msg="waiting for llama server to exit" pid=1144796
lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.666+01:00 level=ERROR source=server.go:265 msg="llama runner terminated" error="exit status 2"
lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.666+01:00 level=DEBUG source=server.go:1765 msg="llama server stopped" pid=1144796
lis 27 19:25:54 cachyos-x64 ollama[1144524]: [GIN] 2025/11/27 - 19:25:54 | 500 |  613.560082ms |       127.0.0.1 | POST     "/api/generate"

@adex345 commented on GitHub (Nov 27, 2025): I have same issue but I use default port and no docker. After update I can finnaly use llama 3b with rocm 7.1 (it was using cpu before) but qwen3 30b and qwen3 coder 30b doesn't work: ```` lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.665+01:00 level=INFO source=sched.go:470 msg="Load failed" model=/var/lib/ollama/blobs/sha256-78b329e716e7e9775973d392cd132b1f1ff1c8287a992887caeb6fd6c56ba9cc error="do load request: Post \"http://127.0.0.1:34163/load\": EOF" lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.665+01:00 level=DEBUG source=server.go:1755 msg="stopping llama server" pid=1144796 lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.665+01:00 level=DEBUG source=server.go:1761 msg="waiting for llama server to exit" pid=1144796 lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.666+01:00 level=ERROR source=server.go:265 msg="llama runner terminated" error="exit status 2" lis 27 19:25:54 cachyos-x64 ollama[1144524]: time=2025-11-27T19:25:54.666+01:00 level=DEBUG source=server.go:1765 msg="llama server stopped" pid=1144796 lis 27 19:25:54 cachyos-x64 ollama[1144524]: [GIN] 2025/11/27 - 19:25:54 | 500 | 613.560082ms | 127.0.0.1 | POST "/api/generate" ````

GiteaMirror commented

2026-04-29 08:41:33 -05:00

@YetheSamartaka commented on GitHub (Dec 8, 2025):

Issue is still present on version 0.13.1

@YetheSamartaka commented on GitHub (Dec 8, 2025): Issue is still present on version 0.13.1

GiteaMirror commented

2026-04-29 08:41:36 -05:00

@adex345 commented on GitHub (Dec 8, 2025):

Issue is still present on version 0.13.1

I can confirm. Quick workaround is to use CPU only or Vulcan but CPU only is quicker on my system. (Vulcan 9t/s, CPU only 15t/s, 50% ROCm with 50% CPU 30t/s)

@adex345 commented on GitHub (Dec 8, 2025): > Issue is still present on version 0.13.1 I can confirm. Quick workaround is to use CPU only or Vulcan but CPU only is quicker on my system. (Vulcan 9t/s, CPU only 15t/s, 50% ROCm with 50% CPU 30t/s)

GiteaMirror commented

2026-04-29 08:41:38 -05:00

@rick-github commented on GitHub (Jan 14, 2026):

Unable to repro. Set OLLAMA_DEBUG=2 in the server environment and post the full log. Note that this will include the prompt so be aware of PII.

@rick-github commented on GitHub (Jan 14, 2026): Unable to repro. Set `OLLAMA_DEBUG=2` in the server environment and post the full log. Note that this will include the prompt so be aware of PII.

GiteaMirror commented

2026-04-29 08:41:38 -05:00

@YetheSamartaka commented on GitHub (Jan 14, 2026):

@rick-github with the new 0.14.0 version of Ollama?

@YetheSamartaka commented on GitHub (Jan 14, 2026): @rick-github with the new 0.14.0 version of Ollama?

GiteaMirror commented

2026-04-29 08:41:39 -05:00

@rick-github commented on GitHub (Jan 14, 2026):

Does it fail with 0.14.0?

@rick-github commented on GitHub (Jan 14, 2026): Does it fail with 0.14.0?

GiteaMirror commented

2026-04-29 08:41:40 -05:00

@YetheSamartaka commented on GitHub (Jan 15, 2026):

@rick-github I confirm that in 0.14.1 it is working much better and I was able to load all those models into vRAM. Thank you very much for fixing it.

@YetheSamartaka commented on GitHub (Jan 15, 2026): @rick-github I confirm that in 0.14.1 it is working much better and I was able to load all those models into vRAM. Thank you very much for fixing it.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/opencode-image-modality

hoyyeva/anthropic-renderer-local-image-path

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#55271