[GH-ISSUE #5359] Both Gemma2 model fail with cudaMalloc error despite available GPU memory, while other models run successfully. #29116

Closed
opened 2026-04-22 07:46:07 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @chirag-urb on GitHub (Jun 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5359

What is the issue?

Archlinux 6.6.35-2-lts
Ollama version 0.1.47
latest ollama-cuda installed via pacman. Ollama system service is active. All other models I have work as expected. Both gemma2 9b and 27b giving me the same error. Ram is not issue. I can run mixtral8x7b.

Hardware

  • CPU: 5800HS
  • GPU: RTX 3050 mobile 4GB
  • RAM: 40GB
  • SWAP: 25GB
$ ollama run gemma2    
Error: llama runner process has terminated: signal: aborted (core dumped) error:failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5'
$ ollama run gemma2:27b 
Error: llama runner process has terminated: signal: aborted (core dumped) error:failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72'

Originally created by @chirag-urb on GitHub (Jun 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5359 ### What is the issue? Archlinux 6.6.35-2-lts Ollama version 0.1.47 latest ollama-cuda installed via pacman. Ollama system service is active. All other models I have work as expected. Both gemma2 9b and 27b giving me the same error. Ram is not issue. I can run mixtral8x7b. Hardware - CPU: 5800HS - GPU: RTX 3050 mobile 4GB - RAM: 40GB - SWAP: 25GB ``` $ ollama run gemma2 Error: llama runner process has terminated: signal: aborted (core dumped) error:failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5' $ ollama run gemma2:27b Error: llama runner process has terminated: signal: aborted (core dumped) error:failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72' ```
GiteaMirror added the bug label 2026-04-22 07:46:07 -05:00
Author
Owner

@chirag-urb commented on GitHub (Jun 28, 2024):

It seems like there's an issue with VRAM. But why can't this model be run on my system when it's possible to run Mixtral with reasonable performance? Is it possible to change some parameters to run it?

Jun 28 13:53:46 archlinux ollama[1036]: ggml_cuda_init: found 1 CUDA devices:
Jun 28 13:53:46 archlinux ollama[1036]:   Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
Jun 28 13:53:46 archlinux ollama[1036]: llm_load_tensors: ggml ctx size =    0.49 MiB
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: offloading 8 repeating layers to GPU
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: offloaded 8/47 layers to GPU
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors:        CPU buffer size = 14898.60 MiB
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors:      CUDA0 buffer size =  2430.56 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_ctx      = 2048
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_batch    = 512
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_ubatch   = 512
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: flash_attn = 0
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: freq_base  = 10000.0
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: freq_scale = 1
Jun 28 13:53:55 archlinux ollama[1036]: llama_kv_cache_init:  CUDA_Host KV buffer size =   608.00 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_kv_cache_init:      CUDA0 KV buffer size =   128.00 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: KV self size  =  736.00 MiB, K (f16):  368.00 MiB, V (f16):  368.00 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model:  CUDA_Host  output buffer size =     0.99 MiB
Jun 28 13:53:55 archlinux ollama[1036]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1431.85 MiB on device 0: cudaMalloc failed: out of memory
Jun 28 13:53:55 archlinux ollama[1036]: ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 1501405184
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: failed to allocate compute buffers
Jun 28 13:53:55 archlinux ollama[1036]: llama_init_from_gpt_params: error: failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72'
Jun 28 13:53:56 archlinux ollama[16132]: ERROR [load_model] unable to load model | model="/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72" tid="134527161749504" timestamp=1719575
636
<!-- gh-comment-id:2196769785 --> @chirag-urb commented on GitHub (Jun 28, 2024): It seems like there's an issue with VRAM. But why can't this model be run on my system when it's possible to run Mixtral with reasonable performance? Is it possible to change some parameters to run it? ``` Jun 28 13:53:46 archlinux ollama[1036]: ggml_cuda_init: found 1 CUDA devices: Jun 28 13:53:46 archlinux ollama[1036]: Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes Jun 28 13:53:46 archlinux ollama[1036]: llm_load_tensors: ggml ctx size = 0.49 MiB Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: offloading 8 repeating layers to GPU Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: offloaded 8/47 layers to GPU Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: CPU buffer size = 14898.60 MiB Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: CUDA0 buffer size = 2430.56 MiB Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_ctx = 2048 Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_batch = 512 Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_ubatch = 512 Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: flash_attn = 0 Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: freq_base = 10000.0 Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: freq_scale = 1 Jun 28 13:53:55 archlinux ollama[1036]: llama_kv_cache_init: CUDA_Host KV buffer size = 608.00 MiB Jun 28 13:53:55 archlinux ollama[1036]: llama_kv_cache_init: CUDA0 KV buffer size = 128.00 MiB Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: KV self size = 736.00 MiB, K (f16): 368.00 MiB, V (f16): 368.00 MiB Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: CUDA_Host output buffer size = 0.99 MiB Jun 28 13:53:55 archlinux ollama[1036]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1431.85 MiB on device 0: cudaMalloc failed: out of memory Jun 28 13:53:55 archlinux ollama[1036]: ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 1501405184 Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: failed to allocate compute buffers Jun 28 13:53:55 archlinux ollama[1036]: llama_init_from_gpt_params: error: failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72' Jun 28 13:53:56 archlinux ollama[16132]: ERROR [load_model] unable to load model | model="/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72" tid="134527161749504" timestamp=1719575 636 ```
Author
Owner

@chirag-urb commented on GitHub (Jun 28, 2024):

gemma2s

FROM gemma2
PARAMETER num_ctx 1024

Model does not load with smaller num_ctx 1024. Same error.
It is saying out of memory but it's not. Nearly entire 4096MB is availble (13/4096 MB).

<!-- gh-comment-id:2196881997 --> @chirag-urb commented on GitHub (Jun 28, 2024): ### gemma2s ``` FROM gemma2 PARAMETER num_ctx 1024 ``` Model does not load with smaller num_ctx 1024. Same error. It is saying out of memory but it's not. Nearly entire 4096MB is availble (13/4096 MB).
Author
Owner

@rick-github commented on GitHub (Jun 28, 2024):

1ed4f521c4 resolves (for me) the problem of OOM during model load. You can get the model to load without this patch by setting num_gpu lower (search logs for --n-gpu-layers to see what the default value is for your config).

<!-- gh-comment-id:2196919918 --> @rick-github commented on GitHub (Jun 28, 2024): https://github.com/ollama/ollama/commit/1ed4f521c403025050c509394fb4ac3ca2466865 resolves (for me) the problem of OOM during model load. You can get the model to load without this patch by setting `num_gpu` lower (search logs for `--n-gpu-layers` to see what the default value is for your config).
Author
Owner

@chirag-urb commented on GitHub (Jun 28, 2024):

Thanks. It worked.
And where can I find all parameters? Parameter you mentioned is not here https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Defaults
8 gpu layers for gemma2:27b
22 gpu layers for gemma2:9b
What worked
6 gpu layers for gemma2:27b
19 gpu layers for gemma2:9b

gemma2l

FROM gemma2:27b
PARAMETER num_gpu 6
ollama create gemma2l -f pathtofile/gemma2l

gemma2s

FROM gemma2
PARAMETER num_gpu 19
ollama create gemma2s -f pathtofile/gemma2s

Logs

Jun 28 13:40:14 archlinux ollama[211707]: time=2024-06-28T13:40:14.215+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 33021"
Jun 28 13:40:36 archlinux ollama[211707]: time=2024-06-28T13:40:36.915+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 18 --parallel 1 --port 36953"
Jun 28 13:40:51 archlinux ollama[211707]: time=2024-06-28T13:40:51.293+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 8 --parallel 1 --port 33949"
Jun 28 13:44:57 archlinux ollama[1036]: time=2024-06-28T13:44:57.033+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 41111"
Jun 28 13:45:39 archlinux ollama[1036]: time=2024-06-28T13:45:39.994+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 41445"
Jun 28 13:53:46 archlinux ollama[1036]: time=2024-06-28T13:53:46.235+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 8 --parallel 1 --port 45083"
Jun 28 14:08:13 archlinux ollama[1036]: time=2024-06-28T14:08:13.101+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b26e6713dc749dda35872713fa19a568040f475cc71cb132cff332fe7e216462 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 31 --parallel 1 --port 41101"
Jun 28 14:11:10 archlinux ollama[1036]: time=2024-06-28T14:11:10.351+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 42557"
Jun 28 14:23:09 archlinux ollama[1036]: time=2024-06-28T14:23:09.476+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 43707"
Jun 28 14:26:33 archlinux ollama[1036]: time=2024-06-28T14:26:33.030+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 1024 --batch-size 512 --embedding --log-disable --n-gpu-layers 25 --parallel 1 --port 45287"
Jun 28 14:27:35 archlinux ollama[1036]: time=2024-06-28T14:27:35.763+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 18 --parallel 1 --port 43549"
Jun 28 15:24:47 archlinux ollama[1036]: time=2024-06-28T15:24:47.997+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 35471"
<!-- gh-comment-id:2196990098 --> @chirag-urb commented on GitHub (Jun 28, 2024): Thanks. It worked. And where can I find all parameters? Parameter you mentioned is not here https://github.com/ollama/ollama/blob/main/docs/modelfile.md Defaults 8 gpu layers for gemma2:27b 22 gpu layers for gemma2:9b What worked 6 gpu layers for gemma2:27b 19 gpu layers for gemma2:9b gemma2l ``` FROM gemma2:27b PARAMETER num_gpu 6 ``` ``` ollama create gemma2l -f pathtofile/gemma2l ``` gemma2s ``` FROM gemma2 PARAMETER num_gpu 19 ``` ``` ollama create gemma2s -f pathtofile/gemma2s ``` Logs ``` Jun 28 13:40:14 archlinux ollama[211707]: time=2024-06-28T13:40:14.215+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 33021" Jun 28 13:40:36 archlinux ollama[211707]: time=2024-06-28T13:40:36.915+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 18 --parallel 1 --port 36953" Jun 28 13:40:51 archlinux ollama[211707]: time=2024-06-28T13:40:51.293+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 8 --parallel 1 --port 33949" Jun 28 13:44:57 archlinux ollama[1036]: time=2024-06-28T13:44:57.033+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 41111" Jun 28 13:45:39 archlinux ollama[1036]: time=2024-06-28T13:45:39.994+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 41445" Jun 28 13:53:46 archlinux ollama[1036]: time=2024-06-28T13:53:46.235+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 8 --parallel 1 --port 45083" Jun 28 14:08:13 archlinux ollama[1036]: time=2024-06-28T14:08:13.101+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b26e6713dc749dda35872713fa19a568040f475cc71cb132cff332fe7e216462 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 31 --parallel 1 --port 41101" Jun 28 14:11:10 archlinux ollama[1036]: time=2024-06-28T14:11:10.351+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 42557" Jun 28 14:23:09 archlinux ollama[1036]: time=2024-06-28T14:23:09.476+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 43707" Jun 28 14:26:33 archlinux ollama[1036]: time=2024-06-28T14:26:33.030+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 1024 --batch-size 512 --embedding --log-disable --n-gpu-layers 25 --parallel 1 --port 45287" Jun 28 14:27:35 archlinux ollama[1036]: time=2024-06-28T14:27:35.763+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 18 --parallel 1 --port 43549" Jun 28 15:24:47 archlinux ollama[1036]: time=2024-06-28T15:24:47.997+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 35471" ```
Author
Owner

@rick-github commented on GitHub (Jun 28, 2024):

The available options are listed in the API doc, 1ed4f521c4/docs/api.md (L288)

<!-- gh-comment-id:2197006107 --> @rick-github commented on GitHub (Jun 28, 2024): The available options are listed in the API doc, https://github.com/ollama/ollama/blob/1ed4f521c403025050c509394fb4ac3ca2466865/docs/api.md?plain=1#L288
Author
Owner

@chirag-urb commented on GitHub (Jun 28, 2024):

Thanks.

<!-- gh-comment-id:2197038090 --> @chirag-urb commented on GitHub (Jun 28, 2024): Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29116