[GH-ISSUE #3940] GPU offloading with little CPU RAM #64481

Closed
opened 2026-05-03 17:48:55 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @dcfidalgo on GitHub (Apr 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3940

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Thanks for this amazing project, I really enjoy the simple, concise and easy-to-start interface! Keep up the fantastic work!

I have the following issue: I have a compute instance in the cloud with one NVIDIA A100 80GB and 16GB of CPU memory running Ubuntu.

When I try to run the llama3:70b model, it takes the ollama server a long time to load the model to the GPU, and as a result, I get an "Error: timed out waiting for llama runner to start" on the ollama run llama3:70b command after 10min (i could not figure out how to increase this timeout).

I noticed that ollama first tries to load the whole model into the page cache, however, in my case, it does not fit entirely. Only after the entire model is read once, offloading to the GPU will occur. My guess is that, since the initial pages got overwritten, it has to read the entire model again from the disk.

I was wondering if there is a way to start the offloading right from the beginning. Not sure if this is even possible, but I think in my case it would help.

This is the log of the server:

...
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: ssm_d_state      = 0
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: ssm_dt_rank      = 0
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model type       = 70B
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model ftype      = Q4_0
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model params     = 70.55 B
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model size       = 37.22 GiB (4.53 BPW)
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: general.name     = Meta-Llama-3-70B-Instruct
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: LF token         = 128 'Ä'
Apr 26 10:29:40 qa-mpcdf ollama[7668]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
Apr 26 10:29:40 qa-mpcdf ollama[7668]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
Apr 26 10:29:40 qa-mpcdf ollama[7668]: ggml_cuda_init: found 1 CUDA devices:
Apr 26 10:29:40 qa-mpcdf ollama[7668]:   Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_tensors: ggml ctx size =    0.55 MiB
Apr 26 10:32:26 qa-mpcdf ollama[7668]: time=2024-04-26T10:32:26.839Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/health\>
Apr 26 10:32:27 qa-mpcdf ollama[7668]: time=2024-04-26T10:32:27.049Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:35:11 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:11.913Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/health\>
Apr 26 10:35:12 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:12.122Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:35:52 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:52.419Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/health\>
Apr 26 10:35:52 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:52.620Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: offloading 80 repeating layers to GPU
Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: offloading non-repeating layers to GPU
Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: offloaded 81/81 layers to GPU
Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors:        CPU buffer size =   563.62 MiB
Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors:      CUDA0 buffer size = 37546.98 MiB
Apr 26 10:36:18 qa-mpcdf ollama[7668]: .....time=2024-04-26T10:36:18.482Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/he>
Apr 26 10:36:18 qa-mpcdf ollama[7668]: time=2024-04-26T10:36:18.683Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:36:51 qa-mpcdf ollama[7668]: .........time=2024-04-26T10:36:51.360Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:3565>
Apr 26 10:36:51 qa-mpcdf ollama[7668]: time=2024-04-26T10:36:51.561Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:38:43 qa-mpcdf ollama[7668]: ............................time=2024-04-26T10:38:43.051Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"ht>
Apr 26 10:38:43 qa-mpcdf ollama[7668]: time=2024-04-26T10:38:43.251Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:39:07 qa-mpcdf ollama[7668]: .......time=2024-04-26T10:39:07.311Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/>
Apr 26 10:39:07 qa-mpcdf ollama[7668]: time=2024-04-26T10:39:07.513Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:39:24 qa-mpcdf ollama[7668]: ....time=2024-04-26T10:39:24.763Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/hea>
Apr 26 10:39:24 qa-mpcdf ollama[7668]: time=2024-04-26T10:39:24.964Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
Apr 26 10:39:39 qa-mpcdf ollama[7668]: ....time=2024-04-26T10:39:39.396Z level=ERROR source=routes.go:120 msg="error loading llama server" error="timed out waiting for llama runner to start>
Apr 26 10:39:39 qa-mpcdf ollama[7668]: time=2024-04-26T10:39:39.396Z level=DEBUG source=server.go:832 msg="stopping llama server"
Apr 26 10:39:39 qa-mpcdf ollama[7668]: [GIN] 2024/04/26 - 10:39:39 | 500 |         10m1s |       127.0.0.1 | POST     "/api/chat"

Thanks again and have a great day!

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.32

Originally created by @dcfidalgo on GitHub (Apr 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3940 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Thanks for this amazing project, I really enjoy the simple, concise and easy-to-start interface! Keep up the fantastic work! I have the following issue: I have a compute instance in the cloud with one NVIDIA A100 80GB and 16GB of CPU memory running Ubuntu. When I try to run the llama3:70b model, it takes the ollama server a long time to load the model to the GPU, and as a result, I get an "Error: timed out waiting for llama runner to start" on the `ollama run llama3:70b` command after 10min (i could not figure out how to increase this timeout). I noticed that ollama first tries to load the whole model into the page cache, however, in my case, it does not fit entirely. Only after the entire model is read once, offloading to the GPU will occur. My guess is that, since the initial pages got overwritten, it has to read the entire model again from the disk. I was wondering if there is a way to start the offloading right from the beginning. Not sure if this is even possible, but I think in my case it would help. This is the log of the server: ```shell ... Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: ssm_d_state = 0 Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: ssm_dt_rank = 0 Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model type = 70B Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model ftype = Q4_0 Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model params = 70.55 B Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: general.name = Meta-Llama-3-70B-Instruct Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_print_meta: LF token = 128 'Ä' Apr 26 10:29:40 qa-mpcdf ollama[7668]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes Apr 26 10:29:40 qa-mpcdf ollama[7668]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no Apr 26 10:29:40 qa-mpcdf ollama[7668]: ggml_cuda_init: found 1 CUDA devices: Apr 26 10:29:40 qa-mpcdf ollama[7668]: Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes Apr 26 10:29:40 qa-mpcdf ollama[7668]: llm_load_tensors: ggml ctx size = 0.55 MiB Apr 26 10:32:26 qa-mpcdf ollama[7668]: time=2024-04-26T10:32:26.839Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/health\> Apr 26 10:32:27 qa-mpcdf ollama[7668]: time=2024-04-26T10:32:27.049Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:35:11 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:11.913Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/health\> Apr 26 10:35:12 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:12.122Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:35:52 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:52.419Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/health\> Apr 26 10:35:52 qa-mpcdf ollama[7668]: time=2024-04-26T10:35:52.620Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: offloading 80 repeating layers to GPU Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: offloading non-repeating layers to GPU Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: offloaded 81/81 layers to GPU Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: CPU buffer size = 563.62 MiB Apr 26 10:36:07 qa-mpcdf ollama[7668]: llm_load_tensors: CUDA0 buffer size = 37546.98 MiB Apr 26 10:36:18 qa-mpcdf ollama[7668]: .....time=2024-04-26T10:36:18.482Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/he> Apr 26 10:36:18 qa-mpcdf ollama[7668]: time=2024-04-26T10:36:18.683Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:36:51 qa-mpcdf ollama[7668]: .........time=2024-04-26T10:36:51.360Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:3565> Apr 26 10:36:51 qa-mpcdf ollama[7668]: time=2024-04-26T10:36:51.561Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:38:43 qa-mpcdf ollama[7668]: ............................time=2024-04-26T10:38:43.051Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"ht> Apr 26 10:38:43 qa-mpcdf ollama[7668]: time=2024-04-26T10:38:43.251Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:39:07 qa-mpcdf ollama[7668]: .......time=2024-04-26T10:39:07.311Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/> Apr 26 10:39:07 qa-mpcdf ollama[7668]: time=2024-04-26T10:39:07.513Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:39:24 qa-mpcdf ollama[7668]: ....time=2024-04-26T10:39:24.763Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:35651/hea> Apr 26 10:39:24 qa-mpcdf ollama[7668]: time=2024-04-26T10:39:24.964Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Apr 26 10:39:39 qa-mpcdf ollama[7668]: ....time=2024-04-26T10:39:39.396Z level=ERROR source=routes.go:120 msg="error loading llama server" error="timed out waiting for llama runner to start> Apr 26 10:39:39 qa-mpcdf ollama[7668]: time=2024-04-26T10:39:39.396Z level=DEBUG source=server.go:832 msg="stopping llama server" Apr 26 10:39:39 qa-mpcdf ollama[7668]: [GIN] 2024/04/26 - 10:39:39 | 500 | 10m1s | 127.0.0.1 | POST "/api/chat" ``` Thanks again and have a great day! ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.32
GiteaMirror added the feature request label 2026-05-03 17:48:56 -05:00
Author
Owner

@dcfidalgo commented on GitHub (Apr 26, 2024):

On Discord lewismac pointed out that one could modify the expiresAt variable.

Look like expiresAt may be too short in your case, in ther server.go function for WaitUntilRunning

func (s *LlamaServer) WaitUntilRunning() error {
start := time.Now()
// TODO we need to wire up a better way to detect hangs during model load and startup of the server
expiresAt := time.Now().Add(10 * time.Minute) // be generous with timeout, large models can take a while to load
ticker := time.NewTicker(50 * time.Millisecond)
defer ticker.Stop()

I guess you could try build your own with a larger time out?

Would you be interested in a PR request making the expiresAt variable configurable via an environment variable, like debug for example? I would be more than happy to provide one.

<!-- gh-comment-id:2080016242 --> @dcfidalgo commented on GitHub (Apr 26, 2024): On Discord lewismac pointed out that one could modify the `expiresAt` variable. > Look like expiresAt may be too short in your case, in ther server.go function for WaitUntilRunning > > func (s *LlamaServer) WaitUntilRunning() error { > start := time.Now() > // TODO we need to wire up a better way to detect hangs during model load and startup of the server > expiresAt := time.Now().Add(10 * time.Minute) // be generous with timeout, large models can take a while to load > ticker := time.NewTicker(50 * time.Millisecond) > defer ticker.Stop() > > > I guess you could try build your own with a larger time out? Would you be interested in a PR request making the `expiresAt` variable configurable via an environment variable, like [`debug`](https://github.com/ollama/ollama/blob/36666c214270b7acf8d696a5c92f2fe33cfa14b8/llm/server.go#L149) for example? I would be more than happy to provide one.
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

I think we'd be open to allowing users to override this via env var. Something like OLLAMA_LOAD_TIMEOUT perhaps? Go for it!

<!-- gh-comment-id:2089149312 --> @dhiltgen commented on GitHub (May 1, 2024): I think we'd be open to allowing users to override this via env var. Something like `OLLAMA_LOAD_TIMEOUT` perhaps? Go for it!
Author
Owner

@dhiltgen commented on GitHub (May 31, 2024):

@dcfidalgo can you try llama3:70b on your setup with 0.1.39 and see how it behaves? If it still has difficulty, please run the server with OLLAMA_DEBUG=1 and share the logs so I can see how it's hitting the timeout and if any additional adjustments are necessary.

<!-- gh-comment-id:2142987617 --> @dhiltgen commented on GitHub (May 31, 2024): @dcfidalgo can you try llama3:70b on your setup with 0.1.39 and see how it behaves? If it still has difficulty, please run the server with OLLAMA_DEBUG=1 and share the logs so I can see how it's hitting the timeout and if any additional adjustments are necessary.
Author
Owner

@pierocor commented on GitHub (Jun 6, 2024):

@dhiltgen I replicated the setup by @dcfidalgo using ollama 0.1.41 in debug mode. Unfortunately, it seems the issue persists. After 5 minutes the timeout kills the process.

$ time ollama run llama3:70b
Error: timed out waiting for llama runner to start - progress 0.00 -

real	5m1.851s
user	0m0.399s
sys	0m0.628s

Here are the full logs:

Jun 06 11:17:03 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:17:03 | 200 |    3.588541ms |       127.0.0.1 | HEAD     "/"
Jun 06 11:17:03 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:17:03 | 200 |   56.794324ms |       127.0.0.1 | POST     "/api/show"
Jun 06 11:17:03 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:17:03 | 200 |    1.760612ms |       127.0.0.1 | POST     "/api/show"
Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.749Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.749Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.749Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.755Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:17:03 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.755Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.755Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:17:03 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:17:03 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:17:03 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:17:04 mygpu ollama[243665]: time=2024-06-06T11:17:04.033Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:17:04 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:17:04 mygpu ollama[243665]: time=2024-06-06T11:17:04.034Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc00004ac40), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.131Z level=DEBUG source=sched.go:153 msg="loading first model" model=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.132Z level=DEBUG source=memory.go:44 msg=evaluating library=cuda gpu_count=1 available="78.7 GiB"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.134Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=81 memory.available="78.7 GiB" memory.required.full="38.5 GiB" memory.required.partial="38.5 GiB" memory.required.kv="640.0 MiB" memory.weights.total="36.7 GiB" memory.weights.repeating="35.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="324.0 MiB" memory.graph.partial="1.1 GiB"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.134Z level=DEBUG source=sched.go:565 msg="new model will fit in available VRAM in single GPU, loading" model=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d gpu=GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a available=84495237120 required="38.5 GiB"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.134Z level=DEBUG source=memory.go:44 msg=evaluating library=cuda gpu_count=1 available="78.7 GiB"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.135Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=81 memory.available="78.7 GiB" memory.required.full="38.5 GiB" memory.required.partial="38.5 GiB" memory.required.kv="640.0 MiB" memory.weights.total="36.7 GiB" memory.weights.repeating="35.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="324.0 MiB" memory.graph.partial="1.1 GiB"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx2
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cuda_v11
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/rocm_v60002
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx2
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cuda_v11
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/rocm_v60002
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama1403192506/runners/cuda_v11/ollama_llama_server --model /data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 81 --verbose --parallel 1 --port 39905"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/home/pierocor/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/tmp/ollama1403192506/runners/cuda_v11 CUDA_VISIBLE_DEVICES=GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a]"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.139Z level=INFO source=sched.go:338 msg="loaded runners" count=1
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.141Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.141Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
Jun 06 11:17:05 mygpu ollama[244275]: INFO [main] build info | build=1 commit="5921b8f" tid="139772047642624" timestamp=1717672625
Jun 06 11:17:05 mygpu ollama[244275]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139772047642624" timestamp=1717672625 total_threads=8
Jun 06 11:17:05 mygpu ollama[244275]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="39905" tid="139772047642624" timestamp=1717672625
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from /data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d (version GGUF V3 (latest))
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-70B-Instruct
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   2:                          llama.block_count u32              = 80
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   4:                     llama.embedding_length u32              = 8192
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 28672
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 64
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  10:                          general.file_type u32              = 2
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.393Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model"
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv  21:               general.quantization_version u32              = 2
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - type  f32:  161 tensors
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - type q4_0:  561 tensors
Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - type q6_K:    1 tensors
Jun 06 11:17:05 mygpu ollama[243665]: llm_load_vocab: special tokens cache size = 256
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_vocab: token to piece cache size = 1.5928 MB
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: format           = GGUF V3 (latest)
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: arch             = llama
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: vocab type       = BPE
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_vocab          = 128256
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_merges         = 280147
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_ctx_train      = 8192
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd           = 8192
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_head           = 64
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_head_kv        = 8
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_layer          = 80
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_rot            = 128
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_head_k    = 128
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_head_v    = 128
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_gqa            = 8
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_k_gqa     = 1024
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_v_gqa     = 1024
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_ff             = 28672
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_expert         = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_expert_used    = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: causal attn      = 1
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: pooling type     = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: rope type        = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: rope scaling     = linear
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: freq_base_train  = 500000.0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: freq_scale_train = 1
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_yarn_orig_ctx  = 8192
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: rope_finetuned   = unknown
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_d_conv       = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_d_inner      = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_d_state      = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_dt_rank      = 0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model type       = 70B
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model ftype      = Q4_0
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model params     = 70.55 B
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model size       = 37.22 GiB (4.53 BPW)
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: general.name     = Meta-Llama-3-70B-Instruct
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: LF token         = 128 'Ä'
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Jun 06 11:17:06 mygpu ollama[243665]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
Jun 06 11:17:06 mygpu ollama[243665]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
Jun 06 11:17:06 mygpu ollama[243665]: ggml_cuda_init: found 1 CUDA devices:
Jun 06 11:17:06 mygpu ollama[243665]:   Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Jun 06 11:17:06 mygpu ollama[243665]: llm_load_tensors: ggml ctx size =    0.74 MiB
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.192Z level=ERROR source=sched.go:344 msg="error loading llama server" error="timed out waiting for llama runner to start - progress 0.00 - "
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.207Z level=DEBUG source=sched.go:347 msg="triggering expiration for failed load" model=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.210Z level=DEBUG source=sched.go:258 msg="runner expired event received" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.212Z level=DEBUG source=sched.go:274 msg="got lock to unload" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.215Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.215Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.217Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:05 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:22:05 | 500 |          5m1s |       127.0.0.1 | POST     "/api/chat"
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.235Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:05 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.260Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.260Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:05 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:05 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80157 mb
Jun 06 11:22:05 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.597Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:05 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.597Z level=DEBUG source=server.go:990 msg="stopping llama server"
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.598Z level=DEBUG source=server.go:996 msg="waiting for llama server to exit"
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.847Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.848Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.848Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.854Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:05 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.855Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.855Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.090Z level=DEBUG source=server.go:1000 msg="llama server stopped"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.090Z level=DEBUG source=sched.go:279 msg="runner released" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.181Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.181Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.181Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.438Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.439Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.439Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.690Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.690Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.690Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.946Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.946Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.946Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.202Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.202Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.202Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.452Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.452Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.452Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.701Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.701Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.701Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.952Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.952Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.952Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.211Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.211Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.465Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.465Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.465Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.717Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.717Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.717Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.972Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.972Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.972Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.202Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.202Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.202Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.426Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.427Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.427Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.427Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.428Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.428Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.428Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.653Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.653Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.653Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.879Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.879Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.879Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.106Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:10 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.106Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.106Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.334Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.348Z level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.135480727
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=sched.go:283 msg="sending an unloaded event" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=sched.go:206 msg="ignoring unload event with no pending requests"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.356Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:10 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.356Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.356Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.576Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.597Z level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.385178735
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.597Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.597Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.598Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.604Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06]
Jun 06 11:22:10 mygpu ollama[243665]: CUDA driver version: 11.4
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.604Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.604Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb
Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.828Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library
Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.847Z level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.635239402
<!-- gh-comment-id:2152181864 --> @pierocor commented on GitHub (Jun 6, 2024): @dhiltgen I replicated the setup by @dcfidalgo using ollama 0.1.41 in debug mode. Unfortunately, it seems the issue persists. After 5 minutes the timeout kills the process. ``` $ time ollama run llama3:70b Error: timed out waiting for llama runner to start - progress 0.00 - real 5m1.851s user 0m0.399s sys 0m0.628s ``` Here are the full logs: ``` Jun 06 11:17:03 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:17:03 | 200 | 3.588541ms | 127.0.0.1 | HEAD "/" Jun 06 11:17:03 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:17:03 | 200 | 56.794324ms | 127.0.0.1 | POST "/api/show" Jun 06 11:17:03 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:17:03 | 200 | 1.760612ms | 127.0.0.1 | POST "/api/show" Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.749Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.749Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.749Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.755Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:17:03 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.755Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:17:03 mygpu ollama[243665]: time=2024-06-06T11:17:03.755Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:17:03 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:17:03 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:17:03 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:17:04 mygpu ollama[243665]: time=2024-06-06T11:17:04.033Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:17:04 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:17:04 mygpu ollama[243665]: time=2024-06-06T11:17:04.034Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc00004ac40), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.131Z level=DEBUG source=sched.go:153 msg="loading first model" model=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.132Z level=DEBUG source=memory.go:44 msg=evaluating library=cuda gpu_count=1 available="78.7 GiB" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.134Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=81 memory.available="78.7 GiB" memory.required.full="38.5 GiB" memory.required.partial="38.5 GiB" memory.required.kv="640.0 MiB" memory.weights.total="36.7 GiB" memory.weights.repeating="35.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="324.0 MiB" memory.graph.partial="1.1 GiB" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.134Z level=DEBUG source=sched.go:565 msg="new model will fit in available VRAM in single GPU, loading" model=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d gpu=GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a available=84495237120 required="38.5 GiB" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.134Z level=DEBUG source=memory.go:44 msg=evaluating library=cuda gpu_count=1 available="78.7 GiB" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.135Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=81 memory.available="78.7 GiB" memory.required.full="38.5 GiB" memory.required.partial="38.5 GiB" memory.required.kv="640.0 MiB" memory.weights.total="36.7 GiB" memory.weights.repeating="35.9 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="324.0 MiB" memory.graph.partial="1.1 GiB" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx2 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.136Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cuda_v11 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/rocm_v60002 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cpu_avx2 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/cuda_v11 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1403192506/runners/rocm_v60002 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama1403192506/runners/cuda_v11/ollama_llama_server --model /data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 81 --verbose --parallel 1 --port 39905" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.137Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/home/pierocor/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/tmp/ollama1403192506/runners/cuda_v11 CUDA_VISIBLE_DEVICES=GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a]" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.139Z level=INFO source=sched.go:338 msg="loaded runners" count=1 Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.141Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.141Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" Jun 06 11:17:05 mygpu ollama[244275]: INFO [main] build info | build=1 commit="5921b8f" tid="139772047642624" timestamp=1717672625 Jun 06 11:17:05 mygpu ollama[244275]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139772047642624" timestamp=1717672625 total_threads=8 Jun 06 11:17:05 mygpu ollama[244275]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="39905" tid="139772047642624" timestamp=1717672625 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from /data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d (version GGUF V3 (latest)) Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 0: general.architecture str = llama Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 1: general.name str = Meta-Llama-3-70B-Instruct Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 2: llama.block_count u32 = 80 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 3: llama.context_length u32 = 8192 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 4: llama.embedding_length u32 = 8192 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 28672 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 6: llama.attention.head_count u32 = 64 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 10: general.file_type u32 = 2 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Jun 06 11:17:05 mygpu ollama[243665]: time=2024-06-06T11:17:05.393Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model" Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - kv 21: general.quantization_version u32 = 2 Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - type f32: 161 tensors Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - type q4_0: 561 tensors Jun 06 11:17:05 mygpu ollama[243665]: llama_model_loader: - type q6_K: 1 tensors Jun 06 11:17:05 mygpu ollama[243665]: llm_load_vocab: special tokens cache size = 256 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_vocab: token to piece cache size = 1.5928 MB Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: format = GGUF V3 (latest) Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: arch = llama Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: vocab type = BPE Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_vocab = 128256 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_merges = 280147 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_ctx_train = 8192 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd = 8192 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_head = 64 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_head_kv = 8 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_layer = 80 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_rot = 128 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_head_k = 128 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_head_v = 128 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_gqa = 8 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_k_gqa = 1024 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_embd_v_gqa = 1024 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_norm_eps = 0.0e+00 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: f_logit_scale = 0.0e+00 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_ff = 28672 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_expert = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_expert_used = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: causal attn = 1 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: pooling type = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: rope type = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: rope scaling = linear Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: freq_base_train = 500000.0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: freq_scale_train = 1 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: n_yarn_orig_ctx = 8192 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: rope_finetuned = unknown Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_d_conv = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_d_inner = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_d_state = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: ssm_dt_rank = 0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model type = 70B Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model ftype = Q4_0 Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model params = 70.55 B Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: general.name = Meta-Llama-3-70B-Instruct Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: LF token = 128 'Ä' Jun 06 11:17:06 mygpu ollama[243665]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Jun 06 11:17:06 mygpu ollama[243665]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes Jun 06 11:17:06 mygpu ollama[243665]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no Jun 06 11:17:06 mygpu ollama[243665]: ggml_cuda_init: found 1 CUDA devices: Jun 06 11:17:06 mygpu ollama[243665]: Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes Jun 06 11:17:06 mygpu ollama[243665]: llm_load_tensors: ggml ctx size = 0.74 MiB Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.192Z level=ERROR source=sched.go:344 msg="error loading llama server" error="timed out waiting for llama runner to start - progress 0.00 - " Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.207Z level=DEBUG source=sched.go:347 msg="triggering expiration for failed load" model=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.210Z level=DEBUG source=sched.go:258 msg="runner expired event received" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.212Z level=DEBUG source=sched.go:274 msg="got lock to unload" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.215Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.215Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.217Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:05 mygpu ollama[243665]: [GIN] 2024/06/06 - 11:22:05 | 500 | 5m1s | 127.0.0.1 | POST "/api/chat" Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.235Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:05 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.260Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.260Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:05 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:05 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80157 mb Jun 06 11:22:05 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.597Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:05 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.597Z level=DEBUG source=server.go:990 msg="stopping llama server" Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.598Z level=DEBUG source=server.go:996 msg="waiting for llama server to exit" Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.847Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.848Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.848Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.854Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:05 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.855Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:05 mygpu ollama[243665]: time=2024-06-06T11:22:05.855Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.090Z level=DEBUG source=server.go:1000 msg="llama server stopped" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.090Z level=DEBUG source=sched.go:279 msg="runner released" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.179Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.181Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.181Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.181Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.437Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.438Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.439Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.439Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.689Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.690Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.690Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.690Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:06 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:06 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.939Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.946Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:06 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.946Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:06 mygpu ollama[243665]: time=2024-06-06T11:22:06.946Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.199Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.202Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.202Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.202Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.450Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.452Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.452Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.452Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.699Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.701Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.701Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.701Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:07 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:07 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.951Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.952Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:07 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.952Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:07 mygpu ollama[243665]: time=2024-06-06T11:22:07.952Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.210Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.211Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.211Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.463Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.465Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.465Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.465Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.715Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.717Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.717Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.717Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:08 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:08 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.970Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.972Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:08 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.972Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:08 mygpu ollama[243665]: time=2024-06-06T11:22:08.972Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.201Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.202Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.202Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.202Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.426Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.427Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.427Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.427Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.428Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.428Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.428Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.651Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.653Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.653Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.653Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:09 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:09 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.875Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.879Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:09 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.879Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:09 mygpu ollama[243665]: time=2024-06-06T11:22:09.879Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.104Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.106Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:10 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.106Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.106Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.334Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.348Z level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.135480727 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=sched.go:283 msg="sending an unloaded event" modelPath=/data/ollama-models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.350Z level=DEBUG source=sched.go:206 msg="ignoring unload event with no pending requests" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.356Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:10 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.356Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.356Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.576Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.597Z level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.385178735 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.597Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.597Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.598Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.604Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06] Jun 06 11:22:10 mygpu ollama[243665]: CUDA driver version: 11.4 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.604Z level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.470.239.06 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.604Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA totalMem 80994 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] CUDA freeMem 80580 mb Jun 06 11:22:10 mygpu ollama[243665]: [GPU-e2dfead1-6232-e5c1-efb8-f20bf39d937a] Compute Capability 8.0 Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.828Z level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 06 11:22:10 mygpu ollama[243665]: releasing nvcuda library Jun 06 11:22:10 mygpu ollama[243665]: time=2024-06-06T11:22:10.847Z level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.635239402 ```
Author
Owner

@dhiltgen commented on GitHub (Jun 6, 2024):

This might be a result of mmap and thrashing system memory.

Can you try loading with mmap turned off?

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:70b",
  "prompt": "Why is the sky blue?",
  "stream": false, "options": {"use_mmap": false}
}'
<!-- gh-comment-id:2153503800 --> @dhiltgen commented on GitHub (Jun 6, 2024): This might be a result of mmap and thrashing system memory. Can you try loading with mmap turned off? ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false} }' ```
Author
Owner

@pierocor commented on GitHub (Jun 10, 2024):

Thanks, disabling mmap works nicely!

<!-- gh-comment-id:2158360542 --> @pierocor commented on GitHub (Jun 10, 2024): Thanks, disabling mmap works nicely!
Author
Owner

@ProjectMoon commented on GitHub (Jun 17, 2024):

Disabling mmap works wonderfully, but I do have some issues with certain models crashing with it disabled, and not always due to out of memory error. What logs would be helpful?

<!-- gh-comment-id:2172421040 --> @ProjectMoon commented on GitHub (Jun 17, 2024): Disabling mmap works wonderfully, but I do have some issues with certain models crashing with it disabled, and not always due to out of memory error. What logs would be helpful?
Author
Owner

@dhiltgen commented on GitHub (Jun 18, 2024):

@ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting "num_gpu": <number> along with "use_mmap": false) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.

<!-- gh-comment-id:2176534682 --> @dhiltgen commented on GitHub (Jun 18, 2024): @ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting `"num_gpu": <number>` along with `"use_mmap": false`) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.
Author
Owner

@ProjectMoon commented on GitHub (Jun 20, 2024):

OK, so one example is deepseek-v2:16b-lite-chat-q5_K_M. It crashes with the error below:

  Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.35 MiB
llm_load_tensors: offloading 27 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 28/28 layers to GPU
llm_load_tensors:      ROCm0 buffer size = 11160.99 MiB
llm_load_tensors:  ROCm_Host buffer size =   137.50 MiB
llama_new_context_with_model: n_ctx      = 16384
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 0.025
llama_kv_cache_init:      ROCm0 KV buffer size =  4320.00 MiB
llama_new_context_with_model: KV self size  = 4320.00 MiB, K (f16): 2592.00 MiB, V (f16): 1728.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.80 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   568.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    36.01 MiB
llama_new_context_with_model: graph nodes  = 1924
llama_new_context_with_model: graph splits = 2
CUDA error: CUBLAS_STATUS_ALLOC_FAILED
  current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda/common.cuh:826
  hipblasCreate(&cublas_handles[device])
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:100: !"CUDA error"

My GPU has 16 GB of VRAM, so I'd expect it to fit in.

When disabling mmap, it still crashes.

Setting number of GPU layers to 20 allows it to run.

This is with the latest rc4 of 0.1.45.

<!-- gh-comment-id:2180788566 --> @ProjectMoon commented on GitHub (Jun 20, 2024): OK, so one example is `deepseek-v2:16b-lite-chat-q5_K_M`. It crashes with the error below: ``` Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no llm_load_tensors: ggml ctx size = 0.35 MiB llm_load_tensors: offloading 27 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 28/28 layers to GPU llm_load_tensors: ROCm0 buffer size = 11160.99 MiB llm_load_tensors: ROCm_Host buffer size = 137.50 MiB llama_new_context_with_model: n_ctx = 16384 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 0.025 llama_kv_cache_init: ROCm0 KV buffer size = 4320.00 MiB llama_new_context_with_model: KV self size = 4320.00 MiB, K (f16): 2592.00 MiB, V (f16): 1728.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.80 MiB llama_new_context_with_model: ROCm0 compute buffer size = 568.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 36.01 MiB llama_new_context_with_model: graph nodes = 1924 llama_new_context_with_model: graph splits = 2 CUDA error: CUBLAS_STATUS_ALLOC_FAILED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda/common.cuh:826 hipblasCreate(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:100: !"CUDA error" ``` My GPU has 16 GB of VRAM, so I'd expect it to fit in. When disabling mmap, it still crashes. Setting number of GPU layers to 20 allows it to run. This is with the latest rc4 of 0.1.45.
Author
Owner

@dhiltgen commented on GitHub (Jun 20, 2024):

@ProjectMoon lets track the deepseek v2 memory prediction glitch via issue #5136

<!-- gh-comment-id:2180998273 --> @dhiltgen commented on GitHub (Jun 20, 2024): @ProjectMoon lets track the deepseek v2 memory prediction glitch via issue #5136
Author
Owner

@ProjectMoon commented on GitHub (Jun 26, 2024):

OK, so here are some logs involving Beyonder V3 4x7b, i1 Q5_K_M. I have the num_ctx parameter set to 8192 here. The model loads, and then crashes after a while with an out of memory error, but I still have like 10 GB of system RAM left that the llama runner isn't using. This happens on ollama 0.1.46.

Edit: also tried setting use_mmap to false in the modelfile. Same result. Though I'm not entirely sure it was actually passed to the runner, perhaps due to issues with the parameter in the current version?

time=2024-06-26T08:50:37.145+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="16.0 GiB"
2024/06/26 08:50:55 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/ollama OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:2 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-06-26T08:50:55.944+02:00 level=INFO source=images.go:730 msg="total blobs: 108"
time=2024-06-26T08:50:55.947+02:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0"
time=2024-06-26T08:50:55.948+02:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.46)"
time=2024-06-26T08:50:55.948+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1305881221/runners
time=2024-06-26T08:50:59.090+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60101]"
time=2024-06-26T08:50:59.112+02:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-26T08:50:59.120+02:00 level=INFO source=amd_linux.go:330 msg="amdgpu is supported" gpu=0 gpu_type=gfx1030
time=2024-06-26T08:50:59.121+02:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=rocm compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="16.0 GiB"

time=2024-06-26T08:51:10.626+02:00 level=WARN source=types.go:430 msg="invalid option provided" option=""
time=2024-06-26T08:51:10.640+02:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=23 layers.split="" memory.available="[16.0 GiB]" memory.required.full="20.9 GiB" memory.required.partial="15.6 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[15.6 GiB]" memory.weights.total="17.8 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="250.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.8 GiB"
time=2024-06-26T08:51:10.645+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama1305881221/runners/rocm_v60101/ollama_llama_server --model /ollama/blobs/sha256-cd9ae73d8328beff1097f313f8787302d476647ed9f56deba7000d6d4633d277 --ctx-size 16384 --batch-size 512 --embedding --log-disable --n-gpu-layers 23 --no-mmap --parallel 2 --port 40153"
time=2024-06-26T08:51:10.646+02:00 level=INFO source=sched.go:382 msg="loaded runners" count=1
time=2024-06-26T08:51:10.646+02:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding"
time=2024-06-26T08:51:10.646+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error"
llama_model_loader: loaded meta data with 26 key-value pairs and 611 tensors from /ollama/blobs/sha256-cd9ae73d8328beff1097f313f8787302d476647ed9f56deba7000d6d4633d277 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:                         llama.expert_count u32              = 4
llama_model_loader: - kv  10:                    llama.expert_used_count u32              = 2
llama_model_loader: - kv  11:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  12:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13:                          general.file_type u32              = 17
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...

llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% for message in messages %}{{bos_to...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:   33 tensors
llama_model_loader: - type q5_K:  433 tensors
llama_model_loader: - type q6_K:   80 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.1637 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 4
llm_load_print_meta: n_expert_used    = 2
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 24.15 B
llm_load_print_meta: model size       = 16.10 GiB (5.73 BPW)
llm_load_print_meta: general.name     = .
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 1 '<s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'




time=2024-06-26T08:51:10.898+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model"


/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.54 MiB
time=2024-06-26T08:51:13.611+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server not responding"
llm_load_tensors: offloading 23 repeating layers to GPU
llm_load_tensors: offloaded 23/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size = 11593.03 MiB
llm_load_tensors:  ROCm_Host buffer size =  4893.42 MiB
time=2024-06-26T08:51:13.863+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model"
llama_new_context_with_model: n_ctx      = 16384
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =  1472.00 MiB
llama_kv_cache_init:  ROCm_Host KV buffer size =   576.00 MiB
llama_new_context_with_model: KV self size  = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.28 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =  1147.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    40.01 MiB
llama_new_context_with_model: graph nodes  = 1510
llama_new_context_with_model: graph splits = 112
time=2024-06-26T08:51:46.061+02:00 level=INFO source=server.go:599 msg="llama runner started in 35.42 seconds"
CUDA error: out of memory
  current device: 0, in function alloc at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:290
  ggml_cuda_device_malloc(&ptr, look_ahead_size, device)
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:100: !"CUDA error"
<!-- gh-comment-id:2190953483 --> @ProjectMoon commented on GitHub (Jun 26, 2024): OK, so here are some logs involving [Beyonder V3 4x7b, i1 Q5_K_M](https://huggingface.co/mradermacher/Beyonder-4x7B-v3-i1-GGUF). I have the `num_ctx` parameter set to 8192 here. The model loads, and then crashes after a while with an out of memory error, but I still have like 10 GB of system RAM left that the llama runner isn't using. This happens on ollama 0.1.46. Edit: also tried setting `use_mmap` to false in the modelfile. Same result. Though I'm not entirely sure it was actually passed to the runner, perhaps due to issues with the parameter in the current version? ``` time=2024-06-26T08:50:37.145+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="16.0 GiB" 2024/06/26 08:50:55 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/ollama OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:2 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-06-26T08:50:55.944+02:00 level=INFO source=images.go:730 msg="total blobs: 108" time=2024-06-26T08:50:55.947+02:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0" time=2024-06-26T08:50:55.948+02:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.46)" time=2024-06-26T08:50:55.948+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1305881221/runners time=2024-06-26T08:50:59.090+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60101]" time=2024-06-26T08:50:59.112+02:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-26T08:50:59.120+02:00 level=INFO source=amd_linux.go:330 msg="amdgpu is supported" gpu=0 gpu_type=gfx1030 time=2024-06-26T08:50:59.121+02:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=rocm compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="16.0 GiB" time=2024-06-26T08:51:10.626+02:00 level=WARN source=types.go:430 msg="invalid option provided" option="" time=2024-06-26T08:51:10.640+02:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=23 layers.split="" memory.available="[16.0 GiB]" memory.required.full="20.9 GiB" memory.required.partial="15.6 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[15.6 GiB]" memory.weights.total="17.8 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="250.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.8 GiB" time=2024-06-26T08:51:10.645+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama1305881221/runners/rocm_v60101/ollama_llama_server --model /ollama/blobs/sha256-cd9ae73d8328beff1097f313f8787302d476647ed9f56deba7000d6d4633d277 --ctx-size 16384 --batch-size 512 --embedding --log-disable --n-gpu-layers 23 --no-mmap --parallel 2 --port 40153" time=2024-06-26T08:51:10.646+02:00 level=INFO source=sched.go:382 msg="loaded runners" count=1 time=2024-06-26T08:51:10.646+02:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding" time=2024-06-26T08:51:10.646+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error" llama_model_loader: loaded meta data with 26 key-value pairs and 611 tensors from /ollama/blobs/sha256-cd9ae73d8328beff1097f313f8787302d476647ed9f56deba7000d6d4633d277 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = . llama_model_loader: - kv 2: llama.context_length u32 = 32768 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 9: llama.expert_count u32 = 4 llama_model_loader: - kv 10: llama.expert_used_count u32 = 2 llama_model_loader: - kv 11: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 12: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 13: general.file_type u32 = 17 llama_model_loader: - kv 14: tokenizer.ggml.model str = llama llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 20: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 24: tokenizer.chat_template str = {% for message in messages %}{{bos_to... llama_model_loader: - kv 25: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type f16: 33 tensors llama_model_loader: - type q5_K: 433 tensors llama_model_loader: - type q6_K: 80 tensors llm_load_vocab: special tokens cache size = 259 llm_load_vocab: token to piece cache size = 0.1637 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 4 llm_load_print_meta: n_expert_used = 2 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q5_K - Medium llm_load_print_meta: model params = 24.15 B llm_load_print_meta: model size = 16.10 GiB (5.73 BPW) llm_load_print_meta: general.name = . llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 1 '<s>' llm_load_print_meta: LF token = 13 '<0x0A>' time=2024-06-26T08:51:10.898+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model" /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no llm_load_tensors: ggml ctx size = 0.54 MiB time=2024-06-26T08:51:13.611+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server not responding" llm_load_tensors: offloading 23 repeating layers to GPU llm_load_tensors: offloaded 23/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 11593.03 MiB llm_load_tensors: ROCm_Host buffer size = 4893.42 MiB time=2024-06-26T08:51:13.863+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model" llama_new_context_with_model: n_ctx = 16384 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 1472.00 MiB llama_kv_cache_init: ROCm_Host KV buffer size = 576.00 MiB llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.28 MiB llama_new_context_with_model: ROCm0 compute buffer size = 1147.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 40.01 MiB llama_new_context_with_model: graph nodes = 1510 llama_new_context_with_model: graph splits = 112 time=2024-06-26T08:51:46.061+02:00 level=INFO source=server.go:599 msg="llama runner started in 35.42 seconds" CUDA error: out of memory current device: 0, in function alloc at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:290 ggml_cuda_device_malloc(&ptr, look_ahead_size, device) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:100: !"CUDA error" ```
Author
Owner

@dhiltgen commented on GitHub (Jul 3, 2024):

For folks hitting slow model loads leading to a timeout on low system memory systems, this should be resolved in the latest few releases where we detect the model being larger than system memory, and turn off mmap.

The model OOM scenarios are unrelated to the mmap behavior on system with small amounts of system memory, so lets track those via other issues. If it involves converting models from huggingface repos, they may be new architecture that need additional work to fully support, including adjusting our memory prediction for the given architecture

<!-- gh-comment-id:2207569598 --> @dhiltgen commented on GitHub (Jul 3, 2024): For folks hitting slow model loads leading to a timeout on low system memory systems, this should be resolved in the latest few releases where we detect the model being larger than system memory, and turn off mmap. The model OOM scenarios are unrelated to the mmap behavior on system with small amounts of system memory, so lets track those via other issues. If it involves converting models from huggingface repos, they may be new architecture that need additional work to fully support, including adjusting our [memory prediction for the given architecture](https://github.com/ollama/ollama/blob/main/llm/ggml.go#L342-L428)
Author
Owner

@accqaz commented on GitHub (Jan 10, 2025):

@ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting "num_gpu": <number> along with "use_mmap": false) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.

Hello!I want to ask how to set num_gpu: <number> ?I have two gpu,but when I run ollama serve,I found it only use one gpu. Maybe this is why I met the timeout error.
Thank you very much!

<!-- gh-comment-id:2581589234 --> @accqaz commented on GitHub (Jan 10, 2025): > @ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting `"num_gpu": <number>` along with `"use_mmap": false`) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look. Hello!I want to ask how to set `num_gpu: <number>` ?I have two gpu,but when I run `ollama serve`,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much!
Author
Owner

@ProjectMoon commented on GitHub (Jan 10, 2025):

@ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting "num_gpu": <number> along with "use_mmap": false) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.

Hello!I want to ask how to set num_gpu: <number> ?I have two gpu,but when I run ollama serve,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much!

The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to.

<!-- gh-comment-id:2581941698 --> @ProjectMoon commented on GitHub (Jan 10, 2025): > > @ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting `"num_gpu": <number>` along with `"use_mmap": false`) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look. > > Hello!I want to ask how to set `num_gpu: <number>` ?I have two gpu,but when I run `ollama serve`,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much! The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to.
Author
Owner

@accqaz commented on GitHub (Jan 10, 2025):

@ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting "num_gpu": <number> along with "use_mmap": false) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.

Hello!I want to ask how to set num_gpu: <number> ?I have two gpu,but when I run ollama serve,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much!

The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to.

I see!If I want to use 2 GPUs,do you know how to set?I’ve done a lot of research on this and I still don't know how to set it up. Thank you very much!

<!-- gh-comment-id:2581955402 --> @accqaz commented on GitHub (Jan 10, 2025): > > > @ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting `"num_gpu": <number>` along with `"use_mmap": false`) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look. > > > > > > Hello!I want to ask how to set `num_gpu: <number>` ?I have two gpu,but when I run `ollama serve`,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much! > > The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to. I see!If I want to use 2 GPUs,do you know how to set?I’ve done a lot of research on this and I still don't know how to set it up. Thank you very much!
Author
Owner

@ProjectMoon commented on GitHub (Jan 10, 2025):

@ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting "num_gpu": <number> along with "use_mmap": false) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.

Hello!I want to ask how to set num_gpu: <number> ?I have two gpu,but when I run ollama serve,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much!

The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to.

I see!If I want to use 2 GPUs,do you know how to set?I’ve done a lot of research on this and I still don't know how to set it up. Thank you very much!

Ollama will print out whatever GPUs it finds on startup. If it finds both, it will use both as needed. I'm not sure if there's a way to force it to use one or the other, aside from the HIP/CUDA environment variables that can control visibility of GPUs to an application. But I think this is outside the scope of this issue. You should probably make a discussion.

<!-- gh-comment-id:2582076399 --> @ProjectMoon commented on GitHub (Jan 10, 2025): > > > > @ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting `"num_gpu": <number>` along with `"use_mmap": false`) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look. > > > > > > > > > Hello!I want to ask how to set `num_gpu: <number>` ?I have two gpu,but when I run `ollama serve`,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much! > > > > > > The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to. > > I see!If I want to use 2 GPUs,do you know how to set?I’ve done a lot of research on this and I still don't know how to set it up. Thank you very much! Ollama will print out whatever GPUs it finds on startup. If it finds both, it will use both as needed. I'm not sure if there's a way to force it to use one or the other, aside from the HIP/CUDA environment variables that can control visibility of GPUs to an application. But I think this is outside the scope of this issue. You should probably make a discussion.
Author
Owner

@accqaz commented on GitHub (Jan 10, 2025):

@ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting "num_gpu": <number> along with "use_mmap": false) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look.

Hello!I want to ask how to set num_gpu: <number> ?I have two gpu,but when I run ollama serve,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much!

The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to.

I see!If I want to use 2 GPUs,do you know how to set?I’ve done a lot of research on this and I still don't know how to set it up. Thank you very much!

Ollama will print out whatever GPUs it finds on startup. If it finds both, it will use both as needed. I'm not sure if there's a way to force it to use one or the other, aside from the HIP/CUDA environment variables that can control visibility of GPUs to an application. But I think this is outside the scope of this issue. You should probably make a discussion.

Understand!Thank you!

<!-- gh-comment-id:2582093462 --> @accqaz commented on GitHub (Jan 10, 2025): > > > > > @ProjectMoon depending on the nature of the out-of-memory scenario, it can sometimes be a little confusing in the logs. I would try forcing a smaller number of layers (by setting `"num_gpu": <number>` along with `"use_mmap": false`) and see if that resolves it (which would confirm a more subtle out of memory scenario) but if that doesn't resolve it, then I'd open a new issue with a repro scenario and server logs so we can take a look. > > > > > > > > > > > > Hello!I want to ask how to set `num_gpu: <number>` ?I have two gpu,but when I run `ollama serve`,I found it only use one gpu. Maybe this is why I met the timeout error. Thank you very much! > > > > > > > > > The num gpu parameter is how many layers to attempt to load into a GPU. It can be set via model file or API call. It's not related to what GPUs ollama has access to. > > > > > > I see!If I want to use 2 GPUs,do you know how to set?I’ve done a lot of research on this and I still don't know how to set it up. Thank you very much! > > Ollama will print out whatever GPUs it finds on startup. If it finds both, it will use both as needed. I'm not sure if there's a way to force it to use one or the other, aside from the HIP/CUDA environment variables that can control visibility of GPUs to an application. But I think this is outside the scope of this issue. You should probably make a discussion. Understand!Thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64481