[GH-ISSUE #7393] EOF error on pull with different model #51211

Closed
opened 2026-04-28 18:55:31 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @bdytx5 on GitHub (Oct 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7393

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

brett@brett:~$ ollama pull llama3.2

Error: registry.ollama.ai/library/phi3:latest: EOF

really confused. This is not an out of memory error. Tried reseting the systemctl stuff also ... https://github.com/ollama/ollama/issues/1859

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.14

Originally created by @bdytx5 on GitHub (Oct 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7393 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? brett@brett:~$ ollama pull llama3.2 Error: registry.ollama.ai/library/phi3:latest: EOF really confused. This is not an out of memory error. Tried reseting the systemctl stuff also ... https://github.com/ollama/ollama/issues/1859 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.14
GiteaMirror added the bug label 2026-04-28 18:55:31 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

Your pull command shows llama3.2 but the error message is for phi3, so I assume this is failing for multiple models, which implies a connectivity issue. Have you ever been able to pull a model, or is this a recent problem? What's in your server logs?

<!-- gh-comment-id:2441312041 --> @rick-github commented on GitHub (Oct 28, 2024): Your `pull` command shows llama3.2 but the error message is for phi3, so I assume this is failing for multiple models, which implies a connectivity issue. Have you ever been able to pull a model, or is this a recent problem? What's in your [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues)?
Author
Owner

@bdytx5 commented on GitHub (Oct 29, 2024):

It worked fine before, but now it is suddenly not working

(base) brett@brett:~$ journalctl -u ollama --no-pager
-- Logs begin at Fri 2024-08-16 10:40:36 CDT, end at Mon 2024-10-28 22:12:45 CDT. --
Aug 28 18:14:57 brett systemd[1]: Started Ollama Service.
Aug 28 18:14:57 brett ollama[1480]: 2024/08/28 18:14:57 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.835-05:00 level=INFO source=images.go:704 msg="total blobs: 35"
Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.847-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.847-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.848-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3896581571/runners
Aug 28 18:15:03 brett ollama[1480]: time=2024-08-28T18:15:03.688-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]"
Aug 28 18:15:03 brett ollama[1480]: time=2024-08-28T18:15:03.916-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.7 GiB"
Sep 09 20:58:24 brett systemd[1]: Stopping Ollama Service...
Sep 09 20:58:24 brett systemd[1]: ollama.service: Succeeded.
Sep 09 20:58:24 brett systemd[1]: Stopped Ollama Service.
-- Reboot --
Sep 09 20:59:04 brett systemd[1]: Started Ollama Service.
Sep 09 20:59:04 brett ollama[1001]: 2024/09/09 20:59:04 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.257-05:00 level=INFO source=images.go:704 msg="total blobs: 35"
Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.267-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.267-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.268-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2167072295/runners
Sep 09 20:59:08 brett ollama[1001]: time=2024-09-09T20:59:08.759-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Sep 09 20:59:08 brett ollama[1001]: time=2024-09-09T20:59:08.963-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Sep 23 01:53:49 brett systemd[1]: Stopping Ollama Service...
Sep 23 01:53:49 brett systemd[1]: ollama.service: Succeeded.
Sep 23 01:53:49 brett systemd[1]: Stopped Ollama Service.
-- Reboot --
Sep 23 01:54:28 brett systemd[1]: Started Ollama Service.
Sep 23 01:54:29 brett ollama[978]: 2024/09/23 01:54:29 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.156-05:00 level=INFO source=images.go:704 msg="total blobs: 35"
Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.164-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.165-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.165-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2714411270/runners
Sep 23 01:54:33 brett ollama[978]: time=2024-09-23T01:54:33.346-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Sep 23 01:54:33 brett ollama[978]: time=2024-09-23T01:54:33.604-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
-- Reboot --
Sep 25 09:59:29 brett systemd[1]: Started Ollama Service.
Sep 25 09:59:29 brett ollama[1454]: 2024/09/25 09:59:29 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.352-05:00 level=INFO source=images.go:704 msg="total blobs: 35"
Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.361-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.362-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.363-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1341669304/runners
Sep 25 09:59:33 brett ollama[1454]: time=2024-09-25T09:59:33.170-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Sep 25 09:59:33 brett ollama[1454]: time=2024-09-25T09:59:33.429-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
-- Reboot --
Oct 03 07:02:51 brett systemd[1]: Started Ollama Service.
Oct 03 07:02:51 brett ollama[1190]: 2024/10/03 07:02:51 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.291-05:00 level=INFO source=images.go:704 msg="total blobs: 35"
Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.306-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.306-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.308-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1192105597/runners
Oct 03 07:02:57 brett ollama[1190]: time=2024-10-03T07:02:57.018-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Oct 03 07:02:57 brett ollama[1190]: time=2024-10-03T07:02:57.175-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
-- Reboot --
Oct 14 09:56:26 brett systemd[1]: Started Ollama Service.
Oct 14 09:56:26 brett ollama[1665]: 2024/10/14 09:56:26 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.195-05:00 level=INFO source=images.go:704 msg="total blobs: 35"
Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.204-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.205-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.206-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama822189088/runners
Oct 14 09:56:30 brett ollama[1665]: time=2024-10-14T09:56:30.014-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Oct 14 09:56:30 brett ollama[1665]: time=2024-10-14T09:56:30.200-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 14 22:26:27 brett ollama[1665]: [GIN] 2024/10/14 - 22:26:27 | 200 |     787.415µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:26:27 brett ollama[1665]: time=2024-10-14T22:26:27.068-05:00 level=WARN source=routes.go:762 msg="bad manifest config" name=registry.ollama.ai/library/phi3:instruct error="invalid character '\\x00' looking for beginning of value"
Oct 14 22:26:27 brett ollama[1665]: time=2024-10-14T22:26:27.068-05:00 level=WARN source=routes.go:749 msg="bad manifest" name=registry.ollama.ai/library/phi3:latest error=EOF
Oct 14 22:26:27 brett ollama[1665]: [GIN] 2024/10/14 - 22:26:27 | 200 |    3.181247ms |       127.0.0.1 | GET      "/api/tags"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |      23.713µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |   22.012957ms |       127.0.0.1 | DELETE   "/api/delete"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |      20.164µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |    9.784876ms |       127.0.0.1 | DELETE   "/api/delete"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |       20.68µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |    2.472006ms |       127.0.0.1 | DELETE   "/api/delete"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |      18.509µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |    3.000684ms |       127.0.0.1 | DELETE   "/api/delete"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |      19.154µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |    9.288327ms |       127.0.0.1 | DELETE   "/api/delete"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |      19.664µs |       127.0.0.1 | HEAD     "/"
Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 |    1.804939ms |       127.0.0.1 | DELETE   "/api/delete"
-- Reboot --
Oct 17 08:27:59 brett systemd[1]: Started Ollama Service.
Oct 17 08:27:59 brett ollama[1634]: 2024/10/17 08:27:59 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.823-05:00 level=INFO source=images.go:704 msg="total blobs: 5"
Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.829-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.830-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.831-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3093056487/runners
Oct 17 08:28:03 brett ollama[1634]: time=2024-10-17T08:28:03.684-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]"
Oct 17 08:28:03 brett ollama[1634]: time=2024-10-17T08:28:03.937-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
-- Reboot --
Oct 27 03:37:43 brett systemd[1]: Started Ollama Service.
Oct 27 03:37:44 brett ollama[1128]: 2024/10/27 03:37:44 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.152-05:00 level=INFO source=images.go:704 msg="total blobs: 5"
Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.156-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.156-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)"
Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.158-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1884882316/runners
Oct 27 03:37:48 brett ollama[1128]: time=2024-10-27T03:37:48.873-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Oct 27 03:37:49 brett ollama[1128]: time=2024-10-27T03:37:49.135-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 27 03:45:22 brett ollama[1128]: [GIN] 2024/10/27 - 03:45:22 | 200 |      867.06µs |       127.0.0.1 | HEAD     "/"
Oct 27 03:45:22 brett ollama[1128]: time=2024-10-27T03:45:22.502-05:00 level=ERROR source=images.go:969 msg="jwt token does not contain 3 parts"
Oct 27 03:45:23 brett ollama[1128]: time=2024-10-27T03:45:23.452-05:00 level=INFO source=download.go:136 msg="downloading 74701a8c35f6 in 14 100 MB part(s)"
Oct 27 03:46:22 brett ollama[1128]: time=2024-10-27T03:46:22.089-05:00 level=INFO source=download.go:136 msg="downloading 966de95ca8a6 in 1 1.4 KB part(s)"
Oct 27 03:46:23 brett ollama[1128]: time=2024-10-27T03:46:23.702-05:00 level=INFO source=download.go:136 msg="downloading fcc5a6bec9da in 1 7.7 KB part(s)"
Oct 27 03:46:25 brett ollama[1128]: time=2024-10-27T03:46:25.324-05:00 level=INFO source=download.go:136 msg="downloading a70ff7e570d9 in 1 6.0 KB part(s)"
Oct 27 03:46:26 brett ollama[1128]: time=2024-10-27T03:46:26.930-05:00 level=INFO source=download.go:136 msg="downloading 4f659a1e86d7 in 1 485 B part(s)"
Oct 27 03:46:31 brett ollama[1128]: [GIN] 2024/10/27 - 03:46:31 | 200 |          1m8s |       127.0.0.1 | POST     "/api/pull"
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.142-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB"
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.142-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB"
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.142-05:00 level=INFO source=server.go:318 msg="starting llama server" cmd="/tmp/ollama1884882316/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 38269"
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.143-05:00 level=INFO source=sched.go:333 msg="loaded runners" count=1
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.143-05:00 level=INFO source=server.go:488 msg="waiting for llama runner to start responding"
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.143-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error"
Oct 27 03:48:29 brett ollama[3108]: INFO [main] build info | build=1 commit="952d03d" tid="140487417716736" timestamp=1730018909
Oct 27 03:48:29 brett ollama[3108]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140487417716736" timestamp=1730018909 total_threads=4
Oct 27 03:48:29 brett ollama[3108]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="38269" tid="140487417716736" timestamp=1730018909
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - type  f32:   34 tensors
Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - type q8_0:  113 tensors
Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.394-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server loading model"
Oct 27 03:48:29 brett ollama[1128]: llm_load_vocab: special tokens definition check successful ( 256/128256 ).
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: format           = GGUF V3 (latest)
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: arch             = llama
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: vocab type       = BPE
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_vocab          = 128256
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_merges         = 280147
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_ctx_train      = 131072
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd           = 2048
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_head           = 32
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_head_kv        = 8
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_layer          = 16
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_rot            = 64
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_head_k    = 64
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_head_v    = 64
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_gqa            = 4
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_k_gqa     = 512
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_v_gqa     = 512
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_ff             = 8192
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_expert         = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_expert_used    = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: causal attn      = 1
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: pooling type     = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: rope type        = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: rope scaling     = linear
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: freq_base_train  = 500000.0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: freq_scale_train = 1
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_yarn_orig_ctx  = 131072
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: rope_finetuned   = unknown
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_d_conv       = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_d_inner      = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_d_state      = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_dt_rank      = 0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model type       = ?B
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model ftype      = Q8_0
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model params     = 1.24 B
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW)
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: LF token         = 128 'Ä'
Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Oct 27 03:48:29 brett ollama[1128]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
Oct 27 03:48:29 brett ollama[1128]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
Oct 27 03:48:29 brett ollama[1128]: ggml_cuda_init: found 1 CUDA devices:
Oct 27 03:48:29 brett ollama[1128]:   Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Oct 27 03:48:29 brett ollama[1128]: llm_load_tensors: ggml ctx size =    0.15 MiB
Oct 27 03:48:29 brett ollama[1128]: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 147, got 146
Oct 27 03:48:29 brett ollama[1128]: llama_load_model_from_file: exception loading model
Oct 27 03:48:30 brett ollama[1128]: terminate called after throwing an instance of 'std::runtime_error'
Oct 27 03:48:30 brett ollama[1128]:   what():  done_getting_tensors: wrong number of tensors; expected 147, got 146
Oct 27 03:48:30 brett ollama[1128]: time=2024-10-27T03:48:30.183-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error"
Oct 27 03:48:30 brett ollama[1128]: time=2024-10-27T03:48:30.434-05:00 level=ERROR source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) "
Oct 27 03:48:30 brett ollama[1128]: [GIN] 2024/10/27 - 03:48:30 | 500 |  2.507157839s |       127.0.0.1 | POST     "/api/chat"
Oct 27 03:48:35 brett ollama[1128]: time=2024-10-27T03:48:35.526-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.091533509
Oct 27 03:48:35 brett ollama[1128]: time=2024-10-27T03:48:35.776-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.341143575
Oct 27 03:48:36 brett ollama[1128]: time=2024-10-27T03:48:36.026-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.591589563
Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.876-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB"
Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.876-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB"
Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.876-05:00 level=INFO source=server.go:318 msg="starting llama server" cmd="/tmp/ollama1884882316/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 40261"
Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.877-05:00 level=INFO source=sched.go:333 msg="loaded runners" count=1
Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.877-05:00 level=INFO source=server.go:488 msg="waiting for llama runner to start responding"
Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.877-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error"
Oct 27 03:50:55 brett ollama[3202]: INFO [main] build info | build=1 commit="952d03d" tid="139704163123200" timestamp=1730019055
Oct 27 03:50:55 brett ollama[3202]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139704163123200" timestamp=1730019055 total_threads=4
Oct 27 03:50:55 brett ollama[3202]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="40261" tid="139704163123200" timestamp=1730019055
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - type  f32:   34 tensors
Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - type q8_0:  113 tensors
Oct 27 03:50:56 brett ollama[1128]: time=2024-10-27T03:50:56.128-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server loading model"
Oct 27 03:50:56 brett ollama[1128]: llm_load_vocab: special tokens definition check successful ( 256/128256 ).
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: format           = GGUF V3 (latest)
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: arch             = llama
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: vocab type       = BPE
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_vocab          = 128256
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_merges         = 280147
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_ctx_train      = 131072
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd           = 2048
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_head           = 32
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_head_kv        = 8
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_layer          = 16
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_rot            = 64
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_head_k    = 64
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_head_v    = 64
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_gqa            = 4
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_k_gqa     = 512
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_v_gqa     = 512
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_ff             = 8192
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_expert         = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_expert_used    = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: causal attn      = 1
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: pooling type     = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: rope type        = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: rope scaling     = linear
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: freq_base_train  = 500000.0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: freq_scale_train = 1
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_yarn_orig_ctx  = 131072
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: rope_finetuned   = unknown
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_d_conv       = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_d_inner      = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_d_state      = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_dt_rank      = 0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model type       = ?B
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model ftype      = Q8_0
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model params     = 1.24 B
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW)
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: LF token         = 128 'Ä'
Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Oct 27 03:50:56 brett ollama[1128]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
Oct 27 03:50:56 brett ollama[1128]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
Oct 27 03:50:56 brett ollama[1128]: ggml_cuda_init: found 1 CUDA devices:
Oct 27 03:50:56 brett ollama[1128]:   Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Oct 27 03:50:56 brett ollama[1128]: llm_load_tensors: ggml ctx size =    0.15 MiB
Oct 27 03:50:56 brett ollama[1128]: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 147, got 146
Oct 27 03:50:56 brett ollama[1128]: llama_load_model_from_file: exception loading model
Oct 27 03:50:56 brett ollama[1128]: terminate called after throwing an instance of 'std::runtime_error'
Oct 27 03:50:56 brett ollama[1128]:   what():  done_getting_tensors: wrong number of tensors; expected 147, got 146
Oct 27 03:50:56 brett ollama[1128]: time=2024-10-27T03:50:56.919-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error"
Oct 27 03:50:57 brett ollama[1128]: time=2024-10-27T03:50:57.170-05:00 level=ERROR source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) "
Oct 27 03:50:57 brett ollama[1128]: [GIN] 2024/10/27 - 03:50:57 | 500 |  2.498784739s |       127.0.0.1 | POST     "/api/chat"
Oct 27 03:51:02 brett ollama[1128]: time=2024-10-27T03:51:02.300-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.129768007
Oct 27 03:51:02 brett ollama[1128]: time=2024-10-27T03:51:02.550-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.379840996
Oct 27 03:51:02 brett ollama[1128]: time=2024-10-27T03:51:02.800-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.629716515
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB"
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB"
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=server.go:318 msg="starting llama server" cmd="/tmp/ollama1884882316/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 38805"
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=sched.go:333 msg="loaded runners" count=1
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=server.go:488 msg="waiting for llama runner to start responding"
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.171-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error"
Oct 27 23:42:17 brett ollama[22115]: INFO [main] build info | build=1 commit="952d03d" tid="140548346470400" timestamp=1730090537
Oct 27 23:42:17 brett ollama[22115]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140548346470400" timestamp=1730090537 total_threads=4
Oct 27 23:42:17 brett ollama[22115]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="38805" tid="140548346470400" timestamp=1730090537
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - type  f32:   34 tensors
Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - type q8_0:  113 tensors
Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.422-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server loading model"
Oct 27 23:42:17 brett ollama[1128]: llm_load_vocab: special tokens definition check successful ( 256/128256 ).
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: format           = GGUF V3 (latest)
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: arch             = llama
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: vocab type       = BPE
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_vocab          = 128256
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_merges         = 280147
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_ctx_train      = 131072
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd           = 2048
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_head           = 32
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_head_kv        = 8
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_layer          = 16
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_rot            = 64
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_head_k    = 64
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_head_v    = 64
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_gqa            = 4
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_k_gqa     = 512
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_v_gqa     = 512
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_ff             = 8192
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_expert         = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_expert_used    = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: causal attn      = 1
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: pooling type     = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: rope type        = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: rope scaling     = linear
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: freq_base_train  = 500000.0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: freq_scale_train = 1
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_yarn_orig_ctx  = 131072
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: rope_finetuned   = unknown
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_d_conv       = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_d_inner      = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_d_state      = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_dt_rank      = 0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model type       = ?B
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model ftype      = Q8_0
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model params     = 1.24 B
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW)
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: LF token         = 128 'Ä'
Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Oct 27 23:42:17 brett ollama[1128]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
Oct 27 23:42:17 brett ollama[1128]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
Oct 27 23:42:17 brett ollama[1128]: ggml_cuda_init: found 1 CUDA devices:
Oct 27 23:42:17 brett ollama[1128]:   Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Oct 27 23:42:17 brett ollama[1128]: llm_load_tensors: ggml ctx size =    0.15 MiB
Oct 27 23:42:17 brett ollama[1128]: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 147, got 146
Oct 27 23:42:17 brett ollama[1128]: llama_load_model_from_file: exception loading model
Oct 27 23:42:18 brett ollama[1128]: terminate called after throwing an instance of 'std::runtime_error'
Oct 27 23:42:18 brett ollama[1128]:   what():  done_getting_tensors: wrong number of tensors; expected 147, got 146
Oct 27 23:42:18 brett ollama[1128]: time=2024-10-27T23:42:18.206-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error"
Oct 27 23:42:18 brett ollama[1128]: time=2024-10-27T23:42:18.456-05:00 level=ERROR source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) "
Oct 27 23:42:18 brett ollama[1128]: [GIN] 2024/10/27 - 23:42:18 | 500 |  2.520228455s |       127.0.0.1 | POST     "/api/chat"
Oct 27 23:42:23 brett ollama[1128]: time=2024-10-27T23:42:23.554-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.097619973
Oct 27 23:42:23 brett ollama[1128]: time=2024-10-27T23:42:23.804-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3474481990000005
Oct 27 23:42:24 brett ollama[1128]: time=2024-10-27T23:42:24.053-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.596719041
Oct 27 23:44:29 brett systemd[1]: Stopping Ollama Service...
Oct 27 23:44:29 brett systemd[1]: ollama.service: Succeeded.
Oct 27 23:44:29 brett systemd[1]: Stopped Ollama Service.
Oct 27 23:44:29 brett systemd[1]: Started Ollama Service.
Oct 27 23:44:29 brett ollama[22362]: 2024/10/27 23:44:29 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.804-05:00 level=INFO source=images.go:754 msg="total blobs: 10"
Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.804-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF"
Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.804-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)"
Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.805-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama528511600/runners
Oct 27 23:44:39 brett ollama[22362]: time=2024-10-27T23:44:39.271-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]"
Oct 27 23:44:39 brett ollama[22362]: time=2024-10-27T23:44:39.271-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 27 23:44:39 brett ollama[22362]: time=2024-10-27T23:44:39.457-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
-- Reboot --
Oct 27 23:45:52 brett systemd[1]: Started Ollama Service.
Oct 27 23:45:52 brett ollama[1210]: 2024/10/27 23:45:52 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.288-05:00 level=INFO source=images.go:754 msg="total blobs: 10"
Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.295-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF"
Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.301-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)"
Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.308-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama58727328/runners
Oct 27 23:46:05 brett ollama[1210]: time=2024-10-27T23:46:05.719-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
Oct 27 23:46:05 brett ollama[1210]: time=2024-10-27T23:46:05.719-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 27 23:46:05 brett ollama[1210]: time=2024-10-27T23:46:05.947-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 27 23:47:05 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:05 | 200 |      36.093µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:47:05 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:05 | 404 |     163.693µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:47:05 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:05 | 400 |     239.022µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:47:08 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:08 | 200 |      18.161µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:47:08 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:08 | 404 |       37.82µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:47:08 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:08 | 400 |      226.35µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:47:17 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:17 | 200 |      19.667µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:47:17 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:17 | 404 |      50.438µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:47:17 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:17 | 400 |     261.805µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:47:24 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:24 | 200 |      16.313µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:47:24 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:24 | 404 |      45.966µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:47:24 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:24 | 400 |     247.012µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:47:38 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:38 | 200 |      22.788µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:47:38 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:38 | 400 |     306.724µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:49:28 brett systemd[1]: Stopping Ollama Service...
Oct 27 23:49:28 brett systemd[1]: ollama.service: Succeeded.
Oct 27 23:49:28 brett systemd[1]: Stopped Ollama Service.
Oct 27 23:49:28 brett systemd[1]: Started Ollama Service.
Oct 27 23:49:28 brett ollama[2954]: 2024/10/27 23:49:28 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.482-05:00 level=INFO source=images.go:754 msg="total blobs: 10"
Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.483-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF"
Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.483-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)"
Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.483-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2690884310/runners
Oct 27 23:49:37 brett ollama[2954]: time=2024-10-27T23:49:37.979-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu]"
Oct 27 23:49:37 brett ollama[2954]: time=2024-10-27T23:49:37.979-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 27 23:49:38 brett ollama[2954]: time=2024-10-27T23:49:38.118-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 27 23:49:38 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:38 | 200 |       49.64µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:49:38 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:38 | 404 |     106.443µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:49:38 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:38 | 400 |     227.448µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:49:50 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:50 | 200 |      18.509µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:49:50 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:50 | 200 |   55.612435ms |       127.0.0.1 | POST     "/api/show"
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.362-05:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 gpu=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 parallel=4 available=11552161792 required="2.5 GiB"
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.443-05:00 level=INFO source=server.go:105 msg="system memory" total="31.3 GiB" free="29.7 GiB" free_swap="2.0 GiB"
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.443-05:00 level=INFO source=memory.go:326 msg="offload to cuda" layers.requested=-1 layers.model=17 layers.offload=17 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.5 GiB" memory.required.partial="2.5 GiB" memory.required.kv="256.0 MiB" memory.required.allocations="[2.5 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="976.1 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB"
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama2690884310/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 17 --threads 4 --parallel 4 --port 46533"
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=server.go:587 msg="waiting for llama runner to start responding"
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error"
Oct 27 23:49:50 brett ollama[3022]: INFO [main] starting c++ runner | tid="139693392822272" timestamp=1730090990
Oct 27 23:49:50 brett ollama[3022]: INFO [main] build info | build=10 commit="3a8c75e" tid="139693392822272" timestamp=1730090990
Oct 27 23:49:50 brett ollama[3022]: INFO [main] system info | n_threads=4 n_threads_batch=4 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139693392822272" timestamp=1730090990 total_threads=4
Oct 27 23:49:50 brett ollama[3022]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="46533" tid="139693392822272" timestamp=1730090990
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - type  f32:   34 tensors
Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - type q8_0:  113 tensors
Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.696-05:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model"
Oct 27 23:49:50 brett ollama[2954]: llm_load_vocab: special tokens cache size = 256
Oct 27 23:49:51 brett ollama[2954]: llm_load_vocab: token to piece cache size = 0.7999 MB
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: format           = GGUF V3 (latest)
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: arch             = llama
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: vocab type       = BPE
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_vocab          = 128256
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_merges         = 280147
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: vocab_only       = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_ctx_train      = 131072
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd           = 2048
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_layer          = 16
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_head           = 32
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_head_kv        = 8
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_rot            = 64
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_swa            = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_head_k    = 64
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_head_v    = 64
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_gqa            = 4
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_k_gqa     = 512
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_v_gqa     = 512
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_ff             = 8192
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_expert         = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_expert_used    = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: causal attn      = 1
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: pooling type     = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: rope type        = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: rope scaling     = linear
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: freq_base_train  = 500000.0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: freq_scale_train = 1
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_ctx_orig_yarn  = 131072
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: rope_finetuned   = unknown
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_d_conv       = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_d_inner      = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_d_state      = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_dt_rank      = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model type       = 1B
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model ftype      = Q8_0
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model params     = 1.24 B
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW)
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: LF token         = 128 'Ä'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: max token length = 256
Oct 27 23:49:51 brett ollama[2954]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Oct 27 23:49:51 brett ollama[2954]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Oct 27 23:49:51 brett ollama[2954]: ggml_cuda_init: found 1 CUDA devices:
Oct 27 23:49:51 brett ollama[2954]:   Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes
Oct 27 23:49:51 brett ollama[2954]: llm_load_tensors: ggml ctx size =    0.14 MiB
Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: offloading 16 repeating layers to GPU
Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: offloading non-repeating layers to GPU
Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: offloaded 17/17 layers to GPU
Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors:        CPU buffer size =   266.16 MiB
Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors:      CUDA0 buffer size =  1252.42 MiB
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: n_ctx      = 8192
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: n_batch    = 512
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: n_ubatch   = 512
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: flash_attn = 0
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: freq_base  = 500000.0
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: freq_scale = 1
Oct 27 23:49:54 brett ollama[2954]: llama_kv_cache_init:      CUDA0 KV buffer size =   256.00 MiB
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model:  CUDA_Host  output buffer size =     1.99 MiB
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model:      CUDA0 compute buffer size =   544.00 MiB
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model:  CUDA_Host compute buffer size =    20.01 MiB
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: graph nodes  = 518
Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: graph splits = 2
Oct 27 23:49:54 brett ollama[3022]: INFO [main] model loaded | tid="139693392822272" timestamp=1730090994
Oct 27 23:49:54 brett ollama[2954]: time=2024-10-27T23:49:54.467-05:00 level=INFO source=server.go:626 msg="llama runner started in 4.02 seconds"
Oct 27 23:49:54 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:54 | 200 |  4.257026632s |       127.0.0.1 | POST     "/api/generate"
Oct 27 23:50:01 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:01 | 200 |      17.756µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:50:01 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:01 | 404 |       50.26µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:50:01 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:01 | 400 |       243.5µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:50:07 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:07 | 200 |      16.663µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:50:07 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:07 | 200 |   18.372207ms |       127.0.0.1 | POST     "/api/show"
Oct 27 23:50:07 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:07 | 200 |   17.780019ms |       127.0.0.1 | POST     "/api/generate"
Oct 27 23:50:20 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:20 | 200 |      17.006µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:50:20 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:20 | 404 |      49.617µs |       127.0.0.1 | POST     "/api/show"
Oct 27 23:50:20 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:20 | 400 |      280.05µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:50:28 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:28 | 200 |      18.864µs |       127.0.0.1 | HEAD     "/"
Oct 27 23:50:28 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:28 | 400 |     273.697µs |       127.0.0.1 | POST     "/api/pull"
Oct 27 23:52:52 brett systemd[1]: Stopping Ollama Service...
Oct 27 23:52:53 brett systemd[1]: ollama.service: Succeeded.
Oct 27 23:52:53 brett systemd[1]: Stopped Ollama Service.
-- Reboot --
Oct 28 00:04:27 brett systemd[1]: Started Ollama Service.
Oct 28 00:04:27 brett ollama[2832]: 2024/10/28 00:04:27 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.622-05:00 level=INFO source=images.go:754 msg="total blobs: 10"
Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.651-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF"
Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.652-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)"
Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.653-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama3229709321/runners
Oct 28 00:04:37 brett ollama[2832]: time=2024-10-28T00:04:37.194-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx]"
Oct 28 00:04:37 brett ollama[2832]: time=2024-10-28T00:04:37.194-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 28 00:04:38 brett ollama[2832]: time=2024-10-28T00:04:38.822-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 28 00:04:39 brett ollama[2832]: [GIN] 2024/10/28 - 00:04:39 | 200 |      52.965µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:04:39 brett ollama[2832]: [GIN] 2024/10/28 - 00:04:39 | 404 |     124.537µs |       127.0.0.1 | POST     "/api/show"
Oct 28 00:04:39 brett ollama[2832]: [GIN] 2024/10/28 - 00:04:39 | 400 |     227.166µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:05:22 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:22 | 200 |      29.093µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:05:22 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:22 | 404 |      58.268µs |       127.0.0.1 | POST     "/api/show"
Oct 28 00:05:22 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:22 | 400 |      229.23µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:05:29 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:29 | 200 |      20.293µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:05:29 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:29 | 400 |     279.554µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:10:00 brett systemd[1]: Stopping Ollama Service...
Oct 28 00:10:00 brett systemd[1]: ollama.service: Succeeded.
Oct 28 00:10:00 brett systemd[1]: Stopped Ollama Service.
Oct 28 00:10:00 brett systemd[1]: Started Ollama Service.
Oct 28 00:10:00 brett ollama[3082]: 2024/10/28 00:10:00 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.490-05:00 level=INFO source=images.go:754 msg="total blobs: 10"
Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.490-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF"
Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.490-05:00 level=INFO source=routes.go:1205 msg="Listening on [::]:11434 (version 0.3.14)"
Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.491-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama175581295/runners
Oct 28 00:10:10 brett ollama[3082]: time=2024-10-28T00:10:10.011-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx cpu_avx2]"
Oct 28 00:10:10 brett ollama[3082]: time=2024-10-28T00:10:10.011-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 28 00:10:10 brett ollama[3082]: time=2024-10-28T00:10:10.149-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 28 00:10:13 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:13 | 200 |      34.452µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:10:13 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:13 | 404 |     125.382µs |       127.0.0.1 | POST     "/api/show"
Oct 28 00:10:13 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:13 | 400 |     232.582µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:10:33 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:33 | 404 |     125.819µs |       127.0.0.1 | POST     "/api/generate"
Oct 28 00:10:53 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:53 | 200 |      16.439µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:10:53 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:53 | 400 |     251.805µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:12:50 brett systemd[1]: /etc/systemd/system/ollama.service:12: Invalid environment assignment, ignoring: “OLLAMA_DEBUG=“1”
Oct 28 00:13:15 brett systemd[1]: Stopping Ollama Service...
Oct 28 00:13:15 brett systemd[1]: ollama.service: Succeeded.
Oct 28 00:13:15 brett systemd[1]: Stopped Ollama Service.
Oct 28 00:13:15 brett systemd[1]: Started Ollama Service.
Oct 28 00:13:15 brett ollama[3208]: 2024/10/28 00:13:15 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=INFO source=images.go:754 msg="total blobs: 10"
Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF"
Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=INFO source=routes.go:1205 msg="Listening on [::]:11434 (version 0.3.14)"
Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama996491480/runners
Oct 28 00:13:25 brett ollama[3208]: time=2024-10-28T00:13:25.415-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]"
Oct 28 00:13:25 brett ollama[3208]: time=2024-10-28T00:13:25.416-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 28 00:13:25 brett ollama[3208]: time=2024-10-28T00:13:25.554-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB"
Oct 28 00:13:25 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:25 | 200 |      26.804µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:13:25 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:25 | 400 |      326.32µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:13:49 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:49 | 200 |      16.045µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:13:49 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:49 | 400 |     277.728µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:14:19 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:19 | 200 |      16.386µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:14:19 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:19 | 400 |     257.092µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:14:59 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:59 | 200 |      16.534µs |       127.0.0.1 | HEAD     "/"
Oct 28 00:14:59 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:59 | 400 |     270.238µs |       127.0.0.1 | POST     "/api/pull"
Oct 28 00:16:37 brett ollama[3208]: [GIN] 2024/10/28 - 00:16:37 | 200 |      42.386µs |       127.0.0.1 | GET      "/api/version"
Oct 28 22:11:41 brett ollama[3208]: [GIN] 2024/10/28 - 22:11:41 | 200 |      17.622µs |       127.0.0.1 | HEAD     "/"
Oct 28 22:11:41 brett ollama[3208]: [GIN] 2024/10/28 - 22:11:41 | 404 |      92.283µs |       127.0.0.1 | POST     "/api/show"
Oct 28 22:11:41 brett ollama[3208]: [GIN] 2024/10/28 - 22:11:41 | 400 |     249.791µs |       127.0.0.1 | POST     "/api/pull"
(base) brett@brett:~$ 

<!-- gh-comment-id:2443110399 --> @bdytx5 commented on GitHub (Oct 29, 2024): It worked fine before, but now it is suddenly not working ``` (base) brett@brett:~$ journalctl -u ollama --no-pager -- Logs begin at Fri 2024-08-16 10:40:36 CDT, end at Mon 2024-10-28 22:12:45 CDT. -- Aug 28 18:14:57 brett systemd[1]: Started Ollama Service. Aug 28 18:14:57 brett ollama[1480]: 2024/08/28 18:14:57 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.835-05:00 level=INFO source=images.go:704 msg="total blobs: 35" Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.847-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.847-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Aug 28 18:14:57 brett ollama[1480]: time=2024-08-28T18:14:57.848-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3896581571/runners Aug 28 18:15:03 brett ollama[1480]: time=2024-08-28T18:15:03.688-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]" Aug 28 18:15:03 brett ollama[1480]: time=2024-08-28T18:15:03.916-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.7 GiB" Sep 09 20:58:24 brett systemd[1]: Stopping Ollama Service... Sep 09 20:58:24 brett systemd[1]: ollama.service: Succeeded. Sep 09 20:58:24 brett systemd[1]: Stopped Ollama Service. -- Reboot -- Sep 09 20:59:04 brett systemd[1]: Started Ollama Service. Sep 09 20:59:04 brett ollama[1001]: 2024/09/09 20:59:04 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.257-05:00 level=INFO source=images.go:704 msg="total blobs: 35" Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.267-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.267-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Sep 09 20:59:04 brett ollama[1001]: time=2024-09-09T20:59:04.268-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2167072295/runners Sep 09 20:59:08 brett ollama[1001]: time=2024-09-09T20:59:08.759-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Sep 09 20:59:08 brett ollama[1001]: time=2024-09-09T20:59:08.963-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Sep 23 01:53:49 brett systemd[1]: Stopping Ollama Service... Sep 23 01:53:49 brett systemd[1]: ollama.service: Succeeded. Sep 23 01:53:49 brett systemd[1]: Stopped Ollama Service. -- Reboot -- Sep 23 01:54:28 brett systemd[1]: Started Ollama Service. Sep 23 01:54:29 brett ollama[978]: 2024/09/23 01:54:29 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.156-05:00 level=INFO source=images.go:704 msg="total blobs: 35" Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.164-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.165-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Sep 23 01:54:29 brett ollama[978]: time=2024-09-23T01:54:29.165-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2714411270/runners Sep 23 01:54:33 brett ollama[978]: time=2024-09-23T01:54:33.346-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Sep 23 01:54:33 brett ollama[978]: time=2024-09-23T01:54:33.604-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" -- Reboot -- Sep 25 09:59:29 brett systemd[1]: Started Ollama Service. Sep 25 09:59:29 brett ollama[1454]: 2024/09/25 09:59:29 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.352-05:00 level=INFO source=images.go:704 msg="total blobs: 35" Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.361-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.362-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Sep 25 09:59:29 brett ollama[1454]: time=2024-09-25T09:59:29.363-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1341669304/runners Sep 25 09:59:33 brett ollama[1454]: time=2024-09-25T09:59:33.170-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Sep 25 09:59:33 brett ollama[1454]: time=2024-09-25T09:59:33.429-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" -- Reboot -- Oct 03 07:02:51 brett systemd[1]: Started Ollama Service. Oct 03 07:02:51 brett ollama[1190]: 2024/10/03 07:02:51 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.291-05:00 level=INFO source=images.go:704 msg="total blobs: 35" Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.306-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.306-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Oct 03 07:02:51 brett ollama[1190]: time=2024-10-03T07:02:51.308-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1192105597/runners Oct 03 07:02:57 brett ollama[1190]: time=2024-10-03T07:02:57.018-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Oct 03 07:02:57 brett ollama[1190]: time=2024-10-03T07:02:57.175-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" -- Reboot -- Oct 14 09:56:26 brett systemd[1]: Started Ollama Service. Oct 14 09:56:26 brett ollama[1665]: 2024/10/14 09:56:26 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.195-05:00 level=INFO source=images.go:704 msg="total blobs: 35" Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.204-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.205-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Oct 14 09:56:26 brett ollama[1665]: time=2024-10-14T09:56:26.206-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama822189088/runners Oct 14 09:56:30 brett ollama[1665]: time=2024-10-14T09:56:30.014-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Oct 14 09:56:30 brett ollama[1665]: time=2024-10-14T09:56:30.200-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 14 22:26:27 brett ollama[1665]: [GIN] 2024/10/14 - 22:26:27 | 200 | 787.415µs | 127.0.0.1 | HEAD "/" Oct 14 22:26:27 brett ollama[1665]: time=2024-10-14T22:26:27.068-05:00 level=WARN source=routes.go:762 msg="bad manifest config" name=registry.ollama.ai/library/phi3:instruct error="invalid character '\\x00' looking for beginning of value" Oct 14 22:26:27 brett ollama[1665]: time=2024-10-14T22:26:27.068-05:00 level=WARN source=routes.go:749 msg="bad manifest" name=registry.ollama.ai/library/phi3:latest error=EOF Oct 14 22:26:27 brett ollama[1665]: [GIN] 2024/10/14 - 22:26:27 | 200 | 3.181247ms | 127.0.0.1 | GET "/api/tags" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 23.713µs | 127.0.0.1 | HEAD "/" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 22.012957ms | 127.0.0.1 | DELETE "/api/delete" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 20.164µs | 127.0.0.1 | HEAD "/" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 9.784876ms | 127.0.0.1 | DELETE "/api/delete" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 20.68µs | 127.0.0.1 | HEAD "/" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 2.472006ms | 127.0.0.1 | DELETE "/api/delete" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 18.509µs | 127.0.0.1 | HEAD "/" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 3.000684ms | 127.0.0.1 | DELETE "/api/delete" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 19.154µs | 127.0.0.1 | HEAD "/" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 9.288327ms | 127.0.0.1 | DELETE "/api/delete" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 19.664µs | 127.0.0.1 | HEAD "/" Oct 14 22:27:06 brett ollama[1665]: [GIN] 2024/10/14 - 22:27:06 | 200 | 1.804939ms | 127.0.0.1 | DELETE "/api/delete" -- Reboot -- Oct 17 08:27:59 brett systemd[1]: Started Ollama Service. Oct 17 08:27:59 brett ollama[1634]: 2024/10/17 08:27:59 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.823-05:00 level=INFO source=images.go:704 msg="total blobs: 5" Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.829-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.830-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Oct 17 08:27:59 brett ollama[1634]: time=2024-10-17T08:27:59.831-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3093056487/runners Oct 17 08:28:03 brett ollama[1634]: time=2024-10-17T08:28:03.684-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]" Oct 17 08:28:03 brett ollama[1634]: time=2024-10-17T08:28:03.937-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" -- Reboot -- Oct 27 03:37:43 brett systemd[1]: Started Ollama Service. Oct 27 03:37:44 brett ollama[1128]: 2024/10/27 03:37:44 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.152-05:00 level=INFO source=images.go:704 msg="total blobs: 5" Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.156-05:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.156-05:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.36)" Oct 27 03:37:44 brett ollama[1128]: time=2024-10-27T03:37:44.158-05:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1884882316/runners Oct 27 03:37:48 brett ollama[1128]: time=2024-10-27T03:37:48.873-05:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Oct 27 03:37:49 brett ollama[1128]: time=2024-10-27T03:37:49.135-05:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 27 03:45:22 brett ollama[1128]: [GIN] 2024/10/27 - 03:45:22 | 200 | 867.06µs | 127.0.0.1 | HEAD "/" Oct 27 03:45:22 brett ollama[1128]: time=2024-10-27T03:45:22.502-05:00 level=ERROR source=images.go:969 msg="jwt token does not contain 3 parts" Oct 27 03:45:23 brett ollama[1128]: time=2024-10-27T03:45:23.452-05:00 level=INFO source=download.go:136 msg="downloading 74701a8c35f6 in 14 100 MB part(s)" Oct 27 03:46:22 brett ollama[1128]: time=2024-10-27T03:46:22.089-05:00 level=INFO source=download.go:136 msg="downloading 966de95ca8a6 in 1 1.4 KB part(s)" Oct 27 03:46:23 brett ollama[1128]: time=2024-10-27T03:46:23.702-05:00 level=INFO source=download.go:136 msg="downloading fcc5a6bec9da in 1 7.7 KB part(s)" Oct 27 03:46:25 brett ollama[1128]: time=2024-10-27T03:46:25.324-05:00 level=INFO source=download.go:136 msg="downloading a70ff7e570d9 in 1 6.0 KB part(s)" Oct 27 03:46:26 brett ollama[1128]: time=2024-10-27T03:46:26.930-05:00 level=INFO source=download.go:136 msg="downloading 4f659a1e86d7 in 1 485 B part(s)" Oct 27 03:46:31 brett ollama[1128]: [GIN] 2024/10/27 - 03:46:31 | 200 | 1m8s | 127.0.0.1 | POST "/api/pull" Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.142-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB" Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.142-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB" Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.142-05:00 level=INFO source=server.go:318 msg="starting llama server" cmd="/tmp/ollama1884882316/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 38269" Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.143-05:00 level=INFO source=sched.go:333 msg="loaded runners" count=1 Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.143-05:00 level=INFO source=server.go:488 msg="waiting for llama runner to start responding" Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.143-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error" Oct 27 03:48:29 brett ollama[3108]: INFO [main] build info | build=1 commit="952d03d" tid="140487417716736" timestamp=1730018909 Oct 27 03:48:29 brett ollama[3108]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140487417716736" timestamp=1730018909 total_threads=4 Oct 27 03:48:29 brett ollama[3108]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="38269" tid="140487417716736" timestamp=1730018909 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 0: general.architecture str = llama Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 1: general.type str = model Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 3: general.finetune str = Instruct Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 5: general.size_label str = 1B Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 18: general.file_type u32 = 7 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - type f32: 34 tensors Oct 27 03:48:29 brett ollama[1128]: llama_model_loader: - type q8_0: 113 tensors Oct 27 03:48:29 brett ollama[1128]: time=2024-10-27T03:48:29.394-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server loading model" Oct 27 03:48:29 brett ollama[1128]: llm_load_vocab: special tokens definition check successful ( 256/128256 ). Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: format = GGUF V3 (latest) Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: arch = llama Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: vocab type = BPE Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_vocab = 128256 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_merges = 280147 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_ctx_train = 131072 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd = 2048 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_head = 32 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_head_kv = 8 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_layer = 16 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_rot = 64 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_head_k = 64 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_head_v = 64 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_gqa = 4 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_k_gqa = 512 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_embd_v_gqa = 512 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_norm_eps = 0.0e+00 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: f_logit_scale = 0.0e+00 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_ff = 8192 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_expert = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_expert_used = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: causal attn = 1 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: pooling type = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: rope type = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: rope scaling = linear Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: freq_base_train = 500000.0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: freq_scale_train = 1 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: n_yarn_orig_ctx = 131072 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: rope_finetuned = unknown Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_d_conv = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_d_inner = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_d_state = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: ssm_dt_rank = 0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model type = ?B Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model ftype = Q8_0 Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model params = 1.24 B Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: general.name = Llama 3.2 1B Instruct Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: LF token = 128 'Ä' Oct 27 03:48:29 brett ollama[1128]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Oct 27 03:48:29 brett ollama[1128]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes Oct 27 03:48:29 brett ollama[1128]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no Oct 27 03:48:29 brett ollama[1128]: ggml_cuda_init: found 1 CUDA devices: Oct 27 03:48:29 brett ollama[1128]: Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes Oct 27 03:48:29 brett ollama[1128]: llm_load_tensors: ggml ctx size = 0.15 MiB Oct 27 03:48:29 brett ollama[1128]: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 147, got 146 Oct 27 03:48:29 brett ollama[1128]: llama_load_model_from_file: exception loading model Oct 27 03:48:30 brett ollama[1128]: terminate called after throwing an instance of 'std::runtime_error' Oct 27 03:48:30 brett ollama[1128]: what(): done_getting_tensors: wrong number of tensors; expected 147, got 146 Oct 27 03:48:30 brett ollama[1128]: time=2024-10-27T03:48:30.183-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error" Oct 27 03:48:30 brett ollama[1128]: time=2024-10-27T03:48:30.434-05:00 level=ERROR source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) " Oct 27 03:48:30 brett ollama[1128]: [GIN] 2024/10/27 - 03:48:30 | 500 | 2.507157839s | 127.0.0.1 | POST "/api/chat" Oct 27 03:48:35 brett ollama[1128]: time=2024-10-27T03:48:35.526-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.091533509 Oct 27 03:48:35 brett ollama[1128]: time=2024-10-27T03:48:35.776-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.341143575 Oct 27 03:48:36 brett ollama[1128]: time=2024-10-27T03:48:36.026-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.591589563 Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.876-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB" Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.876-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB" Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.876-05:00 level=INFO source=server.go:318 msg="starting llama server" cmd="/tmp/ollama1884882316/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 40261" Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.877-05:00 level=INFO source=sched.go:333 msg="loaded runners" count=1 Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.877-05:00 level=INFO source=server.go:488 msg="waiting for llama runner to start responding" Oct 27 03:50:55 brett ollama[1128]: time=2024-10-27T03:50:55.877-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error" Oct 27 03:50:55 brett ollama[3202]: INFO [main] build info | build=1 commit="952d03d" tid="139704163123200" timestamp=1730019055 Oct 27 03:50:55 brett ollama[3202]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139704163123200" timestamp=1730019055 total_threads=4 Oct 27 03:50:55 brett ollama[3202]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="40261" tid="139704163123200" timestamp=1730019055 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 0: general.architecture str = llama Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 1: general.type str = model Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 3: general.finetune str = Instruct Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 5: general.size_label str = 1B Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 18: general.file_type u32 = 7 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Oct 27 03:50:55 brett ollama[1128]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - type f32: 34 tensors Oct 27 03:50:56 brett ollama[1128]: llama_model_loader: - type q8_0: 113 tensors Oct 27 03:50:56 brett ollama[1128]: time=2024-10-27T03:50:56.128-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server loading model" Oct 27 03:50:56 brett ollama[1128]: llm_load_vocab: special tokens definition check successful ( 256/128256 ). Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: format = GGUF V3 (latest) Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: arch = llama Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: vocab type = BPE Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_vocab = 128256 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_merges = 280147 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_ctx_train = 131072 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd = 2048 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_head = 32 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_head_kv = 8 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_layer = 16 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_rot = 64 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_head_k = 64 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_head_v = 64 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_gqa = 4 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_k_gqa = 512 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_embd_v_gqa = 512 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_norm_eps = 0.0e+00 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: f_logit_scale = 0.0e+00 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_ff = 8192 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_expert = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_expert_used = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: causal attn = 1 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: pooling type = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: rope type = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: rope scaling = linear Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: freq_base_train = 500000.0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: freq_scale_train = 1 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: n_yarn_orig_ctx = 131072 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: rope_finetuned = unknown Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_d_conv = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_d_inner = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_d_state = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: ssm_dt_rank = 0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model type = ?B Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model ftype = Q8_0 Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model params = 1.24 B Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: general.name = Llama 3.2 1B Instruct Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: LF token = 128 'Ä' Oct 27 03:50:56 brett ollama[1128]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Oct 27 03:50:56 brett ollama[1128]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes Oct 27 03:50:56 brett ollama[1128]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no Oct 27 03:50:56 brett ollama[1128]: ggml_cuda_init: found 1 CUDA devices: Oct 27 03:50:56 brett ollama[1128]: Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes Oct 27 03:50:56 brett ollama[1128]: llm_load_tensors: ggml ctx size = 0.15 MiB Oct 27 03:50:56 brett ollama[1128]: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 147, got 146 Oct 27 03:50:56 brett ollama[1128]: llama_load_model_from_file: exception loading model Oct 27 03:50:56 brett ollama[1128]: terminate called after throwing an instance of 'std::runtime_error' Oct 27 03:50:56 brett ollama[1128]: what(): done_getting_tensors: wrong number of tensors; expected 147, got 146 Oct 27 03:50:56 brett ollama[1128]: time=2024-10-27T03:50:56.919-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error" Oct 27 03:50:57 brett ollama[1128]: time=2024-10-27T03:50:57.170-05:00 level=ERROR source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) " Oct 27 03:50:57 brett ollama[1128]: [GIN] 2024/10/27 - 03:50:57 | 500 | 2.498784739s | 127.0.0.1 | POST "/api/chat" Oct 27 03:51:02 brett ollama[1128]: time=2024-10-27T03:51:02.300-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.129768007 Oct 27 03:51:02 brett ollama[1128]: time=2024-10-27T03:51:02.550-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.379840996 Oct 27 03:51:02 brett ollama[1128]: time=2024-10-27T03:51:02.800-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.629716515 Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB" Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=memory.go:127 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="10.8 GiB" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="64.0 MiB" memory.weights.total="1.3 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="464.0 MiB" Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=server.go:318 msg="starting llama server" cmd="/tmp/ollama1884882316/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 38805" Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=sched.go:333 msg="loaded runners" count=1 Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.170-05:00 level=INFO source=server.go:488 msg="waiting for llama runner to start responding" Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.171-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error" Oct 27 23:42:17 brett ollama[22115]: INFO [main] build info | build=1 commit="952d03d" tid="140548346470400" timestamp=1730090537 Oct 27 23:42:17 brett ollama[22115]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140548346470400" timestamp=1730090537 total_threads=4 Oct 27 23:42:17 brett ollama[22115]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="38805" tid="140548346470400" timestamp=1730090537 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 0: general.architecture str = llama Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 1: general.type str = model Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 3: general.finetune str = Instruct Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 5: general.size_label str = 1B Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 18: general.file_type u32 = 7 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - type f32: 34 tensors Oct 27 23:42:17 brett ollama[1128]: llama_model_loader: - type q8_0: 113 tensors Oct 27 23:42:17 brett ollama[1128]: time=2024-10-27T23:42:17.422-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server loading model" Oct 27 23:42:17 brett ollama[1128]: llm_load_vocab: special tokens definition check successful ( 256/128256 ). Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: format = GGUF V3 (latest) Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: arch = llama Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: vocab type = BPE Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_vocab = 128256 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_merges = 280147 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_ctx_train = 131072 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd = 2048 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_head = 32 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_head_kv = 8 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_layer = 16 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_rot = 64 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_head_k = 64 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_head_v = 64 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_gqa = 4 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_k_gqa = 512 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_embd_v_gqa = 512 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_norm_eps = 0.0e+00 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: f_logit_scale = 0.0e+00 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_ff = 8192 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_expert = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_expert_used = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: causal attn = 1 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: pooling type = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: rope type = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: rope scaling = linear Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: freq_base_train = 500000.0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: freq_scale_train = 1 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: n_yarn_orig_ctx = 131072 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: rope_finetuned = unknown Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_d_conv = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_d_inner = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_d_state = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: ssm_dt_rank = 0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model type = ?B Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model ftype = Q8_0 Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model params = 1.24 B Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: general.name = Llama 3.2 1B Instruct Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: LF token = 128 'Ä' Oct 27 23:42:17 brett ollama[1128]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Oct 27 23:42:17 brett ollama[1128]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes Oct 27 23:42:17 brett ollama[1128]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no Oct 27 23:42:17 brett ollama[1128]: ggml_cuda_init: found 1 CUDA devices: Oct 27 23:42:17 brett ollama[1128]: Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes Oct 27 23:42:17 brett ollama[1128]: llm_load_tensors: ggml ctx size = 0.15 MiB Oct 27 23:42:17 brett ollama[1128]: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 147, got 146 Oct 27 23:42:17 brett ollama[1128]: llama_load_model_from_file: exception loading model Oct 27 23:42:18 brett ollama[1128]: terminate called after throwing an instance of 'std::runtime_error' Oct 27 23:42:18 brett ollama[1128]: what(): done_getting_tensors: wrong number of tensors; expected 147, got 146 Oct 27 23:42:18 brett ollama[1128]: time=2024-10-27T23:42:18.206-05:00 level=INFO source=server.go:524 msg="waiting for server to become available" status="llm server error" Oct 27 23:42:18 brett ollama[1128]: time=2024-10-27T23:42:18.456-05:00 level=ERROR source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) " Oct 27 23:42:18 brett ollama[1128]: [GIN] 2024/10/27 - 23:42:18 | 500 | 2.520228455s | 127.0.0.1 | POST "/api/chat" Oct 27 23:42:23 brett ollama[1128]: time=2024-10-27T23:42:23.554-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.097619973 Oct 27 23:42:23 brett ollama[1128]: time=2024-10-27T23:42:23.804-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3474481990000005 Oct 27 23:42:24 brett ollama[1128]: time=2024-10-27T23:42:24.053-05:00 level=WARN source=sched.go:507 msg="gpu VRAM usage didn't recover within timeout" seconds=5.596719041 Oct 27 23:44:29 brett systemd[1]: Stopping Ollama Service... Oct 27 23:44:29 brett systemd[1]: ollama.service: Succeeded. Oct 27 23:44:29 brett systemd[1]: Stopped Ollama Service. Oct 27 23:44:29 brett systemd[1]: Started Ollama Service. Oct 27 23:44:29 brett ollama[22362]: 2024/10/27 23:44:29 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.804-05:00 level=INFO source=images.go:754 msg="total blobs: 10" Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.804-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF" Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.804-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)" Oct 27 23:44:29 brett ollama[22362]: time=2024-10-27T23:44:29.805-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama528511600/runners Oct 27 23:44:39 brett ollama[22362]: time=2024-10-27T23:44:39.271-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]" Oct 27 23:44:39 brett ollama[22362]: time=2024-10-27T23:44:39.271-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 27 23:44:39 brett ollama[22362]: time=2024-10-27T23:44:39.457-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" -- Reboot -- Oct 27 23:45:52 brett systemd[1]: Started Ollama Service. Oct 27 23:45:52 brett ollama[1210]: 2024/10/27 23:45:52 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.288-05:00 level=INFO source=images.go:754 msg="total blobs: 10" Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.295-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF" Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.301-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)" Oct 27 23:45:52 brett ollama[1210]: time=2024-10-27T23:45:52.308-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama58727328/runners Oct 27 23:46:05 brett ollama[1210]: time=2024-10-27T23:46:05.719-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]" Oct 27 23:46:05 brett ollama[1210]: time=2024-10-27T23:46:05.719-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 27 23:46:05 brett ollama[1210]: time=2024-10-27T23:46:05.947-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 27 23:47:05 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:05 | 200 | 36.093µs | 127.0.0.1 | HEAD "/" Oct 27 23:47:05 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:05 | 404 | 163.693µs | 127.0.0.1 | POST "/api/show" Oct 27 23:47:05 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:05 | 400 | 239.022µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:47:08 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:08 | 200 | 18.161µs | 127.0.0.1 | HEAD "/" Oct 27 23:47:08 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:08 | 404 | 37.82µs | 127.0.0.1 | POST "/api/show" Oct 27 23:47:08 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:08 | 400 | 226.35µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:47:17 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:17 | 200 | 19.667µs | 127.0.0.1 | HEAD "/" Oct 27 23:47:17 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:17 | 404 | 50.438µs | 127.0.0.1 | POST "/api/show" Oct 27 23:47:17 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:17 | 400 | 261.805µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:47:24 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:24 | 200 | 16.313µs | 127.0.0.1 | HEAD "/" Oct 27 23:47:24 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:24 | 404 | 45.966µs | 127.0.0.1 | POST "/api/show" Oct 27 23:47:24 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:24 | 400 | 247.012µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:47:38 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:38 | 200 | 22.788µs | 127.0.0.1 | HEAD "/" Oct 27 23:47:38 brett ollama[1210]: [GIN] 2024/10/27 - 23:47:38 | 400 | 306.724µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:49:28 brett systemd[1]: Stopping Ollama Service... Oct 27 23:49:28 brett systemd[1]: ollama.service: Succeeded. Oct 27 23:49:28 brett systemd[1]: Stopped Ollama Service. Oct 27 23:49:28 brett systemd[1]: Started Ollama Service. Oct 27 23:49:28 brett ollama[2954]: 2024/10/27 23:49:28 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.482-05:00 level=INFO source=images.go:754 msg="total blobs: 10" Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.483-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF" Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.483-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)" Oct 27 23:49:28 brett ollama[2954]: time=2024-10-27T23:49:28.483-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2690884310/runners Oct 27 23:49:37 brett ollama[2954]: time=2024-10-27T23:49:37.979-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu]" Oct 27 23:49:37 brett ollama[2954]: time=2024-10-27T23:49:37.979-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 27 23:49:38 brett ollama[2954]: time=2024-10-27T23:49:38.118-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 27 23:49:38 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:38 | 200 | 49.64µs | 127.0.0.1 | HEAD "/" Oct 27 23:49:38 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:38 | 404 | 106.443µs | 127.0.0.1 | POST "/api/show" Oct 27 23:49:38 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:38 | 400 | 227.448µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:49:50 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:50 | 200 | 18.509µs | 127.0.0.1 | HEAD "/" Oct 27 23:49:50 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:50 | 200 | 55.612435ms | 127.0.0.1 | POST "/api/show" Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.362-05:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 gpu=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 parallel=4 available=11552161792 required="2.5 GiB" Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.443-05:00 level=INFO source=server.go:105 msg="system memory" total="31.3 GiB" free="29.7 GiB" free_swap="2.0 GiB" Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.443-05:00 level=INFO source=memory.go:326 msg="offload to cuda" layers.requested=-1 layers.model=17 layers.offload=17 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.5 GiB" memory.required.partial="2.5 GiB" memory.required.kv="256.0 MiB" memory.required.allocations="[2.5 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="976.1 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB" Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama2690884310/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 17 --threads 4 --parallel 4 --port 46533" Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=server.go:587 msg="waiting for llama runner to start responding" Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.445-05:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error" Oct 27 23:49:50 brett ollama[3022]: INFO [main] starting c++ runner | tid="139693392822272" timestamp=1730090990 Oct 27 23:49:50 brett ollama[3022]: INFO [main] build info | build=10 commit="3a8c75e" tid="139693392822272" timestamp=1730090990 Oct 27 23:49:50 brett ollama[3022]: INFO [main] system info | n_threads=4 n_threads_batch=4 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139693392822272" timestamp=1730090990 total_threads=4 Oct 27 23:49:50 brett ollama[3022]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="46533" tid="139693392822272" timestamp=1730090990 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 0: general.architecture str = llama Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 1: general.type str = model Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 3: general.finetune str = Instruct Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 5: general.size_label str = 1B Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 18: general.file_type u32 = 7 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - type f32: 34 tensors Oct 27 23:49:50 brett ollama[2954]: llama_model_loader: - type q8_0: 113 tensors Oct 27 23:49:50 brett ollama[2954]: time=2024-10-27T23:49:50.696-05:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model" Oct 27 23:49:50 brett ollama[2954]: llm_load_vocab: special tokens cache size = 256 Oct 27 23:49:51 brett ollama[2954]: llm_load_vocab: token to piece cache size = 0.7999 MB Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: format = GGUF V3 (latest) Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: arch = llama Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: vocab type = BPE Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_vocab = 128256 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_merges = 280147 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: vocab_only = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_ctx_train = 131072 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd = 2048 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_layer = 16 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_head = 32 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_head_kv = 8 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_rot = 64 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_swa = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_head_k = 64 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_head_v = 64 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_gqa = 4 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_k_gqa = 512 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_embd_v_gqa = 512 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_norm_eps = 0.0e+00 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: f_logit_scale = 0.0e+00 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_ff = 8192 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_expert = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_expert_used = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: causal attn = 1 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: pooling type = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: rope type = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: rope scaling = linear Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: freq_base_train = 500000.0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: freq_scale_train = 1 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: rope_finetuned = unknown Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_d_conv = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_d_inner = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_d_state = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_dt_rank = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model type = 1B Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model ftype = Q8_0 Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model params = 1.24 B Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: general.name = Llama 3.2 1B Instruct Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: LF token = 128 'Ä' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOM token = 128008 '<|eom_id|>' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOG token = 128008 '<|eom_id|>' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: EOG token = 128009 '<|eot_id|>' Oct 27 23:49:51 brett ollama[2954]: llm_load_print_meta: max token length = 256 Oct 27 23:49:51 brett ollama[2954]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Oct 27 23:49:51 brett ollama[2954]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Oct 27 23:49:51 brett ollama[2954]: ggml_cuda_init: found 1 CUDA devices: Oct 27 23:49:51 brett ollama[2954]: Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes Oct 27 23:49:51 brett ollama[2954]: llm_load_tensors: ggml ctx size = 0.14 MiB Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: offloading 16 repeating layers to GPU Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: offloading non-repeating layers to GPU Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: offloaded 17/17 layers to GPU Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: CPU buffer size = 266.16 MiB Oct 27 23:49:54 brett ollama[2954]: llm_load_tensors: CUDA0 buffer size = 1252.42 MiB Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: n_ctx = 8192 Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: n_batch = 512 Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: n_ubatch = 512 Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: flash_attn = 0 Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: freq_base = 500000.0 Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: freq_scale = 1 Oct 27 23:49:54 brett ollama[2954]: llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: CUDA_Host output buffer size = 1.99 MiB Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: CUDA0 compute buffer size = 544.00 MiB Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: CUDA_Host compute buffer size = 20.01 MiB Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: graph nodes = 518 Oct 27 23:49:54 brett ollama[2954]: llama_new_context_with_model: graph splits = 2 Oct 27 23:49:54 brett ollama[3022]: INFO [main] model loaded | tid="139693392822272" timestamp=1730090994 Oct 27 23:49:54 brett ollama[2954]: time=2024-10-27T23:49:54.467-05:00 level=INFO source=server.go:626 msg="llama runner started in 4.02 seconds" Oct 27 23:49:54 brett ollama[2954]: [GIN] 2024/10/27 - 23:49:54 | 200 | 4.257026632s | 127.0.0.1 | POST "/api/generate" Oct 27 23:50:01 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:01 | 200 | 17.756µs | 127.0.0.1 | HEAD "/" Oct 27 23:50:01 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:01 | 404 | 50.26µs | 127.0.0.1 | POST "/api/show" Oct 27 23:50:01 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:01 | 400 | 243.5µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:50:07 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:07 | 200 | 16.663µs | 127.0.0.1 | HEAD "/" Oct 27 23:50:07 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:07 | 200 | 18.372207ms | 127.0.0.1 | POST "/api/show" Oct 27 23:50:07 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:07 | 200 | 17.780019ms | 127.0.0.1 | POST "/api/generate" Oct 27 23:50:20 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:20 | 200 | 17.006µs | 127.0.0.1 | HEAD "/" Oct 27 23:50:20 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:20 | 404 | 49.617µs | 127.0.0.1 | POST "/api/show" Oct 27 23:50:20 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:20 | 400 | 280.05µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:50:28 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:28 | 200 | 18.864µs | 127.0.0.1 | HEAD "/" Oct 27 23:50:28 brett ollama[2954]: [GIN] 2024/10/27 - 23:50:28 | 400 | 273.697µs | 127.0.0.1 | POST "/api/pull" Oct 27 23:52:52 brett systemd[1]: Stopping Ollama Service... Oct 27 23:52:53 brett systemd[1]: ollama.service: Succeeded. Oct 27 23:52:53 brett systemd[1]: Stopped Ollama Service. -- Reboot -- Oct 28 00:04:27 brett systemd[1]: Started Ollama Service. Oct 28 00:04:27 brett ollama[2832]: 2024/10/28 00:04:27 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.622-05:00 level=INFO source=images.go:754 msg="total blobs: 10" Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.651-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF" Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.652-05:00 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.14)" Oct 28 00:04:27 brett ollama[2832]: time=2024-10-28T00:04:27.653-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama3229709321/runners Oct 28 00:04:37 brett ollama[2832]: time=2024-10-28T00:04:37.194-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx]" Oct 28 00:04:37 brett ollama[2832]: time=2024-10-28T00:04:37.194-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 28 00:04:38 brett ollama[2832]: time=2024-10-28T00:04:38.822-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 28 00:04:39 brett ollama[2832]: [GIN] 2024/10/28 - 00:04:39 | 200 | 52.965µs | 127.0.0.1 | HEAD "/" Oct 28 00:04:39 brett ollama[2832]: [GIN] 2024/10/28 - 00:04:39 | 404 | 124.537µs | 127.0.0.1 | POST "/api/show" Oct 28 00:04:39 brett ollama[2832]: [GIN] 2024/10/28 - 00:04:39 | 400 | 227.166µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:05:22 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:22 | 200 | 29.093µs | 127.0.0.1 | HEAD "/" Oct 28 00:05:22 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:22 | 404 | 58.268µs | 127.0.0.1 | POST "/api/show" Oct 28 00:05:22 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:22 | 400 | 229.23µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:05:29 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:29 | 200 | 20.293µs | 127.0.0.1 | HEAD "/" Oct 28 00:05:29 brett ollama[2832]: [GIN] 2024/10/28 - 00:05:29 | 400 | 279.554µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:10:00 brett systemd[1]: Stopping Ollama Service... Oct 28 00:10:00 brett systemd[1]: ollama.service: Succeeded. Oct 28 00:10:00 brett systemd[1]: Stopped Ollama Service. Oct 28 00:10:00 brett systemd[1]: Started Ollama Service. Oct 28 00:10:00 brett ollama[3082]: 2024/10/28 00:10:00 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.490-05:00 level=INFO source=images.go:754 msg="total blobs: 10" Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.490-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF" Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.490-05:00 level=INFO source=routes.go:1205 msg="Listening on [::]:11434 (version 0.3.14)" Oct 28 00:10:00 brett ollama[3082]: time=2024-10-28T00:10:00.491-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama175581295/runners Oct 28 00:10:10 brett ollama[3082]: time=2024-10-28T00:10:10.011-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx cpu_avx2]" Oct 28 00:10:10 brett ollama[3082]: time=2024-10-28T00:10:10.011-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 28 00:10:10 brett ollama[3082]: time=2024-10-28T00:10:10.149-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 28 00:10:13 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:13 | 200 | 34.452µs | 127.0.0.1 | HEAD "/" Oct 28 00:10:13 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:13 | 404 | 125.382µs | 127.0.0.1 | POST "/api/show" Oct 28 00:10:13 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:13 | 400 | 232.582µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:10:33 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:33 | 404 | 125.819µs | 127.0.0.1 | POST "/api/generate" Oct 28 00:10:53 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:53 | 200 | 16.439µs | 127.0.0.1 | HEAD "/" Oct 28 00:10:53 brett ollama[3082]: [GIN] 2024/10/28 - 00:10:53 | 400 | 251.805µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:12:50 brett systemd[1]: /etc/systemd/system/ollama.service:12: Invalid environment assignment, ignoring: “OLLAMA_DEBUG=“1” Oct 28 00:13:15 brett systemd[1]: Stopping Ollama Service... Oct 28 00:13:15 brett systemd[1]: ollama.service: Succeeded. Oct 28 00:13:15 brett systemd[1]: Stopped Ollama Service. Oct 28 00:13:15 brett systemd[1]: Started Ollama Service. Oct 28 00:13:15 brett ollama[3208]: 2024/10/28 00:13:15 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=INFO source=images.go:754 msg="total blobs: 10" Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/phi3:latest: EOF" Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=INFO source=routes.go:1205 msg="Listening on [::]:11434 (version 0.3.14)" Oct 28 00:13:15 brett ollama[3208]: time=2024-10-28T00:13:15.901-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama996491480/runners Oct 28 00:13:25 brett ollama[3208]: time=2024-10-28T00:13:25.415-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]" Oct 28 00:13:25 brett ollama[3208]: time=2024-10-28T00:13:25.416-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 28 00:13:25 brett ollama[3208]: time=2024-10-28T00:13:25.554-05:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-d7fc4e85-d6b0-19c0-0feb-e70772adb097 library=cuda variant=v12 compute=6.1 driver=12.5 name="NVIDIA GeForce GTX 1080 Ti" total="10.9 GiB" available="10.8 GiB" Oct 28 00:13:25 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:25 | 200 | 26.804µs | 127.0.0.1 | HEAD "/" Oct 28 00:13:25 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:25 | 400 | 326.32µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:13:49 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:49 | 200 | 16.045µs | 127.0.0.1 | HEAD "/" Oct 28 00:13:49 brett ollama[3208]: [GIN] 2024/10/28 - 00:13:49 | 400 | 277.728µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:14:19 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:19 | 200 | 16.386µs | 127.0.0.1 | HEAD "/" Oct 28 00:14:19 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:19 | 400 | 257.092µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:14:59 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:59 | 200 | 16.534µs | 127.0.0.1 | HEAD "/" Oct 28 00:14:59 brett ollama[3208]: [GIN] 2024/10/28 - 00:14:59 | 400 | 270.238µs | 127.0.0.1 | POST "/api/pull" Oct 28 00:16:37 brett ollama[3208]: [GIN] 2024/10/28 - 00:16:37 | 200 | 42.386µs | 127.0.0.1 | GET "/api/version" Oct 28 22:11:41 brett ollama[3208]: [GIN] 2024/10/28 - 22:11:41 | 200 | 17.622µs | 127.0.0.1 | HEAD "/" Oct 28 22:11:41 brett ollama[3208]: [GIN] 2024/10/28 - 22:11:41 | 404 | 92.283µs | 127.0.0.1 | POST "/api/show" Oct 28 22:11:41 brett ollama[3208]: [GIN] 2024/10/28 - 22:11:41 | 400 | 249.791µs | 127.0.0.1 | POST "/api/pull" (base) brett@brett:~$ ```
Author
Owner

@rick-github commented on GitHub (Oct 29, 2024):

Please wrap your logs in a markdown block , three backticks (```) on a line at the start and again at the end.

<!-- gh-comment-id:2443115430 --> @rick-github commented on GitHub (Oct 29, 2024): Please wrap your logs in a markdown block , three backticks (\`\`\`) on a line at the start and again at the end.
Author
Owner

@rick-github commented on GitHub (Oct 29, 2024):

Do you have a proxy or a captive portal between your machine and the internet? What's the output when you run:

curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest
<!-- gh-comment-id:2443123696 --> @rick-github commented on GitHub (Oct 29, 2024): Do you have a proxy or a captive portal between your machine and the internet? What's the output when you run: ``` curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest ```
Author
Owner

@bdytx5 commented on GitHub (Oct 29, 2024):

(base) brett@brett:~$ curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest
HTTP/1.1 200 OK
Date: Tue, 29 Oct 2024 03:40:27 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 1006
Connection: keep-alive
via: 1.1 google
alt-svc: h3=":443"; ma=86400
cf-cache-status: DYNAMIC
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=35vTU6tEAwoD85%2B%2FyFCatK1AO%2BrVX9uLhO%2FQjyEZxJnjo0lv0k92hvATA2jVz6jWy3XI8DiFQe67jOPOZA14X%2FHmjP3Oco7knf2PuJtUNYfKwuP5d9k%2FhxLeuoNG5Kdfim97Foc%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 8da024f048185dbb-OKC
server-timing: cfL4;desc="?proto=TCP&rtt=10984&sent=4&recv=7&lost=0&retrans=0&sent_bytes=2833&recv_bytes=737&delivery_rate=252484&cwnd=251&unsent_bytes=0&cid=0e8ef6855e58c816&ts=368&x=0"

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"digest":"sha256:34bb5ab01051a11372a91f95f3fbbc51173eed8e7f13ec395b9ae9b8bd0e242b","mediaType":"application/vnd.docker.container.image.v1+json","size":561},"layers":[{"digest":"sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff","mediaType":"application/vnd.ollama.image.model","size":2019377376},{"digest":"sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396","mediaType":"application/vnd.ollama.image.template","size":1429},{"digest":"sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d","mediaType":"application/vnd.ollama.image.license","size":7711},{"digest":"sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd","mediaType":"application/vnd.ollama.image.license","size":6016},{"digest":"sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb","mediaType":"application/vnd.ollama.image.params","size":96}]}

<!-- gh-comment-id:2443135676 --> @bdytx5 commented on GitHub (Oct 29, 2024): ``` (base) brett@brett:~$ curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest HTTP/1.1 200 OK Date: Tue, 29 Oct 2024 03:40:27 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 1006 Connection: keep-alive via: 1.1 google alt-svc: h3=":443"; ma=86400 cf-cache-status: DYNAMIC Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=35vTU6tEAwoD85%2B%2FyFCatK1AO%2BrVX9uLhO%2FQjyEZxJnjo0lv0k92hvATA2jVz6jWy3XI8DiFQe67jOPOZA14X%2FHmjP3Oco7knf2PuJtUNYfKwuP5d9k%2FhxLeuoNG5Kdfim97Foc%3D"}],"group":"cf-nel","max_age":604800} NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800} Server: cloudflare CF-RAY: 8da024f048185dbb-OKC server-timing: cfL4;desc="?proto=TCP&rtt=10984&sent=4&recv=7&lost=0&retrans=0&sent_bytes=2833&recv_bytes=737&delivery_rate=252484&cwnd=251&unsent_bytes=0&cid=0e8ef6855e58c816&ts=368&x=0" {"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"digest":"sha256:34bb5ab01051a11372a91f95f3fbbc51173eed8e7f13ec395b9ae9b8bd0e242b","mediaType":"application/vnd.docker.container.image.v1+json","size":561},"layers":[{"digest":"sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff","mediaType":"application/vnd.ollama.image.model","size":2019377376},{"digest":"sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396","mediaType":"application/vnd.ollama.image.template","size":1429},{"digest":"sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d","mediaType":"application/vnd.ollama.image.license","size":7711},{"digest":"sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd","mediaType":"application/vnd.ollama.image.license","size":6016},{"digest":"sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb","mediaType":"application/vnd.ollama.image.params","size":96}]} ```
Author
Owner

@bdytx5 commented on GitHub (Oct 29, 2024):

I'm not aware of a proxy or a captive portal

<!-- gh-comment-id:2443135877 --> @bdytx5 commented on GitHub (Oct 29, 2024): I'm not aware of a proxy or a captive portal
Author
Owner

@rick-github commented on GitHub (Oct 29, 2024):

Do you get the same result for:

curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest --noproxy '*'
<!-- gh-comment-id:2443146273 --> @rick-github commented on GitHub (Oct 29, 2024): Do you get the same result for: ``` curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest --noproxy '*' ```
Author
Owner

@bdytx5 commented on GitHub (Oct 29, 2024):

(base) brett@brett:~$ curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest --noproxy '*'
HTTP/1.1 200 OK
Date: Tue, 29 Oct 2024 04:28:11 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 1006
Connection: keep-alive
via: 1.1 google
alt-svc: h3=":443"; ma=86400
cf-cache-status: DYNAMIC
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=d%2B4ASF34%2Faetdm9%2BTs%2B8BxbSEfy%2Bqrn4ZxHA0XY2o2i%2F2edhQfmix4ikknPCgUOA8soCwDLCTun%2F8UD1dfs05%2B5HWMKAwMrh1opN7u8XkwVap7wfyv%2F%2B95EmjrQIWYEMz2fCtyk%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 8da06adcfd9c5dbd-OKC
server-timing: cfL4;desc="?proto=TCP&rtt=10975&sent=4&recv=7&lost=0&retrans=0&sent_bytes=2833&recv_bytes=737&delivery_rate=259777&cwnd=251&unsent_bytes=0&cid=e0fa34b6a0c43b03&ts=293&x=0"

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"digest":"sha256:34bb5ab01051a11372a91f95f3fbbc51173eed8e7f13ec395b9ae9b8bd0e242b","mediaType":"application/vnd.docker.container.image.v1+json","size":561},"layers":[{"digest":"sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff","mediaType":"application/vnd.ollama.image.model","size":2019377376},{"digest":"sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396","mediaType":"application/vnd.ollama.image.template","size":1429},{"digest":"sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d","mediaType":"application/vnd.ollama.image.license","size":7711},{"digest":"sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd","mediaType":"application/vnd.ollama.image.license","size":6016},{"digest":"sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb","mediaType":"application/vnd.ollama.image.params","size":96}]}

<!-- gh-comment-id:2443178811 --> @bdytx5 commented on GitHub (Oct 29, 2024): ``` (base) brett@brett:~$ curl -D - https://registry.ollama.ai/v2/library/llama3.2/manifests/latest --noproxy '*' HTTP/1.1 200 OK Date: Tue, 29 Oct 2024 04:28:11 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 1006 Connection: keep-alive via: 1.1 google alt-svc: h3=":443"; ma=86400 cf-cache-status: DYNAMIC Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=d%2B4ASF34%2Faetdm9%2BTs%2B8BxbSEfy%2Bqrn4ZxHA0XY2o2i%2F2edhQfmix4ikknPCgUOA8soCwDLCTun%2F8UD1dfs05%2B5HWMKAwMrh1opN7u8XkwVap7wfyv%2F%2B95EmjrQIWYEMz2fCtyk%3D"}],"group":"cf-nel","max_age":604800} NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800} Server: cloudflare CF-RAY: 8da06adcfd9c5dbd-OKC server-timing: cfL4;desc="?proto=TCP&rtt=10975&sent=4&recv=7&lost=0&retrans=0&sent_bytes=2833&recv_bytes=737&delivery_rate=259777&cwnd=251&unsent_bytes=0&cid=e0fa34b6a0c43b03&ts=293&x=0" {"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"digest":"sha256:34bb5ab01051a11372a91f95f3fbbc51173eed8e7f13ec395b9ae9b8bd0e242b","mediaType":"application/vnd.docker.container.image.v1+json","size":561},"layers":[{"digest":"sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff","mediaType":"application/vnd.ollama.image.model","size":2019377376},{"digest":"sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396","mediaType":"application/vnd.ollama.image.template","size":1429},{"digest":"sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d","mediaType":"application/vnd.ollama.image.license","size":7711},{"digest":"sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd","mediaType":"application/vnd.ollama.image.license","size":6016},{"digest":"sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb","mediaType":"application/vnd.ollama.image.params","size":96}]} ```
Author
Owner

@rick-github commented on GitHub (Oct 29, 2024):

It looks like you pulled llama3.2:1b-instruct-q8_0 on Oct 27 03:46:31. That was with v0.1.36, and you upgraded to v0.3,14 on Oct 27 23:44:29, and there hasn't been a successful pull since.

I think this is a separate issue from the Error: registry.ollama.ai/library/phi3:latest: EOF error. That is just local cleanup, and I suspect that the manifest file in /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest is corrupted or zero length. What's the output of:

ls -l /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest
jq . /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest

Back to the failed pulls: what's the exact command you type, and what's the response printed by ollama pull?

<!-- gh-comment-id:2444744117 --> @rick-github commented on GitHub (Oct 29, 2024): It looks like you pulled llama3.2:1b-instruct-q8_0 on Oct 27 03:46:31. That was with v0.1.36, and you upgraded to v0.3,14 on Oct 27 23:44:29, and there hasn't been a successful pull since. I think this is a separate issue from the `Error: registry.ollama.ai/library/phi3:latest: EOF` error. That is just local cleanup, and I suspect that the manifest file in /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest is corrupted or zero length. What's the output of: ``` ls -l /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest jq . /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest ``` Back to the failed pulls: what's the exact command you type, and what's the response printed by `ollama pull`?
Author
Owner

@bdytx5 commented on GitHub (Oct 30, 2024):

just any ollama pull command, and gives me the EOF error


(base) brett@brett:~$ ls -l /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest
-rw-r--r-- 1 ollama ollama 0 May 11 15:57 /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest
(base) brett@brett:~$ 
(base) brett@brett:~$ jq . /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest
(base) brett@brett:~$ 

(base) brett@brett:~$ ollama pull llama3.1

Error: registry.ollama.ai/library/phi3:latest: EOF
(base) brett@brett:~$ ollama pull llama3.2

Error: registry.ollama.ai/library/phi3:latest: EOF
(base) brett@brett:~$ ollama pull phi3

Error: registry.ollama.ai/library/phi3:latest: EOF
(base) brett@brett:~$ 
<!-- gh-comment-id:2445857118 --> @bdytx5 commented on GitHub (Oct 30, 2024): just any ollama pull command, and gives me the EOF error ``` (base) brett@brett:~$ ls -l /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest -rw-r--r-- 1 ollama ollama 0 May 11 15:57 /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest (base) brett@brett:~$ (base) brett@brett:~$ jq . /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest (base) brett@brett:~$ ``` ``` (base) brett@brett:~$ ollama pull llama3.1 Error: registry.ollama.ai/library/phi3:latest: EOF (base) brett@brett:~$ ollama pull llama3.2 Error: registry.ollama.ai/library/phi3:latest: EOF (base) brett@brett:~$ ollama pull phi3 Error: registry.ollama.ai/library/phi3:latest: EOF (base) brett@brett:~$ ```
Author
Owner

@rick-github commented on GitHub (Oct 30, 2024):

sudo rm /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest
<!-- gh-comment-id:2446935589 --> @rick-github commented on GitHub (Oct 30, 2024): ``` sudo rm /usr/share/ollama/.ollama/models/manifests/registry.ollama.ai/library/phi3/latest ````
Author
Owner

@bdytx5 commented on GitHub (Oct 31, 2024):

you solved my issue. keep up the the good stuff, thanks

<!-- gh-comment-id:2449516502 --> @bdytx5 commented on GitHub (Oct 31, 2024): you solved my issue. keep up the the good stuff, thanks
Author
Owner

@dhiltgen commented on GitHub (Nov 5, 2024):

I've reproduced this on a test system, and it does not appear to be a networking problem, but a local manifest that is somehow corrupted, and the current code doesn't handle this scenario well. What's unfortunate is pulling unrelated models fail when the system gets into this state:

% ollama pull x/llama3.2-vision

Error: registry.ollama.ai/library/llava:7b: EOF

Server log shows:

time=2024-11-05T16:53:28.266Z level=INFO source=images.go:754 msg="total blobs: 32"
time=2024-11-05T16:53:28.266Z level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/llava:7b: EOF"
...
[GIN] 2024/11/05 - 16:54:18 | 400 |    1.446558ms | 100.126.204.152 | POST     "/api/pull"

Listing also fails

% ollama ls
Error: registry.ollama.ai/library/llava:7b: EOF
[GIN] 2024/11/05 - 16:55:57 | 500 |     962.279µs | 100.126.204.152 | GET      "/api/tags"
<!-- gh-comment-id:2457699893 --> @dhiltgen commented on GitHub (Nov 5, 2024): I've reproduced this on a test system, and it does not appear to be a networking problem, but a local manifest that is somehow corrupted, and the current code doesn't handle this scenario well. What's unfortunate is pulling unrelated models fail when the system gets into this state: ``` % ollama pull x/llama3.2-vision Error: registry.ollama.ai/library/llava:7b: EOF ``` Server log shows: ``` time=2024-11-05T16:53:28.266Z level=INFO source=images.go:754 msg="total blobs: 32" time=2024-11-05T16:53:28.266Z level=ERROR source=images.go:757 msg="couldn't remove unused layers: registry.ollama.ai/library/llava:7b: EOF" ... [GIN] 2024/11/05 - 16:54:18 | 400 | 1.446558ms | 100.126.204.152 | POST "/api/pull" ``` Listing also fails ``` % ollama ls Error: registry.ollama.ai/library/llava:7b: EOF ``` ``` [GIN] 2024/11/05 - 16:55:57 | 500 | 962.279µs | 100.126.204.152 | GET "/api/tags" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51211