[GH-ISSUE #6408] 404 POST "/api/chat" #29786

Closed
opened 2026-04-22 09:01:18 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @turndown on GitHub (Aug 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6408

What is the issue?

At first, it started running normally, but after a while, it reported 404,and can‘t run any model.
Can you help me solve it?Thx.

install by:curl -fsSL https://ollama.com/install.sh

log below:
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: LF token = 148848 'ÄĬ'
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: EOT token = 151643 '<|endoftext|>'
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: max token length = 256
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: found 1 CUDA devices:
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: ggml ctx size = 0.30 MiB
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading 28 repeating layers to GPU
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading non-repeating layers to GPU
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloaded 29/29 layers to GPU
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CPU buffer size = 292.36 MiB
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CUDA0 buffer size = 3928.07 MiB
Aug 19 10:26:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:01 | 404 | 185.499µs | ::1 | POST "/api/chat"
Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 1.273346ms | 172.17.0.2 | GET "/api/tags"
Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 88.559µs | 172.17.0.2 | GET "/api/vers>
Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 207.009µs | 127.0.0.1 | HEAD "/"
Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 1.100698ms | 127.0.0.1 | GET "/api/tags"
Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 46.933µs | 127.0.0.1 | HEAD "/"
Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 23.522263ms | 127.0.0.1 | POST "/api/show"
Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.502+08:00 level=INFO source=server.go:627 msg="waiting for serve>
Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.780+08:00 level=INFO source=server.go:627 msg="waiting for serve>
Aug 19 10:27:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:27:01 | 404 | 7.051455ms | ::1 | POST "/api/chat"
Aug 19 10:28:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:01 | 404 | 367.924µs | ::1 | POST "/api/chat"
Aug 19 10:28:55 ecs-lcdsj systemd[1]: Stopping Ollama Service...
Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.817+08:00 level=WARN source=server.go:600 msg="client connection>
Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.818+08:00 level=ERROR source=sched.go:451 msg="error loading lla>
Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:55 | 499 | 3m0s | 172.17.0.2 | POST "/api/chat"
Aug 19 10:28:56 ecs-lcdsj systemd[1]: ollama.service: Succeeded.
Aug 19 10:28:56 ecs-lcdsj systemd[1]: Stopped Ollama Service.
Aug 19 10:28:56 ecs-lcdsj systemd[1]: Started Ollama Service.
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: 2024/08/19 10:28:56 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU>
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.246+08:00 level=INFO source=images.go:782 msg="total blobs: 15"
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=images.go:790 msg="total unused blob>
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=routes.go:1172 msg="Listening on [::>
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.250+08:00 level=INFO source=payload.go:30 msg="extracting embedd>
Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.035+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libra>
Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.037+08:00 level=INFO source=gpu.go:204 msg="looking for compatib>
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.605+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:29:10 | 404 | 13.419583ms | ::1 | POST "/api/chat"
Aug 19 10:30:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:30:01 | 404 | 990.349µs | ::1 | POST "/api/chat"
Aug 19 10:31:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:31:01 | 404 | 224.61µs | ::1 | POST "/api/chat"
Aug 19 10:32:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:01 | 404 | 15.250541ms | ::1 | POST "/api/chat"
Aug 19 10:32:27 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:27 | 200 | 46.654µs | 127.0.0.1 | GET "/api/vers>
Aug 19 10:33:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:33:01 | 404 | 959.34µs | ::1 | POST "/api/chat"
Aug 19 10:34:02 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:34:02 | 404 | 18.592866ms | ::1 | POST "/api/chat"
Aug 19 10:35:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:35:01 | 404 | 284.394µs | ::1 | POST "/api/chat"

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.3.6

Originally created by @turndown on GitHub (Aug 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6408 ### What is the issue? At first, it started running normally, but after a while, it reported 404,and can‘t run any model. Can you help me solve it?Thx. install by:curl -fsSL https://ollama.com/install.sh **log below:** Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: LF token = 148848 'ÄĬ' Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: EOT token = 151643 '<|endoftext|>' Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: max token length = 256 Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: found 1 CUDA devices: Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: ggml ctx size = 0.30 MiB Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading 28 repeating layers to GPU Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading non-repeating layers to GPU Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloaded 29/29 layers to GPU Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CPU buffer size = 292.36 MiB Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CUDA0 buffer size = 3928.07 MiB Aug 19 10:26:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:01 | 404 | 185.499µs | ::1 | POST "/api/chat" Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 1.273346ms | 172.17.0.2 | GET "/api/tags" Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 88.559µs | 172.17.0.2 | GET "/api/vers> Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 207.009µs | 127.0.0.1 | HEAD "/" Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 1.100698ms | 127.0.0.1 | GET "/api/tags" Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 46.933µs | 127.0.0.1 | HEAD "/" Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 23.522263ms | 127.0.0.1 | POST "/api/show" Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.502+08:00 level=INFO source=server.go:627 msg="waiting for serve> Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.780+08:00 level=INFO source=server.go:627 msg="waiting for serve> Aug 19 10:27:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:27:01 | 404 | 7.051455ms | ::1 | POST "/api/chat" Aug 19 10:28:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:01 | 404 | 367.924µs | ::1 | POST "/api/chat" Aug 19 10:28:55 ecs-lcdsj systemd[1]: Stopping Ollama Service... Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.817+08:00 level=WARN source=server.go:600 msg="client connection> Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.818+08:00 level=ERROR source=sched.go:451 msg="error loading lla> Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:55 | 499 | 3m0s | 172.17.0.2 | POST "/api/chat" Aug 19 10:28:56 ecs-lcdsj systemd[1]: ollama.service: Succeeded. Aug 19 10:28:56 ecs-lcdsj systemd[1]: Stopped Ollama Service. Aug 19 10:28:56 ecs-lcdsj systemd[1]: Started Ollama Service. Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: 2024/08/19 10:28:56 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU> Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.246+08:00 level=INFO source=images.go:782 msg="total blobs: 15" Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=images.go:790 msg="total unused blob> Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=routes.go:1172 msg="Listening on [::> Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.250+08:00 level=INFO source=payload.go:30 msg="extracting embedd> Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.035+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libra> Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.037+08:00 level=INFO source=gpu.go:204 msg="looking for compatib> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.605+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:29:10 | 404 | 13.419583ms | ::1 | POST "/api/chat" Aug 19 10:30:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:30:01 | 404 | 990.349µs | ::1 | POST "/api/chat" Aug 19 10:31:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:31:01 | 404 | 224.61µs | ::1 | POST "/api/chat" Aug 19 10:32:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:01 | 404 | 15.250541ms | ::1 | POST "/api/chat" Aug 19 10:32:27 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:27 | 200 | 46.654µs | 127.0.0.1 | GET "/api/vers> Aug 19 10:33:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:33:01 | 404 | 959.34µs | ::1 | POST "/api/chat" Aug 19 10:34:02 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:34:02 | 404 | 18.592866ms | ::1 | POST "/api/chat" Aug 19 10:35:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:35:01 | 404 | 284.394µs | ::1 | POST "/api/chat" ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.3.6
GiteaMirror added the needs more infobug labels 2026-04-22 09:01:19 -05:00
Author
Owner

@turndown commented on GitHub (Aug 19, 2024):

I think the key issue here is that the model cannot be loaded, but it can be searched locally
image

0.560+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server not responding"
0.855+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model"
9:01 | 404 | 8.989131ms | ::1 | POST "/api/chat"
0:01 | 404 | 191.365µs | ::1 | POST "/api/chat"

<!-- gh-comment-id:2295584514 --> @turndown commented on GitHub (Aug 19, 2024): I think the key issue here is that the model cannot be loaded, but it can be searched locally ![image](https://github.com/user-attachments/assets/d4864ed4-d2b8-4976-b290-2d001e0f3ad4) 0.560+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server not responding" 0.855+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model" 9:01 | 404 | 8.989131ms | ::1 | POST "/api/chat" 0:01 | 404 | 191.365µs | ::1 | POST "/api/chat"
Author
Owner

@turndown commented on GitHub (Aug 19, 2024):

I try to print some debug detail log, it's just report [GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat"
(base) [root@ecs-lcdsj ~]# OLLAMA_DEBUG=1 ollama serve 2>&1 | tee server.log
2024/08/19 14:09:56 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:782 msg="total blobs: 0"
time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-19T14:09:56.520+08:00 level=INFO source=routes.go:1170 msg="Listening on 127.0.0.1:11434 (version 0.3.5)"
time=2024-08-19T14:09:56.522+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama565513732/runners
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx2/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cuda_v11/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/rocm_v60102/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]"
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-08-19T14:10:01.179+08:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so

time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/cuda-12.5/lib64/libcuda.so** /root/libcuda.so** /usr/local/cuda*/targets//lib/libcuda.so /usr/lib/-linux-gnu/nvidia/current/libcuda.so /usr/lib/-linux-gnu/libcuda.so /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers//libcuda.so /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-08-19T14:10:01.187+08:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.550.90.07 /usr/lib64/libcuda.so.550.90.07]"
library /usr/lib/libcuda.so.550.90.07 load err: /usr/lib/libcuda.so.550.90.07: wrong ELF class: ELFCLASS32
time=2024-08-19T14:10:01.188+08:00 level=DEBUG source=gpu.go:566 msg="skipping 32bit library" library=/usr/lib/libcuda.so.550.90.07
CUDA driver version: 12.4
time=2024-08-19T14:10:01.548+08:00 level=DEBUG source=gpu.go:123 msg="detected GPUs" count=4 library=/usr/lib64/libcuda.so.550.90.07
[GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA totalMem 40326 mb
[GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA freeMem 38836 mb
[GPU-220df675-5d27-88e7-0958-f62f77a1e82a] Compute Capability 8.0
[GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA totalMem 40326 mb
[GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA freeMem 39903 mb
[GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] Compute Capability 8.0
[GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA totalMem 40326 mb
[GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA freeMem 39903 mb
[GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] Compute Capability 8.0
[GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA totalMem 40326 mb
[GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA freeMem 39903 mb
[GPU-7aabff4a-5756-eee1-b793-880410188e85] Compute Capability 8.0
time=2024-08-19T14:10:02.846+08:00 level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing cuda driver library
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-220df675-5d27-88e7-0958-f62f77a1e82a library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="37.9 GiB"
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB"
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB"
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-7aabff4a-5756-eee1-b793-880410188e85 library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB"
[GIN] 2024/08/19 - 14:10:02 | 404 | 2.731353ms | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/08/19 - 14:11:01 | 404 | 205.613µs | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/08/19 - 14:12:02 | 404 | 15.933031ms | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat"

<!-- gh-comment-id:2295746711 --> @turndown commented on GitHub (Aug 19, 2024): I try to print some debug detail log, it's just report [GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat" (base) [root@ecs-lcdsj ~]# OLLAMA_DEBUG=1 ollama serve 2>&1 | tee server.log 2024/08/19 14:09:56 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:782 msg="total blobs: 0" time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-19T14:09:56.520+08:00 level=INFO source=routes.go:1170 msg="Listening on 127.0.0.1:11434 (version 0.3.5)" time=2024-08-19T14:09:56.522+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama565513732/runners time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx2/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cuda_v11/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/rocm_v60102/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]" time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-08-19T14:10:01.179+08:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA" time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so* time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/cuda-12.5/lib64/libcuda.so** /root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-08-19T14:10:01.187+08:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.550.90.07 /usr/lib64/libcuda.so.550.90.07]" library /usr/lib/libcuda.so.550.90.07 load err: /usr/lib/libcuda.so.550.90.07: wrong ELF class: ELFCLASS32 time=2024-08-19T14:10:01.188+08:00 level=DEBUG source=gpu.go:566 msg="skipping 32bit library" library=/usr/lib/libcuda.so.550.90.07 CUDA driver version: 12.4 time=2024-08-19T14:10:01.548+08:00 level=DEBUG source=gpu.go:123 msg="detected GPUs" count=4 library=/usr/lib64/libcuda.so.550.90.07 [GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA totalMem 40326 mb [GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA freeMem 38836 mb [GPU-220df675-5d27-88e7-0958-f62f77a1e82a] Compute Capability 8.0 [GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA totalMem 40326 mb [GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA freeMem 39903 mb [GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] Compute Capability 8.0 [GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA totalMem 40326 mb [GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA freeMem 39903 mb [GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] Compute Capability 8.0 [GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA totalMem 40326 mb [GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA freeMem 39903 mb [GPU-7aabff4a-5756-eee1-b793-880410188e85] Compute Capability 8.0 time=2024-08-19T14:10:02.846+08:00 level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu" releasing cuda driver library time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-220df675-5d27-88e7-0958-f62f77a1e82a library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="37.9 GiB" time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB" time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB" time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-7aabff4a-5756-eee1-b793-880410188e85 library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB" [GIN] 2024/08/19 - 14:10:02 | 404 | 2.731353ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/19 - 14:11:01 | 404 | 205.613µs | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/19 - 14:12:02 | 404 | 15.933031ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat"
Author
Owner

@rick-github commented on GitHub (Aug 19, 2024):

What's in the body of the 404 response that the client receives?

<!-- gh-comment-id:2295830190 --> @rick-github commented on GitHub (Aug 19, 2024): What's in the body of the 404 response that the client receives?
Author
Owner

@turndown commented on GitHub (Aug 19, 2024):

What's in the body of the 404 response that the client receives?

Hi,do you mean this blow?I use openwebui to connect ollama, and it got no response.
But sometimes it worked fine,sometimes it happens when switching models.
I can't find the pattern of what happened.
Thx for your reply.
image
image

<!-- gh-comment-id:2296314378 --> @turndown commented on GitHub (Aug 19, 2024): > What's in the body of the 404 response that the client receives? Hi,do you mean this blow?I use openwebui to connect ollama, and it got no response. But sometimes it worked fine,sometimes it happens when switching models. I can't find the pattern of what happened. Thx for your reply. ![image](https://github.com/user-attachments/assets/5a3f596c-c015-40a0-a7cc-c3b13a4b1a11) ![image](https://github.com/user-attachments/assets/e4d9eb7d-de89-4d30-896f-9879e9d9794b)
Author
Owner

@rick-github commented on GitHub (Aug 19, 2024):

The most likely problem is that the request that is being sent to ollama has a bad model name:

$ curl -s -D - localhost:11434/api/chat -d '{"model":"unknown"}'
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=utf-8
Date: Mon, 19 Aug 2024 11:15:04 GMT
Content-Length: 61

{"error":"model \"unknown\" not found, try pulling it first"}

If you can get the contents of the 404 response that ollama sent, it will probably have information about why the request failed, whether a bad model name or some other reason.

The HTTP tracer extension won't help because that's looking at the traffic between the browser and the open-webui port, not between open-webui and ollama.

Run this and use open-webui, when an error occurs you should be able to find the error message in the packet trace:

sudo tcpdump -X -i lo port 11434
<!-- gh-comment-id:2296345499 --> @rick-github commented on GitHub (Aug 19, 2024): The most likely problem is that the request that is being sent to ollama has a bad model name: ``` $ curl -s -D - localhost:11434/api/chat -d '{"model":"unknown"}' HTTP/1.1 404 Not Found Content-Type: application/json; charset=utf-8 Date: Mon, 19 Aug 2024 11:15:04 GMT Content-Length: 61 {"error":"model \"unknown\" not found, try pulling it first"} ``` If you can get the contents of the 404 response that ollama sent, it will probably have information about why the request failed, whether a bad model name or some other reason. The HTTP tracer extension won't help because that's looking at the traffic between the browser and the open-webui port, not between open-webui and ollama. Run this and use open-webui, when an error occurs you should be able to find the error message in the packet trace: ``` sudo tcpdump -X -i lo port 11434 ```
Author
Owner

@turndown commented on GitHub (Aug 19, 2024):

I try this command, and find some message that show model name "qwen2:72b".
But I don't use this model after I deteled it. I will pull the model and try again.
tcpdump -X -i lo port 11434
19:41:01.470424 IP6 localhost.11434 > localhost.spremotetablet: Flags [P.], seq 1:194, ack 199, win 512, options [nop,nop,TS val 1269460090 ecr 1269460088], length 193 0x0000: 6002 1aed 00e1 0640 0000 0000 0000 0000 ......@........
0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0001 2caa b796 97e3 4236 ........,.....B6
0x0030: 8bb0 bc8e 8018 0200 00e9 0000 0101 080a ................
0x0040: 4baa 6c7a 4baa 6c78 4854 5450 2f31 2e31 K.lzK.lxHTTP/1.1
0x0050: 2034 3034 204e 6f74 2046 6f75 6e64 0d0a .404.Not.Found..
0x0060: 436f 6e74 656e 742d 5479 7065 3a20 6170 Content-Type:.ap
0x0070: 706c 6963 6174 696f 6e2f 6a73 6f6e 3b20 plication/json;.
0x0080: 6368 6172 7365 743d 7574 662d 380d 0a44 charset=utf-8..D
0x0090: 6174 653a 204d 6f6e 2c20 3139 2041 7567 ate:.Mon,.19.Aug
0x00a0: 2032 3032 3420 3131 3a34 313a 3031 2047 .2024.11:41:01.G
0x00b0: 4d54 0d0a 436f 6e74 656e 742d 4c65 6e67 MT..Content-Leng
0x00c0: 7468 3a20 3633 0d0a 0d0a 7b22 6572 726f th:.63....{"erro
0x00d0: 7222 3a22 6d6f 6465 6c20 5c22 7177 656e r":"model."qwen
0x00e0: 323a 3732 625c 2220 6e6f 7420 666f 756e 2:72b".not.foun
0x00f0: 642c 2074 7279 2070 756c 6c69 6e67 2069 d,.try.pulling.i
0x0100: 7420 6669 7273 7422 7d t.first"}
`
But if it's a model name issue, why do I get a 404 error and get stuck when I execute this command on the terminal?
I was so confused and thanks for your direction.
image

<!-- gh-comment-id:2296404837 --> @turndown commented on GitHub (Aug 19, 2024): I try this command, and find some message that show model name "qwen2:72b". But I don't use this model after I deteled it. I will pull the model and try again. `tcpdump -X -i lo port 11434` `19:41:01.470424 IP6 localhost.11434 > localhost.spremotetablet: Flags [P.], seq 1:194, ack 199, win 512, options [nop,nop,TS val 1269460090 ecr 1269460088], length 193 0x0000: 6002 1aed 00e1 0640 0000 0000 0000 0000 `......@........ 0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0001 2caa b796 97e3 4236 ........,.....B6 0x0030: 8bb0 bc8e 8018 0200 00e9 0000 0101 080a ................ 0x0040: 4baa 6c7a 4baa 6c78 4854 5450 2f31 2e31 K.lzK.lxHTTP/1.1 0x0050: 2034 3034 204e 6f74 2046 6f75 6e64 0d0a .404.Not.Found.. 0x0060: 436f 6e74 656e 742d 5479 7065 3a20 6170 Content-Type:.ap 0x0070: 706c 6963 6174 696f 6e2f 6a73 6f6e 3b20 plication/json;. 0x0080: 6368 6172 7365 743d 7574 662d 380d 0a44 charset=utf-8..D 0x0090: 6174 653a 204d 6f6e 2c20 3139 2041 7567 ate:.Mon,.19.Aug 0x00a0: 2032 3032 3420 3131 3a34 313a 3031 2047 .2024.11:41:01.G 0x00b0: 4d54 0d0a 436f 6e74 656e 742d 4c65 6e67 MT..Content-Leng 0x00c0: 7468 3a20 3633 0d0a 0d0a 7b22 6572 726f th:.63....{"erro 0x00d0: 7222 3a22 6d6f 6465 6c20 5c22 7177 656e r":"model.\"qwen 0x00e0: 323a 3732 625c 2220 6e6f 7420 666f 756e 2:72b\".not.foun 0x00f0: 642c 2074 7279 2070 756c 6c69 6e67 2069 d,.try.pulling.i 0x0100: 7420 6669 7273 7422 7d t.first"} ` **But if it's a model name issue, why do I get a 404 error and get stuck when I execute this command on the terminal?** I was so confused and thanks for your direction. ![image](https://github.com/user-attachments/assets/2de479b2-4d50-47cf-a1b7-fb1429617bbd)
Author
Owner

@rick-github commented on GitHub (Aug 19, 2024):

I think you have multiple problems. The 404 that you tracedumped is different to the ollama run llama3:latest issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix.

<!-- gh-comment-id:2296789043 --> @rick-github commented on GitHub (Aug 19, 2024): I think you have multiple problems. The 404 that you tracedumped is different to the `ollama run llama3:latest` issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix.
Author
Owner

@turndown commented on GitHub (Aug 20, 2024):

I think you have multiple problems. The 404 that you tracedumped is different to the ollama run llama3:latest issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix.

Today I stop openwebui and test docker ollama 0.3.5 image. I just run docker exec -it ollama ollama run svjack/qwen1_5_14b in a terminal,but and in another terminal capture the dump message that still show "model": "qwen2:72b".
I don't know why. Will the names of these models conflict? For example, if everything starts with 'qwen', such as' qwen2 ',' qwen: 72b ', etc., will this create problem.

10:57:01.184426 IP6 localhost.52764 > localhost.11434: Flags [P.], seq 1:199, ack 1, win 512, options [nop,nop,TS val 1324419804 ecr 1324419803], length 198
        0x0000:  600b efaa 00e6 0640 0000 0000 0000 0000  `......@........
        0x0010:  0000 0000 0000 0001 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0001 ce1c 2caa ec00 525f  ..........,...R_
        0x0030:  248f 1099 8018 0200 00ee 0000 0101 080a  $...............
        0x0040:  4ef1 0adc 4ef1 0adb 504f 5354 202f 6170  N...N...POST./ap
        0x0050:  692f 6368 6174 2048 5454 502f 312e 310d  i/chat.HTTP/1.1.
        0x0060:  0a48 6f73 743a 206c 6f63 616c 686f 7374  .Host:.localhost
        0x0070:  3a31 3134 3334 0d0a 5573 6572 2d41 6765  :11434..User-Age
        0x0080:  6e74 3a20 6375 726c 2f37 2e37 312e 310d  nt:.curl/7.71.1.
        0x0090:  0a41 6363 6570 743a 202a 2f2a 0d0a 436f  .Accept:.*/*..Co
        0x00a0:  6e74 656e 742d 4c65 6e67 7468 3a20 3431  ntent-Length:.41
        0x00b0:  0d0a 436f 6e74 656e 742d 5479 7065 3a20  ..Content-Type:.
        0x00c0:  6170 706c 6963 6174 696f 6e2f 782d 7777  application/x-ww
        0x00d0:  772d 666f 726d 2d75 726c 656e 636f 6465  w-form-urlencode
        0x00e0:  640d 0a0d 0a7b 2022 6d6f 6465 6c22 3a20  d....{."model":.
        0x00f0:  2271 7765 6e32 3a37 3262 222c 2022 6b65  "qwen2:72b",."ke
        0x0100:  6570 5f61 6c69 7665 223a 202d 317d       ep_alive":.-1}

Also, may I ask if the packet capture command is real-time? Why wasn't the query packet I executed on the terminal captured like this blow?
Thank you very much for your help.
image

<!-- gh-comment-id:2297885706 --> @turndown commented on GitHub (Aug 20, 2024): > I think you have multiple problems. The 404 that you tracedumped is different to the `ollama run llama3:latest` issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix. Today I stop openwebui and test docker ollama 0.3.5 image. I just run docker exec -it ollama ollama run svjack/qwen1_5_14b in a terminal,but and in another terminal capture the dump message that still show **"model": "qwen2:72b"**. I don't know why. Will the names of these models conflict? For example, if everything starts with 'qwen', such as' qwen2 ',' qwen: 72b ', etc., will this create problem. ``` 10:57:01.184426 IP6 localhost.52764 > localhost.11434: Flags [P.], seq 1:199, ack 1, win 512, options [nop,nop,TS val 1324419804 ecr 1324419803], length 198 0x0000: 600b efaa 00e6 0640 0000 0000 0000 0000 `......@........ 0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0001 ce1c 2caa ec00 525f ..........,...R_ 0x0030: 248f 1099 8018 0200 00ee 0000 0101 080a $............... 0x0040: 4ef1 0adc 4ef1 0adb 504f 5354 202f 6170 N...N...POST./ap 0x0050: 692f 6368 6174 2048 5454 502f 312e 310d i/chat.HTTP/1.1. 0x0060: 0a48 6f73 743a 206c 6f63 616c 686f 7374 .Host:.localhost 0x0070: 3a31 3134 3334 0d0a 5573 6572 2d41 6765 :11434..User-Age 0x0080: 6e74 3a20 6375 726c 2f37 2e37 312e 310d nt:.curl/7.71.1. 0x0090: 0a41 6363 6570 743a 202a 2f2a 0d0a 436f .Accept:.*/*..Co 0x00a0: 6e74 656e 742d 4c65 6e67 7468 3a20 3431 ntent-Length:.41 0x00b0: 0d0a 436f 6e74 656e 742d 5479 7065 3a20 ..Content-Type:. 0x00c0: 6170 706c 6963 6174 696f 6e2f 782d 7777 application/x-ww 0x00d0: 772d 666f 726d 2d75 726c 656e 636f 6465 w-form-urlencode 0x00e0: 640d 0a0d 0a7b 2022 6d6f 6465 6c22 3a20 d....{."model":. 0x00f0: 2271 7765 6e32 3a37 3262 222c 2022 6b65 "qwen2:72b",."ke 0x0100: 6570 5f61 6c69 7665 223a 202d 317d ep_alive":.-1} ``` Also, may I ask if the packet capture command is real-time? Why wasn't the query packet I executed on the terminal captured like this blow? Thank you very much for your help. ![image](https://github.com/user-attachments/assets/2b9bc423-80d4-45f9-b28c-06d1828cd8ed)
Author
Owner

@pdevine commented on GitHub (Aug 30, 2024):

I don't see qwen2:72b in the ollama list output. Can youollama pull qwen2:72b and then try to use the api with that model again?

<!-- gh-comment-id:2322506227 --> @pdevine commented on GitHub (Aug 30, 2024): I don't see `qwen2:72b` in the `ollama list` output. Can you`ollama pull qwen2:72b` and then try to use the api with that model again?
Author
Owner

@pdevine commented on GitHub (Sep 2, 2024):

I'm going to go ahead and close the issue. I'm pretty certain that you just need to pull the correct model. I'll reopen it if you're still having the issue.

<!-- gh-comment-id:2323723542 --> @pdevine commented on GitHub (Sep 2, 2024): I'm going to go ahead and close the issue. I'm pretty certain that you just need to pull the correct model. I'll reopen it if you're still having the issue.
Author
Owner

@TheLillin commented on GitHub (Oct 16, 2024):

this worked for me, In WSL I Entered " docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui " and then I could interact with the models on Open WebUI

<!-- gh-comment-id:2415556756 --> @TheLillin commented on GitHub (Oct 16, 2024): this worked for me, In WSL I Entered " docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui " and then I could interact with the models on Open WebUI
Author
Owner

@deepanshu-prajapati01 commented on GitHub (Nov 5, 2024):

As for me i was also encountering the following issue in the open WebUI

Ollama: 500, message='Internal Server Error', url='http://127.0.0.1:11434/api/chat'

But somehow trying another model (latest) works for me.
Hope anyone finds it helpful!

<!-- gh-comment-id:2456872309 --> @deepanshu-prajapati01 commented on GitHub (Nov 5, 2024): As for me i was also encountering the following issue in the open WebUI Ollama: 500, message='Internal Server Error', url='http://127.0.0.1:11434/api/chat' But somehow trying another model (latest) works for me. Hope anyone finds it helpful!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29786