[GH-ISSUE #6408] 404 POST "/api/chat" #50537

New Issue

GiteaMirror · 2026-04-28T16:15:58-05:00

GiteaMirror commented

2026-04-28 16:15:58 -05:00

Originally created by @turndown on GitHub (Aug 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6408

What is the issue?

At first, it started running normally, but after a while, it reported 404，and can‘t run any model.
Can you help me solve it？Thx.

install by：curl -fsSL https://ollama.com/install.sh

log below：
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: LF token = 148848 'ÄĬ'
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: EOT token = 151643 '<|endoftext|>'
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: max token length = 256
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: found 1 CUDA devices:
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: ggml ctx size = 0.30 MiB
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading 28 repeating layers to GPU
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading non-repeating layers to GPU
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloaded 29/29 layers to GPU
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CPU buffer size = 292.36 MiB
Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CUDA0 buffer size = 3928.07 MiB
Aug 19 10:26:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:01 | 404 | 185.499µs | ::1 | POST "/api/chat"
Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 1.273346ms | 172.17.0.2 | GET "/api/tags"
Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 88.559µs | 172.17.0.2 | GET "/api/vers>
Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 207.009µs | 127.0.0.1 | HEAD "/"
Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 1.100698ms | 127.0.0.1 | GET "/api/tags"
Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 46.933µs | 127.0.0.1 | HEAD "/"
Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 23.522263ms | 127.0.0.1 | POST "/api/show"
Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.502+08:00 level=INFO source=server.go:627 msg="waiting for serve>
Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.780+08:00 level=INFO source=server.go:627 msg="waiting for serve>
Aug 19 10:27:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:27:01 | 404 | 7.051455ms | ::1 | POST "/api/chat"
Aug 19 10:28:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:01 | 404 | 367.924µs | ::1 | POST "/api/chat"
Aug 19 10:28:55 ecs-lcdsj systemd[1]: Stopping Ollama Service...
Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.817+08:00 level=WARN source=server.go:600 msg="client connection>
Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.818+08:00 level=ERROR source=sched.go:451 msg="error loading lla>
Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:55 | 499 | 3m0s | 172.17.0.2 | POST "/api/chat"
Aug 19 10:28:56 ecs-lcdsj systemd[1]: ollama.service: Succeeded.
Aug 19 10:28:56 ecs-lcdsj systemd[1]: Stopped Ollama Service.
Aug 19 10:28:56 ecs-lcdsj systemd[1]: Started Ollama Service.
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: 2024/08/19 10:28:56 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU>
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.246+08:00 level=INFO source=images.go:782 msg="total blobs: 15"
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=images.go:790 msg="total unused blob>
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=routes.go:1172 msg="Listening on [::>
Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.250+08:00 level=INFO source=payload.go:30 msg="extracting embedd>
Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.035+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libra>
Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.037+08:00 level=INFO source=gpu.go:204 msg="looking for compatib>
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.605+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute">
Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:29:10 | 404 | 13.419583ms | ::1 | POST "/api/chat"
Aug 19 10:30:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:30:01 | 404 | 990.349µs | ::1 | POST "/api/chat"
Aug 19 10:31:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:31:01 | 404 | 224.61µs | ::1 | POST "/api/chat"
Aug 19 10:32:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:01 | 404 | 15.250541ms | ::1 | POST "/api/chat"
Aug 19 10:32:27 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:27 | 200 | 46.654µs | 127.0.0.1 | GET "/api/vers>
Aug 19 10:33:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:33:01 | 404 | 959.34µs | ::1 | POST "/api/chat"
Aug 19 10:34:02 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:34:02 | 404 | 18.592866ms | ::1 | POST "/api/chat"
Aug 19 10:35:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:35:01 | 404 | 284.394µs | ::1 | POST "/api/chat"

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.3.6

Originally created by @turndown on GitHub (Aug 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6408 ### What is the issue? At first, it started running normally, but after a while, it reported 404，and can‘t run any model. Can you help me solve it？Thx. install by：curl -fsSL https://ollama.com/install.sh **log below：** Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: LF token = 148848 'ÄĬ' Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: EOT token = 151643 '<|endoftext|>' Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_print_meta: max token length = 256 Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: ggml_cuda_init: found 1 CUDA devices: Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: ggml ctx size = 0.30 MiB Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading 28 repeating layers to GPU Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloading non-repeating layers to GPU Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: offloaded 29/29 layers to GPU Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CPU buffer size = 292.36 MiB Aug 19 10:25:57 ecs-lcdsj ollama[1026502]: llm_load_tensors: CUDA0 buffer size = 3928.07 MiB Aug 19 10:26:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:01 | 404 | 185.499µs | ::1 | POST "/api/chat" Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 1.273346ms | 172.17.0.2 | GET "/api/tags" Aug 19 10:26:02 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:02 | 200 | 88.559µs | 172.17.0.2 | GET "/api/vers> Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 207.009µs | 127.0.0.1 | HEAD "/" Aug 19 10:26:26 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:26 | 200 | 1.100698ms | 127.0.0.1 | GET "/api/tags" Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 46.933µs | 127.0.0.1 | HEAD "/" Aug 19 10:26:33 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:26:33 | 200 | 23.522263ms | 127.0.0.1 | POST "/api/show" Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.502+08:00 level=INFO source=server.go:627 msg="waiting for serve> Aug 19 10:26:44 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:26:44.780+08:00 level=INFO source=server.go:627 msg="waiting for serve> Aug 19 10:27:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:27:01 | 404 | 7.051455ms | ::1 | POST "/api/chat" Aug 19 10:28:01 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:01 | 404 | 367.924µs | ::1 | POST "/api/chat" Aug 19 10:28:55 ecs-lcdsj systemd[1]: Stopping Ollama Service... Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.817+08:00 level=WARN source=server.go:600 msg="client connection> Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: time=2024-08-19T10:28:55.818+08:00 level=ERROR source=sched.go:451 msg="error loading lla> Aug 19 10:28:55 ecs-lcdsj ollama[1026502]: [GIN] 2024/08/19 - 10:28:55 | 499 | 3m0s | 172.17.0.2 | POST "/api/chat" Aug 19 10:28:56 ecs-lcdsj systemd[1]: ollama.service: Succeeded. Aug 19 10:28:56 ecs-lcdsj systemd[1]: Stopped Ollama Service. Aug 19 10:28:56 ecs-lcdsj systemd[1]: Started Ollama Service. Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: 2024/08/19 10:28:56 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU> Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.246+08:00 level=INFO source=images.go:782 msg="total blobs: 15" Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=images.go:790 msg="total unused blob> Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.249+08:00 level=INFO source=routes.go:1172 msg="Listening on [::> Aug 19 10:28:56 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:28:56.250+08:00 level=INFO source=payload.go:30 msg="extracting embedd> Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.035+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libra> Aug 19 10:29:01 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:01.037+08:00 level=INFO source=gpu.go:204 msg="looking for compatib> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.605+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: time=2024-08-19T10:29:10.606+08:00 level=INFO source=types.go:105 msg="inference compute"> Aug 19 10:29:10 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:29:10 | 404 | 13.419583ms | ::1 | POST "/api/chat" Aug 19 10:30:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:30:01 | 404 | 990.349µs | ::1 | POST "/api/chat" Aug 19 10:31:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:31:01 | 404 | 224.61µs | ::1 | POST "/api/chat" Aug 19 10:32:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:01 | 404 | 15.250541ms | ::1 | POST "/api/chat" Aug 19 10:32:27 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:32:27 | 200 | 46.654µs | 127.0.0.1 | GET "/api/vers> Aug 19 10:33:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:33:01 | 404 | 959.34µs | ::1 | POST "/api/chat" Aug 19 10:34:02 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:34:02 | 404 | 18.592866ms | ::1 | POST "/api/chat" Aug 19 10:35:01 ecs-lcdsj ollama[1032507]: [GIN] 2024/08/19 - 10:35:01 | 404 | 284.394µs | ::1 | POST "/api/chat" ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.3.6

GiteaMirror added the needs more info bug labels 2026-04-28 16:16:00 -05:00

GiteaMirror closed this issue

2026-04-28 16:16:05 -05:00

GiteaMirror commented

2026-04-28 16:16:12 -05:00

@turndown commented on GitHub (Aug 19, 2024):

I think the key issue here is that the model cannot be loaded, but it can be searched locally

0.560+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server not responding"
0.855+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model"
9:01 | 404 | 8.989131ms | ::1 | POST "/api/chat"
0:01 | 404 | 191.365µs | ::1 | POST "/api/chat"

@turndown commented on GitHub (Aug 19, 2024): I think the key issue here is that the model cannot be loaded, but it can be searched locally ![image](https://github.com/user-attachments/assets/d4864ed4-d2b8-4976-b290-2d001e0f3ad4) 0.560+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server not responding" 0.855+08:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model" 9:01 | 404 | 8.989131ms | ::1 | POST "/api/chat" 0:01 | 404 | 191.365µs | ::1 | POST "/api/chat"

GiteaMirror commented

2026-04-28 16:16:16 -05:00

@turndown commented on GitHub (Aug 19, 2024):

I try to print some debug detail log, it's just report [GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat"
(base) [root@ecs-lcdsj ~]# OLLAMA_DEBUG=1 ollama serve 2>&1 | tee server.log
2024/08/19 14:09:56 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:782 msg="total blobs: 0"
time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-19T14:09:56.520+08:00 level=INFO source=routes.go:1170 msg="Listening on 127.0.0.1:11434 (version 0.3.5)"
time=2024-08-19T14:09:56.522+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama565513732/runners
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx2/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cuda_v11/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/rocm_v60102/ollama_llama_server
time=2024-08-19T14:10:01.179+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]"
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-08-19T14:10:01.179+08:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so
time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/cuda-12.5/lib64/libcuda.so** /root/libcuda.so** /usr/local/cuda*/targets//lib/libcuda.so /usr/lib/-linux-gnu/nvidia/current/libcuda.so /usr/lib/-linux-gnu/libcuda.so /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers//libcuda.so /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-08-19T14:10:01.187+08:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.550.90.07 /usr/lib64/libcuda.so.550.90.07]"
library /usr/lib/libcuda.so.550.90.07 load err: /usr/lib/libcuda.so.550.90.07: wrong ELF class: ELFCLASS32
time=2024-08-19T14:10:01.188+08:00 level=DEBUG source=gpu.go:566 msg="skipping 32bit library" library=/usr/lib/libcuda.so.550.90.07
CUDA driver version: 12.4
time=2024-08-19T14:10:01.548+08:00 level=DEBUG source=gpu.go:123 msg="detected GPUs" count=4 library=/usr/lib64/libcuda.so.550.90.07
[GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA totalMem 40326 mb
[GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA freeMem 38836 mb
[GPU-220df675-5d27-88e7-0958-f62f77a1e82a] Compute Capability 8.0
[GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA totalMem 40326 mb
[GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA freeMem 39903 mb
[GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] Compute Capability 8.0
[GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA totalMem 40326 mb
[GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA freeMem 39903 mb
[GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] Compute Capability 8.0
[GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA totalMem 40326 mb
[GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA freeMem 39903 mb
[GPU-7aabff4a-5756-eee1-b793-880410188e85] Compute Capability 8.0
time=2024-08-19T14:10:02.846+08:00 level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing cuda driver library
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-220df675-5d27-88e7-0958-f62f77a1e82a library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="37.9 GiB"
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB"
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB"
time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-7aabff4a-5756-eee1-b793-880410188e85 library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB"
[GIN] 2024/08/19 - 14:10:02 | 404 | 2.731353ms | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/08/19 - 14:11:01 | 404 | 205.613µs | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/08/19 - 14:12:02 | 404 | 15.933031ms | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat"

@turndown commented on GitHub (Aug 19, 2024): I try to print some debug detail log, it's just report [GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat" (base) [root@ecs-lcdsj ~]# OLLAMA_DEBUG=1 ollama serve 2>&1 | tee server.log 2024/08/19 14:09:56 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:782 msg="total blobs: 0" time=2024-08-19T14:09:56.520+08:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-19T14:09:56.520+08:00 level=INFO source=routes.go:1170 msg="Listening on 127.0.0.1:11434 (version 0.3.5)" time=2024-08-19T14:09:56.522+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama565513732/runners time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz time=2024-08-19T14:09:56.522+08:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cpu_avx2/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/cuda_v11/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama565513732/runners/rocm_v60102/ollama_llama_server time=2024-08-19T14:10:01.179+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]" time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-08-19T14:10:01.179+08:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-08-19T14:10:01.179+08:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA" time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so* time=2024-08-19T14:10:01.180+08:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/cuda-12.5/lib64/libcuda.so** /root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-08-19T14:10:01.187+08:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.550.90.07 /usr/lib64/libcuda.so.550.90.07]" library /usr/lib/libcuda.so.550.90.07 load err: /usr/lib/libcuda.so.550.90.07: wrong ELF class: ELFCLASS32 time=2024-08-19T14:10:01.188+08:00 level=DEBUG source=gpu.go:566 msg="skipping 32bit library" library=/usr/lib/libcuda.so.550.90.07 CUDA driver version: 12.4 time=2024-08-19T14:10:01.548+08:00 level=DEBUG source=gpu.go:123 msg="detected GPUs" count=4 library=/usr/lib64/libcuda.so.550.90.07 [GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA totalMem 40326 mb [GPU-220df675-5d27-88e7-0958-f62f77a1e82a] CUDA freeMem 38836 mb [GPU-220df675-5d27-88e7-0958-f62f77a1e82a] Compute Capability 8.0 [GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA totalMem 40326 mb [GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] CUDA freeMem 39903 mb [GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff] Compute Capability 8.0 [GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA totalMem 40326 mb [GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] CUDA freeMem 39903 mb [GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac] Compute Capability 8.0 [GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA totalMem 40326 mb [GPU-7aabff4a-5756-eee1-b793-880410188e85] CUDA freeMem 39903 mb [GPU-7aabff4a-5756-eee1-b793-880410188e85] Compute Capability 8.0 time=2024-08-19T14:10:02.846+08:00 level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu" releasing cuda driver library time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-220df675-5d27-88e7-0958-f62f77a1e82a library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="37.9 GiB" time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-0509be8c-c34b-4e94-ccc8-3d06d7a287ff library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB" time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-238d50b9-e2e6-8bf5-cf29-8a98895db3ac library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB" time=2024-08-19T14:10:02.846+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-7aabff4a-5756-eee1-b793-880410188e85 library=cuda compute=8.0 driver=12.4 name="NVIDIA A100-PCIE-40GB" total="39.4 GiB" available="39.0 GiB" [GIN] 2024/08/19 - 14:10:02 | 404 | 2.731353ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/19 - 14:11:01 | 404 | 205.613µs | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/19 - 14:12:02 | 404 | 15.933031ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/19 - 14:13:01 | 404 | 414.019µs | 127.0.0.1 | POST "/api/chat"

GiteaMirror commented

2026-04-28 16:16:17 -05:00

@rick-github commented on GitHub (Aug 19, 2024):

What's in the body of the 404 response that the client receives?

@rick-github commented on GitHub (Aug 19, 2024): What's in the body of the 404 response that the client receives?

GiteaMirror commented

2026-04-28 16:16:19 -05:00

@turndown commented on GitHub (Aug 19, 2024):

What's in the body of the 404 response that the client receives?

Hi，do you mean this blow？I use openwebui to connect ollama， and it got no response.
But sometimes it worked fine，sometimes it happens when switching models.
I can't find the pattern of what happened.
Thx for your reply.

@turndown commented on GitHub (Aug 19, 2024): > What's in the body of the 404 response that the client receives? Hi，do you mean this blow？I use openwebui to connect ollama， and it got no response. But sometimes it worked fine，sometimes it happens when switching models. I can't find the pattern of what happened. Thx for your reply. ![image](https://github.com/user-attachments/assets/5a3f596c-c015-40a0-a7cc-c3b13a4b1a11) ![image](https://github.com/user-attachments/assets/e4d9eb7d-de89-4d30-896f-9879e9d9794b)

GiteaMirror commented

2026-04-28 16:16:21 -05:00

@rick-github commented on GitHub (Aug 19, 2024):

The most likely problem is that the request that is being sent to ollama has a bad model name:

$ curl -s -D - localhost:11434/api/chat -d '{"model":"unknown"}'
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=utf-8
Date: Mon, 19 Aug 2024 11:15:04 GMT
Content-Length: 61

{"error":"model \"unknown\" not found, try pulling it first"}

If you can get the contents of the 404 response that ollama sent, it will probably have information about why the request failed, whether a bad model name or some other reason.

The HTTP tracer extension won't help because that's looking at the traffic between the browser and the open-webui port, not between open-webui and ollama.

Run this and use open-webui, when an error occurs you should be able to find the error message in the packet trace:

sudo tcpdump -X -i lo port 11434

@rick-github commented on GitHub (Aug 19, 2024): The most likely problem is that the request that is being sent to ollama has a bad model name: ``` $ curl -s -D - localhost:11434/api/chat -d '{"model":"unknown"}' HTTP/1.1 404 Not Found Content-Type: application/json; charset=utf-8 Date: Mon, 19 Aug 2024 11:15:04 GMT Content-Length: 61 {"error":"model \"unknown\" not found, try pulling it first"} ``` If you can get the contents of the 404 response that ollama sent, it will probably have information about why the request failed, whether a bad model name or some other reason. The HTTP tracer extension won't help because that's looking at the traffic between the browser and the open-webui port, not between open-webui and ollama. Run this and use open-webui, when an error occurs you should be able to find the error message in the packet trace: ``` sudo tcpdump -X -i lo port 11434 ```

GiteaMirror commented

2026-04-28 16:16:23 -05:00

@turndown commented on GitHub (Aug 19, 2024):

I try this command, and find some message that show model name "qwen2:72b".
But I don't use this model after I deteled it. I will pull the model and try again.
tcpdump -X -i lo port 11434
19:41:01.470424 IP6 localhost.11434 > localhost.spremotetablet: Flags [P.], seq 1:194, ack 199, win 512, options [nop,nop,TS val 1269460090 ecr 1269460088], length 193 0x0000: 6002 1aed 00e1 0640 0000 0000 0000 0000 ......@........
0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0001 2caa b796 97e3 4236 ........,.....B6
0x0030: 8bb0 bc8e 8018 0200 00e9 0000 0101 080a ................
0x0040: 4baa 6c7a 4baa 6c78 4854 5450 2f31 2e31 K.lzK.lxHTTP/1.1
0x0050: 2034 3034 204e 6f74 2046 6f75 6e64 0d0a .404.Not.Found..
0x0060: 436f 6e74 656e 742d 5479 7065 3a20 6170 Content-Type:.ap
0x0070: 706c 6963 6174 696f 6e2f 6a73 6f6e 3b20 plication/json;.
0x0080: 6368 6172 7365 743d 7574 662d 380d 0a44 charset=utf-8..D
0x0090: 6174 653a 204d 6f6e 2c20 3139 2041 7567 ate:.Mon,.19.Aug
0x00a0: 2032 3032 3420 3131 3a34 313a 3031 2047 .2024.11:41:01.G
0x00b0: 4d54 0d0a 436f 6e74 656e 742d 4c65 6e67 MT..Content-Leng
0x00c0: 7468 3a20 3633 0d0a 0d0a 7b22 6572 726f th:.63....{"erro
0x00d0: 7222 3a22 6d6f 6465 6c20 5c22 7177 656e r":"model."qwen
0x00e0: 323a 3732 625c 2220 6e6f 7420 666f 756e 2:72b".not.foun
0x00f0: 642c 2074 7279 2070 756c 6c69 6e67 2069 d,.try.pulling.i
0x0100: 7420 6669 7273 7422 7d t.first"}
`
But if it's a model name issue, why do I get a 404 error and get stuck when I execute this command on the terminal?
I was so confused and thanks for your direction.

@turndown commented on GitHub (Aug 19, 2024): I try this command, and find some message that show model name "qwen2:72b". But I don't use this model after I deteled it. I will pull the model and try again. `tcpdump -X -i lo port 11434` `19:41:01.470424 IP6 localhost.11434 > localhost.spremotetablet: Flags [P.], seq 1:194, ack 199, win 512, options [nop,nop,TS val 1269460090 ecr 1269460088], length 193 0x0000: 6002 1aed 00e1 0640 0000 0000 0000 0000 `......@........ 0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0001 2caa b796 97e3 4236 ........,.....B6 0x0030: 8bb0 bc8e 8018 0200 00e9 0000 0101 080a ................ 0x0040: 4baa 6c7a 4baa 6c78 4854 5450 2f31 2e31 K.lzK.lxHTTP/1.1 0x0050: 2034 3034 204e 6f74 2046 6f75 6e64 0d0a .404.Not.Found.. 0x0060: 436f 6e74 656e 742d 5479 7065 3a20 6170 Content-Type:.ap 0x0070: 706c 6963 6174 696f 6e2f 6a73 6f6e 3b20 plication/json;. 0x0080: 6368 6172 7365 743d 7574 662d 380d 0a44 charset=utf-8..D 0x0090: 6174 653a 204d 6f6e 2c20 3139 2041 7567 ate:.Mon,.19.Aug 0x00a0: 2032 3032 3420 3131 3a34 313a 3031 2047 .2024.11:41:01.G 0x00b0: 4d54 0d0a 436f 6e74 656e 742d 4c65 6e67 MT..Content-Leng 0x00c0: 7468 3a20 3633 0d0a 0d0a 7b22 6572 726f th:.63....{"erro 0x00d0: 7222 3a22 6d6f 6465 6c20 5c22 7177 656e r":"model.\"qwen 0x00e0: 323a 3732 625c 2220 6e6f 7420 666f 756e 2:72b\".not.foun 0x00f0: 642c 2074 7279 2070 756c 6c69 6e67 2069 d,.try.pulling.i 0x0100: 7420 6669 7273 7422 7d t.first"} ` **But if it's a model name issue, why do I get a 404 error and get stuck when I execute this command on the terminal?** I was so confused and thanks for your direction. ![image](https://github.com/user-attachments/assets/2de479b2-4d50-47cf-a1b7-fb1429617bbd)

GiteaMirror commented

2026-04-28 16:16:25 -05:00

@rick-github commented on GitHub (Aug 19, 2024):

I think you have multiple problems. The 404 that you tracedumped is different to the ollama run llama3:latest issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix.

@rick-github commented on GitHub (Aug 19, 2024): I think you have multiple problems. The 404 that you tracedumped is different to the `ollama run llama3:latest` issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix.

GiteaMirror commented

2026-04-28 16:16:26 -05:00

@turndown commented on GitHub (Aug 20, 2024):

I think you have multiple problems. The 404 that you tracedumped is different to the ollama run llama3:latest issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix.

Today I stop openwebui and test docker ollama 0.3.5 image. I just run docker exec -it ollama ollama run svjack/qwen1_5_14b in a terminal，but and in another terminal capture the dump message that still show "model": "qwen2:72b".
I don't know why. Will the names of these models conflict? For example, if everything starts with 'qwen', such as' qwen2 ',' qwen: 72b ', etc., will this create problem.

10:57:01.184426 IP6 localhost.52764 > localhost.11434: Flags [P.], seq 1:199, ack 1, win 512, options [nop,nop,TS val 1324419804 ecr 1324419803], length 198
        0x0000:  600b efaa 00e6 0640 0000 0000 0000 0000  `......@........
        0x0010:  0000 0000 0000 0001 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0001 ce1c 2caa ec00 525f  ..........,...R_
        0x0030:  248f 1099 8018 0200 00ee 0000 0101 080a  $...............
        0x0040:  4ef1 0adc 4ef1 0adb 504f 5354 202f 6170  N...N...POST./ap
        0x0050:  692f 6368 6174 2048 5454 502f 312e 310d  i/chat.HTTP/1.1.
        0x0060:  0a48 6f73 743a 206c 6f63 616c 686f 7374  .Host:.localhost
        0x0070:  3a31 3134 3334 0d0a 5573 6572 2d41 6765  :11434..User-Age
        0x0080:  6e74 3a20 6375 726c 2f37 2e37 312e 310d  nt:.curl/7.71.1.
        0x0090:  0a41 6363 6570 743a 202a 2f2a 0d0a 436f  .Accept:.*/*..Co
        0x00a0:  6e74 656e 742d 4c65 6e67 7468 3a20 3431  ntent-Length:.41
        0x00b0:  0d0a 436f 6e74 656e 742d 5479 7065 3a20  ..Content-Type:.
        0x00c0:  6170 706c 6963 6174 696f 6e2f 782d 7777  application/x-ww
        0x00d0:  772d 666f 726d 2d75 726c 656e 636f 6465  w-form-urlencode
        0x00e0:  640d 0a0d 0a7b 2022 6d6f 6465 6c22 3a20  d....{."model":.
        0x00f0:  2271 7765 6e32 3a37 3262 222c 2022 6b65  "qwen2:72b",."ke
        0x0100:  6570 5f61 6c69 7665 223a 202d 317d       ep_alive":.-1}

Also, may I ask if the packet capture command is real-time? Why wasn't the query packet I executed on the terminal captured like this blow?
Thank you very much for your help.

@turndown commented on GitHub (Aug 20, 2024): > I think you have multiple problems. The 404 that you tracedumped is different to the `ollama run llama3:latest` issue because the models are not the same. You need to separate out the problems and post server logs that clearly show the issue you are trying to fix. Today I stop openwebui and test docker ollama 0.3.5 image. I just run docker exec -it ollama ollama run svjack/qwen1_5_14b in a terminal，but and in another terminal capture the dump message that still show **"model": "qwen2:72b"**. I don't know why. Will the names of these models conflict? For example, if everything starts with 'qwen', such as' qwen2 ',' qwen: 72b ', etc., will this create problem. ``` 10:57:01.184426 IP6 localhost.52764 > localhost.11434: Flags [P.], seq 1:199, ack 1, win 512, options [nop,nop,TS val 1324419804 ecr 1324419803], length 198 0x0000: 600b efaa 00e6 0640 0000 0000 0000 0000 `......@........ 0x0010: 0000 0000 0000 0001 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0001 ce1c 2caa ec00 525f ..........,...R_ 0x0030: 248f 1099 8018 0200 00ee 0000 0101 080a $............... 0x0040: 4ef1 0adc 4ef1 0adb 504f 5354 202f 6170 N...N...POST./ap 0x0050: 692f 6368 6174 2048 5454 502f 312e 310d i/chat.HTTP/1.1. 0x0060: 0a48 6f73 743a 206c 6f63 616c 686f 7374 .Host:.localhost 0x0070: 3a31 3134 3334 0d0a 5573 6572 2d41 6765 :11434..User-Age 0x0080: 6e74 3a20 6375 726c 2f37 2e37 312e 310d nt:.curl/7.71.1. 0x0090: 0a41 6363 6570 743a 202a 2f2a 0d0a 436f .Accept:.*/*..Co 0x00a0: 6e74 656e 742d 4c65 6e67 7468 3a20 3431 ntent-Length:.41 0x00b0: 0d0a 436f 6e74 656e 742d 5479 7065 3a20 ..Content-Type:. 0x00c0: 6170 706c 6963 6174 696f 6e2f 782d 7777 application/x-ww 0x00d0: 772d 666f 726d 2d75 726c 656e 636f 6465 w-form-urlencode 0x00e0: 640d 0a0d 0a7b 2022 6d6f 6465 6c22 3a20 d....{."model":. 0x00f0: 2271 7765 6e32 3a37 3262 222c 2022 6b65 "qwen2:72b",."ke 0x0100: 6570 5f61 6c69 7665 223a 202d 317d ep_alive":.-1} ``` Also, may I ask if the packet capture command is real-time? Why wasn't the query packet I executed on the terminal captured like this blow? Thank you very much for your help. ![image](https://github.com/user-attachments/assets/2b9bc423-80d4-45f9-b28c-06d1828cd8ed)

GiteaMirror commented

2026-04-28 16:16:28 -05:00

@pdevine commented on GitHub (Aug 30, 2024):

I don't see qwen2:72b in the ollama list output. Can youollama pull qwen2:72b and then try to use the api with that model again?

@pdevine commented on GitHub (Aug 30, 2024): I don't see `qwen2:72b` in the `ollama list` output. Can you`ollama pull qwen2:72b` and then try to use the api with that model again?

GiteaMirror commented

2026-04-28 16:16:30 -05:00

@pdevine commented on GitHub (Sep 2, 2024):

I'm going to go ahead and close the issue. I'm pretty certain that you just need to pull the correct model. I'll reopen it if you're still having the issue.

@pdevine commented on GitHub (Sep 2, 2024): I'm going to go ahead and close the issue. I'm pretty certain that you just need to pull the correct model. I'll reopen it if you're still having the issue.

GiteaMirror commented

2026-04-28 16:16:31 -05:00

@TheLillin commented on GitHub (Oct 16, 2024):

this worked for me, In WSL I Entered " docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui " and then I could interact with the models on Open WebUI

@TheLillin commented on GitHub (Oct 16, 2024): this worked for me, In WSL I Entered " docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui " and then I could interact with the models on Open WebUI

GiteaMirror commented

2026-04-28 16:16:34 -05:00

@deepanshu-prajapati01 commented on GitHub (Nov 5, 2024):

As for me i was also encountering the following issue in the open WebUI

Ollama: 500, message='Internal Server Error', url='http://127.0.0.1:11434/api/chat'

But somehow trying another model (latest) works for me.
Hope anyone finds it helpful!

@deepanshu-prajapati01 commented on GitHub (Nov 5, 2024): As for me i was also encountering the following issue in the open WebUI Ollama: 500, message='Internal Server Error', url='http://127.0.0.1:11434/api/chat' But somehow trying another model (latest) works for me. Hope anyone finds it helpful!

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#50537