[GH-ISSUE #14619] Error 499 and CUDA error #55986

Closed
opened 2026-04-29 10:06:34 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ScaryBeats01 on GitHub (Mar 4, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14619

What is the issue?

I was trying this command: npx llm-checker toolcheck --all
And ollama reacted like the log output. I think the error is related to https://github.com/ollama/ollama/issues/14615.

Here half of the output of the command:

╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ llama3.2:1b                              │ SUPPORTED   │ 100   │ Model emitted structured tool_calls.                                                                                                                 ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ phi4-mini:latest                         │ UNSUPPORTED │ 0     │ Failed to run chat request: This operation was aborted                                                                                               ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ jobautomation/OpenEuroLLM-Catalan:latest │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/jobautomation/OpenEuroLLM-Catalan:latest does not support tools"}   ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ nomic-embed-text:latest                  │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"\"nomic-embed-text:latest\" does not support chat"}                                    ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ mxbai-embed-large:latest                 │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"\"mxbai-embed-large:latest\" does not support chat"}                                   ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ deepseek-r1:latest                       │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:latest does not support tools"}                 ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ gemma3:latest                            │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/gemma3:latest does not support tools"}                      ║
╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ codellama:latest                         │ UNSUPPORTED │ 0     │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/codellama:latest does not support tools"}                   ║
╚══════════════════════════════════════════╧═════════════╧═══════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Relevant log output

print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-03-04T12:17:09.811+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\user\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 45930"
time=2026-03-04T12:17:09.829+01:00 level=INFO source=sched.go:489 msg="system memory" total="15.7 GiB" free="1.3 GiB" free_swap="19.8 GiB"
time=2026-03-04T12:17:09.830+01:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 library=CUDA available="0 B" free="430.5 MiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-04T12:17:09.915+01:00 level=INFO source=server.go:497 msg="loading model" "model layers"=29 requested=-1
time=2026-03-04T12:17:10.059+01:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="4.1 GiB"
time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="224.0 MiB"
time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:272 msg="total memory" size="4.3 GiB"
time=2026-03-04T12:17:18.807+01:00 level=INFO source=runner.go:965 msg="starting go runner"
load_backend: loaded CPU backend from C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-03-04T12:17:27.038+01:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\user\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 error="context canceled"
[GIN] 2026/03/04 - 12:17:27 | 200 |    156.1757ms |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/04 - 12:17:27 | 499 |   45.5433371s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-04T12:17:30.611+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 8626"
time=2026-03-04T12:17:33.243+01:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2026-03-04T12:17:33.245+01:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"
time=2026-03-04T12:17:33.247+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\user\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2

OS

Windows

GPU

Intel, Nvidia

CPU

Intel

Ollama version

0.17.6

Originally created by @ScaryBeats01 on GitHub (Mar 4, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14619 ### What is the issue? I was trying this command: npx llm-checker toolcheck --all And ollama reacted like the log output. I think the error is related to https://github.com/ollama/ollama/issues/14615. Here half of the output of the command: ``` ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ llama3.2:1b │ SUPPORTED │ 100 │ Model emitted structured tool_calls. ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ phi4-mini:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: This operation was aborted ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ jobautomation/OpenEuroLLM-Catalan:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/jobautomation/OpenEuroLLM-Catalan:latest does not support tools"} ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ nomic-embed-text:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"\"nomic-embed-text:latest\" does not support chat"} ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ mxbai-embed-large:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"\"mxbai-embed-large:latest\" does not support chat"} ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ deepseek-r1:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:latest does not support tools"} ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ gemma3:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/gemma3:latest does not support tools"} ║ ╟──────────────────────────────────────────┼─────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ codellama:latest │ UNSUPPORTED │ 0 │ Failed to run chat request: HTTP 400: Bad Request - {"error":"registry.ollama.ai/library/codellama:latest does not support tools"} ║ ╚══════════════════════════════════════════╧═════════════╧═══════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ ``` ### Relevant log output ```shell print_info: arch = qwen2 print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-03-04T12:17:09.811+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\user\\.ollama\\models\\blobs\\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --port 45930" time=2026-03-04T12:17:09.829+01:00 level=INFO source=sched.go:489 msg="system memory" total="15.7 GiB" free="1.3 GiB" free_swap="19.8 GiB" time=2026-03-04T12:17:09.830+01:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 library=CUDA available="0 B" free="430.5 MiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-04T12:17:09.915+01:00 level=INFO source=server.go:497 msg="loading model" "model layers"=29 requested=-1 time=2026-03-04T12:17:10.059+01:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="4.1 GiB" time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="224.0 MiB" time=2026-03-04T12:17:10.395+01:00 level=INFO source=device.go:272 msg="total memory" size="4.3 GiB" time=2026-03-04T12:17:18.807+01:00 level=INFO source=runner.go:965 msg="starting go runner" load_backend: loaded CPU backend from C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2026-03-04T12:17:27.038+01:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\user\.ollama\models\blobs\sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 error="context canceled" [GIN] 2026/03/04 - 12:17:27 | 200 | 156.1757ms | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/04 - 12:17:27 | 499 | 45.5433371s | 127.0.0.1 | POST "/api/chat" time=2026-03-04T12:17:30.611+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 8626" time=2026-03-04T12:17:33.243+01:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout" time=2026-03-04T12:17:33.245+01:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values" time=2026-03-04T12:17:33.247+01:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-03-04T12:17:33.248+01:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20 llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\user\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Llama-3.2 ``` ### OS Windows ### GPU Intel, Nvidia ### CPU Intel ### Ollama version 0.17.6
GiteaMirror added the bug label 2026-04-29 10:06:34 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 4, 2026):

The error messages are pretty self-explanatory. Embedding models don't support chat and the other models don't support tools, which llm-checker is reporting. The 499 error is because the model took longer than 45 seconds to respond and the client has a 45 second timeout.

<!-- gh-comment-id:3997211590 --> @rick-github commented on GitHub (Mar 4, 2026): The error messages are pretty self-explanatory. Embedding models don't support chat and the other models don't support tools, which llm-checker is reporting. The 499 error is because the model took longer than 45 seconds to respond and the client has a 45 second timeout.
Author
Owner

@ScaryBeats01 commented on GitHub (Mar 7, 2026):

Yes, I know that, thanks, but now works well, without errors and timeouts.

<!-- gh-comment-id:4016517731 --> @ScaryBeats01 commented on GitHub (Mar 7, 2026): Yes, I know that, thanks, but now works well, without errors and timeouts.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55986