ollama-ollama

Performance Regression on Apple Silicon M1: GPU → CPU Fallback in v0.12.9 (works correctly in v0.12.5) bug macos performance

#8606 opened 2025-11-12 14:47:00 -06:00 by GiteaMirror

26

Deepseek R1, 671b is faster than 70b performance

#6584 opened 2025-11-12 13:38:30 -06:00 by GiteaMirror

30

CPU inference much slower than expected performance

#6579 opened 2025-11-12 13:38:23 -06:00 by GiteaMirror

5

Ollama not running with ROCm backend? amd bug performance

#5984 opened 2025-11-12 13:18:21 -06:00 by GiteaMirror

14

Creating embeddings using the REST API is much slower than performing the same operation using Sentence Transformers bug performance

#4709 opened 2025-11-12 12:28:30 -06:00 by GiteaMirror

12

The speed of using embedded models is much slower compared to xinference feature request performance

#4190 opened 2025-11-12 12:10:48 -06:00 by GiteaMirror

optimize numa behavior for large models with GPU and CPU inference - numa_balancing on GPU causes excessively slow load times feature request linux performance

#4043 opened 2025-11-12 12:05:02 -06:00 by GiteaMirror

14

Enable speculative decoding feature request performance

#3619 opened 2025-11-12 11:43:22 -06:00 by GiteaMirror

55

gemma2 27b is too slow bug gpu nvidia performance

#3459 opened 2025-11-12 11:37:23 -06:00 by GiteaMirror

4

Ollama CPU based don't run in a LXC (Host Kernel 6.8.4-3) bug linux needs more info performance

#3458 opened 2025-11-12 11:37:17 -06:00 by GiteaMirror

14

Improve ollama's Output Speed feature request performance

#3392 opened 2025-11-12 11:35:01 -06:00 by GiteaMirror

5

Ollama keeps to randomly re-evaluate whole prompt, making chats impossible bug performance

#3329 opened 2025-11-12 11:32:47 -06:00 by GiteaMirror

20

Performance degrades over time when running in Docker with Nvidia GPU bug docker nvidia performance

#3068 opened 2025-11-12 11:24:08 -06:00 by GiteaMirror

12

Slower performance on Arm64 with Phi3 and Lexi-Llama on 1.39 bug performance

#2980 opened 2025-11-12 11:21:04 -06:00 by GiteaMirror

1

Ollama’s speed in generating chat content slowed down by tenfold When switching the chat format to JSON api bug performance

#2730 opened 2025-11-12 11:09:38 -06:00 by GiteaMirror

12

Degraded response quality on v 0.1.33 bug performance

#2640 opened 2025-11-12 11:06:50 -06:00 by GiteaMirror

13

Why Ollama is so terribly slow when I set format="json" api bug performance

#2389 opened 2025-11-12 10:57:48 -06:00 by GiteaMirror

6

Ollama is not using the 100% of RTX4000 VRAM (18 of 20GB) nvidia performance

#1896 opened 2025-11-12 10:37:07 -06:00 by GiteaMirror

29

Ollama only using half of available CPU cores with NUMA multi-socket systems bug linux performance

#1796 opened 2025-11-12 10:33:46 -06:00 by GiteaMirror

37

If you have multiple GPUs then the new default split_mode = "layer" option in the wrapped llama.cpp server may effect you alot! nvidia performance

#1253 opened 2025-11-12 10:07:25 -06:00 by GiteaMirror