mirror of
https://github.com/ollama/ollama.git
synced 2025-12-05 18:46:22 -06:00
-
released this
2025-10-09 12:37:47 -05:00 | 230 commits to main since this release📅 Originally published on GitHub: Fri, 03 Oct 2025 16:38:12 GMT
🏷️ Git tag created: Thu, 09 Oct 2025 17:37:47 GMTWhat's Changed
- Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
- Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
- Fixed an issue where
keep_alivein the API would accept different values for the/api/chatand/api/generateendpoints - Fixed tool calling rendering with
qwen3-coder - More reliable and accurate VRAM detection
OLLAMA_FLASH_ATTENTIONcan now be overridden to0for models that have flash attention enabled by default- macOS 12 Monterey and macOS 13 Ventura are no longer supported
- Fixed crash where templates were not correctly defined
- Fix memory calculations on NVIDIA iGPUs
- AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.
New Contributors
- @Fachep made their first contribution in https://github.com/ollama/ollama/pull/12412
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.3...v0.12.4-rc3
Downloads