-
released this
2025-11-05 14:33:01 -06:00 | 110 commits to main since this release📅 Originally published on GitHub: Wed, 05 Nov 2025 21:41:21 GMT
🏷️ Git tag created: Wed, 05 Nov 2025 20:33:01 GMTollama runnow works with embedding modelsollama runcan now run embedding models to generate vector embeddings from text:ollama run embeddinggemma "Hello world"Content can also be provided to
ollama runvia standard input:echo "Hello world" | ollama run embeddinggemmaWhat's Changed
- Fixed errors when running
qwen3-vl:235bandqwen3-vl:235b-instruct - Enable flash attention for Vulkan (currently needs to be built from source)
- Add Vulkan memory detection for Intel GPU using DXGI+PDH
- Ollama will now return tool call IDs from the
/api/chatAPI - Fixed hanging due to CPU discovery
- Ollama will now show login instructions when switching to a cloud model in interactive mode
- Fix reading stale VRAM data
ollama runnow works with embedding models
New Contributors
- @ryanycoleman made their first contribution in https://github.com/ollama/ollama/pull/11740
- @Rajathbail made their first contribution in https://github.com/ollama/ollama/pull/12929
- @virajwad made their first contribution in https://github.com/ollama/ollama/pull/12664
- @AXYZdong made their first contribution in https://github.com/ollama/ollama/pull/8601
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.9...v0.12.10
Downloads
- Fixed errors when running
-
released this
2025-10-31 17:23:28 -05:00 | 128 commits to main since this release📅 Originally published on GitHub: Fri, 31 Oct 2025 23:33:13 GMT
🏷️ Git tag created: Fri, 31 Oct 2025 22:23:28 GMTWhat's Changed
- Fix performance regression on CPU-only systems
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.8...v0.12.9
Downloads
-
released this
2025-10-30 17:12:14 -05:00 | 132 commits to main since this release📅 Originally published on GitHub: Thu, 30 Oct 2025 23:22:27 GMT
🏷️ Git tag created: Thu, 30 Oct 2025 22:12:14 GMTWhat's Changed
qwen3-vlperformance improvements, including flash attention support by defaultqwen3-vlwill now output less leading whitespace in the response when thinking- Fixed issue where
deepseek-v3.1thinking could not be disabled in Ollama's new app - Fixed issue where
qwen3-vlwould fail to interpret images with transparent backgrounds - Ollama will now stop running a model before removing it via
ollama rm - Fixed issue where prompt processing would be slower on Ollama's engine
- Ignore unsupported iGPUs when doing device discovery on Windows
New Contributors
- @athshh made their first contribution in https://github.com/ollama/ollama/pull/12822
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.7...v0.12.8
Downloads
-
released this
2025-10-29 13:50:56 -05:00 | 144 commits to main since this release📅 Originally published on GitHub: Wed, 29 Oct 2025 02:07:54 GMT
🏷️ Git tag created: Wed, 29 Oct 2025 18:50:56 GMTNew models
- Qwen3-VL: Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
- MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama's cloud
Add files and adjust thinking levels in Ollama's new app
Ollama's new app now includes a way to add one or many files when prompting the model:
For better responses, thinking levels can now be adjusted for the gpt-oss models:
New API documentation
New API documentation is available for Ollama's API: https://docs.ollama.com/api
What's Changed
- Model load failures now include more information on Windows
- Fixed embedding results being incorrect when running
embeddinggemma - Fixed gemma3n on Vulkan backend
- Increased time allocated for ROCm to discover devices
- Fixed truncation error when generating embeddings
- Fixed request status code when running cloud models
- The OpenAI-compatible
/v1/embeddingsendpoint now supportsencoding_formatparameter - Ollama will now parse tool calls that don't conform to
{"name": name, "arguments": args}(thanks @rick-github!) - Fixed prompt processing reporting in the llama runner
- Increase speed when scheduling models
- Fixed issue where
FROM <model>would not inheritRENDERERorPARSERcommands
New Contributors
- @npardal made their first contribution in https://github.com/ollama/ollama/pull/12715
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.6...v0.12.7
Downloads
-
released this
2025-10-16 15:07:41 -05:00 | 185 commits to main since this release📅 Originally published on GitHub: Wed, 15 Oct 2025 23:02:31 GMT
🏷️ Git tag created: Thu, 16 Oct 2025 20:07:41 GMTWhat's Changed
- Ollama's app now supports searching when running DeepSeek-V3.1, Qwen3 and other models that support tool calling.
- Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
- Fixed issue where Ollama would hang while generating responses
- Fixed issue where
qwen3-coderwould act in raw mode when using/api/generateorollama run qwen3-coder <prompt> - Fixed
qwen3-embeddingproviding invalid results - Ollama will now evict models correctly when
num_gpuis set - Fixed issue where
tool_indexwith a value of0would not be sent to the model
Experimental Vulkan Support
Experimental support for Vulkan is now available when you build locally from source. This will enable additional GPUs from AMD, and Intel which are not currently supported by Ollama. To build locally, install the Vulkan SDK and set VULKAN_SDK in your environment, then follow the developer instructions. In a future release, Vulkan support will be included in the binary release as well. Please file issues if you run into any problems.
New Contributors
- @yajianggroup made their first contribution in https://github.com/ollama/ollama/pull/12377
- @inforithmics made their first contribution in https://github.com/ollama/ollama/pull/11835
- @sbhavani made their first contribution in https://github.com/ollama/ollama/pull/12619
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.5...v0.12.6
Downloads
-
released this
2025-10-09 21:08:21 -05:00 | 224 commits to main since this release📅 Originally published on GitHub: Fri, 10 Oct 2025 16:30:53 GMT
🏷️ Git tag created: Fri, 10 Oct 2025 02:08:21 GMTWhat's Changed
- Thinking models now support structured outputs when using the
/api/chatAPI - Ollama's app will now wait until Ollama is running to allow for a conversation to be started
- Fixed issue where
"think": falsewould show an error instead of being silently ignored - Fixed
deepseek-r1output issues - macOS 12 Monterey and macOS 13 Ventura are no longer supported
- AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.
New Contributors
- @shengxinjing made their first contribution in https://github.com/ollama/ollama/pull/12415
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.4...v0.12.5-rc0
Downloads
- Thinking models now support structured outputs when using the
-
released this
2025-10-09 12:37:47 -05:00 | 230 commits to main since this release📅 Originally published on GitHub: Fri, 03 Oct 2025 16:38:12 GMT
🏷️ Git tag created: Thu, 09 Oct 2025 17:37:47 GMTWhat's Changed
- Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
- Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
- Fixed an issue where
keep_alivein the API would accept different values for the/api/chatand/api/generateendpoints - Fixed tool calling rendering with
qwen3-coder - More reliable and accurate VRAM detection
OLLAMA_FLASH_ATTENTIONcan now be overridden to0for models that have flash attention enabled by default- macOS 12 Monterey and macOS 13 Ventura are no longer supported
- Fixed crash where templates were not correctly defined
- Fix memory calculations on NVIDIA iGPUs
- AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.
New Contributors
- @Fachep made their first contribution in https://github.com/ollama/ollama/pull/12412
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.3...v0.12.4-rc3
Downloads
-
released this
2025-09-25 20:30:45 -05:00 | 270 commits to main since this release📅 Originally published on GitHub: Fri, 26 Sep 2025 05:08:26 GMT
🏷️ Git tag created: Fri, 26 Sep 2025 01:30:45 GMTNew models
-
DeepSeek-V3.1-Terminus: DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode. It delivers more stable & reliable outputs across benchmarks compared to the previous version:
Run on Ollama's cloud:
ollama run deepseek-v3.1:671b-cloudRun locally (requires 500GB+ of VRAM)
ollama run deepseek-v3.1 -
Kimi-K2-Instruct-0905: Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters.
ollama run kimi-k2:1t-cloud
What's Changed
- Fixed issue where tool calls provided as stringified JSON would not be parsed correctly
ollama pushwill now provide a URL to follow to sign in- Fixed issues where qwen3-coder would output unicode characters incorrectly
- Fix issue where loading a model with
/loadwould crash
New Contributors
- @gr4ceG made their first contribution in https://github.com/ollama/ollama/pull/12385
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.2...v0.12.3
Downloads
-
-
released this
2025-09-24 13:21:32 -05:00 | 276 commits to main since this release📅 Originally published on GitHub: Wed, 24 Sep 2025 21:19:20 GMT
🏷️ Git tag created: Wed, 24 Sep 2025 18:21:32 GMTWeb search
A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud. This web search capability can augment models with the latest information from the web to reduce hallucinations and improve accuracy.
What's Changed
- Models with Qwen3's architecture including MoE now run in Ollama's new engine
- Fixed issue where built-in tools for gpt-oss were not being rendered correctly
- Support multi-regex pretokenizers in Ollama's new engine
- Ollama's new engine can now load tensors by matching a prefix or suffix
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.1...v0.12.2
Downloads
-
released this
2025-09-23 01:20:20 -05:00 | 282 commits to main since this release📅 Originally published on GitHub: Sun, 21 Sep 2025 23:19:05 GMT
🏷️ Git tag created: Tue, 23 Sep 2025 06:20:20 GMTNew models
- Qwen3 Embedding: state of the art open embedding model by the Qwen team
What's Changed
- Qwen3-Coder now supports tool calling
- Ollama's app will now longer show "connection lost" in error when connecting to cloud models
- Fixed issue where Gemma3 QAT models would not output correct tokens
- Fix issue where
&characters in Qwen3-Coder would not be parsed correctly when function calling - Fixed issues where
ollama signinwould not work properly on Linux
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.0...v0.12.1
Downloads
mirror of
https://github.com/ollama/ollama.git
synced 2025-12-05 18:46:22 -06:00