-
released this
2025-09-18 18:11:08 -05:00 | 291 commits to main since this release📅 Originally published on GitHub: Thu, 18 Sep 2025 17:29:57 GMT
🏷️ Git tag created: Thu, 18 Sep 2025 23:11:08 GMTCloud models
Cloud models are now available in preview, allowing you to run a group of larger models with fast, datacenter-grade hardware.
To run a cloud model, use:
ollama run qwen3-coder:480b-cloudWhat's Changed
- Models with the Bert architecture now run on Ollama's engine
- Models with the Qwen 3 architecture now run on Ollama's engine
- Fix issue where older NVIDIA GPUs would not be detected if newer drivers were installed
- Fixed issue where models would not be imported correctly with
ollama create - Ollama will skip parsing the initial
<think>if provided in the prompt for /api/generate by @rick-github
New Contributors
- @egyptianbman made their first contribution in https://github.com/ollama/ollama/pull/12300
- @russcoss made their first contribution in https://github.com/ollama/ollama/pull/12280
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.11...v0.12.0
Downloads
-
released this
2025-09-12 18:40:14 -05:00 | 314 commits to main since this release📅 Originally published on GitHub: Thu, 11 Sep 2025 21:02:41 GMT
🏷️ Git tag created: Fri, 12 Sep 2025 23:40:14 GMTWhat's Changed
- Support for CUDA 13
- Improved memory usage when using gpt-oss in Ollama's app
- Better scrolling better in Ollama's app when submitting long prompts
- Cmd +/- will now zoom and shrink text in Ollama's app
- Assistant messages can now by copied in Ollama's app
- Fixed error that would occur when attempting to import satefensor files by @rick-github in https://github.com/ollama/ollama/pull/12176
- Improved memory estimates for hybrid and recurrent models by @gabe-l-hart in https://github.com/ollama/ollama/pull/12186
- Fixed error that would occur when when batch size was greater than context length
- Flash attention & KV cache quantization validation fixes by @jessegross in https://github.com/ollama/ollama/pull/12231
- Add
dimensionsfield to embed requests by @mxyng in https://github.com/ollama/ollama/pull/12242 - Enable new memory estimates in Ollama's new engine by default by @jessegross in https://github.com/ollama/ollama/pull/12252
- Ollama will no longer load split vision models in the Ollama engine by @jessegross in https://github.com/ollama/ollama/pull/12241
New Contributors
- @KashyapTan made their first contribution in https://github.com/ollama/ollama/pull/12188
- @carbonatedWaterOrg made their first contribution in https://github.com/ollama/ollama/pull/12230
- @fengyuchuanshen made their first contribution in https://github.com/ollama/ollama/pull/12249
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.10...v0.11.11
Downloads
-
released this
2025-09-04 11:09:07 -05:00 | 339 commits to main since this release📅 Originally published on GitHub: Thu, 04 Sep 2025 17:27:40 GMT
🏷️ Git tag created: Thu, 04 Sep 2025 16:09:07 GMTNew models
- EmbeddingGemma a new open embedding model that delivers best-in-class performance for its size
What's Changed
- Support for EmbeddingGemma
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.9...v0.11.10
Downloads
-
released this
2025-09-02 15:09:12 -05:00 | 341 commits to main since this release📅 Originally published on GitHub: Tue, 02 Sep 2025 20:14:18 GMT
🏷️ Git tag created: Tue, 02 Sep 2025 20:09:12 GMTWhat's Changed
- Improved performance via overlapping GPU and CPU computations
- Fixed issues where unrecognized AMD GPU would cause an error
- Reduce crashes due to unhandled errors in some Mac and Linux installations of Ollama
New Contributors
- @alpha-nerd-nomyo made their first contribution in https://github.com/ollama/ollama/pull/12129
- @pxwanglu made their first contribution in https://github.com/ollama/ollama/pull/12123
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.8...v0.11.9-rc0
Downloads
-
released this
2025-08-28 14:27:13 -05:00 | 348 commits to main since this release📅 Originally published on GitHub: Wed, 27 Aug 2025 18:43:44 GMT
🏷️ Git tag created: Thu, 28 Aug 2025 19:27:13 GMTWhat's Changed
gpt-ossnow has flash attention enabled by default for systems that support it- Improved load times for
gpt-oss
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.7...v0.11.8
Downloads
-
released this
2025-08-22 18:26:48 -05:00 | 355 commits to main since this release📅 Originally published on GitHub: Mon, 25 Aug 2025 18:04:05 GMT
🏷️ Git tag created: Fri, 22 Aug 2025 23:26:48 GMTDeepSeek-V3.1
DeepSeek-V3.1 is now available to run via Ollama.
This model supports hybrid thinking, meaning thinking can be enabled or disabled by setting
thinkin Ollama's API:curl http://localhost:11434/api/chat -d '{ "model": "deepseek-v3.1", "messages": [ { "role": "user", "content": "why is the sky blue?" } ], "think": true }'In Ollama's CLI, thinking can be enabled or disabled by running the
/set thinkor/set nothinkcommands.Turbo (in preview)
DeepSeek-V3.1 has over 671B parameters, and so a large amount of VRAM is required to run it. Ollama's Turbo mode (in preview) provides access to powerful hardware in the cloud you can use to run the model.
Turbo via Ollama's app
- Download Ollama for macOS or Windows
- Select
deepseek-v3.1:671bfrom the model selector - Enable Turbo
Turbo via Ollama's CLI and libraries
- Create an account on ollama.com/signup
- Follow the docs for Ollama's CLI to upload authenticate your Ollama installation
- Run the following:
OLLAMA_HOST=ollama.com ollama run deepseek-v3.1For instructions on using Turbo with Ollama's Python and JavaScript library, see the docs
What's Changed
- Fixed issue where multiple models would not be loaded on CPU-only systems
- Ollama will now work with models who skip outputting the initial
<think>tag (e.g. DeepSeek-V3.1) - Fixed issue where text would be emitted when there is no opening
<think>tag from a model - Fixed issue where tool calls containing
{or}would not be parsed correctly
New Contributors
- @zoupingshi made their first contribution in https://github.com/ollama/ollama/pull/12028
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.6...v0.11.7
Downloads
-
released this
2025-08-20 00:58:33 -05:00 | 365 commits to main since this release📅 Originally published on GitHub: Wed, 20 Aug 2025 21:00:13 GMT
🏷️ Git tag created: Wed, 20 Aug 2025 05:58:33 GMTWhat's Changed
- Ollama's app will now switch between chats faster
- Improved layout of messages in Ollama's app
- Fixed issue where command prompt would show when Ollama's app detected an old version of Ollama running
- Improved performance when using flash attention
- Fixed boundary case when encoding text using BPE
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.5...v0.11.6
Downloads
-
released this
2025-08-18 19:45:40 -05:00 | 369 commits to main since this release📅 Originally published on GitHub: Fri, 15 Aug 2025 02:38:31 GMT
🏷️ Git tag created: Tue, 19 Aug 2025 00:45:40 GMTWhat's Changed
- Performance improvements for the
gpt-ossmodels - New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with
OLLAMA_NEW_ESTIMATES=1 ollama serveand will soon be enabled by default. - Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
- Ollama's new app will now remember default selections for default model, Turbo and Web Search between restarts
- Fix error when parsing bad harmony tool calls
OLLAMA_FLASH_ATTENTION=1will also enable flash attention for pure-CPU models- Fixed OpenAI-compatible API not supporting
reasoning_effort - Reduced size of installation on Windows and Linux
New Contributors
- @vorburger made their first contribution in https://github.com/ollama/ollama/pull/11755
- @dan-and made their first contribution in https://github.com/ollama/ollama/pull/10678
- @youzichuan made their first contribution in https://github.com/ollama/ollama/pull/11880
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4...v0.11.5
Downloads
- Performance improvements for the
-
released this
2025-08-07 16:23:55 -05:00 | 406 commits to main since this release📅 Originally published on GitHub: Thu, 07 Aug 2025 17:17:41 GMT
🏷️ Git tag created: Thu, 07 Aug 2025 21:23:55 GMTWhat's Changed
- openai: allow for content and tool calls in the same message by @drifkin in https://github.com/ollama/ollama/pull/11759
- openai: when converting role=tool messages, propagate the tool name by @drifkin in https://github.com/ollama/ollama/pull/11761
- openai: always provide reasoning by @drifkin in https://github.com/ollama/ollama/pull/11765
New Contributors
- @gao-feng made their first contribution in https://github.com/ollama/ollama/pull/11170
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.3...v0.11.4
Downloads
-
released this
2025-08-05 19:29:08 -05:00 | 416 commits to main since this release📅 Originally published on GitHub: Wed, 06 Aug 2025 01:29:59 GMT
🏷️ Git tag created: Wed, 06 Aug 2025 00:29:08 GMTWhat's Changed
- Fixed issue where
gpt-osswould consume too much VRAM when split across GPU & CPU or multiple GPUs - Statically link C++ libraries on windows for better compatibility
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.2...v0.11.3
Downloads
- Fixed issue where
mirror of
https://github.com/ollama/ollama.git
synced 2025-12-05 18:46:22 -06:00