-
released this
2025-05-28 21:38:52 -05:00 | 535 commits to main since this release📅 Originally published on GitHub: Thu, 29 May 2025 05:41:01 GMT
🏷️ Git tag created: Thu, 29 May 2025 02:38:52 GMTNew models
- DeepSeek-R1-2508: DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities.
Thinking
Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.
When thinking is enabled, the output will separate the model’s thinking from the model’s output. When thinking is disabled, the model will not think and directly output the content.
Models that support thinking:
- DeepSeek R1
- Qwen 3
- more will be added under thinking models.
When running a model that supports thinking, Ollama will now display the model's thoughts:
% ollama run deepseek-r1 >>> How many Rs are in strawberry Thinking... First, I need to understand what the question is asking. It's asking how many letters 'R' are present in the word "strawberry." Next, I'll examine each letter in the word individually. I'll start from the beginning and count every occurrence of the letter 'R.' After reviewing all the letters, I determine that there are three instances where the letter 'R' appears in the word "strawberry." ...done thinking. There are three **Rs** in the word **"strawberry"**.In Ollama's API, a model's thinking is now returned as a separate
thinkingfield for easy parsing:{ "message": { "role": "assistant", "thinking": "First, I need to understand what the question is asking. It's asking how many letters 'R' are present in the word "strawberry...", "content": "There are **3** instances of the letter **R** in the word **"strawberry."**" } }Turning thinking on and off
In the API, thinking can be enabled by passing
"think": trueand disabled by passing"think": falsecurl http://localhost:11434/api/chat -d '{ "model": "deepseek-r1", "messages": [ { "role": "user", "content": "Why is the sky blue?" }, ], "think": true }'In Ollama's CLI, use
/set thinkand/set nothinkto enable and disable thinking.What's Changed
- Add thinking support to Ollama
Full Changelog: https://github.com/ollama/ollama/compare/v0.8.0...v0.9.0
Downloads
-
released this
2025-05-27 18:50:57 -05:00 | 536 commits to main since this release📅 Originally published on GitHub: Tue, 27 May 2025 19:55:15 GMT
🏷️ Git tag created: Tue, 27 May 2025 23:50:57 GMTWhat's Changed
- Ollama will now stream responses with tool calls blog post
- Logs will now include better memory estimate debug information when running models in Ollama's engine.
New Contributors
- @hellotunamayo made their first contribution in https://github.com/ollama/ollama/pull/10790
Full Changelog: https://github.com/ollama/ollama/compare/v0.7.1...v0.8.0
Downloads
-
released this
2025-05-22 20:53:31 -05:00 | 548 commits to main since this release📅 Originally published on GitHub: Wed, 21 May 2025 22:17:35 GMT
🏷️ Git tag created: Fri, 23 May 2025 01:53:31 GMTWhat's Changed
- Improved model memory management to allocate sufficient memory to prevent crashes when running multimodal models in certain situations
- Enhanced memory estimation for models to prevent unintended memory offloading
ollama showwill now show...when data is truncated- Fixed crash that would occur with
qwen2.5vl - Fixed crash on Nvidia's CUDA for
llama3.2-vision - Support for Alibaba's Qwen 3 and Qwen 2 architectures in Ollama's new multimodal engine
New Contributors
- @ronxldwilson made their first contribution in https://github.com/ollama/ollama/pull/10763
- @DarkCaster made their first contribution in https://github.com/ollama/ollama/pull/10779
Full Changelog: https://github.com/ollama/ollama/compare/v0.7.0...v0.7.1
Downloads
-
released this
2025-05-14 18:42:30 -05:00 | 580 commits to main since this release📅 Originally published on GitHub: Tue, 13 May 2025 00:10:33 GMT
🏷️ Git tag created: Wed, 14 May 2025 23:42:30 GMTOllama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:
What's Changed
- Ollama now supports providing WebP images as input to multimodal models
- Fixed issue where a blank terminal window would appear when runnings models on Windows
- Fixed error that would occur when running
llama4on NVIDIA GPUs - Reduced log level of
key not foundmessage - Ollama will now correct remove quotes from image paths when sending images as input with
ollama run - Improved performance of importing safetensors models via
ollama create - Improved prompt processing speeds of Qwen3 MoE on macOS
- Fixed issue where providing large JSON schemas in structured output requests would result in an error
- Ollama's API will now return code 405 instead of 404 for methods that are not allowed
- Fixed issue where
ollamaprocesses would continue to run after a model was unloaded
New Contributors
- @ashokgelal made their first contribution in https://github.com/ollama/ollama/pull/8668
- @Aharon-Bensadoun made their first contribution in https://github.com/ollama/ollama/pull/9719
- @HardCodeDev777 made their first contribution in https://github.com/ollama/ollama/pull/10664
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.8...v0.7.0
Downloads
-
released this
2025-05-03 15:11:48 -05:00 | 626 commits to main since this release📅 Originally published on GitHub: Sat, 03 May 2025 22:56:44 GMT
🏷️ Git tag created: Sat, 03 May 2025 20:11:48 GMTWhat's Changed
- Performance improvements for Qwen 3 MoE models (
30b-a3band235b-a22b) on NVIDIA and AMD GPUs - Fixed
GGML_ASSERT(tensor->op == GGML_OP_UNARY) failedissue caused by conflicting installations - Fixed a memory leak that occurred when providing images as input
ollama showwill now correctly label older vision models such asllava- Reduced out of memory errors by improving worst-case memory estimations
- Fix issue that resulted in a
context cancelederror
New Contributors
- @AliAhmedNada made their first contribution in https://github.com/ollama/ollama/pull/10522
- @HarshNevse made their first contribution in https://github.com/ollama/ollama/pull/10465
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.7...v0.6.8
Downloads
- Performance improvements for Qwen 3 MoE models (
-
released this
2025-04-30 19:59:31 -05:00 | 638 commits to main since this release📅 Originally published on GitHub: Sat, 26 Apr 2025 09:16:02 GMT
🏷️ Git tag created: Thu, 01 May 2025 00:59:31 GMTNew models
- Qwen 3: Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
- Phi 4 reasoning and Phi-4-mini-reasoning: New state-of-the-art reasoning models from Microsoft
- Llama 4: state-of-the-art multi-modal models from Meta
What's Changed
- Add support for Meta's Llama 4 multimodal models
- Add support for Microsoft's Phi 4 reasoning models, and Phi 4 mini reasoning model
- Increased default context window to 4096 tokens
- Fixed issue where image paths would not be recognized with
~when being provided toollama run - Improved output quality when using JSON mode in certain scenarios
- Fixed
tensor->op == GGML_OP_UNARYerrors when running a model due to conflicting inference libraries - Fixed issue where model would be stuck in the
Stopping...state
New Contributors
- @greengrass821 made their first contribution in https://github.com/ollama/ollama/pull/10339
- @richardshiue made their first contribution in https://github.com/ollama/ollama/pull/10335
- @aduermael made their first contribution in https://github.com/ollama/ollama/pull/10386
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.6...v0.6.7
Downloads
-
released this
2025-04-18 20:13:05 -05:00 | 679 commits to main since this release📅 Originally published on GitHub: Thu, 17 Apr 2025 04:34:58 GMT
🏷️ Git tag created: Sat, 19 Apr 2025 01:13:05 GMTNew models
- IBM Granite 3.3: 2B and 8B models with 128K context length that have been fine-tuned for improved reasoning and instruction-following capabilities.
- DeepCoder: a fully open-Source 14B coder model at O3-mini level, with a 1.5B version also available.
What's Changed
- New, faster model downloading:
OLLAMA_EXPERIMENT=client2 ollama servewill run Ollama using a new downloader with improved performance and reliability when runningollama pull. Please share feedback here! - Fixed memory leak issues when running Gemma 3, Mistral Small 3.1 and other models on Ollama
- Improved performance of
ollama createwhen importing models from Safetensors - Ollama will now allow tool function parameters with either a single type or an array of types by @rozgo
- Fixed certain out of memory issues from not reserving enough memory at startup
- Fix nondeterministic model unload order by @IreGaddr
- Include the
itemsand$defsfields to properly handlearraytypes in the API by @sheffler OpenAI-Betaheaders are now included in the CORS safelist by @drifkin- Fixed issue where model tensor data would be corrupted when importing models from Safetensors
New Contributors
- @drifkin made their first contribution in https://github.com/ollama/ollama/pull/10169
- @rozgo made their first contribution in https://github.com/ollama/ollama/pull/9434
- @qwerty108109 made their first contribution in https://github.com/ollama/ollama/pull/10168
- @IreGaddr made their first contribution in https://github.com/ollama/ollama/pull/10185
- @sheffler made their first contribution in https://github.com/ollama/ollama/pull/10091
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.5...v0.6.6
Downloads
-
released this
2025-04-04 19:04:24 -05:00 | 720 commits to main since this release📅 Originally published on GitHub: Sun, 06 Apr 2025 00:15:39 GMT
🏷️ Git tag created: Sat, 05 Apr 2025 00:04:24 GMTNew models
- Mistral Small 3.1: the best performing vision model in its weight class.
What's Changed
- Support for Mistral Small 3.1
- Improved model loading times for Gemma 3 on network-backed filesystems such as Google Cloud Storage FUSE
New Contributors
- @danhipke made their first contribution in https://github.com/ollama/ollama/pull/10133
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.4...v0.6.5
Downloads
-
released this
2025-04-02 15:22:56 -05:00 | 725 commits to main since this release📅 Originally published on GitHub: Wed, 02 Apr 2025 22:14:24 GMT
🏷️ Git tag created: Wed, 02 Apr 2025 20:22:56 GMTWhat's Changed
/api/showwill now include model capabilities such asvision- Fixed certain out-of-memory errors that would occur with parallel requests with Gemma 3
- Gemma 3 will now properly understand and output certain multilingual characters
- Fixed context shifting issues with models using the DeepSeek architecture
- Fixed issues with 0.6.3 where Gemma 3's output quality would worsen after 512 or 1024 tokens
- Added AMD RDNA4 support on Linux
New Contributors
- @saman-amd made their first contribution in https://github.com/ollama/ollama/pull/9878
- @leandroBorgesFerreira made their first contribution in https://github.com/ollama/ollama/pull/10042
- @Abyss-c0re made their first contribution in https://github.com/ollama/ollama/pull/9955
- @uggrock made their first contribution in https://github.com/ollama/ollama/pull/9983
- @IsAurora6 made their first contribution in https://github.com/ollama/ollama/pull/10057
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.3...v0.6.4
Downloads
-
released this
2025-03-26 15:39:01 -05:00 | 743 commits to main since this release📅 Originally published on GitHub: Sat, 22 Mar 2025 02:56:08 GMT
🏷️ Git tag created: Wed, 26 Mar 2025 20:39:01 GMTWhat's Changed
- New sliding window attention optimizations for Gemma 3, improving inference speed and memory allocation for long context windows.
- Improved loading speed of Gemma 3
ollama createwill now return the name of unsupported architectures- Fixed error
talloc->buffer_id >= 0when running a model - Fixed
(int)sched->hash_set.size >= graph->n_nodes + graph->n_leafserror when running a model ollama createwill now automatically select the right template when importing Gemma 3 from safetensorsollama show -vwill now correctly render boolean values astrueorfalse
New Contributors
- @rylativity made their first contribution in https://github.com/ollama/ollama/pull/9874
Full Changelog: https://github.com/ollama/ollama/compare/v0.6.2...v0.6.3
Downloads
mirror of
https://github.com/ollama/ollama.git
synced 2025-12-05 18:46:22 -06:00