[GH-ISSUE #13049] When will minimax-m2 be supported? #34406

Closed
opened 2026-04-22 17:55:44 -05:00 by GiteaMirror · 24 comments
Owner

Originally created by @ghost on GitHub (Nov 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13049

What is the issue?

I would really like to test minimax-m2. Am getting an unknown model architecture: 'minimax-m2' error.

Relevant log output

time=2025-11-11T23:49:01.672+11:00 level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\user\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-11-11T23:49:01.678+11:00 level=INFO source=images.go:522 msg="total blobs: 49"
time=2025-11-11T23:49:01.680+11:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-11T23:49:01.682+11:00 level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.12.10)"
time=2025-11-11T23:49:01.682+11:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-11T23:49:01.691+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49732"
time=2025-11-11T23:49:02.137+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49745"
time=2025-11-11T23:49:03.406+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49753"
time=2025-11-11T23:49:03.713+11:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-3f8ecd83-47d0-81a2-ed3c-5c41dd1507fc filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=13.0 pci_id=0000:15:00.0 type=discrete total="12.0 GiB" available="11.8 GiB"
time=2025-11-11T23:49:03.713+11:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-cbe06948-85e0-6899-f43f-21e1865c0283 filter_id="" library=CUDA compute=8.6 name=CUDA1 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=13.0 pci_id=0000:21:00.0 type=discrete total="12.0 GiB" available="11.4 GiB"
[GIN] 2025/11/11 - 23:49:03 | 200 |      1.1656ms |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/11/11 - 23:49:03 | 404 |      5.3881ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/11/11 - 23:49:03 | 200 |      9.7446ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/11/11 - 23:49:05 | 200 |        34.5µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/11/11 - 23:49:05 | 200 |     86.9558ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/11/11 - 23:49:05 | 200 |    166.3243ms |       127.0.0.1 | POST     "/api/show"
time=2025-11-11T23:49:05.858+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49762"
time=2025-11-11T23:49:06.390+11:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2025-11-11T23:49:06.390+11:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=6 efficiency=0 threads=12
llama_model_loader: loaded meta data with 41 key-value pairs and 809 tensors from C:\Users\user\.ollama\models\blobs\sha256-aaf82562594aa1a9ee8f47ca3e666d4401d7e6157fa79d76ca3f56ed2c5d9b9e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = minimax-m2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Minimax-M2
llama_model_loader: - kv   3:                           general.basename str              = Minimax-M2
llama_model_loader: - kv   4:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   5:                         general.size_label str              = 256x4.9B
llama_model_loader: - kv   6:                            general.license str              = mit
llama_model_loader: - kv   7:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv   8:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv   9:                     minimax-m2.block_count u32              = 62
llama_model_loader: - kv  10:                  minimax-m2.context_length u32              = 196608
llama_model_loader: - kv  11:                minimax-m2.embedding_length u32              = 3072
llama_model_loader: - kv  12:             minimax-m2.feed_forward_length u32              = 1536
llama_model_loader: - kv  13:            minimax-m2.attention.head_count u32              = 48
llama_model_loader: - kv  14:         minimax-m2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                  minimax-m2.rope.freq_base f32              = 5000000.000000
llama_model_loader: - kv  16: minimax-m2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  17:                    minimax-m2.expert_count u32              = 256
llama_model_loader: - kv  18:               minimax-m2.expert_used_count u32              = 8
llama_model_loader: - kv  19:            minimax-m2.attention.key_length u32              = 128
llama_model_loader: - kv  20:          minimax-m2.attention.value_length u32              = 128
llama_model_loader: - kv  21:              minimax-m2.expert_gating_func u32              = 2
llama_model_loader: - kv  22:      minimax-m2.expert_feed_forward_length u32              = 1536
llama_model_loader: - kv  23:            minimax-m2.rope.dimension_count u32              = 64
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = minimax-m2
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,200064]  = ["Ā", "ā", "Ă", "ă", "Ą", "ą", ...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,200064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,199744]  = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "e r...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 200034
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 200020
llama_model_loader: - kv  31:            tokenizer.ggml.unknown_token_id u32              = 200021
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 200004
llama_model_loader: - kv  33:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {# Unsloth & community template fixes...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - kv  36:                          general.file_type u32              = 24
llama_model_loader: - kv  37:                      quantize.imatrix.file str              = MiniMax-M2-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv  38:                   quantize.imatrix.dataset str              = unsloth_calibration_MiniMax-M2.txt
llama_model_loader: - kv  39:             quantize.imatrix.entries_count u32              = 496
llama_model_loader: - kv  40:              quantize.imatrix.chunks_count u32              = 697
llama_model_loader: - type  f32:  373 tensors
llama_model_loader: - type q4_K:    1 tensors
llama_model_loader: - type q5_K:   18 tensors
llama_model_loader: - type q6_K:   11 tensors
llama_model_loader: - type iq2_xxs:   31 tensors
llama_model_loader: - type iq3_xxs:  106 tensors
llama_model_loader: - type iq1_s:  131 tensors
llama_model_loader: - type iq3_s:  103 tensors
llama_model_loader: - type iq4_xs:   35 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = IQ1_S - 1.5625 bpw
print_info: file size   = 52.49 GiB (1.97 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minimax-m2'
llama_model_load_from_file_impl: failed to load model
time=2025-11-11T23:49:06.704+11:00 level=INFO source=sched.go:425 msg="NewLlamaServer failed" model=C:\Users\user\.ollama\models\blobs\sha256-aaf82562594aa1a9ee8f47ca3e666d4401d7e6157fa79d76ca3f56ed2c5d9b9e error="unable to load model: C:\\Users\\user\\.ollama\\models\\blobs\\sha256-aaf82562594aa1a9ee8f47ca3e666d4401d7e6157fa79d76ca3f56ed2c5d9b9e"
[GIN] 2025/11/11 - 23:49:06 | 500 |    1.0296521s |       127.0.0.1 | POST     "/api/generate"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.12.10

Originally created by @ghost on GitHub (Nov 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13049 ### What is the issue? I would really like to test minimax-m2. Am getting an unknown model architecture: 'minimax-m2' error. ### Relevant log output ```shell time=2025-11-11T23:49:01.672+11:00 level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\user\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-11-11T23:49:01.678+11:00 level=INFO source=images.go:522 msg="total blobs: 49" time=2025-11-11T23:49:01.680+11:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-11T23:49:01.682+11:00 level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.12.10)" time=2025-11-11T23:49:01.682+11:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-11T23:49:01.691+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49732" time=2025-11-11T23:49:02.137+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49745" time=2025-11-11T23:49:03.406+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49753" time=2025-11-11T23:49:03.713+11:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-3f8ecd83-47d0-81a2-ed3c-5c41dd1507fc filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=13.0 pci_id=0000:15:00.0 type=discrete total="12.0 GiB" available="11.8 GiB" time=2025-11-11T23:49:03.713+11:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-cbe06948-85e0-6899-f43f-21e1865c0283 filter_id="" library=CUDA compute=8.6 name=CUDA1 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=13.0 pci_id=0000:21:00.0 type=discrete total="12.0 GiB" available="11.4 GiB" [GIN] 2025/11/11 - 23:49:03 | 200 | 1.1656ms | 127.0.0.1 | GET "/api/version" [GIN] 2025/11/11 - 23:49:03 | 404 | 5.3881ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/11/11 - 23:49:03 | 200 | 9.7446ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/11/11 - 23:49:05 | 200 | 34.5µs | 127.0.0.1 | HEAD "/" [GIN] 2025/11/11 - 23:49:05 | 200 | 86.9558ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/11/11 - 23:49:05 | 200 | 166.3243ms | 127.0.0.1 | POST "/api/show" time=2025-11-11T23:49:05.858+11:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\user\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 49762" time=2025-11-11T23:49:06.390+11:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2025-11-11T23:49:06.390+11:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=6 efficiency=0 threads=12 llama_model_loader: loaded meta data with 41 key-value pairs and 809 tensors from C:\Users\user\.ollama\models\blobs\sha256-aaf82562594aa1a9ee8f47ca3e666d4401d7e6157fa79d76ca3f56ed2c5d9b9e (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = minimax-m2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Minimax-M2 llama_model_loader: - kv 3: general.basename str = Minimax-M2 llama_model_loader: - kv 4: general.quantized_by str = Unsloth llama_model_loader: - kv 5: general.size_label str = 256x4.9B llama_model_loader: - kv 6: general.license str = mit llama_model_loader: - kv 7: general.repo_url str = https://huggingface.co/unsloth llama_model_loader: - kv 8: general.tags arr[str,1] = ["text-generation"] llama_model_loader: - kv 9: minimax-m2.block_count u32 = 62 llama_model_loader: - kv 10: minimax-m2.context_length u32 = 196608 llama_model_loader: - kv 11: minimax-m2.embedding_length u32 = 3072 llama_model_loader: - kv 12: minimax-m2.feed_forward_length u32 = 1536 llama_model_loader: - kv 13: minimax-m2.attention.head_count u32 = 48 llama_model_loader: - kv 14: minimax-m2.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: minimax-m2.rope.freq_base f32 = 5000000.000000 llama_model_loader: - kv 16: minimax-m2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 17: minimax-m2.expert_count u32 = 256 llama_model_loader: - kv 18: minimax-m2.expert_used_count u32 = 8 llama_model_loader: - kv 19: minimax-m2.attention.key_length u32 = 128 llama_model_loader: - kv 20: minimax-m2.attention.value_length u32 = 128 llama_model_loader: - kv 21: minimax-m2.expert_gating_func u32 = 2 llama_model_loader: - kv 22: minimax-m2.expert_feed_forward_length u32 = 1536 llama_model_loader: - kv 23: minimax-m2.rope.dimension_count u32 = 64 llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 25: tokenizer.ggml.pre str = minimax-m2 llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,200064] = ["Ā", "ā", "Ă", "ă", "Ą", "ą", ... llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,200064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,199744] = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "e r... llama_model_loader: - kv 29: tokenizer.ggml.bos_token_id u32 = 200034 llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 200020 llama_model_loader: - kv 31: tokenizer.ggml.unknown_token_id u32 = 200021 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 200004 llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 34: tokenizer.chat_template str = {# Unsloth & community template fixes... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - kv 36: general.file_type u32 = 24 llama_model_loader: - kv 37: quantize.imatrix.file str = MiniMax-M2-GGUF/imatrix_unsloth.gguf llama_model_loader: - kv 38: quantize.imatrix.dataset str = unsloth_calibration_MiniMax-M2.txt llama_model_loader: - kv 39: quantize.imatrix.entries_count u32 = 496 llama_model_loader: - kv 40: quantize.imatrix.chunks_count u32 = 697 llama_model_loader: - type f32: 373 tensors llama_model_loader: - type q4_K: 1 tensors llama_model_loader: - type q5_K: 18 tensors llama_model_loader: - type q6_K: 11 tensors llama_model_loader: - type iq2_xxs: 31 tensors llama_model_loader: - type iq3_xxs: 106 tensors llama_model_loader: - type iq1_s: 131 tensors llama_model_loader: - type iq3_s: 103 tensors llama_model_loader: - type iq4_xs: 35 tensors print_info: file format = GGUF V3 (latest) print_info: file type = IQ1_S - 1.5625 bpw print_info: file size = 52.49 GiB (1.97 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minimax-m2' llama_model_load_from_file_impl: failed to load model time=2025-11-11T23:49:06.704+11:00 level=INFO source=sched.go:425 msg="NewLlamaServer failed" model=C:\Users\user\.ollama\models\blobs\sha256-aaf82562594aa1a9ee8f47ca3e666d4401d7e6157fa79d76ca3f56ed2c5d9b9e error="unable to load model: C:\\Users\\user\\.ollama\\models\\blobs\\sha256-aaf82562594aa1a9ee8f47ca3e666d4401d7e6157fa79d76ca3f56ed2c5d9b9e" [GIN] 2025/11/11 - 23:49:06 | 500 | 1.0296521s | 127.0.0.1 | POST "/api/generate" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.10
GiteaMirror added the modelbug labels 2026-04-22 17:55:44 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 11, 2025):

minimax-m2 has been added to llama.cpp so the next vendor sync will pull that in via the llama.cpp engine. It's also possible that the ollama team will add support for the native engine, but that will take longer.

<!-- gh-comment-id:3517732249 --> @rick-github commented on GitHub (Nov 11, 2025): minimax-m2 has been [added](https://github.com/ggml-org/llama.cpp/pull/16831) to llama.cpp so the next vendor sync will pull that in via the llama.cpp engine. It's also possible that the ollama team will add support for the native engine, but that will take longer.
Author
Owner

@BarachielFallen commented on GitHub (Nov 16, 2025):

Also seeing this failure with the Ollama:ROCM branch

<!-- gh-comment-id:3539473878 --> @BarachielFallen commented on GitHub (Nov 16, 2025): Also seeing this failure with the Ollama:ROCM branch
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

Vendor sync needs to move past b6906 before ollama can use the llama.cpp engine for minimax-m2.

<!-- gh-comment-id:3539478479 --> @rick-github commented on GitHub (Nov 16, 2025): Vendor sync needs to move past [b6906](https://github.com/ggml-org/llama.cpp/releases/tag/b6906) before ollama can use the llama.cpp engine for minimax-m2.
Author
Owner

@rick-github commented on GitHub (Nov 28, 2025):

https://github.com/ollama/ollama/pull/12992

<!-- gh-comment-id:3588783446 --> @rick-github commented on GitHub (Nov 28, 2025): https://github.com/ollama/ollama/pull/12992
Author
Owner

@gkjpettet commented on GitHub (Nov 28, 2025):

+1 for this. Dosappointing that it’s on Ollama cloud but not available to run locally. It’s currently state of the art for local models and is possible to run on a M3 Ultra with LM Studio.

<!-- gh-comment-id:3590666040 --> @gkjpettet commented on GitHub (Nov 28, 2025): +1 for this. Dosappointing that it’s on Ollama cloud but not available to run locally. It’s currently state of the art for local models and is possible to run on a M3 Ultra with LM Studio.
Author
Owner

@rick-github commented on GitHub (Dec 10, 2025):

$ ollama -v
ollama version is 0.13.2
$ ollama run minimax-m2:q4_K_M
>>> hello
Hello! How can I assist you today? Feel free to ask anything you'd like—whether it's about 
technology, life advice, or even random fun facts! 😊✨

$ ollama ps
NAME                 ID              SIZE      PROCESSOR    CONTEXT    UNTIL   
minimax-m2:q4_k_m    d84f43c16ffd    141 GB    100% GPU     4096       Forever    
$ hf download --include Q4_K_M/* --local-dir unsloth/MiniMax-M2-GGUF unsloth/MiniMax-M2-GGUF
$ cd unsloth/MiniMax-M2-GGUF/Q4_K_M
$ llama-gguf-split --merge MiniMax-M2-Q4_K_M-00001-of-00003.gguf MiniMax-M2-Q4_K_M.gguf
$ ollama create minimax-m2:q4_K_M -f Modelfile
FROM MiniMax-M2-Q4_K_M.gguf
TEMPLATE """
{{- $lastUserIdx := -1 -}}
{{- range $idx, $msg := .Messages -}}
{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end }}
{{- end -}}

]~!b[]~b]system
{{- if .System }}
{{ .System }}
{{- else }}
You are a helpful assistant.
{{- end -}}
[e~[
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}

{{- if eq .Role "assistant" }}
]~b]ai
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
<think>
{{ .Thinking }}
</think>
{{ end }}
{{- if .Content }}
{{ .Content }}
{{- end -}}
[e~[
{{- end }}

{{- if eq .Role "user" }}
]~b]user
{{ .Content -}}
[e~[
{{- end }}

{{- if and (ne .Role "assistant") $last }}
]~b]ai
<think>
{{ end }}
{{- end }}
"""

The model supports tool calling but the format is too complex to encode in a template, so that will need a renderer/parser like qwen3-coder.

<!-- gh-comment-id:3636782466 --> @rick-github commented on GitHub (Dec 10, 2025): ```console $ ollama -v ollama version is 0.13.2 $ ollama run minimax-m2:q4_K_M >>> hello Hello! How can I assist you today? Feel free to ask anything you'd like—whether it's about technology, life advice, or even random fun facts! 😊✨ $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL minimax-m2:q4_k_m d84f43c16ffd 141 GB 100% GPU 4096 Forever ``` ```console $ hf download --include Q4_K_M/* --local-dir unsloth/MiniMax-M2-GGUF unsloth/MiniMax-M2-GGUF $ cd unsloth/MiniMax-M2-GGUF/Q4_K_M $ llama-gguf-split --merge MiniMax-M2-Q4_K_M-00001-of-00003.gguf MiniMax-M2-Q4_K_M.gguf $ ollama create minimax-m2:q4_K_M -f Modelfile ``` ```dockerfile FROM MiniMax-M2-Q4_K_M.gguf TEMPLATE """ {{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end }} {{- end -}} ]~!b[]~b]system {{- if .System }} {{ .System }} {{- else }} You are a helpful assistant. {{- end -}} [e~[ {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "assistant" }} ]~b]ai {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think> {{ .Thinking }} </think> {{ end }} {{- if .Content }} {{ .Content }} {{- end -}} [e~[ {{- end }} {{- if eq .Role "user" }} ]~b]user {{ .Content -}} [e~[ {{- end }} {{- if and (ne .Role "assistant") $last }} ]~b]ai <think> {{ end }} {{- end }} """ ``` The model supports tool calling but the format is too complex to encode in a template, so that will need a renderer/parser like qwen3-coder.
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

There is no way you can backdate this to version 12.11 is there? That is the last version of ollama that recognizes gtt size for vram on an amd strix halo igpu

<!-- gh-comment-id:3636868992 --> @BarachielFallen commented on GitHub (Dec 10, 2025): There is no way you can backdate this to version 12.11 is there? That is the last version of ollama that recognizes gtt size for vram on an amd strix halo igpu
Author
Owner

@gkjpettet commented on GitHub (Dec 10, 2025):

Getting tool use with Minimax M2 would make my Christmas! Benchmarks suggest this is one of the most powerful agentic models you can run on a Mac Studio.

<!-- gh-comment-id:3636878519 --> @gkjpettet commented on GitHub (Dec 10, 2025): Getting tool use with Minimax M2 would make my Christmas! Benchmarks suggest this is one of the most powerful agentic models you can run on a Mac Studio.
Author
Owner

@rick-github commented on GitHub (Dec 10, 2025):

There is no way you can backdate this to version 12.11 is there? That is the last version of ollama that recognizes gtt size for vram on an amd strix halo igpu

I have an evo-x2 which I believe has the same chipset as the strix halo and it runs miniimax-m2 on ollama 0.13.2, although not very quickly.

$ ollama run minimax-m2:Q4_K_M
>>> hello
Hello! 😊 How can I help you today?

$ ollama ps
NAME                 ID              SIZE      PROCESSOR          CONTEXT    UNTIL   
minimax-m2:Q4_K_M    d84f43c16ffd    140 GB    29%/71% CPU/GPU    4096       Forever    

Getting tool use with Minimax M2 would make my Christmas! Benchmarks suggest this is one of the most powerful agentic models you can run on a Mac Studio.

The minimax-m2 tool calling format is too complex for the template but I'm experimenting with adding tool calls using a more generic format.

<!-- gh-comment-id:3636973805 --> @rick-github commented on GitHub (Dec 10, 2025): > There is no way you can backdate this to version 12.11 is there? That is the last version of ollama that recognizes gtt size for vram on an amd strix halo igpu I have an evo-x2 which I believe has the same chipset as the strix halo and it runs miniimax-m2 on ollama 0.13.2, although not very quickly. ```console $ ollama run minimax-m2:Q4_K_M >>> hello Hello! 😊 How can I help you today? $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL minimax-m2:Q4_K_M d84f43c16ffd 140 GB 29%/71% CPU/GPU 4096 Forever ``` > Getting tool use with Minimax M2 would make my Christmas! Benchmarks suggest this is one of the most powerful agentic models you can run on a Mac Studio. The minimax-m2 tool calling format is too complex for the template but I'm experimenting with adding tool calls using a more generic format.
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

what version of linux and rocm did you use to get it to recognize gtt
values instead of bios defined vram?

On Wed, Dec 10, 2025, 7:57 AM frob @.***> wrote:

rick-github left a comment (ollama/ollama#13049)
https://github.com/ollama/ollama/issues/13049#issuecomment-3636973805

There is no way you can backdate this to version 12.11 is there? That is
the last version of ollama that recognizes gtt size for vram on an amd
strix halo igpu

I have an evo-x2 which I believe has the same chipset as the strix halo
and it runs miniimax-m2 on ollama 0.13.2, although not very quickly.

$ ollama run minimax-m2:Q4_K_M>>> helloHello! 😊 How can I help you today?

$ ollama psNAME ID SIZE PROCESSOR CONTEXT UNTIL minimax-m2:Q4_K_M d84f43c16ffd 140 GB 29%/71% CPU/GPU 4096 Forever

Getting tool use with Minimax M2 would make my Christmas! Benchmarks
suggest this is one of the most powerful agentic models you can run on a
Mac Studio.

The minimax-m2 tool calling format is too complex for the template but I'm
experimenting with adding tool calls using a more generic format.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13049#issuecomment-3636973805,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AQJZJM7KTOTCGPL7DATF7FL4BAKCJAVCNFSM6AAAAACLYUSJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZWHE3TGOBQGU
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:3637049435 --> @BarachielFallen commented on GitHub (Dec 10, 2025): what version of linux and rocm did you use to get it to recognize gtt values instead of bios defined vram? On Wed, Dec 10, 2025, 7:57 AM frob ***@***.***> wrote: > *rick-github* left a comment (ollama/ollama#13049) > <https://github.com/ollama/ollama/issues/13049#issuecomment-3636973805> > > There is no way you can backdate this to version 12.11 is there? That is > the last version of ollama that recognizes gtt size for vram on an amd > strix halo igpu > > I have an evo-x2 which I believe has the same chipset as the strix halo > and it runs miniimax-m2 on ollama 0.13.2, although not very quickly. > > $ ollama run minimax-m2:Q4_K_M>>> helloHello! 😊 How can I help you today? > > $ ollama psNAME ID SIZE PROCESSOR CONTEXT UNTIL minimax-m2:Q4_K_M d84f43c16ffd 140 GB 29%/71% CPU/GPU 4096 Forever > > Getting tool use with Minimax M2 would make my Christmas! Benchmarks > suggest this is one of the most powerful agentic models you can run on a > Mac Studio. > > The minimax-m2 tool calling format is too complex for the template but I'm > experimenting with adding tool calls using a more generic format. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/13049#issuecomment-3636973805>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AQJZJM7KTOTCGPL7DATF7FL4BAKCJAVCNFSM6AAAAACLYUSJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZWHE3TGOBQGU> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

Also have you already tried one of the smaller REAPs to fit the entire model into vram instead of the cpu offload your metrics are showing for the Q4 version to see what speed increase is gained for full gpu offload?

<!-- gh-comment-id:3637444361 --> @BarachielFallen commented on GitHub (Dec 10, 2025): Also have you already tried one of the smaller REAPs to fit the entire model into vram instead of the cpu offload your metrics are showing for the Q4 version to see what speed increase is gained for full gpu offload?
Author
Owner

@rick-github commented on GitHub (Dec 10, 2025):

$ ollama run minimax-m2:162b-a10b-q4_K_M hello --verbose
...
eval rate:            7.20 tokens/s
$ ollama ps
NAME                           ID              SIZE      PROCESSOR          CONTEXT    UNTIL   
minimax-m2:162b-a10b-q4_K_M    d84f43c16ffd    140 GB    29%/71% CPU/GPU    4096       Forever    
$ ollama run minimax-m2:162b-a10b-reap-bartowski-q4_K_S hello --verbose
...
eval rate:            21.27 tokens/s
$ ollama ps
NAME                                          ID              SIZE     PROCESSOR    CONTEXT    UNTIL   
minimax-m2:162b-a10b-reap-bartowski-q4_K_S    e5c9633fc4a9    96 GB    100% GPU     4096       Forever    
<!-- gh-comment-id:3637997650 --> @rick-github commented on GitHub (Dec 10, 2025): ```console $ ollama run minimax-m2:162b-a10b-q4_K_M hello --verbose ... eval rate: 7.20 tokens/s $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL minimax-m2:162b-a10b-q4_K_M d84f43c16ffd 140 GB 29%/71% CPU/GPU 4096 Forever ``` ```console $ ollama run minimax-m2:162b-a10b-reap-bartowski-q4_K_S hello --verbose ... eval rate: 21.27 tokens/s $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL minimax-m2:162b-a10b-reap-bartowski-q4_K_S e5c9633fc4a9 96 GB 100% GPU 4096 Forever ```
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

Could you please tell me your linux kernel, os, and ROCM versions in order to get ollama:latest to recognize the full 128gb as vram through gttsize grub settings instead of the bios set vram limit of 96gb? Or alternately amdgttsize if that is the version that works.

<!-- gh-comment-id:3638028117 --> @BarachielFallen commented on GitHub (Dec 10, 2025): Could you please tell me your linux kernel, os, and ROCM versions in order to get ollama:latest to recognize the full 128gb as vram through gttsize grub settings instead of the bios set vram limit of 96gb? Or alternately amdgttsize if that is the version that works.
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

I am loving those REAP tk/s

<!-- gh-comment-id:3638032691 --> @BarachielFallen commented on GitHub (Dec 10, 2025): I am loving those REAP tk/s
Author
Owner

@rick-github commented on GitHub (Dec 10, 2025):

The machine is currently configured with 96GB VRAM for ROCm. Using Vulkan increases the available resources but there have been issues and it doesn't seem production-ready just yet, so for the moment sticking with ROCm.

ollama  | time=2025-12-10T16:49:29.775Z level=INFO source=types.go:42 msg="inference compute"
 id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0
 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU
 total="111.5 GiB" available="111.3 GiB"

<!-- gh-comment-id:3638046354 --> @rick-github commented on GitHub (Dec 10, 2025): The machine is currently configured with 96GB VRAM for ROCm. Using Vulkan increases the available resources but there have been issues and it doesn't seem production-ready just yet, so for the moment sticking with ROCm. ```console ollama | time=2025-12-10T16:49:29.775Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" ```
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

I am able to utilize the full 128gb as VRAM using ollama version 12.11 on
ROCM 7.0.2 linux kernel 6.15 Ubuntu 24.04.3 LTS, using ROCM and not Vulkan backend. I was hoping you had a
solution for the newer version of Ollama since I cant use the newer models
like minimax and QWEN Next past version 12 with full vram. GTTSize /
AMDGTTsize is completely broken in the newer builds, it only reports the
1gb of vram which isnt even what my bios is set to which is 512mb. I even
tried the new Ollama engine with no luck.

On Wed, Dec 10, 2025, 11:53 AM frob @.***> wrote:

rick-github left a comment (ollama/ollama#13049)
https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354

The machine is currently configured with 96GB VRAM for ROCm. Using Vulkan
increases the available resources but there have been issues and it doesn't
seem production-ready just yet, so for the moment sticking with ROCm.

ollama | time=2025-12-10T16:49:29.775Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB"


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AQJZJMYO63ZGJDJX5MTGTNT4BBFZBAVCNFSM6AAAAACLYUSJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZYGA2DMMZVGQ
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:3638077132 --> @BarachielFallen commented on GitHub (Dec 10, 2025): I am able to utilize the full 128gb as VRAM using ollama version 12.11 on ROCM 7.0.2 linux kernel 6.15 Ubuntu 24.04.3 LTS, using ROCM and not Vulkan backend. I was hoping you had a solution for the newer version of Ollama since I cant use the newer models like minimax and QWEN Next past version 12 with full vram. GTTSize / AMDGTTsize is completely broken in the newer builds, it only reports the 1gb of vram which isnt even what my bios is set to which is 512mb. I even tried the new Ollama engine with no luck. On Wed, Dec 10, 2025, 11:53 AM frob ***@***.***> wrote: > *rick-github* left a comment (ollama/ollama#13049) > <https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354> > > The machine is currently configured with 96GB VRAM for ROCm. Using Vulkan > increases the available resources but there have been issues and it doesn't > seem production-ready just yet, so for the moment sticking with ROCm. > > ollama | time=2025-12-10T16:49:29.775Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AQJZJMYO63ZGJDJX5MTGTNT4BBFZBAVCNFSM6AAAAACLYUSJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZYGA2DMMZVGQ> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

"time=2025-12-02T04:37:28.716Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs /sha256-090a1569019c7b110eaaf505792bdbed02b7ea5b54783e18a38a1747e4de96eb --port 34237"
time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:443 msg="system memory" total="124.9 GiB" free="122.5 GiB" free_swap="7.7 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:450 msg="gpu memory" id=0 library=ROCm available="121.0 GiB" free="121.5 GiB" minimum="45 7.0 MiB" overhead="0 B"
time=2025-12-02T04:37:28.717Z level=INFO source=server.go:459 msg="loading model" "model layers"=48 requested=-1
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="67.2 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="17.2 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="34.4 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:272 msg="total memory" size="118.8 GiB"
time=2025-12-02T04:37:28.725Z level=INFO source=runner.go:963 msg="starting go runner"
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2025-12-02T04:37:29.385Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA= 1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SI ZE=128 compiler=cgo(gcc)
time=2025-12-02T04:37:29.385Z level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:34237"
time=2025-12-02T04:37:29.392Z level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:3 BatchSize:512 FlashAttent ion:false KvSize:96000 KvCacheType: NumThreads:16 GPULayers:48[ID:0 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:c5:00.0) - 124397 MiB free
time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 50 key-value pairs and 803 tensors from /root/.ollama/models/blobs/sha256-090a1569019c7b110eaaf505792bdb ed02b7ea5b54783e18a38a1747e4de96eb (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
"

My entire setup is sitting in a docker compose stack on the bare metal
ubuntu LTS.

On Wed, Dec 10, 2025 at 12:01 PM Gabriel Walters @.***>
wrote:

I am able to utilize the full 128gb as VRAM using ollama version 12.11 on
ROCM 7.0.2 linux kernel 6.15 Ubuntu 24.04.3 LTS, I was hoping you had a
aolution for the newer vwrsion of Ollama since I cant use the newer models
like minimax and QWEN Next on version 12 with full vram. GTTSize /
AMDGTTsize is completely broken in the newer builds, it only reports the
1gb of vram which isnt even what my bios is set to which is 512mb. I even
tried the new Ollama engine with no luck.

On Wed, Dec 10, 2025, 11:53 AM frob @.***> wrote:

rick-github left a comment (ollama/ollama#13049)
https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354

The machine is currently configured with 96GB VRAM for ROCm. Using Vulkan
increases the available resources but there have been issues and it doesn't
seem production-ready just yet, so for the moment sticking with ROCm.

ollama | time=2025-12-10T16:49:29.775Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB"


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AQJZJMYO63ZGJDJX5MTGTNT4BBFZBAVCNFSM6AAAAACLYUSJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZYGA2DMMZVGQ
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:3638105600 --> @BarachielFallen commented on GitHub (Dec 10, 2025): "time=2025-12-02T04:37:28.716Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs /sha256-090a1569019c7b110eaaf505792bdbed02b7ea5b54783e18a38a1747e4de96eb --port 34237" time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:443 msg="system memory" total="124.9 GiB" free="122.5 GiB" free_swap="7.7 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:450 msg="gpu memory" id=0 library=ROCm available="121.0 GiB" free="121.5 GiB" minimum="45 7.0 MiB" overhead="0 B" time=2025-12-02T04:37:28.717Z level=INFO source=server.go:459 msg="loading model" "model layers"=48 requested=-1 time=2025-12-02T04:37:28.717Z level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="67.2 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="17.2 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="34.4 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=device.go:272 msg="total memory" size="118.8 GiB" time=2025-12-02T04:37:28.725Z level=INFO source=runner.go:963 msg="starting go runner" load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-12-02T04:37:29.385Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA= 1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SI ZE=128 compiler=cgo(gcc) time=2025-12-02T04:37:29.385Z level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:34237" time=2025-12-02T04:37:29.392Z level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:3 BatchSize:512 FlashAttent ion:false KvSize:96000 KvCacheType: NumThreads:16 GPULayers:48[ID:0 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:c5:00.0) - 124397 MiB free time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 50 key-value pairs and 803 tensors from /root/.ollama/models/blobs/sha256-090a1569019c7b110eaaf505792bdb ed02b7ea5b54783e18a38a1747e4de96eb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. " My entire setup is sitting in a docker compose stack on the bare metal ubuntu LTS. On Wed, Dec 10, 2025 at 12:01 PM Gabriel Walters ***@***.***> wrote: > I am able to utilize the full 128gb as VRAM using ollama version 12.11 on > ROCM 7.0.2 linux kernel 6.15 Ubuntu 24.04.3 LTS, I was hoping you had a > aolution for the newer vwrsion of Ollama since I cant use the newer models > like minimax and QWEN Next on version 12 with full vram. GTTSize / > AMDGTTsize is completely broken in the newer builds, it only reports the > 1gb of vram which isnt even what my bios is set to which is 512mb. I even > tried the new Ollama engine with no luck. > > On Wed, Dec 10, 2025, 11:53 AM frob ***@***.***> wrote: > >> *rick-github* left a comment (ollama/ollama#13049) >> <https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354> >> >> The machine is currently configured with 96GB VRAM for ROCm. Using Vulkan >> increases the available resources but there have been issues and it doesn't >> seem production-ready just yet, so for the moment sticking with ROCm. >> >> ollama | time=2025-12-10T16:49:29.775Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" >> >> — >> Reply to this email directly, view it on GitHub >> <https://github.com/ollama/ollama/issues/13049#issuecomment-3638046354>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AQJZJMYO63ZGJDJX5MTGTNT4BBFZBAVCNFSM6AAAAACLYUSJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZYGA2DMMZVGQ> >> . >> You are receiving this because you commented.Message ID: >> ***@***.***> >> >
Author
Owner

@BarachielFallen commented on GitHub (Dec 10, 2025):

The VRAM broken feature under discussion I have seen on my AMD 780M iGPU with 96gb of VRAM builds as well (both 8945hs and 8745h apu). After Ollama version 12.11 my 3 780M 96gb builds and my strix halo with 128gb ram all have broken grub gttsize functionality.

<!-- gh-comment-id:3638140078 --> @BarachielFallen commented on GitHub (Dec 10, 2025): The VRAM broken feature under discussion I have seen on my AMD 780M iGPU with 96gb of VRAM builds as well (both 8945hs and 8745h apu). After Ollama version 12.11 my 3 780M 96gb builds and my strix halo with 128gb ram all have broken grub gttsize functionality.
Author
Owner

@BarachielFallen commented on GitHub (Dec 11, 2025):

Should I open a separate bug report for this?

<!-- gh-comment-id:3639914475 --> @BarachielFallen commented on GitHub (Dec 11, 2025): Should I open a separate bug report for this?
Author
Owner

@BarachielFallen commented on GitHub (Dec 11, 2025):

There is no way you can backdate this to version 12.11 is there? That is the last version of ollama that recognizes gtt size for vram on an amd strix halo igpu

I have an evo-x2 which I believe has the same chipset as the strix halo and it runs miniimax-m2 on ollama 0.13.2, although not very quickly.

$ ollama run minimax-m2:Q4_K_M

hello
Hello! 😊 How can I help you today?

$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
minimax-m2:Q4_K_M d84f43c16ffd 140 GB 29%/71% CPU/GPU 4096 Forever

Getting tool use with Minimax M2 would make my Christmas! Benchmarks suggest this is one of the most powerful agentic models you can run on a Mac Studio.

The minimax-m2 tool calling format is too complex for the template but I'm experimenting with adding tool calls using a more generic format.

BTW I have an evo-x2 mini-pc that I am running this on too. Strix Halo is the AMD chipset line for the Evo-x2's AI MAX+ 395 APU

<!-- gh-comment-id:3639920918 --> @BarachielFallen commented on GitHub (Dec 11, 2025): > > There is no way you can backdate this to version 12.11 is there? That is the last version of ollama that recognizes gtt size for vram on an amd strix halo igpu > > I have an evo-x2 which I believe has the same chipset as the strix halo and it runs miniimax-m2 on ollama 0.13.2, although not very quickly. > > $ ollama run minimax-m2:Q4_K_M > >>> hello > Hello! 😊 How can I help you today? > > $ ollama ps > NAME ID SIZE PROCESSOR CONTEXT UNTIL > minimax-m2:Q4_K_M d84f43c16ffd 140 GB 29%/71% CPU/GPU 4096 Forever > > Getting tool use with Minimax M2 would make my Christmas! Benchmarks suggest this is one of the most powerful agentic models you can run on a Mac Studio. > > The minimax-m2 tool calling format is too complex for the template but I'm experimenting with adding tool calls using a more generic format. BTW I have an evo-x2 mini-pc that I am running this on too. Strix Halo is the AMD chipset line for the Evo-x2's AI MAX+ 395 APU
Author
Owner

@rick-github commented on GitHub (Dec 13, 2025):

Added tool support. Considering it's non-native, it does a pretty good job. Chained tools calls, callbacks to previous results, inferring required arguments, etc.

TEMPLATE """
{{- $lastUserIdx := -1 -}}
{{- range $idx, $msg := .Messages -}}
{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end }}
{{- end -}}

]~!b[]~b]system
{{- if .System }}
{{ .System }}
{{- else }}
You are a helpful assistant.
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more tools to assist with the user query.

Here are the tools available in JSONSchema format:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

When making tool calls, use a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}
[e~[
{{- $inTool := false }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}

{{- if $inTool }}
{{- if ne .Role "tool" }}
{{- $inTool = false -}}
[e~[
{{- end }}
{{- end }}

{{- if eq .Role "assistant" }}
]~b]ai
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
<think>
{{ .Thinking }}
</think>
{{ end }}
{{- if .Content }}
{{ .Content }}
{{- end }}
{{- if .ToolCalls -}}
<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end -}}
</tool_call>
{{- end -}}
[e~[
{{- end }}

{{- if eq .Role "user" }}
]~b]user
{{ .Content -}}
[e~[
{{- end }}

{{- if eq .Role "tool" }}
{{- if not $inTool }}{{ $inTool = true -}}
]~b]tool
{{- end }}
<response>
{{ .Content }}
</response>
{{- if $last }}
{{- $inTool = false -}}
[e~[
{{- end }}
{{- end }}

{{- if and (ne .Role "assistant") $last }}
]~b]ai
<think>
{{ end }}
{{- end }}
"""
$ ollama-run.py minimax-m2:162b-a10b-tools-q4_K_M  --tools all --test -i

>>> what time is it?

How would you like me to find out the current time for you?

  1. I can show you my system's current date and time (which will be in UTC timezone)
  2. Or, if you're looking for the current time somewhere specific, just let me know where (city/country) you'd like to check!

Let me try option 1 first:

calling get_datetime({})
It's currently Saturday, December 13, 2025 at 9:52 AM (UTC).

If you need the current time for a different timezone or location, just let me know where you are and I can help with that!

>>> what is 10 ^ 1.34?

I'll calculate that for you using the power function:

calling power({'x': 10, 'y': 1.34})
10^1.34 ≈ 21.88

So 10 raised to the power of 1.34 is approximately 21.88.

>>> what operating system are you running on?

I'll check what operating system I'm running on:

calling get_operating_system({})
I'm running on a Linux operating system with the kernel version 6.14.0-36-generic for x86_64 architecture, using glibc version 2.39.

This appears to be a Linux-based environment - likely Ubuntu or another Debian-based distribution given the "generic" kernel designation and the specific package naming convention.

>>> what is my ip address?

I'll check your IP address for you:

calling get_ip_address({})
Your current IP address is: 77.x.x.x

This is your public IP address that identifies your connection on the internet.

Note: If you're using a VPN, proxy server, or are behind a corporate network, this might show the IP address of the VPN/proxy/gateway rather than your direct connection to the internet.

>>> what is my location?

I'll check what I can determine about your approximate location based on your IP address:

calling get_location({})
Based on your IP address (77.x.x.x), I can determine the following approximate location:

Your Location:

  • Country: Switzerland
  • Region/Canton: Zurich
  • City: Zurich
  • Latitude/Longitude: 47.3976, 8.5442
  • Organization: Init7

This information is based on your IP address geolocation and shows you're likely in or near Zurich, Switzerland.

Important note: This is only an approximate location based on where your internet service provider's network infrastructure is registered. It might not be exactly where you are physically located - for instance, if you're using a VPN, traveling, or if the IP database has outdated information.

If you'd like more precise location data, your browser could ask permission to share GPS coordinates (browser geolocation), but that's something you'd need to explicitly authorize on your device.

>>> What's the weather where I am?

I'll check the current weather for Zurich, Switzerland based on your IP location:

calling get_weather({'city': 'Zurich', 'country': 'Switzerland', 'unit': 'celsius'})
Based on your location in Zurich, Switzerland, here's the current weather:

Current Weather:

  • Conditions: Foggy
  • Temperature: Around 2°C (feels like)
  • Today's forecast: High of 7°C / Low of 1°C
  • Wind: 14 km/h from Southwest

Upcoming days look chilly with temperatures ranging from about 6-9°C highs and around 0-5°C lows.

The high humidity (91%) combined with the foggy conditions is making it feel quite cool. If you're going out, you'll definitely want a warm jacket and maybe an umbrella since there seems to be some drizzle/mist in the forecast too!

>>> count the number of files in the current directory

I'll count the files in your current working directory:

calling get_current_directory({})

Now let me list and count all the files in that directory:

calling list_directory({'directory': '/home/rick/minimax-m2'})

Let me count the number of items in that directory:

calling count({'l': ['0.11.5' ,... 'test.png', 'ttft.py']})
There are 61 files in your current directory (/home/rick/minimax-m2).

The directory contains a variety of file types including:

  • Configuration files (docker-compose.yaml, Modelfile.*)
  • Python scripts (.py files like ollama-run.py, ttft.py, etc.)
  • Images (.jpg, .png files - lots of them!)
  • Other data files

Would you like me to count only specific types of files or get more details about the contents?

>>> when is sunrise in New Zealand tomorrow?

I'll look up when sunrise occurs in New Zealand tomorrow. Let me search for this information:

calling web_search({'query': 'sunrise time New Zealand tomorrow date December 14 2025'})
Based on the search results, here are the sunrise times for different parts of New Zealand tomorrow (December 14, 2025):

General New Zealand:

  • Sunrise: 7:45 AM

Auckland specifically:

  • Sunrise: 5:55 AM

Note: The significant difference between these times is due to New Zealand's time zones. Auckland uses NZST (UTC+12) during December, while some other parts of New Zealand may be in different time zones or observe daylight saving time differently.

Since you didn't specify which city in New Zealand you're interested in, the most comprehensive answer would be that sunrise tomorrow ranges from 5:55 AM in Auckland to 7:45 AM elsewhere in New Zealand, depending on your specific location within the country.

Would you like me to look up the sunrise time for a specific city in New Zealand?

>>> summarize the contents of the file /etc/os-release

I'll read and summarize the contents of the /etc/os-release file:

calling read_file({'filename': '/etc/os-release'})
Here's a summary of the /etc/os-release file contents:

Operating System Information

Distribution: Linux Mint 22.2 "Zara"

Key Details:

  • OS Family: Based on Ubuntu/Debian (Linux Mint is built on top of Ubuntu)
  • Version: 22.2 with codename "Zara"
  • Ubuntu Base: Built on Ubuntu Noble (24.04 LTS)
  • Website: linuxmint.com

What this means:

This shows you're running a Linux Mint system, which is a popular Linux distribution known for being user-friendly and beginner-oriented. Version 22.2 "Zara" is based on Ubuntu Noble (24.04), meaning you have access to modern packages while benefiting from Linux Mint's custom desktop environment and software management tools.

The system identifies as both Linux Mint and compatible with Ubuntu/Debian package formats, which means it can run most Ubuntu-compatible software.

>>> what happened this week in AI?

I'll search for recent developments in AI from this week:

calling web_search({'query': 'AI news December 2025 this week latest developments artificial intelligence'})
Based on my search, here are the major AI developments from this week (December 2025):

🚀 Major Model Releases & Competition

OpenAI GPT-5.2 "Garlic" Launch

  • Released December 12th as a direct response to Google's advances
  • Claims to perform at or above human expert level in 70.9% of business tasks vs. GPT-5.1's 38.8%
  • Available in Instant, Thinking, and Pro tiers with improved real-world task performance
  • Pricing: $1.75 per million input tokens, $14 per million output tokens

Google Gemini Deep Research Agent

  • Launched same day as GPT-5.2 (December 11th) - clearly strategic timing
  • Now allows developers to embed Google's research capabilities into their apps via new Interactions API
  • Based on Gemini 3 Pro and designed to minimize hallucinations in complex reasoning tasks
  • Plans integration with Google Search, Finance, Gemini App, and NotebookLM

🏢 Enterprise & Industry Moves

Major Partnerships & Acquisitions:

  • IBM acquiring Confluent for $31/share to build smart data platforms for enterprise AI
  • Accenture-Anthropic partnership launching business group to train 30,000 professionals on Claude
  • ServiceNow investing CA$110 million in Canadian public sector AI adoption

Enterprise AI Control & Governance:

  • New research shows 96% of organizations lose cost control when deploying GenAI at scale
  • Multiple companies launched agentic AI platforms: Couchbase, Databricks, Thales (AI Security Fabric)
  • Growing focus on "agent orchestration" and enterprise-grade AI governance

🔧 Technical Developments

Agentic AI Focus:

  • Visual Studio Code added multi-agent orchestration capabilities
  • New open-source platform "AgentField" launched as "Kubernetes + Okta for AI Agents"
  • Progress Software integrated agentic AI into development tools (Telerik & Kendo UI)

Security & Trust Issues:

  • 96% of organizations struggle with securing human elements against AI-enabled threats
  • Deepfake-enabled social engineering and AI-assisted phishing on the rise

The big theme this week was "agentic AI" - AI systems that can act autonomously to complete complex tasks. However, two critical challenges emerged:

  1. Cost control issues as deployments scale
  2. Data governance gaps preventing enterprises from achieving full AI potential

This represents a pivotal moment where the focus shifts from basic language models to practical, enterprise-ready autonomous AI systems, but with growing concerns about cost overruns and security risks.

>>> describe what's in ./filename.png

I'll analyze the image for you:

calling describe_contents_of_image_file({'filename': './filename.png', 'query': 'Describe what is shown in this image'})
Based on my analysis of the image filename.png, here's what I can see:

Image Description

Main Subject:

  • An extremely cute, fluffy white puppy with distinctive features
  • The breed appears to be a Shiba Inu or similar spitz-type dog
  • Has the characteristic curled tail typical of Shiba Inus
  • Small in size, sitting upright

Setting & Props:

  • The puppy is positioned on what appears to be concrete steps/staircase
  • The surface looks weathered and worn
  • Good natural lighting that highlights the puppy's fluffy coat

Accessories:

  • Wearing a red collar around its neck
  • Has a small gold bell attached to the collar

Overall Aesthetic:
The photo has a very heartwarming, charming quality that showcases the adorable features of this little dog. The contrast between the soft, fluffy white fur and the industrial concrete surface creates an appealing visual dynamic.

It's definitely the kind of image that would make people say "aww!" - the combination of the innocent puppy, cute accessories, and well-composed shot makes for a very endearing photograph.

>>> What are the 3 most recent posts in mastodon?

I'll fetch the 3 most recent Mastodon posts for you:

calling get_recent_mastodon_posts({'count': 3})
Here are the 3 most recent Mastodon posts:

📱 Most Recent Posts (as of December 13, 2025)

1. DrSprockets (@mas.to) - Just now (09:55)

  • Shared a Substack link to what appears to be content by Zev Shalev
  • User profile: Peter Lichota from Ontario, Canada - describes himself as fact-focused and conspiracy-theory skeptical

2. YouTube (@mastodon.jwsoat.com) - Just now (09:54:59)

  • Simple post stating "Jwsoat TV Is Playing By"
  • Appears to be an automated account with minimal engagement (only 2 followers)

3. delainfo (Railway Delay Info) - Just now (09:54:59)

  • Posted in Japanese about train service updates
  • Content translates roughly to: "Takasaki Line [Normal Operations] Currently operating nearly normally. ([Original info]: Due to in-car inspection on the Utsunomiya Line, some down-line trains between Ueno-Kagahara stations are experiencing delays.)"
  • This is a bot account that provides real-time Japanese railway delay information

Note: All three posts were made within seconds of each other at 09:54:59-55:00 UTC on December 13th, suggesting these might be automated or scheduled posts. The content spans from personal sharing to transportation updates across different languages (English and Japanese).

>>> plot x^2

I'll create a plot of the function f(x) = x² for you:

calling run_python_script_sandbox({'script': "import matplotlib.pyplot as plt\nimport numpy as np\n\n# Create x values from -5 to 5\nx = np.linspace(-5, 5, 100)\ny = x**2\n\n# Create the plot\nplt.figure(figsize=(10, 6))\nplt.plot(x, y, 'b-', linewidth=2, label='f(x) = x²')\n\n# Customize the plot\nplt.title('Plot of f(x) = x²', fontsize=16, fontweight='bold')\nplt.xlabel('x', fontsize=12)\nplt.ylabel('y', fontsize=12)\nplt.grid(True, alpha=0.3)\nplt.legend(fontsize=12)\n\n# Add some styling\nplt.axhline(y=0, color='k', linewidth=0.5)\nplt.axvline(x=0, color='k', linewidth=0.5)\n\n# Set axis limits for better visualization\nplt.xlim(-5.5, 5.5)\nplt.ylim(-1, 26)\n\n# Show the plot\nplt.tight_layout()\nplt.show()"})

Let me try using a different approach to create the plot:

calling run_python_script_nosandbox({'script': 'import matplotlib.pyplot as plt\nimport numpy as np\n\n# Create x values from -5 to 5\nx = np.linspace(-5, 5, 100)\ny = x2\n\n# Create the plot\nplt.figure(figsize=(10, 6))\nplt.plot(x, y, 'b-', linewidth=2, label='f(x) = x²')\n\n# Customize the plot\nplt.title('Plot of f(x) = x²', fontsize=16, fontweight='bold')\nplt.xlabel('x', fontsize=12)\nplt.ylabel('y', fontsize=12)\nplt.grid(True, alpha=0.3)\nplt.legend(fontsize=12)\n\n# Add some styling\nplt.axhline(y=0, color='k', linewidth=0.5)\nplt.axvline(x=0, color='k', linewidth=0.5)\n\n# Set axis limits for better visualization\nplt.xlim(-5.5, 5.5)\nplt.ylim(-1, 26)\n\n# Save the plot to a file\nplt.tight_layout()\nplt.savefig('x_squared_plot.png', dpi=300, bbox_inches='tight')\nprint("Plot saved as 'x_squared_plot.png'")\n\n# Also display some key points\nprint("\nKey characteristics of f(x) = x²:")\nprint(f"• Domain: All real numbers (-∞ to +∞)")\nprint(f"• Range: [0, ∞) (y ≥ 0)")\nprint(f"• Vertex: (0, 0)")\nprint(f"• Axis of symmetry: x = 0 (the y-axis)")\n\n# Show some sample values\nprint(f"\nSample points:")\nsample_x = [-3, -2, -1, 0, 1, 2, 3]\nfor val in sample_x:\n print(f"f({val}) = {val}² = {val2}")'})
Perfect! I've created a plot of the function f(x) = x² and saved it as x_squared_plot.png. Here's what you get:

📈 Plot of f(x) = x²

The graph shows the classic parabolic curve with these key characteristics:

Mathematical Properties:

  • Domain: All real numbers (-∞ to +∞)
  • Range: [0, ∞) - The function never produces negative values
  • Vertex: (0, 0) - The minimum point of the parabola
  • Axis of symmetry: x = 0 (the y-axis)

Key Points:

The curve passes through points like:

  • (-3, 9), (-2, 4), (-1, 1)
  • (0, 0) ← Vertex/minimum
  • (1, 1), (2, 4), (3, 9)

This is a fundamental quadratic function that creates a U-shaped curve opening upward from the origin. The plot has been saved to your current directory and shows the characteristic parabolic shape with proper axis labels, grid lines, and styling for clear visualization.

The graph demonstrates why x² is always non-negative - any real number squared results in a positive value (except 0² = 0).

minimax-m2:162b-a10b-tools-q4_K_M >>> !ls -l x_squared_plot.png
-rw-rw-r-- 1 rick rick 114698 Dec 13 10:55 x_squared_plot.png
<!-- gh-comment-id:3649185587 --> @rick-github commented on GitHub (Dec 13, 2025): Added tool support. Considering it's non-native, it does a pretty good job. Chained tools calls, callbacks to previous results, inferring required arguments, etc. ```dockerfile TEMPLATE """ {{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end }} {{- end -}} ]~!b[]~b]system {{- if .System }} {{ .System }} {{- else }} You are a helpful assistant. {{- end }} {{- if .Tools }} # Tools You may call one or more tools to assist with the user query. Here are the tools available in JSONSchema format: <tools> {{- range .Tools }} {"type": "function", "function": {{ .Function }}} {{- end }} </tools> When making tool calls, use a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{- end -}} [e~[ {{- $inTool := false }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if $inTool }} {{- if ne .Role "tool" }} {{- $inTool = false -}} [e~[ {{- end }} {{- end }} {{- if eq .Role "assistant" }} ]~b]ai {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think> {{ .Thinking }} </think> {{ end }} {{- if .Content }} {{ .Content }} {{- end }} {{- if .ToolCalls -}} <tool_call> {{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{- end -}} </tool_call> {{- end -}} [e~[ {{- end }} {{- if eq .Role "user" }} ]~b]user {{ .Content -}} [e~[ {{- end }} {{- if eq .Role "tool" }} {{- if not $inTool }}{{ $inTool = true -}} ]~b]tool {{- end }} <response> {{ .Content }} </response> {{- if $last }} {{- $inTool = false -}} [e~[ {{- end }} {{- end }} {{- if and (ne .Role "assistant") $last }} ]~b]ai <think> {{ end }} {{- end }} """ ``` ```console $ ollama-run.py minimax-m2:162b-a10b-tools-q4_K_M --tools all --test -i ``` #### >>> what time is it? > How would you like me to find out the current time for you? > > 1. I can show you my system's current date and time (which will be in UTC timezone) > 2. Or, if you're looking for the current time somewhere specific, just let me know where (city/country) you'd like to check! > > Let me try option 1 first: _calling get_datetime({})_ It's currently **Saturday, December 13, 2025 at 9:52 AM (UTC)**. If you need the current time for a different timezone or location, just let me know where you are and I can help with that! #### >>> what is 10 ^ 1.34? > I'll calculate that for you using the power function: _calling power({'x': 10, 'y': 1.34})_ **10^1.34 ≈ 21.88** So 10 raised to the power of 1.34 is approximately **21.88**. #### >>> what operating system are you running on? > I'll check what operating system I'm running on: _calling get_operating_system({})_ I'm running on a **Linux** operating system with the kernel version 6.14.0-36-generic for x86_64 architecture, using glibc version 2.39. This appears to be a Linux-based environment - likely Ubuntu or another Debian-based distribution given the "generic" kernel designation and the specific package naming convention. #### >>> what is my ip address? > I'll check your IP address for you: _calling get_ip_address({})_ Your current IP address is: **77.x.x.x** This is your public IP address that identifies your connection on the internet. **Note:** If you're using a VPN, proxy server, or are behind a corporate network, this might show the IP address of the VPN/proxy/gateway rather than your direct connection to the internet. #### >>> what is my location? > I'll check what I can determine about your approximate location based on your IP address: _calling get_location({})_ Based on your IP address (77.x.x.x), I can determine the following approximate location: **Your Location:** - **Country:** Switzerland - **Region/Canton:** Zurich - **City:** Zurich - **Latitude/Longitude:** 47.3976, 8.5442 - **Organization:** Init7 This information is based on your IP address geolocation and shows you're likely in or near Zurich, Switzerland. **Important note:** This is only an approximate location based on where your internet service provider's network infrastructure is registered. It might not be exactly where you are physically located - for instance, if you're using a VPN, traveling, or if the IP database has outdated information. If you'd like more precise location data, your browser could ask permission to share GPS coordinates (browser geolocation), but that's something you'd need to explicitly authorize on your device. #### >>> What's the weather where I am? > I'll check the current weather for Zurich, Switzerland based on your IP location: _calling get_weather({'city': 'Zurich', 'country': 'Switzerland', 'unit': 'celsius'})_ Based on your location in Zurich, Switzerland, here's the current weather: **Current Weather:** - **Conditions:** Foggy - **Temperature:** Around 2°C (feels like) - **Today's forecast:** High of 7°C / Low of 1°C - **Wind:** 14 km/h from Southwest **Upcoming days look chilly with temperatures ranging from about 6-9°C highs and around 0-5°C lows.** The high humidity (91%) combined with the foggy conditions is making it feel quite cool. If you're going out, you'll definitely want a warm jacket and maybe an umbrella since there seems to be some drizzle/mist in the forecast too! #### >>> count the number of files in the current directory > I'll count the files in your current working directory: _calling get_current_directory({})_ > Now let me list and count all the files in that directory: _calling list_directory({'directory': '/home/rick/minimax-m2'})_ > Let me count the number of items in that directory: _calling count({'l': ['0.11.5' ,... 'test.png', 'ttft.py']})_ There are **61 files** in your current directory (/home/rick/minimax-m2). The directory contains a variety of file types including: - Configuration files (docker-compose.yaml, Modelfile.*) - Python scripts (.py files like ollama-run.py, ttft.py, etc.) - Images (.jpg, .png files - lots of them!) - Other data files Would you like me to count only specific types of files or get more details about the contents? #### >>> when is sunrise in New Zealand tomorrow? > I'll look up when sunrise occurs in New Zealand tomorrow. Let me search for this information: _calling web_search({'query': 'sunrise time New Zealand tomorrow date December 14 2025'})_ Based on the search results, here are the sunrise times for different parts of New Zealand tomorrow (December 14, 2025): **General New Zealand:** - **Sunrise: 7:45 AM** **Auckland specifically:** - **Sunrise: 5:55 AM** **Note:** The significant difference between these times is due to New Zealand's time zones. Auckland uses NZST (UTC+12) during December, while some other parts of New Zealand may be in different time zones or observe daylight saving time differently. Since you didn't specify which city in New Zealand you're interested in, the most comprehensive answer would be that sunrise tomorrow ranges from **5:55 AM in Auckland** to **7:45 AM** elsewhere in New Zealand, depending on your specific location within the country. Would you like me to look up the sunrise time for a specific city in New Zealand? #### >>> summarize the contents of the file /etc/os-release > I'll read and summarize the contents of the `/etc/os-release` file: _calling read_file({'filename': '/etc/os-release'})_ Here's a summary of the `/etc/os-release` file contents: ## Operating System Information **Distribution:** Linux Mint 22.2 "Zara" ### Key Details: - **OS Family:** Based on Ubuntu/Debian (Linux Mint is built on top of Ubuntu) - **Version:** 22.2 with codename "Zara" - **Ubuntu Base:** Built on Ubuntu Noble (24.04 LTS) - **Website:** [linuxmint.com](https://www.linuxmint.com/) ### What this means: This shows you're running a **Linux Mint** system, which is a popular Linux distribution known for being user-friendly and beginner-oriented. Version 22.2 "Zara" is based on Ubuntu Noble (24.04), meaning you have access to modern packages while benefiting from Linux Mint's custom desktop environment and software management tools. The system identifies as both Linux Mint and compatible with Ubuntu/Debian package formats, which means it can run most Ubuntu-compatible software. #### >>> what happened this week in AI? > I'll search for recent developments in AI from this week: _calling web_search({'query': 'AI news December 2025 this week latest developments artificial intelligence'})_ Based on my search, here are the major AI developments from this week (December 2025): ## 🚀 **Major Model Releases & Competition** **OpenAI GPT-5.2 "Garlic" Launch** - Released December 12th as a direct response to Google's advances - Claims to perform at or above human expert level in 70.9% of business tasks vs. GPT-5.1's 38.8% - Available in Instant, Thinking, and Pro tiers with improved real-world task performance - Pricing: $1.75 per million input tokens, $14 per million output tokens **Google Gemini Deep Research Agent** - Launched same day as GPT-5.2 (December 11th) - clearly strategic timing - Now allows developers to embed Google's research capabilities into their apps via new Interactions API - Based on Gemini 3 Pro and designed to minimize hallucinations in complex reasoning tasks - Plans integration with Google Search, Finance, Gemini App, and NotebookLM ## 🏢 **Enterprise & Industry Moves** **Major Partnerships & Acquisitions:** - **IBM acquiring Confluent** for $31/share to build smart data platforms for enterprise AI - **Accenture-Anthropic partnership** launching business group to train 30,000 professionals on Claude - **ServiceNow investing CA$110 million** in Canadian public sector AI adoption **Enterprise AI Control & Governance:** - New research shows **96% of organizations lose cost control** when deploying GenAI at scale - Multiple companies launched agentic AI platforms: Couchbase, Databricks, Thales (AI Security Fabric) - Growing focus on "agent orchestration" and enterprise-grade AI governance ## 🔧 **Technical Developments** **Agentic AI Focus:** - Visual Studio Code added multi-agent orchestration capabilities - New open-source platform "AgentField" launched as "Kubernetes + Okta for AI Agents" - Progress Software integrated agentic AI into development tools (Telerik & Kendo UI) **Security & Trust Issues:** - 96% of organizations struggle with securing human elements against AI-enabled threats - Deepfake-enabled social engineering and AI-assisted phishing on the rise ## 📊 **Market Trends** The big theme this week was **"agentic AI"** - AI systems that can act autonomously to complete complex tasks. However, two critical challenges emerged: 1. **Cost control issues** as deployments scale 2. **Data governance gaps** preventing enterprises from achieving full AI potential This represents a pivotal moment where the focus shifts from basic language models to practical, enterprise-ready autonomous AI systems, but with growing concerns about cost overruns and security risks. #### >>> describe what's in ./filename.png > I'll analyze the image for you: _calling describe_contents_of_image_file({'filename': './filename.png', 'query': 'Describe what is shown in this image'})_ Based on my analysis of the image `filename.png`, here's what I can see: ## Image Description **Main Subject:** - An extremely cute, fluffy white puppy with distinctive features - The breed appears to be a **Shiba Inu** or similar spitz-type dog - Has the characteristic curled tail typical of Shiba Inus - Small in size, sitting upright **Setting & Props:** - The puppy is positioned on what appears to be concrete steps/staircase - The surface looks weathered and worn - Good natural lighting that highlights the puppy's fluffy coat **Accessories:** - Wearing a **red collar** around its neck - Has a small **gold bell attached** to the collar **Overall Aesthetic:** The photo has a very heartwarming, charming quality that showcases the adorable features of this little dog. The contrast between the soft, fluffy white fur and the industrial concrete surface creates an appealing visual dynamic. It's definitely the kind of image that would make people say "aww!" - the combination of the innocent puppy, cute accessories, and well-composed shot makes for a very endearing photograph. #### >>> What are the 3 most recent posts in mastodon? > I'll fetch the 3 most recent Mastodon posts for you: _calling get_recent_mastodon_posts({'count': 3})_ Here are the 3 most recent Mastodon posts: ## 📱 **Most Recent Posts (as of December 13, 2025)** ### **1. DrSprockets (@mas.to)** - Just now (09:55) - Shared a Substack link to what appears to be content by Zev Shalev - User profile: Peter Lichota from Ontario, Canada - describes himself as fact-focused and conspiracy-theory skeptical ### 2. **YouTube (@mastodon.jwsoat.com)** - Just now (09:54:59) - Simple post stating "Jwsoat TV Is Playing By" - Appears to be an automated account with minimal engagement (only 2 followers) ### 3. **delainfo (Railway Delay Info)** - Just now (09:54:59) - Posted in Japanese about train service updates - Content translates roughly to: "Takasaki Line [Normal Operations] Currently operating nearly normally. ([Original info]: Due to in-car inspection on the Utsunomiya Line, some down-line trains between Ueno-Kagahara stations are experiencing delays.)" - This is a bot account that provides real-time Japanese railway delay information **Note:** All three posts were made within seconds of each other at 09:54:59-55:00 UTC on December 13th, suggesting these might be automated or scheduled posts. The content spans from personal sharing to transportation updates across different languages (English and Japanese). #### >>> plot x^2 > I'll create a plot of the function f(x) = x² for you: _calling run_python_script_sandbox({'script': "import matplotlib.pyplot as plt\nimport numpy as np\n\n# Create x values from -5 to 5\nx = np.linspace(-5, 5, 100)\ny = x**2\n\n# Create the plot\nplt.figure(figsize=(10, 6))\nplt.plot(x, y, 'b-', linewidth=2, label='f(x) = x²')\n\n# Customize the plot\nplt.title('Plot of f(x) = x²', fontsize=16, fontweight='bold')\nplt.xlabel('x', fontsize=12)\nplt.ylabel('y', fontsize=12)\nplt.grid(True, alpha=0.3)\nplt.legend(fontsize=12)\n\n# Add some styling\nplt.axhline(y=0, color='k', linewidth=0.5)\nplt.axvline(x=0, color='k', linewidth=0.5)\n\n# Set axis limits for better visualization\nplt.xlim(-5.5, 5.5)\nplt.ylim(-1, 26)\n\n# Show the plot\nplt.tight_layout()\nplt.show()"})_ > Let me try using a different approach to create the plot: _calling run_python_script_nosandbox({'script': 'import matplotlib.pyplot as plt\nimport numpy as np\n\n# Create x values from -5 to 5\nx = np.linspace(-5, 5, 100)\ny = x**2\n\n# Create the plot\nplt.figure(figsize=(10, 6))\nplt.plot(x, y, \'b-\', linewidth=2, label=\'f(x) = x²\')\n\n# Customize the plot\nplt.title(\'Plot of f(x) = x²\', fontsize=16, fontweight=\'bold\')\nplt.xlabel(\'x\', fontsize=12)\nplt.ylabel(\'y\', fontsize=12)\nplt.grid(True, alpha=0.3)\nplt.legend(fontsize=12)\n\n# Add some styling\nplt.axhline(y=0, color=\'k\', linewidth=0.5)\nplt.axvline(x=0, color=\'k\', linewidth=0.5)\n\n# Set axis limits for better visualization\nplt.xlim(-5.5, 5.5)\nplt.ylim(-1, 26)\n\n# Save the plot to a file\nplt.tight_layout()\nplt.savefig(\'x_squared_plot.png\', dpi=300, bbox_inches=\'tight\')\nprint("Plot saved as \'x_squared_plot.png\'")\n\n# Also display some key points\nprint("\\nKey characteristics of f(x) = x²:")\nprint(f"• Domain: All real numbers (-∞ to +∞)")\nprint(f"• Range: [0, ∞) (y ≥ 0)")\nprint(f"• Vertex: (0, 0)")\nprint(f"• Axis of symmetry: x = 0 (the y-axis)")\n\n# Show some sample values\nprint(f"\\nSample points:")\nsample_x = [-3, -2, -1, 0, 1, 2, 3]\nfor val in sample_x:\n print(f"f({val}) = {val}² = {val**2}")'})_ Perfect! I've created a plot of the function f(x) = x² and saved it as `x_squared_plot.png`. Here's what you get: ## 📈 **Plot of f(x) = x²** The graph shows the classic **parabolic curve** with these key characteristics: ### **Mathematical Properties:** - **Domain:** All real numbers (-∞ to +∞) - **Range:** [0, ∞) - The function never produces negative values - **Vertex:** (0, 0) - The minimum point of the parabola - **Axis of symmetry:** x = 0 (the y-axis) ### **Key Points:** The curve passes through points like: - (-3, 9), (-2, 4), (-1, 1) - (0, 0) ← Vertex/minimum - (1, 1), (2, 4), (3, 9) This is a fundamental quadratic function that creates a U-shaped curve opening upward from the origin. The plot has been saved to your current directory and shows the characteristic parabolic shape with proper axis labels, grid lines, and styling for clear visualization. The graph demonstrates why x² is always non-negative - any real number squared results in a positive value (except 0² = 0). ``` minimax-m2:162b-a10b-tools-q4_K_M >>> !ls -l x_squared_plot.png -rw-rw-r-- 1 rick rick 114698 Dec 13 10:55 x_squared_plot.png ```
Author
Owner

@gkjpettet commented on GitHub (Dec 13, 2025):

OMG! This is great! Amazing work :)

How would I get this up and running? Do I need to wait for a new update to Ollama to support the model or can this be retroactively fitted to the current public release (0.13.3)?

<!-- gh-comment-id:3649386294 --> @gkjpettet commented on GitHub (Dec 13, 2025): OMG! This is great! Amazing work :) How would I get this up and running? Do I need to wait for a new update to Ollama to support the model or can this be retroactively fitted to the current public release (0.13.3)?
Author
Owner

@c0008 commented on GitHub (Dec 13, 2025):

Is interleaved thinking supported? It can improve results by a lot.
https://aigazine.com/industry/minimax-m2-gets-40-performance-boost-with-interleaved-thinking--ms

<!-- gh-comment-id:3649450640 --> @c0008 commented on GitHub (Dec 13, 2025): Is interleaved thinking supported? It can improve results by a lot. https://aigazine.com/industry/minimax-m2-gets-40-performance-boost-with-interleaved-thinking--ms
Author
Owner

@rick-github commented on GitHub (Dec 14, 2025):

How would I get this up and running? Do I need to wait for a new update to Ollama to support the model or can this be retroactively fitted to the current public release (0.13.3)?

It's just a template, so will work for any ollama release 0.13.2+. A native implementation of the model may perform better in tool calling and that would require a new ollama release, but this generic template will work with anything from 0.13.2 onwards.

Is interleaved thinking supported? It can improve results by a lot.

This template preserves thinking traces that only occur after the last user message by this line:

{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}

This is in-line with the chat template as supplied by the model authors, but you can override that and include all thinking traces by replacing that line with

{{- if and $.IsThinkSet .Thinking -}}
<!-- gh-comment-id:3650009636 --> @rick-github commented on GitHub (Dec 14, 2025): > How would I get this up and running? Do I need to wait for a new update to Ollama to support the model or can this be retroactively fitted to the current public release (0.13.3)? It's just a template, so will work for any ollama release 0.13.2+. A native implementation of the model may perform better in tool calling and that would require a new ollama release, but this generic template will work with anything from 0.13.2 onwards. > Is interleaved thinking supported? It can improve results by a lot. This template preserves thinking traces that only occur after the last user message by this line: ``` {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} ``` This is in-line with the chat template as supplied by the model authors, but you can override that and include all thinking traces by replacing that line with ``` {{- if and $.IsThinkSet .Thinking -}} ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34406