ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-03-11 20:23:55 -05:00

Files

Jesse Gross 4d5ff25724 mlxrunner: Report actual memory usage from runner

The MLX runner previously reported a static VRAM estimate that was
computed at load time and consisted only of the weights. This is
strictly less than the actual memory usage, as it does not include
the KV cache or compute graph.

2026-02-25 15:06:37 -08:00

cache

mlxrunner: Fix duplicate log prefixes and reduce log noise

2026-02-23 14:09:20 -08:00

mlx

update mlx-c bindings to 0.5.0 (#14380 )

2026-02-23 16:44:29 -08:00

model

mlx: don't default to affine quantization for unquantized models

2026-02-23 15:03:53 -08:00

sample

mlxrunner fixes (#14247 )

2026-02-13 22:30:42 -08:00

cache.go

mlxrunner: Simplify pipeline memory and cache management

2026-02-25 14:00:42 -08:00

client.go

mlxrunner: Report actual memory usage from runner

2026-02-25 15:06:37 -08:00

imports.go

model: add qwen3 support to mlxrunner (#14293 )

2026-02-17 13:58:49 -08:00

pipeline.go

mlxrunner: Cancel in-flight requests when the client disconnects

2026-02-25 14:00:42 -08:00

runner.go

mlxrunner: Cancel in-flight requests when the client disconnects

2026-02-25 14:00:42 -08:00

server_stub.go

Add MLX runner with GLM4-MoE-Lite model support (#14185 )

2026-02-10 14:57:57 -08:00

server.go

mlxrunner: Report actual memory usage from runner

2026-02-25 15:06:37 -08:00

utf8_buffer_test.go

consolidate the tokenizer (#14327 )

2026-02-19 15:55:45 -08:00

utf8_buffer.go

consolidate the tokenizer (#14327 )

2026-02-19 15:55:45 -08:00