ollama

mirror of https://github.com/ollama/ollama.git synced 2026-03-09 07:16:38 -05:00

Files

Jesse Gross a60b9adcce mlxrunner: Fix prompt eval timing and count metrics

Only the last token's processing time is included in prompt processing,
giving an artificially high rate. In addition, the number of tokens
only included the tokens that miss the cache, instead of our historic
total tokens.

2026-02-27 17:29:47 -08:00

cache

mlxrunner: Fix duplicate log prefixes and reduce log noise

2026-02-23 14:09:20 -08:00

mlx

show peak memory usage (#14485 )

2026-02-26 18:38:27 -08:00

model

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

sample

mlxrunner fixes (#14247 )

2026-02-13 22:30:42 -08:00

cache.go

mlxrunner: Fix panic on full KV cache hit

2026-02-27 11:07:03 -08:00

client.go

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

imports.go

model: add qwen3 support to mlxrunner (#14293 )

2026-02-17 13:58:49 -08:00

pipeline.go

mlxrunner: Fix prompt eval timing and count metrics

2026-02-27 17:29:47 -08:00

runner.go

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

server_stub.go

Add MLX runner with GLM4-MoE-Lite model support (#14185 )

2026-02-10 14:57:57 -08:00

server.go

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

utf8_buffer_test.go

consolidate the tokenizer (#14327 )

2026-02-19 15:55:45 -08:00

utf8_buffer.go

consolidate the tokenizer (#14327 )

2026-02-19 15:55:45 -08:00