ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-28 08:48:22 -05:00

Files

Daniel Hiltgen 03aee88186 mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )

* mlx: Support NVIDIA TensorRT Model Optimizer import

* x/create: support FP8 safetensors import

Decode HF F8_E4M3 safetensors with block scale companions into MLX-importable tensor blobs, including compressed-tensors weight_scale metadata, packed NVFP4 layouts, and mixed-precision tensor headers.

Use that source-precision metadata during create quantization: default FP8-sourced imports to mxfp8, allow source FP8 to target MLX low-bit formats, preserve source-quantized NVFP4 layouts, selectively keep or promote tensors based on their source precision, and detect quantized dtype from mixed-precision safetensors manifests.

* review comments

2026-04-27 18:28:10 -07:00

create_test.go

mlx: mixed-precision quant and capability detection improvements (#15409 )

2026-04-13 11:43:07 -07:00

create.go

Gemma4 on MLX (#15244 )

2026-04-13 16:36:51 -07:00

quantize_test.go

mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )

2026-04-27 18:28:10 -07:00

quantize.go

mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )

2026-04-27 18:28:10 -07:00