[GH-ISSUE #14156] FP8 (F8_E4M3) tensor type support for text models #9230

Open
opened 2026-04-12 22:05:59 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @rjmalagon on GitHub (Feb 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14156

More than a plain feature request, it is more an inquiry about eventual FP8 support on Ollama.

Despite the limited hardware (and software) support, some models are being published on FP8 (F8_E4M3).

Notable MOE models like Qwen/Qwen3-Coder-Next-FP8 sound very welcome if FP8 retains better precision than Q8 (INT8) for the active parameters.

It is actually difficult to compare both types for an informed discussion because I only found FP8 support on VLLM. Building VLLM for my actual setup has been challenging (what a Python mess, even on containers).

Originally created by @rjmalagon on GitHub (Feb 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14156 More than a plain feature request, it is more an inquiry about eventual FP8 support on Ollama. Despite the limited hardware (and software) support, some models are being published on FP8 (F8_E4M3). Notable MOE models like Qwen/Qwen3-Coder-Next-FP8 sound very welcome if FP8 retains better precision than Q8 (INT8) for the active parameters. It is actually difficult to compare both types for an informed discussion because I only found FP8 support on VLLM. Building VLLM for my actual setup has been challenging (what a Python mess, even on containers).
GiteaMirror added the feature request label 2026-04-12 22:05:59 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 8, 2026):

It's possible that FP8 support will come via the MLX backend, jmorgan mentioned fp8 and nvfp4 in a discord post.

<!-- gh-comment-id:3868205427 --> @rick-github commented on GitHub (Feb 8, 2026): It's possible that FP8 support will come via the MLX backend, jmorgan mentioned fp8 and nvfp4 in a [discord post](https://discord.com/channels/1128867683291627614/1464814269479780458/1464863039106056213).
Author
Owner

@rjmalagon commented on GitHub (Feb 8, 2026):

Sounds good @rick-github , but how far is the support of the MLX backend for Linux+ROCM or Linux+Vulkan. As far as I know, MLX is an Apple thing and only used with image generation models.

<!-- gh-comment-id:3868255046 --> @rjmalagon commented on GitHub (Feb 8, 2026): Sounds good @rick-github , but how far is the support of the MLX backend for Linux+ROCM or Linux+Vulkan. As far as I know, MLX is an Apple thing and only used with image generation models.
Author
Owner

@rick-github commented on GitHub (Feb 8, 2026):

Support for text/text models on the MLX backend is progressing, eg 0.15.5 has support for glm-4.7-flash. Currently only MacOS and Linux+CUDA are supported, but there is an open PR in ollama for Windows support, and discussion in mlx regarding the Vulkan backend (besides others).

<!-- gh-comment-id:3868341705 --> @rick-github commented on GitHub (Feb 8, 2026): Support for text/text models on the MLX backend is progressing, eg [0.15.5](https://github.com/ollama/ollama/releases/tag/v0.15.5) has support for glm-4.7-flash. Currently only MacOS and Linux+CUDA are supported, but there is an open [PR](https://github.com/ollama/ollama/pull/13806) in ollama for Windows support, and [discussion](https://github.com/ml-explore/mlx/pull/3098) in mlx regarding the Vulkan backend (besides others).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9230