[GH-ISSUE #12725] Optimization with Nvidia DGX Spark #8443

Closed
opened 2026-04-12 21:07:27 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @VistritPandey on GitHub (Oct 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12725

What is the issue?

I am not sure if this is a bug or just a performance optimization issue, but Ollama seems to be performing slower on the new DGX Spark.

Tested the same example on a Mac Studio with an M3 Ultra, and it was 2.5 times faster. In theory, DGX Spark should be fast, but I am unsure whether it's an Ollama issue or a Spark issue.

Fwiw, Ollama is being utilized properly, when I wanna see the GPU usage using nvidia-smi. Models I tried:

  • Llama 3.2 (3b)
  • Llama 3.2:vision (11b)
  • Llama 3.3 (70b)
  • deepseek-r1 (with thinking enabled) (8b)

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.12.6

Originally created by @VistritPandey on GitHub (Oct 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12725 ### What is the issue? I am not sure if this is a bug or just a performance optimization issue, but Ollama seems to be performing slower on the new DGX Spark. Tested the same example on a Mac Studio with an M3 Ultra, and it was 2.5 times faster. In theory, DGX Spark should be fast, but I am unsure whether it's an Ollama issue or a Spark issue. Fwiw, Ollama is being utilized properly, when I wanna see the GPU usage using `nvidia-smi`. Models I tried: - Llama 3.2 (3b) - Llama 3.2:vision (11b) - Llama 3.3 (70b) - deepseek-r1 (with thinking enabled) (8b) ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.12.6
GiteaMirror added the bug label 2026-04-12 21:07:27 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 21, 2025):

By what measure is ollama slower?

<!-- gh-comment-id:3429438132 --> @rick-github commented on GitHub (Oct 21, 2025): By what measure is ollama slower?
Author
Owner

@VistritPandey commented on GitHub (Oct 21, 2025):

I meant the time it takes to process a request. For example, if it takes 30 seconds to process a request, it takes approximately 70ish seconds for the same request, settings, and model to process the request

Everything (prompt, settings, model, ctx, etc.) was the same, except the environment and hardware

<!-- gh-comment-id:3429485031 --> @VistritPandey commented on GitHub (Oct 21, 2025): I meant the time it takes to process a request. For example, if it takes 30 seconds to process a request, it takes approximately 70ish seconds for the same request, settings, and model to process the request Everything (prompt, settings, model, ctx, etc.) was the same, except the environment and hardware
Author
Owner

@rick-github commented on GitHub (Oct 21, 2025):

DGX Spark has a memory bandwidth of 273 GB/s, an M3 Ultra has 819 GB/s. The DGX has a better GPU, but processing time is going to be dominated by how long it takes to get the weights and activations into the GPU.

<!-- gh-comment-id:3429532493 --> @rick-github commented on GitHub (Oct 21, 2025): DGX Spark has a memory bandwidth of 273 GB/s, an M3 Ultra has 819 GB/s. The DGX has a better GPU, but processing time is going to be dominated by how long it takes to get the weights and activations into the GPU.
Author
Owner

@VistritPandey commented on GitHub (Oct 21, 2025):

Thanks, that makes sense, I was just flabbergasted by the "TFLOPS" lol

<!-- gh-comment-id:3429579584 --> @VistritPandey commented on GitHub (Oct 21, 2025): Thanks, that makes sense, I was just flabbergasted by the "TFLOPS" lol
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8443