[GH-ISSUE #3732] num_gpu is not working in modelfile based on another model. #2298

Closed
opened 2026-04-12 12:34:15 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chigkim on GitHub (Apr 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3732

What is the issue?

First I downloaded wizardlm2:8x22b.
ollama pull wizardlm2:8x22b
I'm trying to offload only 30 layers to gpu using this modelfile with PARAMETER num_gpu 30:

FROM wizardlm2:8x22b
TEMPLATE """{{ if .System }}{{ .System }} {{ end }}{{ if .Prompt }}USER: {{ .Prompt }} {{ end }}ASSISTANT: {{ .Response }}"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""
PARAMETER stop "USER:"
PARAMETER stop "ASSISTANT:"
PARAMETER num_gpu 30

However, server.log indicates 0 layers are offloaded.
If I create a model from .gguf file that I downloaded from HF, it works.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.2

Originally created by @chigkim on GitHub (Apr 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3732 ### What is the issue? First I downloaded wizardlm2:8x22b. `ollama pull wizardlm2:8x22b` I'm trying to offload only 30 layers to gpu using this modelfile with PARAMETER num_gpu 30: ``` FROM wizardlm2:8x22b TEMPLATE """{{ if .System }}{{ .System }} {{ end }}{{ if .Prompt }}USER: {{ .Prompt }} {{ end }}ASSISTANT: {{ .Response }}""" SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.""" PARAMETER stop "USER:" PARAMETER stop "ASSISTANT:" PARAMETER num_gpu 30 ``` However, server.log indicates 0 layers are offloaded. If I create a model from .gguf file that I downloaded from HF, it works. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.3.2
GiteaMirror added the bug label 2026-04-12 12:34:15 -05:00
Author
Owner

@pdevine commented on GitHub (Jul 11, 2024):

This worked fine for me in 0.2.1.

% ollama ps
NAME                      	ID          	SIZE 	PROCESSOR      	UNTIL
pdevine/wizard-test:latest	306b7caca5bf	83 GB	45%/55% CPU/GPU	4 minutes from now

I went ahead and pushed the model, so you can pull it with ollama pull pdevine/wizard-test. It should reuse your weights so you won't have to download them again. My test was on an M3 MBP.

I'm going to go ahead and close the issue (and sorry for the slow response!)

<!-- gh-comment-id:2221962435 --> @pdevine commented on GitHub (Jul 11, 2024): This worked fine for me in `0.2.1`. ``` % ollama ps NAME ID SIZE PROCESSOR UNTIL pdevine/wizard-test:latest 306b7caca5bf 83 GB 45%/55% CPU/GPU 4 minutes from now ``` I went ahead and pushed the model, so you can pull it with `ollama pull pdevine/wizard-test`. It should reuse your weights so you won't have to download them again. My test was on an M3 MBP. I'm going to go ahead and close the issue (and sorry for the slow response!)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2298