[GH-ISSUE #5271] Low VRAM Utilization on RTX 3090 When Models are Split Across Multiple CUDA Devices (separate ollama serve) #3301

Closed
opened 2026-04-12 13:51:59 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @chrisoutwright on GitHub (Jun 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5271

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Environment

  • Ollama Version: 0.1.45

  • Operating System: Win10

  • GPU Type: NVIDIA RTX 3090, GTX 1080Ti

Issue Description

I am experiencing an issue with VRAM utilization in Ollama 0.1.45. When using the codestral example to split models across different CUDA devices on an RTX 3090 and GTX 1080Ti (one GPU per model that is!) , it appears that only 10GB of VRAM is being used for Codestal now. This is similar to what one might expect with a GTX 1080Ti, suggesting that there might be a misconfiguration or a bug in how VRAM is allocated or recognized for the RTX 3090.

I am using:
$env:CUDA_VISIBLE_DEVICES=0 for 3090
and
$env:CUDA_VISIBLE_DEVICES=1 for 1080
which correspond to the identifiers
and am using ollama serve

Capture

OS

Windows

GPU

Nvidia

CPU

No response

Ollama version

0.1.45

Originally created by @chrisoutwright on GitHub (Jun 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5271 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? ## Environment - **Ollama Version**: 0.1.45 - **Operating System**: Win10 - **GPU Type**: NVIDIA RTX 3090, GTX 1080Ti ## Issue Description I am experiencing an issue with VRAM utilization in Ollama 0.1.45. When using the codestral example to split models across different CUDA devices on an RTX 3090 and GTX 1080Ti (one GPU per model that is!) , it appears that only 10GB of VRAM is being used for Codestal now. This is similar to what one might expect with a GTX 1080Ti, suggesting that there might be a misconfiguration or a bug in how VRAM is allocated or recognized for the RTX 3090. I am using: $env:CUDA_VISIBLE_DEVICES=0 for 3090 and $env:CUDA_VISIBLE_DEVICES=1 for 1080 which correspond to the identifiers and am using ollama serve ![Capture](https://github.com/ollama/ollama/assets/27736055/d0362621-136f-47c0-b3fe-1bcc619d4893) ### OS Windows ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.1.45
GiteaMirror added the nvidiabug labels 2026-04-12 13:51:59 -05:00
Author
Owner

@chrisoutwright commented on GitHub (Jun 25, 2024):

For 0.1.43 I did not experience this.

here my script I use to configure the nodes:

# Define the environments for each node
$envNode6 = @{
    HostAddress = "0.0.0.0:11436"
    CUDA = "0"
    OllamaPath = "J:\Ollama\Ollama_node6"
    MaxLoadedModels = "1"
    NumParallel = "1"
}

$envNode2 = @{
    HostAddress = "0.0.0.0:11439"
    CUDA = "1"
    OllamaPath = "J:\Ollama\Ollama_node2"
	    MaxLoadedModels = "1"
    NumParallel = "1"
}
#$envNode6,$envNode5,
# Set environment variables for each node and start the ollama serve command
foreach ($env in @($envNode2,$envNode6)) {
    Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_HOST' -Value $env.HostAddress
    Set-ItemProperty -Path 'HKCU:\Environment' -Name 'CUDA_VISIBLE_DEVICES' -Value $env.CUDA
    Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_MODELS' -Value $env.OllamaPath
    Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_MAX_LOADED_MODELS' -Value $env.MaxLoadedModels
    Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_NUM_PARALLEL' -Value $env.NumParallel

    Start-Process powershell -ArgumentList "-Command `"`$env:OLLAMA_HOST='$($env.HostAddress)'; echo `$env:OLLAMA_HOST; `$env:OLLAMA_MAX_LOADED_MODELS='$($env.MaxLoadedModels)'; echo `$env:OLLAMA_MAX_LOADED_MODELS; `$env:OLLAMA_NUM_PARALLEL='$($env.NumParallel)'; echo `$env:OLLAMA_NUM_PARALLEL;`$env:CUDA_VISIBLE_DEVICES='$($env.CUDA)'; echo `$env:CUDA_VISIBLE_DEVICES; `$env:OLLAMA_MODELS='$($env.OllamaPath)'; echo `$env:OLLAMA_MODELS; J:\Ollama\Ollama_node2\ollama.exe serve; Read-Host 'Press any key to close the instance.'`"" -WindowStyle Normal -Verb RunAs

}
<!-- gh-comment-id:2188148908 --> @chrisoutwright commented on GitHub (Jun 25, 2024): For 0.1.43 I did not experience this. here my script I use to configure the nodes: ``` # Define the environments for each node $envNode6 = @{ HostAddress = "0.0.0.0:11436" CUDA = "0" OllamaPath = "J:\Ollama\Ollama_node6" MaxLoadedModels = "1" NumParallel = "1" } $envNode2 = @{ HostAddress = "0.0.0.0:11439" CUDA = "1" OllamaPath = "J:\Ollama\Ollama_node2" MaxLoadedModels = "1" NumParallel = "1" } #$envNode6,$envNode5, # Set environment variables for each node and start the ollama serve command foreach ($env in @($envNode2,$envNode6)) { Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_HOST' -Value $env.HostAddress Set-ItemProperty -Path 'HKCU:\Environment' -Name 'CUDA_VISIBLE_DEVICES' -Value $env.CUDA Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_MODELS' -Value $env.OllamaPath Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_MAX_LOADED_MODELS' -Value $env.MaxLoadedModels Set-ItemProperty -Path 'HKCU:\Environment' -Name 'OLLAMA_NUM_PARALLEL' -Value $env.NumParallel Start-Process powershell -ArgumentList "-Command `"`$env:OLLAMA_HOST='$($env.HostAddress)'; echo `$env:OLLAMA_HOST; `$env:OLLAMA_MAX_LOADED_MODELS='$($env.MaxLoadedModels)'; echo `$env:OLLAMA_MAX_LOADED_MODELS; `$env:OLLAMA_NUM_PARALLEL='$($env.NumParallel)'; echo `$env:OLLAMA_NUM_PARALLEL;`$env:CUDA_VISIBLE_DEVICES='$($env.CUDA)'; echo `$env:CUDA_VISIBLE_DEVICES; `$env:OLLAMA_MODELS='$($env.OllamaPath)'; echo `$env:OLLAMA_MODELS; J:\Ollama\Ollama_node2\ollama.exe serve; Read-Host 'Press any key to close the instance.'`"" -WindowStyle Normal -Verb RunAs } ```
Author
Owner

@chrisoutwright commented on GitHub (Jun 29, 2024):

Any ideas on why this would happen even when specifying explicitly the CUDA_VISIBLE_DEVICES to be separate even for the 2 models?

<!-- gh-comment-id:2198145844 --> @chrisoutwright commented on GitHub (Jun 29, 2024): Any ideas on why this would happen even when specifying explicitly the CUDA_VISIBLE_DEVICES to be separate even for the 2 models?
Author
Owner

@dhiltgen commented on GitHub (Jul 2, 2024):

Numeric ordering can be ambiguous in some contexts, although we might have a bug. Can you use the UUID of the GPUs? nvidia-smi -L

https://github.com/ollama/ollama/blob/main/docs/gpu.md#gpu-selection

<!-- gh-comment-id:2204571738 --> @dhiltgen commented on GitHub (Jul 2, 2024): Numeric ordering can be ambiguous in some contexts, although we might have a bug. Can you use the UUID of the GPUs? `nvidia-smi -L` https://github.com/ollama/ollama/blob/main/docs/gpu.md#gpu-selection
Author
Owner

@dhiltgen commented on GitHub (Aug 1, 2024):

If you're still having trouble with multiple cuda GPUs, please make sure to upgrade to the latest version, and if that doesn't clear it up, please share an updated server log and I'll reopen the issue.

<!-- gh-comment-id:2264135101 --> @dhiltgen commented on GitHub (Aug 1, 2024): If you're still having trouble with multiple cuda GPUs, please make sure to upgrade to the latest version, and if that doesn't clear it up, please share an updated server log and I'll reopen the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3301