[GH-ISSUE #10089] Exceeding GPU memory even I have 2 GPUs #68671

Closed
opened 2026-05-04 14:47:56 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @aymanelbacha on GitHub (Apr 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10089

What is the issue?

Once I run the following
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.api_server --model /models/llama --tensor-parallel-size 2
I am getting the error mentioned below

Following the workaround to avoid fragmentation I tried to export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True and then run the cmd, but no improvment
Appreciate your support
I am limited to 2 GPUs in my setup, each with 16 GB vRAM, I am using runpod for simulatation

Relevant log output

[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 15.60 GiB of which 700.88 MiB is free. Process 2514 has 14.91 GiB memory in use. Of the allocated memory 14.62 GiB is allocated by PyTorch, and 87.96 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
INFO 04-02 14:50:40 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
[rank0]:[W402 14:50:41.497221295 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

phi4:14b-q8_0

Originally created by @aymanelbacha on GitHub (Apr 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10089 ### What is the issue? Once I run the following CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.api_server --model /models/llama --tensor-parallel-size 2 I am getting the error mentioned below Following the workaround to avoid fragmentation I tried to export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True and then run the cmd, but no improvment Appreciate your support I am limited to 2 GPUs in my setup, each with 16 GB vRAM, I am using runpod for simulatation ### Relevant log output ```shell [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 15.60 GiB of which 700.88 MiB is free. Process 2514 has 14.91 GiB memory in use. Of the allocated memory 14.62 GiB is allocated by PyTorch, and 87.96 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) INFO 04-02 14:50:40 [multiproc_worker_utils.py:124] Killing local vLLM worker processes [rank0]:[W402 14:50:41.497221295 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version phi4:14b-q8_0
GiteaMirror added the bug label 2026-05-04 14:47:56 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 2, 2025):

Issue tracker for vllm is here.

<!-- gh-comment-id:2772958335 --> @rick-github commented on GitHub (Apr 2, 2025): Issue tracker for vllm is [here](https://github.com/vllm-project/vllm/issues).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68671