[GH-ISSUE #6764] llama3.1:70B 16fp not working on nvidia H100 #4263

Closed
opened 2026-04-12 15:11:50 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @AliAhmedNada on GitHub (Sep 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6764

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Hello

I am trying to work on on llama3.1 70B f16 , it seems not to work properly for some reason

root@xxxxx:/home/ollama/models# ollama run llama3.1:70b-instruct-fp16
Error: llama runner process no longer running: -1 

how to investiagte in this part ?!

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

Originally created by @AliAhmedNada on GitHub (Sep 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6764 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Hello I am trying to work on on llama3.1 70B f16 , it seems not to work properly for some reason ``` root@xxxxx:/home/ollama/models# ollama run llama3.1:70b-instruct-fp16 Error: llama runner process no longer running: -1 ``` how to investiagte in this part ?! ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.10
GiteaMirror added the nvidiamemoryneeds more infobug labels 2026-04-12 15:11:51 -05:00
Author
Owner

@pdevine commented on GitHub (Sep 11, 2024):

Can you provide the server logs? Also, you're running this as root?

<!-- gh-comment-id:2344914038 --> @pdevine commented on GitHub (Sep 11, 2024): Can you [provide the server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues)? Also, you're running this as root?
Author
Owner

@pdevine commented on GitHub (Sep 11, 2024):

I tried reproducing this using Paperspace's ML-in-a-box image and everything seems to be working fine:

paperspace@pshp0xdbcmic:~$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%#=#=#
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.
paperspace@pshp0xdbcmic:~$ ollama run llama3.1:70b-instruct-fp16
pulling manifest
pulling f3da77bf16bc... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 141 GB
pulling 948af2743fc7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏  12 KB
pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏   96 B
pulling 44ad5c5c5b2c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
success
>>> hi there
Hello! How can I assist you today?

>>>
paperspace@pshp0xdbcmic:~$ nvidia-smi
Wed Sep 11 23:55:17 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000000:00:05.0 Off |                    0 |
| N/A   26C    P0             115W / 700W |  80202MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1953      G   /usr/lib/xorg/Xorg                           70MiB |
|    0   N/A  N/A      2086      G   /usr/bin/gnome-shell                         82MiB |
|    0   N/A  N/A      5341      C   ...unners/cuda_v12/ollama_llama_server    80032MiB |
+---------------------------------------------------------------------------------------+
paperspace@pshp0xdbcmic:~$ ollama ps
NAME                      	ID          	SIZE  	PROCESSOR      	UNTIL
llama3.1:70b-instruct-fp16	80d34437631f	143 GB	41%/59% CPU/GPU	4 minutes from now
<!-- gh-comment-id:2344993060 --> @pdevine commented on GitHub (Sep 11, 2024): I tried reproducing this using Paperspace's _ML-in-a-box_ image and everything seems to be working fine: ``` paperspace@pshp0xdbcmic:~$ curl -fsSL https://ollama.com/install.sh | sh >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ######################################################################## 100.0%#=#=# >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> NVIDIA GPU installed. paperspace@pshp0xdbcmic:~$ ollama run llama3.1:70b-instruct-fp16 pulling manifest pulling f3da77bf16bc... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 141 GB pulling 948af2743fc7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B pulling 44ad5c5c5b2c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest success >>> hi there Hello! How can I assist you today? >>> paperspace@pshp0xdbcmic:~$ nvidia-smi Wed Sep 11 23:55:17 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:00:05.0 Off | 0 | | N/A 26C P0 115W / 700W | 80202MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1953 G /usr/lib/xorg/Xorg 70MiB | | 0 N/A N/A 2086 G /usr/bin/gnome-shell 82MiB | | 0 N/A N/A 5341 C ...unners/cuda_v12/ollama_llama_server 80032MiB | +---------------------------------------------------------------------------------------+ paperspace@pshp0xdbcmic:~$ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.1:70b-instruct-fp16 80d34437631f 143 GB 41%/59% CPU/GPU 4 minutes from now ```
Author
Owner

@pdevine commented on GitHub (Sep 12, 2024):

@AliAhmedNada honestly I think you're probably just out of memory and the OOM killer is nuking the runner.

<!-- gh-comment-id:2345017174 --> @pdevine commented on GitHub (Sep 12, 2024): @AliAhmedNada honestly I think you're probably just out of memory and the OOM killer is nuking the runner.
Author
Owner

@AliAhmedNada commented on GitHub (Sep 15, 2024):

Hey @pdevine Correct Error was in memory , i ve increase memory size and it worked , Thanks !!

<!-- gh-comment-id:2351398779 --> @AliAhmedNada commented on GitHub (Sep 15, 2024): Hey @pdevine Correct Error was in memory , i ve increase memory size and it worked , Thanks !!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4263