[GH-ISSUE #5024] Multiple GPU HI00 #65218

New Issue

GiteaMirror · 2026-05-03T20:03:00-05:00

GiteaMirror commented

2026-05-03 20:03:00 -05:00

Originally created by @sksdev27 on GitHub (Jun 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5024

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I have NVIDA H100 multiple of them with NVLINK but ollama seems to only use 1 nvidia gpu. I tried various deployments but here is current one:

nvidia-smi

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

Dockerfile

NVIDIA CUDA 12.2

FROM nvcr.io/nvidia/ai-workbench/python-cuda122:1.0.3

Set up environment variables

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV CUDA_VISIBLE_DEVICES=0,1,2,3
ENV OLLAMA_CONFIG_PATH=/opt/ollama/ollama.yaml

Install dependencies

RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*

Install Ollama

RUN wget https://ollama.com/install.sh -O - | bash

Copy the configuration file to the expected location

COPY ollama.yaml /opt/ollama/ollama.yaml

Set working directory

WORKDIR /opt/ollama

Expose port for Ollama

EXPOSE 5000

Default command to start Ollama

CMD ["ollama", "start"]

version: '3.8'

services:
ollama:
image: ollama-cuda122
build: .
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- CUDA_VISIBLE_DEVICES=0,1,2,3
ports:
- "5000:5000"
volumes:
- ./models:/opt/ollama/models # Mount the models directory
restart: unless-stopped

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.1.43

Originally created by @sksdev27 on GitHub (Jun 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5024 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I have NVIDA H100 multiple of them with NVLINK but ollama seems to only use 1 nvidia gpu. I tried various deployments but here is current one: nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA H100 PCIe Off | 00000000:17:00.0 Off | 0 | | N/A 35C P0 49W / 310W | 7MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA H100 PCIe Off | 00000000:65:00.0 Off | 0 | | N/A 33C P0 47W / 310W | 7MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA H100 PCIe Off | 00000000:CA:00.0 Off | 0 | | N/A 32C P0 47W / 310W | 7MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA H100 PCIe Off | 00000000:E3:00.0 Off | 0 | | N/A 33C P0 49W / 310W | 7MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ Dockerfile # NVIDIA CUDA 12.2 FROM nvcr.io/nvidia/ai-workbench/python-cuda122:1.0.3 # Set up environment variables ENV LANG=C.UTF-8 LC_ALL=C.UTF-8 ENV CUDA_VISIBLE_DEVICES=0,1,2,3 ENV OLLAMA_CONFIG_PATH=/opt/ollama/ollama.yaml # Install dependencies RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/* # Install Ollama RUN wget https://ollama.com/install.sh -O - | bash # Copy the configuration file to the expected location COPY ollama.yaml /opt/ollama/ollama.yaml # Set working directory WORKDIR /opt/ollama # Expose port for Ollama EXPOSE 5000 # Default command to start Ollama CMD ["ollama", "start"] version: '3.8' services: ollama: image: ollama-cuda122 build: . deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] environment: - CUDA_VISIBLE_DEVICES=0,1,2,3 ports: - "5000:5000" volumes: - ./models:/opt/ollama/models # Mount the models directory restart: unless-stopped ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.1.43

GiteaMirror added the bug label 2026-05-03 20:03:00 -05:00

GiteaMirror closed this issue

2026-05-03 20:03:02 -05:00

GiteaMirror commented

2026-05-03 20:03:03 -05:00

@dhiltgen commented on GitHub (Jun 18, 2024):

Can you share your server log?

My suspicion is we do see all the GPUs, but you are loading a model that fits in 1 GPUs VRAM and we're only loading it on one. If you attempt to load a large model, it will spread, or you can set OLLAMA_SCHED_SPREAD to force it to spread over multiple GPUs on newer versions.

@dhiltgen commented on GitHub (Jun 18, 2024): Can you share your server log? My suspicion is we do see all the GPUs, but you are loading a model that fits in 1 GPUs VRAM and we're only loading it on one. If you attempt to load a large model, it will spread, or you can set OLLAMA_SCHED_SPREAD to force it to spread over multiple GPUs on newer versions.

GiteaMirror commented

2026-05-03 20:03:04 -05:00

@sksdev27 commented on GitHub (Jun 19, 2024):

We this time when i tried the 70b it couldn't load in 1 gpu so it failed:
here are the docker logs:

docker_logs_ollama.log

I also tried setting the OLLAMA_SCHED_SPREAD: docker run --gpus all -p 11434:11434 -e OLLAMA_SCHED_SPREAD=1 -it --rm ollama/ollama:latest
here is its logs:
OLLAMA_SCHED_SPREAD.log

@sksdev27 commented on GitHub (Jun 19, 2024): We this time when i tried the 70b it couldn't load in 1 gpu so it failed: here are the docker logs: [docker_logs_ollama.log](https://github.com/user-attachments/files/15895114/docker_logs_ollama.log) I also tried setting the OLLAMA_SCHED_SPREAD: docker run --gpus all -p 11434:11434 -e OLLAMA_SCHED_SPREAD=1 -it --rm ollama/ollama:latest here is its logs: [OLLAMA_SCHED_SPREAD.log](https://github.com/user-attachments/files/15895227/OLLAMA_SCHED_SPREAD.log)

GiteaMirror commented

2026-05-03 20:03:04 -05:00

@dhiltgen commented on GitHub (Jun 19, 2024):

From the looks of the first log, your client gave up after ~2 minutes and we aborted the load as a result of that.

time=2024-06-19T03:33:26.815Z level=WARN source=server.go:536 msg="client connection closed before server finished loading, aborting load"

You may see better load performance by disabling mmap

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?",
  "stream": false, "options": {"num_gpu": 21 }
}'

I forgot that OLLAMA_SCHED_SPREAD is new in 0.1.45 which explains why 0.1.44 didn't respect it.

@dhiltgen commented on GitHub (Jun 19, 2024): From the looks of the first log, your client gave up after ~2 minutes and we aborted the load as a result of that. ``` time=2024-06-19T03:33:26.815Z level=WARN source=server.go:536 msg="client connection closed before server finished loading, aborting load" ``` You may see better load performance by disabling mmap ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt": "Why is the sky blue?", "stream": false, "options": {"num_gpu": 21 } }' ``` I forgot that OLLAMA_SCHED_SPREAD is new in 0.1.45 which explains why 0.1.44 didn't respect it.

GiteaMirror commented

2026-05-03 20:03:05 -05:00

@sksdev27 commented on GitHub (Jun 20, 2024):

ollama_45_rc3.log
ollama_45_rc2.log
ollama_45_rc4_rom.log

Not sure what crashed NVIDIA GPU. but after running the ollama GPU crashes.

I wanted to load this we openwebui not sure disabling mmap would be posible with openwebui

@sksdev27 commented on GitHub (Jun 20, 2024): [ollama_45_rc3.log](https://github.com/user-attachments/files/15919215/ollama_45_rc3.log) [ollama_45_rc2.log](https://github.com/user-attachments/files/15919216/ollama_45_rc2.log) [ollama_45_rc4_rom.log](https://github.com/user-attachments/files/15919217/ollama_45_rc4_rom.log) Not sure what crashed NVIDIA GPU. but after running the ollama GPU crashes. I wanted to load this we openwebui not sure disabling mmap would be posible with openwebui

GiteaMirror commented

2026-05-03 20:03:06 -05:00

@dhiltgen commented on GitHub (Jun 20, 2024):

In the next release (0.1.46) we'll have automatic mmap logic so if the model is larger than the free memory on the system, we'll revert to regular file reads instead of mmap. From your logs though, it looks like this system has a lot of memory, so we'd still default to mmap for the model you're trying to load.

You didn't mention what model you're trying to load, however I see the load timed out before the cuda error happened, so it's possible this was a race of trying to shutdown while it was still loading. I'd suggest trying to load this model with mmap disabled using curl (see above) and see if that at least gets it to load, or if there's still some other bug lurking in here.

If switching to regular file reads solves the problem, then I may be able to adjust the algorithm to set some upper threshold where we disable mmap for extremely large models, but I don't want to do that until we can confirm it actually solves the problem.

@dhiltgen commented on GitHub (Jun 20, 2024): In the next release (0.1.46) we'll have [automatic mmap logic](https://github.com/ollama/ollama/pull/5194) so if the model is larger than the free memory on the system, we'll revert to regular file reads instead of mmap. From your logs though, it looks like this system has a lot of memory, so we'd still default to mmap for the model you're trying to load. You didn't mention what model you're trying to load, however I see the load timed out before the cuda error happened, so it's possible this was a race of trying to shutdown while it was still loading. I'd suggest trying to load this model with mmap disabled using curl (see above) and see if that at least gets it to load, or if there's still some other bug lurking in here. If switching to regular file reads solves the problem, then I may be able to adjust the algorithm to set some upper threshold where we disable mmap for extremely large models, but I don't want to do that until we can confirm it actually solves the problem.

GiteaMirror commented

2026-05-03 20:03:07 -05:00

@sksdev27 commented on GitHub (Jun 21, 2024):

tested out 0.1.45.rc4 with the curl command Here are the logs:

ollama_45_rc4_mmap_dis.log

NVIDIA SMI during exit:

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 44047 C ...unners/cuda_v11/ollama_llama_server 4456MiB |
| 1 N/A N/A 44047 C ...unners/cuda_v11/ollama_llama_server 3138MiB |
| 2 N/A N/A 44047 C ...unners/cuda_v11/ollama_llama_server 3138MiB |
| 3 N/A N/A 44047 C ...unners/cuda_v11/ollama_llama_server 3138MiB |
+---------------------------------------------------------------------------------------+

GiteaMirror commented

2026-05-03 20:03:07 -05:00

@dhiltgen commented on GitHub (Jun 21, 2024):

Hmm... those logs don't seem to indicate use_mmap=false was passed. It's still using the mmap logic to load the model.

The subprocess was started with the following:

time=2024-06-21T15:20:56.247Z level=INFO source=server.go:359 msg="starting llama server" cmd="/tmp/ollama3491629577/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 21 --verbose --parallel 1 --tensor-split 6,5,5,5 --tensor-split 6,5,5,5 --port 44779"

There should be an additional --no-mmap flag passed in there if use_mmap=false was passed in.

@dhiltgen commented on GitHub (Jun 21, 2024): Hmm... those logs don't seem to indicate use_mmap=false was passed. It's still using the mmap logic to load the model. The subprocess was started with the following: ``` time=2024-06-21T15:20:56.247Z level=INFO source=server.go:359 msg="starting llama server" cmd="/tmp/ollama3491629577/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/blobs/sha256-0bd51f8f0c975ce910ed067dcb962a9af05b77bafcdc595ef02178387f10e51d --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 21 --verbose --parallel 1 --tensor-split 6,5,5,5 --tensor-split 6,5,5,5 --port 44779" ``` There should be an additional `--no-mmap` flag passed in there if use_mmap=false was passed in.

GiteaMirror commented

2026-05-03 20:03:08 -05:00

@sksdev27 commented on GitHub (Jun 21, 2024):

Hmm I used the curl command:
curl http://localhost:11434/api/generate -d '{
"model": "llama3:70b",
"prompt": "Why is the sky blue?",
"stream": false, "options": {"num_gpu": 21 }
}'

Is their another way to pass the argument --no-mmap

@sksdev27 commented on GitHub (Jun 21, 2024): Hmm I used the curl command: curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"num_gpu": 21 } }' Is their another way to pass the argument --no-mmap

GiteaMirror commented

2026-05-03 20:03:09 -05:00

@dhiltgen commented on GitHub (Jun 21, 2024):

Oops, sorry, I cut-and-pasted the wrong curl example. Try this:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:70b",
  "prompt": "Why is the sky blue?",
  "stream": false, "options": {"use_mmap": false }
}'

@dhiltgen commented on GitHub (Jun 21, 2024): Oops, sorry, I cut-and-pasted the wrong curl example. Try this: ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false } }' ```

GiteaMirror commented

2026-05-03 20:03:10 -05:00

@sksdev27 commented on GitHub (Jun 21, 2024):

it went little bit further but did result in server crash but then i relaunch the same thing again and it was up and running Here are the logs
ollama_45_rc4_mmap_dis_latest.log

@sksdev27 commented on GitHub (Jun 21, 2024): it went little bit further but did result in server crash but then i relaunch the same thing again and it was up and running Here are the logs [ollama_45_rc4_mmap_dis_latest.log](https://github.com/user-attachments/files/15932078/ollama_45_rc4_mmap_dis_latest.log)

GiteaMirror commented

2026-05-03 20:03:11 -05:00

@dhiltgen commented on GitHub (Jun 21, 2024):

The latest log seems somewhat truncated, so I can't see the loading portion, but good to hear you got it working by adding use_mmap=false - I'm curious how long the load took.

I'm not sure what the threshold should be to toggle off mmap. I'll try to run some more experiments to see if I can find what the deciding factor(s) should be, but if you have the ability to experiment with different sized models and single vs. multi-GPU in this same environment, that might help us understand when we should switch loading strategy.

@dhiltgen commented on GitHub (Jun 21, 2024): The latest log seems somewhat truncated, so I can't see the loading portion, but good to hear you got it working by adding `use_mmap=false` - I'm curious how long the load took. I'm not sure what the threshold should be to toggle off mmap. I'll try to run some more experiments to see if I can find what the deciding factor(s) should be, but if you have the ability to experiment with different sized models and single vs. multi-GPU in this same environment, that might help us understand when we should switch loading strategy.

GiteaMirror commented

2026-05-03 20:03:13 -05:00

@sksdev27 commented on GitHub (Jun 21, 2024):

so it crashed after a while here are the latest logs:
log.txt

i will try to load it again get back to you with the loading logs and also try different size models. I have single H100 PC as well and test the single GPU vs multi GPU

@sksdev27 commented on GitHub (Jun 21, 2024): so it crashed after a while here are the latest logs: [log.txt](https://github.com/user-attachments/files/15932944/log.txt) i will try to load it again get back to you with the loading logs and also try different size models. I have single H100 PC as well and test the single GPU vs multi GPU

GiteaMirror commented

2026-05-03 20:03:16 -05:00

@sksdev27 commented on GitHub (Jun 22, 2024):

So I relaunched it with first time it failed, second time it failed, third time it failed and forth time it started working. Then I did I did two curl commands similar to this:
curl http://localhost:11434/api/generate -d '{
"model": "llama3:70b",
"prompt": "Why is the sky blue?",
"stream": false, "options": {"use_mmap": false }
}'
Then I started openwebui and started sending it couple of questions. after two question gpu crashed. i think its because i am not passing use_mmap": false through the web ui.

ollama_1_45_mmap_dis_all.log

I will do more testing and try out small models

@sksdev27 commented on GitHub (Jun 22, 2024): So I relaunched it with first time it failed, second time it failed, third time it failed and forth time it started working. Then I did I did two curl commands similar to this: curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false } }' Then I started openwebui and started sending it couple of questions. after two question gpu crashed. i think its because i am not passing use_mmap": false through the web ui. [ollama_1_45_mmap_dis_all.log](https://github.com/user-attachments/files/15934808/ollama_1_45_mmap_dis_all.log) I will do more testing and try out small models

GiteaMirror commented

2026-05-03 20:03:19 -05:00

@sksdev27 commented on GitHub (Jun 22, 2024):

ollama_46.log
I tried the 1.46 it works but if I leave the GPU idle after a while it breaks. Don't know why may be its the GPU or something else. Trying to figure that out but seems like loading is working fine. I will try other models

@sksdev27 commented on GitHub (Jun 22, 2024): [ollama_46.log](https://github.com/user-attachments/files/15937507/ollama_46.log) I tried the 1.46 it works but if I leave the GPU idle after a while it breaks. Don't know why may be its the GPU or something else. Trying to figure that out but seems like loading is working fine. I will try other models

GiteaMirror commented

2026-05-03 20:03:20 -05:00

@sksdev27 commented on GitHub (Jun 25, 2024):

So GPU crash was because my NVIDIA drivers weren't updated: suppose to 535.183 instead of 535.163

Bottom Line i assume this ticket is closed. However:
Here is one issue that I do want to highlight:
During the lunch of llama3:70b with 1.46.. unless i run this first:
curl http://localhost:11434/api/generate -d '{
"model": "llama3:70b",
"prompt": "Why is the sky blue?",
"stream": false, "options": {"use_mmap": false }
}'
if I did ollama pull llama3:70b and followed up with the run command:
It would fail to load the server

Also as for your other questions:
Current Set up with 1 GPU server and 4 GPU Server:

1GPU Running following models with ollama 1.46:
root@4cdbe351ed8b:/# ollama list
NAME ID SIZE MODIFIED
mistral:latest 2ae6f6dd7a3d 4.1 GB About a minute ago
starcoder2:7b 0679cedc1189 4.0 GB About a minute ago
gemma:7b a72c7f4d0a15 5.0 GB About a minute ago
llama3:latest 365c0bd3c000 4.7 GB About a minute ago
command-r:latest b8cdfff0263c 20 GB About a minute ago

4GPU Running following models with ollama 1.46:
root@c1e628e9c647:/# ollama list
NAME ID SIZE MODIFIED
starcoder2:15b 20cdb0f709c2 9.1 GB 33 seconds ago
mistral:latest 2ae6f6dd7a3d 4.1 GB 34 seconds ago
command-r-plus:latest c9c6cc6d20c7 59 GB 35 seconds ago
llama3:70b 786f3184aec0 39 GB 34 seconds ago
openchat:latest 537a4e03b649 4.1 GB About a minute ago

Testing with an OpenWebUI client

@sksdev27 commented on GitHub (Jun 25, 2024): So GPU crash was because my NVIDIA drivers weren't updated: suppose to 535.183 instead of 535.163 Bottom Line i assume this ticket is closed. However: Here is one issue that I do want to highlight: During the lunch of llama3:70b with 1.46.. unless i run this first: curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false } }' if I did ollama pull llama3:70b and followed up with the run command: It would fail to load the server Also as for your other questions: Current Set up with 1 GPU server and 4 GPU Server: 1GPU Running following models with ollama 1.46: root@4cdbe351ed8b:/# ollama list NAME ID SIZE MODIFIED mistral:latest 2ae6f6dd7a3d 4.1 GB About a minute ago starcoder2:7b 0679cedc1189 4.0 GB About a minute ago gemma:7b a72c7f4d0a15 5.0 GB About a minute ago llama3:latest 365c0bd3c000 4.7 GB About a minute ago command-r:latest b8cdfff0263c 20 GB About a minute ago 4GPU Running following models with ollama 1.46: root@c1e628e9c647:/# ollama list NAME ID SIZE MODIFIED starcoder2:15b 20cdb0f709c2 9.1 GB 33 seconds ago mistral:latest 2ae6f6dd7a3d 4.1 GB 34 seconds ago command-r-plus:latest c9c6cc6d20c7 59 GB 35 seconds ago llama3:70b 786f3184aec0 39 GB 34 seconds ago openchat:latest 537a4e03b649 4.1 GB About a minute ago Testing with an OpenWebUI client

GiteaMirror commented

2026-05-03 20:03:21 -05:00

@sksdev27 commented on GitHub (Jun 25, 2024):

Here is the comparison on loading ollama version 0.1.46 lunched by using the following docker commands:
docker run --gpus all -p 11434:11434 -e OLLAMA_SCHED_SPREAD=true -e OLLAMA_DEBUG=true -it --rm ollama/ollama:0.1.46

loading logs for ollama run llama3:70b:
ollama_1_46_ollama_run_llama70b.log
nvidia-smi log:
log-nvidia-smi.log

loading logs for the curl command:
curl http://localhost:11434/api/generate -d '{
"model": "llama3:70b",
"prompt": "Why is the sky blue?",
"stream": false, "options": {"use_mmap": false }
}'
commands had to run twice:
root@f03fa0d6d2bd:/# curl http://localhost:11434/api/generate -d '{
"model": "llama3:70b",
"prompt": "Why is the sky blue?",
"stream": false, "options": {"use_mmap": false }
}'
{"error":"timed out waiting for llama runner to start - progress 1.00 - "}root@f03fa0d6d2bd:/# curl http://localhost:11434/api/generate -d '{
"model": "llama3:70b",
"prompt": "Why is the sky blue?",
"stream": false, "options": {"use_mmap": false }
}'
{"model":"llama3:70b","created_at":"2024-06-25T18:55:09.266770179Z","response":"One of the most popular and intriguing questions in all of science!\n\nThe sky appears blue because of a phenomenon called Rayleigh scattering, which is named after the British physicist Lord Rayleigh. In 1871, he discovered that shorter (blue) wavelengths of light are scattered more than longer (red) wavelengths by the tiny molecules of gases in the atmosphere.\n\nHere's what happens:\n\n1. Sunlight enters Earth's atmosphere: When sunlight enters our atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2). These molecules are much smaller than the wavelength of light.\n2. Scattering occurs: The shorter wavelengths of light, such as blue and violet, are more easily deflected by these small molecules due to their smaller size. This is known as Rayleigh scattering.\n3. Blue light is scattered in all directions: As a result of this scattering, the blue light is dispersed throughout the atmosphere, reaching our eyes from all directions.\n4. Red light continues its path: The longer wavelengths of light, like red and orange, are less affected by the small molecules and continue to travel in a more direct path to our eyes.\n\nThis combination of scattered blue light and direct red light creates the blue color we see in the sky during the daytime. The exact shade of blue can vary depending on atmospheric conditions, such as pollution, dust, and water vapor, which can scatter light in different ways.\n\nAdditionally, the following factors can influence the apparent color of the sky:\n\n* Time of day: During sunrise and sunset, the sun's rays have to travel through more of the atmosphere, scattering shorter wavelengths and making the sky appear more red or orange.\n* Atmospheric conditions: Dust, pollution, and water vapor can scatter light in different ways, changing the apparent color of the sky.\n* Altitude and atmospheric pressure: At higher elevations, there is less air to scatter the light, resulting in a deeper blue color.\n\nSo, to summarize, the sky appears blue because of the scattering of shorter (blue) wavelengths of light by the tiny molecules in our atmosphere, while longer (red) wavelengths continue their path directly to our eyes.","done":true,"done_reason":"stop","context":[128006,882,128007,271,10445,374,279,13180,6437,30,128009,128006,78191,128007,271,4054,315,279,1455,5526,323,41765,4860,304,682,315,8198,2268,791,13180,8111,6437,1606,315,264,25885,2663,13558,64069,72916,11,902,374,7086,1306,279,8013,83323,10425,13558,64069,13,763,220,9674,16,11,568,11352,430,24210,320,12481,8,93959,315,3177,527,38067,810,1109,5129,320,1171,8,93959,555,279,13987,35715,315,45612,304,279,16975,382,8586,596,1148,8741,1473,16,13,3146,31192,4238,29933,9420,596,16975,96618,3277,40120,29933,1057,16975,11,433,35006,13987,35715,315,45612,1093,47503,320,45,17,8,323,24463,320,46,17,570,4314,35715,527,1790,9333,1109,279,46406,315,3177,627,17,13,3146,3407,31436,13980,96618,578,24210,93959,315,3177,11,1778,439,6437,323,80836,11,527,810,6847,711,2258,555,1521,2678,35715,4245,311,872,9333,1404,13,1115,374,3967,439,13558,64069,72916,627,18,13,3146,10544,3177,374,38067,304,682,18445,96618,1666,264,1121,315,420,72916,11,279,6437,3177,374,77810,6957,279,16975,11,19261,1057,6548,505,682,18445,627,19,13,3146,6161,3177,9731,1202,1853,96618,578,5129,93959,315,3177,11,1093,2579,323,19087,11,527,2753,11754,555,279,2678,35715,323,3136,311,5944,304,264,810,2167,1853,311,1057,6548,382,2028,10824,315,38067,6437,3177,323,2167,2579,3177,11705,279,6437,1933,584,1518,304,279,13180,2391,279,62182,13,578,4839,28601,315,6437,649,13592,11911,389,45475,4787,11,1778,439,25793,11,16174,11,323,3090,38752,11,902,649,45577,3177,304,2204,5627,382,50674,11,279,2768,9547,649,10383,279,10186,1933,315,279,13180,1473,9,3146,1489,315,1938,96618,12220,64919,323,44084,11,279,7160,596,45220,617,311,5944,1555,810,315,279,16975,11,72916,24210,93959,323,3339,279,13180,5101,810,2579,477,19087,627,9,3146,1688,8801,33349,4787,96618,33093,11,25793,11,323,3090,38752,649,45577,3177,304,2204,5627,11,10223,279,10186,1933,315,279,13180,627,9,3146,27108,3993,323,45475,7410,96618,2468,5190,12231,811,11,1070,374,2753,3805,311,45577,279,3177,11,13239,304,264,19662,6437,1933,382,4516,11,311,63179,11,279,13180,8111,6437,1606,315,279,72916,315,24210,320,12481,8,93959,315,3177,555,279,13987,35715,304,1057,16975,11,1418,5129,320,1171,8,93959,3136,872,1853,6089,311,1057,6548,13,128009],"total_duration":55703871735,"load_duration":39062954205,"prompt_eval_count":16,"prompt_eval_duration":90332000,"eval_count":443,"eval_duration":16548221000}root@f03fa0d6d2bd:/
logs:
Load logs:
ollama_1_46_ollama_run_llama70b_curl.log
nvidia-smi logs:
log-nvidia-smi_curl.log

@sksdev27 commented on GitHub (Jun 25, 2024): Here is the comparison on loading ollama version 0.1.46 lunched by using the following docker commands: docker run --gpus all -p 11434:11434 -e OLLAMA_SCHED_SPREAD=true -e OLLAMA_DEBUG=true -it --rm ollama/ollama:0.1.46 loading logs for ollama run llama3:70b: [ollama_1_46_ollama_run_llama70b.log](https://github.com/user-attachments/files/15976212/ollama_1_46_ollama_run_llama70b.log) nvidia-smi log: [log-nvidia-smi.log](https://github.com/user-attachments/files/15976234/log-nvidia-smi.log) loading logs for the curl command: curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false } }' commands had to run twice: root@f03fa0d6d2bd:/# curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false } }' {"error":"timed out waiting for llama runner to start - progress 1.00 - "}root@f03fa0d6d2bd:/# curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false } }' {"model":"llama3:70b","created_at":"2024-06-25T18:55:09.266770179Z","response":"One of the most popular and intriguing questions in all of science!\n\nThe sky appears blue because of a phenomenon called Rayleigh scattering, which is named after the British physicist Lord Rayleigh. In 1871, he discovered that shorter (blue) wavelengths of light are scattered more than longer (red) wavelengths by the tiny molecules of gases in the atmosphere.\n\nHere's what happens:\n\n1. **Sunlight enters Earth's atmosphere**: When sunlight enters our atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2). These molecules are much smaller than the wavelength of light.\n2. **Scattering occurs**: The shorter wavelengths of light, such as blue and violet, are more easily deflected by these small molecules due to their smaller size. This is known as Rayleigh scattering.\n3. **Blue light is scattered in all directions**: As a result of this scattering, the blue light is dispersed throughout the atmosphere, reaching our eyes from all directions.\n4. **Red light continues its path**: The longer wavelengths of light, like red and orange, are less affected by the small molecules and continue to travel in a more direct path to our eyes.\n\nThis combination of scattered blue light and direct red light creates the blue color we see in the sky during the daytime. The exact shade of blue can vary depending on atmospheric conditions, such as pollution, dust, and water vapor, which can scatter light in different ways.\n\nAdditionally, the following factors can influence the apparent color of the sky:\n\n* **Time of day**: During sunrise and sunset, the sun's rays have to travel through more of the atmosphere, scattering shorter wavelengths and making the sky appear more red or orange.\n* **Atmospheric conditions**: Dust, pollution, and water vapor can scatter light in different ways, changing the apparent color of the sky.\n* **Altitude and atmospheric pressure**: At higher elevations, there is less air to scatter the light, resulting in a deeper blue color.\n\nSo, to summarize, the sky appears blue because of the scattering of shorter (blue) wavelengths of light by the tiny molecules in our atmosphere, while longer (red) wavelengths continue their path directly to our eyes.","done":true,"done_reason":"stop","context":[128006,882,128007,271,10445,374,279,13180,6437,30,128009,128006,78191,128007,271,4054,315,279,1455,5526,323,41765,4860,304,682,315,8198,2268,791,13180,8111,6437,1606,315,264,25885,2663,13558,64069,72916,11,902,374,7086,1306,279,8013,83323,10425,13558,64069,13,763,220,9674,16,11,568,11352,430,24210,320,12481,8,93959,315,3177,527,38067,810,1109,5129,320,1171,8,93959,555,279,13987,35715,315,45612,304,279,16975,382,8586,596,1148,8741,1473,16,13,3146,31192,4238,29933,9420,596,16975,96618,3277,40120,29933,1057,16975,11,433,35006,13987,35715,315,45612,1093,47503,320,45,17,8,323,24463,320,46,17,570,4314,35715,527,1790,9333,1109,279,46406,315,3177,627,17,13,3146,3407,31436,13980,96618,578,24210,93959,315,3177,11,1778,439,6437,323,80836,11,527,810,6847,711,2258,555,1521,2678,35715,4245,311,872,9333,1404,13,1115,374,3967,439,13558,64069,72916,627,18,13,3146,10544,3177,374,38067,304,682,18445,96618,1666,264,1121,315,420,72916,11,279,6437,3177,374,77810,6957,279,16975,11,19261,1057,6548,505,682,18445,627,19,13,3146,6161,3177,9731,1202,1853,96618,578,5129,93959,315,3177,11,1093,2579,323,19087,11,527,2753,11754,555,279,2678,35715,323,3136,311,5944,304,264,810,2167,1853,311,1057,6548,382,2028,10824,315,38067,6437,3177,323,2167,2579,3177,11705,279,6437,1933,584,1518,304,279,13180,2391,279,62182,13,578,4839,28601,315,6437,649,13592,11911,389,45475,4787,11,1778,439,25793,11,16174,11,323,3090,38752,11,902,649,45577,3177,304,2204,5627,382,50674,11,279,2768,9547,649,10383,279,10186,1933,315,279,13180,1473,9,3146,1489,315,1938,96618,12220,64919,323,44084,11,279,7160,596,45220,617,311,5944,1555,810,315,279,16975,11,72916,24210,93959,323,3339,279,13180,5101,810,2579,477,19087,627,9,3146,1688,8801,33349,4787,96618,33093,11,25793,11,323,3090,38752,649,45577,3177,304,2204,5627,11,10223,279,10186,1933,315,279,13180,627,9,3146,27108,3993,323,45475,7410,96618,2468,5190,12231,811,11,1070,374,2753,3805,311,45577,279,3177,11,13239,304,264,19662,6437,1933,382,4516,11,311,63179,11,279,13180,8111,6437,1606,315,279,72916,315,24210,320,12481,8,93959,315,3177,555,279,13987,35715,304,1057,16975,11,1418,5129,320,1171,8,93959,3136,872,1853,6089,311,1057,6548,13,128009],"total_duration":55703871735,"load_duration":39062954205,"prompt_eval_count":16,"prompt_eval_duration":90332000,"eval_count":443,"eval_duration":16548221000}root@f03fa0d6d2bd:/ logs: Load logs: [ollama_1_46_ollama_run_llama70b_curl.log](https://github.com/user-attachments/files/15976479/ollama_1_46_ollama_run_llama70b_curl.log) nvidia-smi logs: [log-nvidia-smi_curl.log](https://github.com/user-attachments/files/15976486/log-nvidia-smi_curl.log)

GiteaMirror commented

2026-05-03 20:03:22 -05:00

@dhiltgen commented on GitHub (Jul 5, 2024):

That's great that you have a working setup.

Looking at that last log, even without mmap, we're still taking a really long time to initialize on your 4 GPU setup. It looks like the loading progress hit 100% in ~14 seconds, but was still initializing for over 5 minutes and triggered our timeout. On the second attempt, things were warmed up in caches and it only took 36s to load over all.

While we could increase the timeout, taking more than 5 minutes to fully load the model still feels problematic. I'm working on another change to add CUDA v12 support, with the intent of improving performance on more modern GPUs, which might wind up solving this load lag. #5049

@dhiltgen commented on GitHub (Jul 5, 2024): That's great that you have a working setup. Looking at that last log, even without mmap, we're still taking a really long time to initialize on your 4 GPU setup. It looks like the loading progress hit 100% in ~14 seconds, but was still initializing for over 5 minutes and triggered our timeout. On the second attempt, things were warmed up in caches and it only took 36s to load over all. While we could increase the timeout, taking more than 5 minutes to fully load the model still feels problematic. I'm working on another change to add CUDA v12 support, with the intent of improving performance on more modern GPUs, which might wind up solving this load lag. #5049

GiteaMirror commented

2026-05-03 20:03:22 -05:00

@sksdev27 commented on GitHub (Jul 5, 2024):

So eventually it was starting up but would eventually fail after a day or two. Seems like their was an issue with on of the GPU and I did some research the model I am working with also has nvlink installed, so it should have been treating it as one GPU. So, currently we are working on replacing it. Either the gpu or one of the components around it.

@sksdev27 commented on GitHub (Jul 5, 2024): So eventually it was starting up but would eventually fail after a day or two. Seems like their was an issue with on of the GPU and I did some research the model I am working with also has nvlink installed, so it should have been treating it as one GPU. So, currently we are working on replacing it. Either the gpu or one of the components around it.

GiteaMirror commented

2026-05-03 20:03:23 -05:00

@sksdev27 commented on GitHub (Jul 25, 2024):

@dhiltgen So when i launch the latest ollama 0.2.8 it uses one gpu but when i use ollama version 0.1.30 it uses all the gpu. The fix that you applied here didnt make it to 0.2.8

@sksdev27 commented on GitHub (Jul 25, 2024): @dhiltgen So when i launch the latest ollama 0.2.8 it uses one gpu but when i use ollama version 0.1.30 it uses all the gpu. The fix that you applied here didnt make it to 0.2.8

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#65218