[GH-ISSUE #1727] ollama doesn't use system RAM #63019

Closed
opened 2026-05-03 11:16:42 -05:00 by GiteaMirror · 29 comments
Owner

Originally created by @DrGood01 on GitHub (Dec 27, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1727

Originally assigned to: @dhiltgen on GitHub.

I'm running Ollama on a ubuntu 22 linux laptop with 32 G of RAM and a NVIDIA gtx 1650. Ollama loads the models exclusively in the graphic card RAM, and doesn't use any of the system RAM at all. Very frustrating, as it exists with "Error: llama runner exited, you may not have enough available memory to run this model" as soon as I try to chat...

Originally created by @DrGood01 on GitHub (Dec 27, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1727 Originally assigned to: @dhiltgen on GitHub. I'm running Ollama on a ubuntu 22 linux laptop with 32 G of RAM and a NVIDIA gtx 1650. Ollama loads the models exclusively in the graphic card RAM, and doesn't use any of the system RAM at all. Very frustrating, as it exists with "Error: llama runner exited, you may not have enough available memory to run this model" as soon as I try to chat...
GiteaMirror added the nvidia label 2026-05-03 11:16:42 -05:00
Author
Owner

@iplayfast commented on GitHub (Dec 27, 2023):

I ran into this as well, The way to get around it is to tell ollama you have no gpu. Then it will load into memory.
my mixtralcpu model is as follows.

FROM mixtral:latest
TEMPLATE """ [INST] {{ .System }} {{ .Prompt }} [/INST]"""
PARAMETER num_gpu 0
PARAMETER num_ctx 32768
PARAMETER stop "</s>"
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"


PARAMETER temperature .9
PARAMETER num_ctx 32768
system "You are an intellegent AI that is always helpful"

modify it for the model you are trying, and create a new model
llama create mixtralcpu

<!-- gh-comment-id:1870104726 --> @iplayfast commented on GitHub (Dec 27, 2023): I ran into this as well, The way to get around it is to tell ollama you have no gpu. Then it will load into memory. my mixtralcpu model is as follows. ``` FROM mixtral:latest TEMPLATE """ [INST] {{ .System }} {{ .Prompt }} [/INST]""" PARAMETER num_gpu 0 PARAMETER num_ctx 32768 PARAMETER stop "</s>" PARAMETER stop "[INST]" PARAMETER stop "[/INST]" PARAMETER temperature .9 PARAMETER num_ctx 32768 system "You are an intellegent AI that is always helpful" ``` modify it for the model you are trying, and create a new model llama create mixtralcpu
Author
Owner

@PollastreGH commented on GitHub (Dec 27, 2023):

Can confirm that I'm running into this issue as-well, EndeavourOS Linux desktop with 64GB of RAM and an RTX-3080.

Update: For me this seems to only be happening on 13b models. All 7b models I've tried and a 70b model (dolphin-mixtral) do not have this issue. Strange. Additionally, this didn't happen for me when I was on WSL2, but it does now that I'm on native Linux.

<!-- gh-comment-id:1870687269 --> @PollastreGH commented on GitHub (Dec 27, 2023): Can confirm that I'm running into this issue as-well, EndeavourOS Linux desktop with 64GB of RAM and an RTX-3080. Update: For me this seems to only be happening on 13b models. All 7b models I've tried and a 70b model (dolphin-mixtral) do not have this issue. Strange. Additionally, this didn't happen for me when I was on WSL2, but it does now that I'm on native Linux.
Author
Owner

@DrGood01 commented on GitHub (Dec 30, 2023):

iplayfast, thank you so much! I'm now running mixtralcpu on my laptop! It's loading into RAM, which is nice. But it also fills the swap space. Is there a way to tell it not to fill swap? Thanks again.
edit: I'm wondering now if there's a way to tell the model that it should use the calculation capacities of the graphic card?

<!-- gh-comment-id:1872495569 --> @DrGood01 commented on GitHub (Dec 30, 2023): iplayfast, thank you so much! I'm now running mixtralcpu on my laptop! It's loading into RAM, which is nice. But it also fills the swap space. Is there a way to tell it not to fill swap? Thanks again. edit: I'm wondering now if there's a way to tell the model that it should use the calculation capacities of the graphic card?
Author
Owner

@easp commented on GitHub (Jan 2, 2024):

But it also fills the swap space. Is there a way to tell it not to fill swap?

If you don't have enough RAM, your system will use swap. The solution is to either get more RAM and/or reduce the RAM demands of your computer by closing files, quitting apps, using smaller models.

<!-- gh-comment-id:1874579153 --> @easp commented on GitHub (Jan 2, 2024): > But it also fills the swap space. Is there a way to tell it not to fill swap? If you don't have enough RAM, your system will use swap. The solution is to either get more RAM and/or reduce the RAM demands of your computer by closing files, quitting apps, using smaller models.
Author
Owner

@DrGood01 commented on GitHub (Jan 3, 2024):

thanks easp. I've got 32G or RAM, and while working, my mixtralcpu uses only 7 or 8 G of it, while rapidly filling swap. Any idea?

<!-- gh-comment-id:1874871995 --> @DrGood01 commented on GitHub (Jan 3, 2024): thanks easp. I've got 32G or RAM, and while working, my mixtralcpu uses only 7 or 8 G of it, while rapidly filling swap. Any idea?
Author
Owner

@Nantris commented on GitHub (Jan 4, 2024):

It seems like for me the ollama never uses system memory at all, which doesn't make any sense to me, but it is reading from the disk at 140MB/s nonstop while it generates though and take up to 15 minutes for a brief response, so maybe it really isn't using system memory.

No GPU involvement.

Specifically I'm on via WSL1 (which I know is not officially supported, but is the only option I have.)

I hope there might be a Windows version soon! LLMs are just too heavy to boot up in traditional VMs.

<!-- gh-comment-id:1876300829 --> @Nantris commented on GitHub (Jan 4, 2024): It seems like for me the ollama never uses system memory at all, which doesn't make any sense to me, but it is reading from the disk at 140MB/s nonstop while it generates though and take up to 15 minutes for a brief response, so maybe it really isn't using system memory. No GPU involvement. Specifically I'm on via WSL1 (which I know is not officially supported, but is the only option I have.) I hope there might be a Windows version soon! LLMs are just too heavy to boot up in traditional VMs.
Author
Owner

@gbrohammer commented on GitHub (Jan 9, 2024):

Same problem, Ubuntu 64GB RAM laptop with RTX 3050 TI (4GB VRAM) fails to load the LLAMA2 model

<!-- gh-comment-id:1883088229 --> @gbrohammer commented on GitHub (Jan 9, 2024): Same problem, Ubuntu 64GB RAM laptop with RTX 3050 TI (4GB VRAM) fails to load the LLAMA2 model
Author
Owner

@bsu3338 commented on GitHub (Jan 19, 2024):

I am having the same problem. I am using the docker image. Solution from @iplayfast did not work for me. I tried q5_k_m models of mixtral, mistral, and llama2. I am also running within a VM.

<!-- gh-comment-id:1901320482 --> @bsu3338 commented on GitHub (Jan 19, 2024): I am having the same problem. I am using the docker image. Solution from @iplayfast did not work for me. I tried q5_k_m models of mixtral, mistral, and llama2. I am also running within a VM.
Author
Owner

@bsu3338 commented on GitHub (Jan 28, 2024):

My problem was caused because the Hyper-V VM was running with Dynamic Memory. After removing that option, everything worked as designed. I do not know where, but it would be good to be made a note somewhere in documentation.

<!-- gh-comment-id:1913595154 --> @bsu3338 commented on GitHub (Jan 28, 2024): My problem was caused because the Hyper-V VM was running with Dynamic Memory. After removing that option, everything worked as designed. I do not know where, but it would be good to be made a note somewhere in documentation.
Author
Owner

@pdevine commented on GitHub (Mar 11, 2024):

This should be working better in that ollama should offload a portion to the GPU, and a portion to the CPU. Can you test again with ollama version 0.1.28?

There are also a change coming in 0.1.29 where you will be able to set the amount of VRAM that you want to use which should force it to use the system memory instead.

<!-- gh-comment-id:1989552347 --> @pdevine commented on GitHub (Mar 11, 2024): This should be working better in that ollama should offload a portion to the GPU, and a portion to the CPU. Can you test again with ollama version 0.1.28? There are also a change coming in 0.1.29 where you will be able to set the amount of VRAM that you want to use which should force it to use the system memory instead.
Author
Owner

@Nantris commented on GitHub (Mar 19, 2024):

It works here on Windows now that WSL is no longer involved.

<!-- gh-comment-id:2007935292 --> @Nantris commented on GitHub (Mar 19, 2024): It works here on Windows now that WSL is no longer involved.
Author
Owner

@mzpqnxow commented on GitHub (Apr 11, 2024):

But it also fills the swap space. Is there a way to tell it not to fill swap?

If you don't have enough RAM, your system will use swap. The solution is to either get more RAM and/or reduce the RAM demands of your computer by closing files, quitting apps, using smaller models.

Important note on this, specifically for most Linux distributions. Arguably the most important thing for Linux desktop users with more than 16GB RAM

Most popular Linux distributions (all Debian-based distros, at least) advise the kernel to use swap for an unreasonably large portion of memory allocations, even when there’s still plenty of physical RAM available

This is a really nasty default setting that in my opinion should be adjusted or determined dynamically or by asking the user at installation time. For most workloads, a system with 32GB RAM should never proactively swap

You can tell the kernel not to swap so aggressively by setting the swappiness value lower. It’s a scale of 1-100, Debian sets it to 40 or 60 by default. I reduce it to 1 (effectively, “hardly ever swap”)

$ sudo sysctl -w vm.swappiness=1

as usual, Arch docs are the best on the subject. There is a link there with counterpoints that you may want to consider over my suggestion. I prefer to reduce disk i/o, ymmv.

<!-- gh-comment-id:2049652214 --> @mzpqnxow commented on GitHub (Apr 11, 2024): > > But it also fills the swap space. Is there a way to tell it not to fill swap? > > If you don't have enough RAM, your system will use swap. The solution is to either get more RAM and/or reduce the RAM demands of your computer by closing files, quitting apps, using smaller models. Important note on this, specifically for most Linux distributions. Arguably the most important thing for Linux desktop users with more than 16GB RAM Most popular Linux distributions (all Debian-based distros, at least) advise the kernel to use swap for an unreasonably large portion of memory allocations, **even when there’s still plenty of physical RAM available** This is a really nasty default setting that in my opinion should be adjusted or determined dynamically or by asking the user at installation time. For most workloads, a system with 32GB RAM should never proactively swap You can tell the kernel not to swap so aggressively by setting the swappiness value lower. It’s a scale of 1-100, Debian sets it to 40 or 60 by default. I reduce it to 1 (effectively, “hardly ever swap”) $ sudo sysctl -w vm.swappiness=1 as usual, [Arch docs](https://wiki.archlinux.org/title/Swap#Swappiness) are the best on the subject. There is a [link](https://chrisdown.name/2018/01/02/in-defence-of-swap.html) there with counterpoints that you may want to consider over my suggestion. I prefer to reduce disk i/o, ymmv.
Author
Owner

@mzpqnxow commented on GitHub (Apr 11, 2024):

thanks easp. I've got 32G or RAM, and while working, my mixtralcpu uses only 7 or 8 G of it, while rapidly filling swap. Any idea?

Check swappiness (see my previous comment, should have replied to your comment sorry)

<!-- gh-comment-id:2049684159 --> @mzpqnxow commented on GitHub (Apr 11, 2024): > thanks easp. I've got 32G or RAM, and while working, my mixtralcpu uses only 7 or 8 G of it, while rapidly filling swap. Any idea? Check swappiness (see my previous comment, should have replied to your comment sorry)
Author
Owner

@ConfoundedHermit commented on GitHub (Apr 12, 2024):

I have this same issue on Windows native v0.1.31 with any model. Loads models into GPU vRAM, but larger models obviously run like molasses when it caps the vRAM. System RAM is not touched by the model at all and there is 100+ GB free.

<!-- gh-comment-id:2050829090 --> @ConfoundedHermit commented on GitHub (Apr 12, 2024): I have this same issue on Windows native v0.1.31 with any model. Loads models into GPU vRAM, but larger models obviously run like molasses when it caps the vRAM. System RAM is not touched by the model at all and there is 100+ GB free.
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

I've lost track of what this issue is tracking. It sounds like the initial problem was we miscalculated the number of layers of the model to load on the GPU, and ran out of VRAM and crashed. In general, Ollama is going to try to use the GPU and VRAM before system memory. We've been improving our prediction algorithms to get closer to fully utilizing the GPU's VRAM, without exceeding it, so I'd definitely encourage you to try the latest release.

<!-- gh-comment-id:2052646448 --> @dhiltgen commented on GitHub (Apr 12, 2024): I've lost track of what this issue is tracking. It sounds like the initial problem was we miscalculated the number of layers of the model to load on the GPU, and ran out of VRAM and crashed. In general, Ollama is going to try to use the GPU and VRAM before system memory. We've been improving our prediction algorithms to get closer to fully utilizing the GPU's VRAM, without exceeding it, so I'd definitely encourage you to try the latest release.
Author
Owner

@Nantris commented on GitHub (Apr 12, 2024):

I am pretty confident that when I tested it was properly using system RAM, as I haven't nearly enough VRAM to store an entire model and yet response times were pretty reasonable (a few seconds, as opposed to previously a few minutes.)

Thanks for the great work.

<!-- gh-comment-id:2052663339 --> @Nantris commented on GitHub (Apr 12, 2024): I am pretty confident that when I tested it was properly using system RAM, as I haven't nearly enough VRAM to store an entire model and yet response times were pretty reasonable (a few seconds, as opposed to previously a few minutes.) Thanks for the great work.
Author
Owner

@Louden7 commented on GitHub (Apr 14, 2024):

I still think this is an issue with linux, or more specifically Ubuntu.

Troubleshooting steps taken:

  1. updated Ollama
  2. Removed all other LLMs from the local server
  3. Restarted service
  4. Set the default swappiness to 5 (from 60) as suggested above in this thread.

I am running Ollama 0.1.31 locally on a Ubuntu 22.04.4 LTS with 16GB RAM and 12GB RTX 3080ti and old Ryzen 1800x. Any LLM smaller then 12GB runs flawlessly since its all on the GPU's memory. However when I tried testing with the 19GB codellama:34b It loads all ~10GB on the GPU but then nothing on the available 16GB of RAM resulting in extremely slow response times.

Screenshots below:

  1. TMUX split screen of htop (top half) and nvtop (bottom half). Note: Average GPU% was ~7%
    htop-nvtop

  2. OpenwebUI with details on prompt response.
    OpenwebUI

<!-- gh-comment-id:2054144381 --> @Louden7 commented on GitHub (Apr 14, 2024): I still think this is an issue with linux, or more specifically Ubuntu. Troubleshooting steps taken: 1. updated Ollama 2. Removed all other LLMs from the local server 3. Restarted service 4. Set the default swappiness to 5 (from 60) as suggested above in this thread. I am running Ollama 0.1.31 locally on a Ubuntu 22.04.4 LTS with 16GB RAM and 12GB RTX 3080ti and old Ryzen 1800x. Any LLM smaller then 12GB runs flawlessly since its all on the GPU's memory. However when I tried testing with the 19GB codellama:34b It loads all ~10GB on the GPU but then nothing on the available 16GB of RAM resulting in extremely slow response times. Screenshots below: 1. TMUX split screen of htop (top half) and nvtop (bottom half). Note: Average GPU% was ~7% <img width="1440" alt="htop-nvtop" src="https://github.com/ollama/ollama/assets/22922778/5c39d777-431b-4ac8-9e78-f2876f4b09a4"> 2. OpenwebUI with details on prompt response. <img width="1316" alt="OpenwebUI" src="https://github.com/ollama/ollama/assets/22922778/c41d044c-5b4a-442f-9c3d-bd8800e53629">
Author
Owner

@joshwkearney commented on GitHub (Apr 24, 2024):

I'll second this, I'm having the same problem running on Zorin 17.1 (based on Ubuntu). Hardware is a Ryzen 3800x, 1080ti 12GB, and 32GB of ram. If I run models much larger than 8b it can't all fit into vram but it doesn't use my system memory at all. I tried the troubleshooting above but no luck

<!-- gh-comment-id:2076009949 --> @joshwkearney commented on GitHub (Apr 24, 2024): I'll second this, I'm having the same problem running on Zorin 17.1 (based on Ubuntu). Hardware is a Ryzen 3800x, 1080ti 12GB, and 32GB of ram. If I run models much larger than 8b it can't all fit into vram but it doesn't use my system memory at all. I tried the troubleshooting above but no luck
Author
Owner

@siakc commented on GitHub (Apr 30, 2024):

Is this related?

<!-- gh-comment-id:2085826449 --> @siakc commented on GitHub (Apr 30, 2024): Is [this](https://github.com/ollama/ollama/issues/3837) related?
Author
Owner

@Louden7 commented on GitHub (Apr 30, 2024):

Yes both of these issues seem related. Trying to run a larger model that does not all fit on GPU VRAM should store the remaining in system RAM but by the images I shared above does not.

Ideally (and I may be wrong) in this case it would fill up GPU VRAM, then system RAM and share the compute load on both GPU and CPU favoring GPU for performance.

<!-- gh-comment-id:2085913445 --> @Louden7 commented on GitHub (Apr 30, 2024): Yes both of these issues seem related. Trying to run a larger model that does not all fit on GPU VRAM _should_ store the remaining in system RAM but by the images I shared above does not. Ideally (and I may be wrong) in this case it would fill up GPU VRAM, then system RAM and share the compute load on both GPU and CPU favoring GPU for performance.
Author
Owner

@Nantris commented on GitHub (Apr 30, 2024):

Is it fair to think this only affects Linux at this point? I haven't re-tested Windows as it's kind of a pain and I haven't much use for it, but it worked for me last time I tried.

<!-- gh-comment-id:2087075961 --> @Nantris commented on GitHub (Apr 30, 2024): Is it fair to think this only affects Linux at this point? I haven't re-tested Windows as it's kind of a pain and I haven't much use for it, but it worked for me last time I tried.
Author
Owner

@easp commented on GitHub (May 1, 2024):

@Louden7

Ideally (and I may be wrong) in this case it would fill up GPU VRAM, then system RAM and share the compute load on both GPU and CPU favoring GPU for performance.

The portion in VRAM is computed on the GPU, the portion in system RAM is computed by the CPU. The bottleneck is memory bandwidth, not compute. Transferring data from system RAM to the GPU is slower than transferring it to the CPU.

Model weights are memory mapped. They are accounted for in buffer/file cache, which is generally counted as available memory. Performance with 19GB of model weights is bad because the portion that doesn't fit in VRAM is processed by the CPU, which is much slower than the GPU. Your GPU utilization is low because it's spending most of its time waiting for the CPU.

<!-- gh-comment-id:2087971975 --> @easp commented on GitHub (May 1, 2024): @Louden7 > Ideally (and I may be wrong) in this case it would fill up GPU VRAM, then system RAM and share the compute load on both GPU and CPU favoring GPU for performance. The portion in VRAM is computed on the GPU, the portion in system RAM is computed by the CPU. The bottleneck is memory bandwidth, not compute. Transferring data from system RAM to the GPU is slower than transferring it to the CPU. Model weights are memory mapped. They are accounted for in buffer/file cache, which is generally counted as available memory. Performance with 19GB of model weights is bad because the portion that doesn't fit in VRAM is processed by the CPU, which is much slower than the GPU. Your GPU utilization is low because it's spending most of its time waiting for the CPU.
Author
Owner

@Louden7 commented on GitHub (May 2, 2024):

@easp
That makes sense. Thank you for the detailed explanation!

I am still curious about htop not showing the correct system RAM utilization.

<!-- gh-comment-id:2091463778 --> @Louden7 commented on GitHub (May 2, 2024): @easp That makes sense. Thank you for the detailed explanation! I am still curious about htop not showing the correct system RAM utilization.
Author
Owner

@siakc commented on GitHub (May 3, 2024):

If you like I can run some more commands for you to see what is going on.

<!-- gh-comment-id:2092940872 --> @siakc commented on GitHub (May 3, 2024): If you like I can run some more commands for you to see what is going on.
Author
Owner

@pdevine commented on GitHub (May 16, 2024):

I'm going to go ahead and close this. Models should work w/ hybrid CPU/GPU. If you want to see what portion is offloaded you can now use the new ollama ps command.

<!-- gh-comment-id:2116354577 --> @pdevine commented on GitHub (May 16, 2024): I'm going to go ahead and close this. Models should work w/ hybrid CPU/GPU. If you want to see what portion is offloaded you can now use the new `ollama ps` command.
Author
Owner

@kwikiel commented on GitHub (Sep 8, 2024):

The issue seems to be that some people would expect Ollama to load models to RAM first, then keep them there as long as possible and when there is some requests -> load from RAM to VRAM

I have 128 GB RAM and 72 GB VRAM ( 3x3090 ) so I can keep the models in RAM instead of loading them from disk for each time it's dropped from the GPU.

This seems kind of non-standard use case and maybe this can be circumvented by using RAM-disk for storing models so it can be fixed without changing anything in the Ollama code.

<!-- gh-comment-id:2336507831 --> @kwikiel commented on GitHub (Sep 8, 2024): The issue seems to be that some people would expect Ollama to load models to RAM first, then keep them there as long as possible and when there is some requests -> load from RAM to VRAM I have 128 GB RAM and 72 GB VRAM ( 3x3090 ) so I can keep the models in RAM instead of loading them from disk for each time it's dropped from the GPU. This seems kind of non-standard use case and maybe this can be circumvented by using RAM-disk for storing models so it can be fixed without changing anything in the Ollama code.
Author
Owner

@summersonnn commented on GitHub (Oct 21, 2024):

Hey.
I have 12 GB VRAM and 64GB RAM.
When I run a 53 GB model, I observe that my VRAM is almost full but my RAM and swap do not change. So, where is the model loaded to?
qwen2.5:72b 424bad2cc13f 53 GB 78%/22% CPU/GPU

It feels like my model is partially loaded onto GPU and processed by CPU (otherwise why would I see spike in cpu usage?) My GPU usage seldomly exceeds 30% and probably around 10% in average. What's going on?

Shouldn't I be seeing like 11 GB VRAM usage (like now) + 42 GB RAM usage ?

<!-- gh-comment-id:2427725477 --> @summersonnn commented on GitHub (Oct 21, 2024): Hey. I have 12 GB VRAM and 64GB RAM. When I run a 53 GB model, I observe that my VRAM is almost full but my RAM and swap do not change. So, where is the model loaded to? qwen2.5:72b 424bad2cc13f 53 GB 78%/22% CPU/GPU It feels like my model is partially loaded onto GPU and processed by CPU (otherwise why would I see spike in cpu usage?) My GPU usage seldomly exceeds 30% and probably around 10% in average. What's going on? Shouldn't I be seeing like 11 GB VRAM usage (like now) + 42 GB RAM usage ?
Author
Owner

@easp commented on GitHub (Oct 21, 2024):

@summersonnn https://github.com/ollama/ollama/issues/1727#issuecomment-2087971975

<!-- gh-comment-id:2427745236 --> @easp commented on GitHub (Oct 21, 2024): @summersonnn https://github.com/ollama/ollama/issues/1727#issuecomment-2087971975
Author
Owner

@cyberluke commented on GitHub (Jan 15, 2025):

The issue seems to be that some people would expect Ollama to load models to RAM first, then keep them there as long as possible and when there is some requests -> load from RAM to VRAM

I have 128 GB RAM and 72 GB VRAM ( 3x3090 ) so I can keep the models in RAM instead of loading them from disk for each time it's dropped from the GPU.

This seems kind of non-standard use case and maybe this can be circumvented by using RAM-disk for storing models so it can be fixed without changing anything in the Ollama code.

Yes, I am expecting exactly this behavior! It seems Ollama is not that efficient. But openVINO can do it.

<!-- gh-comment-id:2593347928 --> @cyberluke commented on GitHub (Jan 15, 2025): > The issue seems to be that some people would expect Ollama to load models to RAM first, then keep them there as long as possible and when there is some requests -> load from RAM to VRAM > > I have 128 GB RAM and 72 GB VRAM ( 3x3090 ) so I can keep the models in RAM instead of loading them from disk for each time it's dropped from the GPU. > > This seems kind of non-standard use case and maybe this can be circumvented by using RAM-disk for storing models so it can be fixed without changing anything in the Ollama code. Yes, I am expecting exactly this behavior! It seems Ollama is not that efficient. But openVINO can do it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63019