[GH-ISSUE #2006] Rate limit download speed on pulling new models #26920

Open
opened 2026-04-22 03:39:55 -05:00 by GiteaMirror · 86 comments
Owner

Originally created by @donuts-are-good on GitHub (Jan 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2006

Originally assigned to: @mxyng on GitHub.

Is there interest in implementing a rate limiter in the pull command? I'm open to working on this, this is the syntax I have in mind for now:

ollama pull modelname --someflagname 1024 <-- this would limit to 1024 kbps

I took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader.

This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.

Originally created by @donuts-are-good on GitHub (Jan 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2006 Originally assigned to: @mxyng on GitHub. Is there interest in implementing a rate limiter in the `pull` command? I'm open to working on this, this is the syntax I have in mind for now: `ollama pull modelname --someflagname 1024` <-- this would limit to 1024 kbps I took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader. This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.
GiteaMirror added the networking label 2026-04-22 03:39:55 -05:00
Author
Owner

@tkafka commented on GitHub (Jan 23, 2024):

Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet.

How about rate-limit?

<!-- gh-comment-id:1905562681 --> @tkafka commented on GitHub (Jan 23, 2024): Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet. How about `rate-limit`?
Author
Owner

@jukofyork commented on GitHub (Jan 24, 2024):

Yeah, same here!

I'm finding ollama pull is really killing my connection and i have to limit myself to just using it at night now...

I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either wget or curl? If so, then it might be good to have control over these parameter(s) too.

<!-- gh-comment-id:1907958045 --> @jukofyork commented on GitHub (Jan 24, 2024): Yeah, same here! I'm finding `ollama pull` is *really* killing my connection and i have to limit myself to just using it at night now... I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either `wget` or `curl`? If so, then it might be good to have control over these parameter(s) too.
Author
Owner

@escaroda commented on GitHub (Jan 31, 2024):

I would do the same as wget:

‘--limit-rate=amount’

Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix. For example, ‘--limit-rate=20k’ will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don’t want Wget to consume the entire available bandwidth.

This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, ‘--limit-rate=2.5k’ is a legal value.

Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don’t be surprised if limiting the rate doesn’t work well with very small files.

<!-- gh-comment-id:1919584259 --> @escaroda commented on GitHub (Jan 31, 2024): I would do the same as `wget`: > ‘--limit-rate=amount’ > > Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix. For example, ‘--limit-rate=20k’ will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don’t want Wget to consume the entire available bandwidth. > > This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, ‘--limit-rate=2.5k’ is a legal value. > > Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don’t be surprised if limiting the rate doesn’t work well with very small files.
Author
Owner

@easp commented on GitHub (Feb 1, 2024):

I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers.

<!-- gh-comment-id:1921859568 --> @easp commented on GitHub (Feb 1, 2024): I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers.
Author
Owner

@akulbe commented on GitHub (Feb 16, 2024):

I would LOVE to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens.

<!-- gh-comment-id:1948704953 --> @akulbe commented on GitHub (Feb 16, 2024): I would **_LOVE_** to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens.
Author
Owner

@BruceMacD commented on GitHub (Feb 16, 2024):

Behavior here will be improved by #2221, working on getting that unblocked now

<!-- gh-comment-id:1948864532 --> @BruceMacD commented on GitHub (Feb 16, 2024): Behavior here will be improved by #2221, working on getting that unblocked now
Author
Owner

@donuts-are-good commented on GitHub (Feb 17, 2024):

We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow.

<!-- gh-comment-id:1949578679 --> @donuts-are-good commented on GitHub (Feb 17, 2024): We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow.
Author
Owner

@pablo-01 commented on GitHub (Mar 17, 2024):

Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset.

<!-- gh-comment-id:2002216833 --> @pablo-01 commented on GitHub (Mar 17, 2024): Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset.
Author
Owner

@simmonsm commented on GitHub (Mar 20, 2024):

I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting.

<!-- gh-comment-id:2009463426 --> @simmonsm commented on GitHub (Mar 20, 2024): I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting.
Author
Owner

@fermuch commented on GitHub (Mar 20, 2024):

@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

<!-- gh-comment-id:2009475897 --> @fermuch commented on GitHub (Mar 20, 2024): @simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).
Author
Owner

@simmonsm commented on GitHub (Mar 20, 2024):

@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch.

<!-- gh-comment-id:2009564134 --> @simmonsm commented on GitHub (Mar 20, 2024): > @simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic). Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch.
Author
Owner

@LagSlug commented on GitHub (Apr 14, 2024):

You might be able to accomplish this with a docker container

https://stackoverflow.com/questions/25497523/how-can-i-rate-limit-network-traffic-on-a-docker-container

<!-- gh-comment-id:2053906779 --> @LagSlug commented on GitHub (Apr 14, 2024): You might be able to accomplish this with a docker container https://stackoverflow.com/questions/25497523/how-can-i-rate-limit-network-traffic-on-a-docker-container
Author
Owner

@supercurio commented on GitHub (Apr 19, 2024):

I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency.

My Ollama instance is running on MacOS as a native app.
For now I found a workaround for my neighbor, using XCode's Network Link Conditioner, but I'm still essentially unable to browse the web on my primary machine when pulling models.

I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all.

My suggestion as a simple solution that can be implemented quickly:

  • run by default with conservative settings to be a good network citizen. 2, maybe 3. Not more than that.
  • mention a more aggressive preset as command-line argument when downloading via ollama pull or ollama run
  • expose a custom "max concurrent download connections" parameter on command line and API.

Then later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters.
This github issue suggests a rate limit which would be helpful as well, but selecting an appropriate amount of concurrent connections should do the trick just fine without resorting to manual tuning.
If Ollama is competing with something else trying to use bandwidth, like a neighbor trying to watch Netflix, it should not try to hack TCP's congestion control standards to get all the bandwidth and respect them instead.

I hope this can be addressed shortly.

<!-- gh-comment-id:2066320555 --> @supercurio commented on GitHub (Apr 19, 2024): I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency. My Ollama instance is running on MacOS as a native app. For now I found a workaround for my neighbor, using XCode's Network Link Conditioner, but I'm still essentially unable to browse the web on my primary machine when pulling models. I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all. My suggestion as a simple solution that can be implemented quickly: - run by default with conservative settings to be a good network citizen. 2, maybe 3. Not more than that. - mention a more aggressive preset as command-line argument when downloading via `ollama pull` or `ollama run` - expose a custom "max concurrent download connections" parameter on command line and API. Then later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters. This github issue suggests a rate limit which would be helpful as well, but selecting an appropriate amount of concurrent connections should do the trick just fine without resorting to manual tuning. If Ollama is competing with something else trying to use bandwidth, like a neighbor trying to watch Netflix, it should not try to hack TCP's congestion control standards to get all the bandwidth and respect them instead. I hope this can be addressed shortly.
Author
Owner

@mcraveiro commented on GitHub (May 19, 2024):

Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients.

<!-- gh-comment-id:2119167856 --> @mcraveiro commented on GitHub (May 19, 2024): Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients.
Author
Owner

@strangehelix commented on GitHub (May 24, 2024):

Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature.

<!-- gh-comment-id:2130393720 --> @strangehelix commented on GitHub (May 24, 2024): Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature.
Author
Owner

@FeyNyXx commented on GitHub (May 28, 2024):

Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :<

<!-- gh-comment-id:2134675884 --> @FeyNyXx commented on GitHub (May 28, 2024): Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :<
Author
Owner

@metamec commented on GitHub (May 30, 2024):

Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process on the system. (Just realised it's network-wide, not system-wide. When downloading large models, it feels like my home network is being DDoSed.)

<!-- gh-comment-id:2139029745 --> @metamec commented on GitHub (May 30, 2024): Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process ~~on the system~~. (Just realised it's network-wide, not system-wide. When downloading large models, it feels like my home network is being DDoSed.)
Author
Owner

@mcraveiro commented on GitHub (May 30, 2024):

Yes, same here, I can only download models at night. Machine is unusable.

<!-- gh-comment-id:2139208543 --> @mcraveiro commented on GitHub (May 30, 2024): Yes, same here, I can only download models at night. Machine is unusable.
Author
Owner

@MihailCosmin commented on GitHub (May 30, 2024):

Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues.
I had to find a solution to rate limit download speed, older solutions (wondershaper, trickle or tc) did not work for me. The only one that worked was FireQOS, in case anyone else needs it.

<!-- gh-comment-id:2139540314 --> @MihailCosmin commented on GitHub (May 30, 2024): Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues. I had to find a solution to rate limit download speed, older solutions (wondershaper, trickle or tc) did not work for me. The only one that worked was [FireQOS](https://firehol.org/tutorial/fireqos-new-user/), in case anyone else needs it.
Author
Owner

@LutzFassl commented on GitHub (Jun 27, 2024):

+1

<!-- gh-comment-id:2194627261 --> @LutzFassl commented on GitHub (Jun 27, 2024): +1
Author
Owner

@robins commented on GitHub (Jul 6, 2024):

My linux box (i5) got reliably stuck every single time I pulled a model... so +1 for the --rate-limit feature.

Two solutions, that did help me limp on for now:

  1. As soon as I started the fetch, I used iotop to change the ionice priority (using i) to idle. That made the issue completely go away in that, although the downloads were still fast, the linux system was quite usable. However, this was still frustrating since one had to type the PIDs when trying to set ionice for them (and there were a few)!

  2. Now since Ollama spun up multiple downloads, the ionice tool didn't work for me - IIUC that's because ionice needs to be run for each process. So it ended up being far simpler to just get the parent PID, and then set ionice for each of the child processes, each time I was downloading a model.

pid=`ps -ef | grep "ollama run" | awk '{print $2}'`
sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '`
<!-- gh-comment-id:2211772983 --> @robins commented on GitHub (Jul 6, 2024): My linux box (i5) got reliably stuck *every* *single* *time* I pulled a model... so +1 for the `--rate-limit` feature. Two solutions, that did help me limp on for now: 1. As soon as I started the fetch, I used `iotop` to change the `ionice` priority (using `i`) to `idle`. That made the issue completely go away in that, although the downloads were still fast, the linux system was quite usable. However, this was still frustrating since one had to type the PIDs when trying to set `ionice` for them (and there were a few)! 2. Now since `Ollama` spun up multiple downloads, the `ionice` tool didn't work for me - IIUC that's because `ionice` needs to be run for each process. So it ended up being far simpler to just get the parent PID, and then set ionice for each of the child processes, each time I was downloading a model. ``` pid=`ps -ef | grep "ollama run" | awk '{print $2}'` sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '` ```
Author
Owner

@treibholz commented on GitHub (Jul 12, 2024):

I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model:

$ sudo ethtool -s eth0  autoneg on speed 10 duplex full

This negotiates the linkspeed of my network-interface to 10Mbit.

Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore.

<!-- gh-comment-id:2225880914 --> @treibholz commented on GitHub (Jul 12, 2024): I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model: ``` $ sudo ethtool -s eth0 autoneg on speed 10 duplex full ``` This negotiates the linkspeed of my network-interface to 10Mbit. Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore.
Author
Owner

@supercurio commented on GitHub (Jul 13, 2024):

I'm preparing a patch and will submit a PR to address this soon.

<!-- gh-comment-id:2226994062 --> @supercurio commented on GitHub (Jul 13, 2024): I'm preparing a patch and will submit a PR to address this soon.
Author
Owner

@Netzvamp commented on GitHub (Jul 13, 2024):

My solution for now, works fine. This docker-tc can also simulate package loss 😂

version: '3'
services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    ports:
      - 11434:11434
    restart: unless-stopped
    labels:
      - "com.docker-tc.enabled=1"
      - "com.docker-tc.limit=30mbit"

  docker-tc:
    image: lukaszlach/docker-tc
    cap_add:
      - NET_ADMIN
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/docker-tc:/var/docker-tc
<!-- gh-comment-id:2227110265 --> @Netzvamp commented on GitHub (Jul 13, 2024): My solution for now, works fine. This docker-tc can also simulate package loss 😂 ```yaml version: '3' services: ollama: image: ollama/ollama container_name: ollama ports: - 11434:11434 restart: unless-stopped labels: - "com.docker-tc.enabled=1" - "com.docker-tc.limit=30mbit" docker-tc: image: lukaszlach/docker-tc cap_add: - NET_ADMIN network_mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock - /var/docker-tc:/var/docker-tc ```
Author
Owner

@supercurio commented on GitHub (Jul 13, 2024):

To everyone in this thread, I'd encourage you to build your own Ollama from my branch for testing purposes and report how the issue is solved. I'm curious how much of your available bandwidth is used during downloads with the new default.

For me, on MacOS, making a build was easy following https://github.com/ollama/ollama/blob/main/docs/development.md
I see that many PRs are awaiting review and merging, so I don't know how long it'll take.
However, using Ollama is so annoying until this issue is solved, I'm determined to make this fix happen.

<!-- gh-comment-id:2227132762 --> @supercurio commented on GitHub (Jul 13, 2024): To everyone in this thread, I'd encourage you to build your own Ollama from my branch for testing purposes and report how the issue is solved. I'm curious how much of your available bandwidth is used during downloads with the new default. For me, on MacOS, making a build was easy following https://github.com/ollama/ollama/blob/main/docs/development.md I see that many PRs are awaiting review and merging, so I don't know how long it'll take. However, using Ollama is so annoying until this issue is solved, I'm determined to make this fix happen.
Author
Owner

@treibholz commented on GitHub (Jul 14, 2024):

@supercurio Works great here. I can still download with 11MB/sec on my 100mbit line here, the machine is still responsive AND I can still watch something in HD over a streaming-service.

<!-- gh-comment-id:2227376880 --> @treibholz commented on GitHub (Jul 14, 2024): @supercurio Works great here. I can still download with 11MB/sec on my 100mbit line here, the machine is still responsive AND I can still watch something in HD over a streaming-service.
Author
Owner

@binarynoise commented on GitHub (Jul 18, 2024):

I can confirm as well that reducing the amount of parallel connections restores network usability (even though I patched the source by hand and not using your feature branch).
Interestingly the download speed is unaffected (maxes at 20MB/s which I think is a server limit).

<!-- gh-comment-id:2237645926 --> @binarynoise commented on GitHub (Jul 18, 2024): I can confirm as well that reducing the amount of parallel connections restores network usability (even though I patched the source by hand and not using your feature branch). Interestingly the download speed is unaffected (maxes at 20MB/s which I think is a server limit).
Author
Owner

@scscgit commented on GitHub (Jul 23, 2024):

I'm adding a vote, it's really disruptive when you can't even join a meeting in work due to slow internet; some users may not be able to figure out the root cause. I ran it in Docker; even after pausing container/engine, it still consumed the entire bandwidth, so I had to either wait some more or completely shut down the Docker. Note that our scenario could be using a tool like Open WebUI to download the models, so it's not enough to provide hidden CLI parameters, and we need a quick solution that we won't need to spend time googling; Ollama should properly display a warning if using it may cause such disruptions.

<!-- gh-comment-id:2245125534 --> @scscgit commented on GitHub (Jul 23, 2024): I'm adding a vote, it's really disruptive when you can't even join a meeting in work due to slow internet; some users may not be able to figure out the root cause. I ran it in Docker; even after pausing container/engine, it still consumed the entire bandwidth, so I had to either wait some more or completely shut down the Docker. Note that our scenario could be using a tool like Open WebUI to download the models, so it's not enough to provide hidden CLI parameters, and we need a quick solution that we won't need to spend time googling; Ollama should properly display a warning if using it may cause such disruptions.
Author
Owner

@enrico3 commented on GitHub (Jul 30, 2024):

I am using @Netzvamp 's solution. I added these parameters, maybe it helps someone:

 ollama:
    [...]
    volumes:
    #with this bind the container downloads new models into my existing model directory on my host
     - /home/USERNAME/.ollama:/root/.ollama
    environment:
    #do not prune partly downloaded models when starting the container. 
    #This way downloads do not need to be completed in one session.
     - OLLAMA_NOPRUNE=true

The docker-tc container needs to be started before the ollama container I think, because it listens to container:start events (https://github.com/lukaszlach/docker-tc#usage)

Sometimes my download stopped because of a TLS handshake timeout. So I put the command in a loop to immediately resume it after an error:

#!/bin/bash
ollama pull MODELNAME
while [ $? -ne 0 ]; do
    ollama pull MODELNAME
done

With this command on the host the speed limit can be changed while the download is running (https://github.com/lukaszlach/docker-tc#post)
curl -d'rate=20Mbit' localhost:4080/ollama

<!-- gh-comment-id:2259331206 --> @enrico3 commented on GitHub (Jul 30, 2024): I am using @Netzvamp 's solution. I added these parameters, maybe it helps someone: ```yaml ollama: [...] volumes: #with this bind the container downloads new models into my existing model directory on my host - /home/USERNAME/.ollama:/root/.ollama environment: #do not prune partly downloaded models when starting the container. #This way downloads do not need to be completed in one session. - OLLAMA_NOPRUNE=true ``` The docker-tc container needs to be started before the ollama container I think, because it listens to container:start events (https://github.com/lukaszlach/docker-tc#usage) Sometimes my download stopped because of a TLS handshake timeout. So I put the command in a loop to immediately resume it after an error: ``` #!/bin/bash ollama pull MODELNAME while [ $? -ne 0 ]; do ollama pull MODELNAME done ``` With this command on the host the speed limit can be changed while the download is running (https://github.com/lukaszlach/docker-tc#post) `curl -d'rate=20Mbit' localhost:4080/ollama`
Author
Owner

@Fluffkin commented on GitHub (Jul 31, 2024):

For those on Linux "Traffic Toll" https://github.com/cryzed/TrafficToll sort of works. But I gave up on ollama because even with that some segments get zero data for long enough to trigger timeouts, so even with a fully saturated bandwidth the download fails. :)

I'm puzzled by two things:
Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download.
What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?

<!-- gh-comment-id:2259417032 --> @Fluffkin commented on GitHub (Jul 31, 2024): For those on Linux "Traffic Toll" https://github.com/cryzed/TrafficToll sort of works. But I gave up on ollama because even with that some segments get zero data for long enough to trigger timeouts, so even with a fully saturated bandwidth the download fails. :) I'm puzzled by two things: Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download. What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?
Author
Owner

@Kisaragi-ng commented on GitHub (Aug 1, 2024):

(...)

I'm puzzled by two things: Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download. What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?

afaik, ollama pull doesn't retrieve data from huggingface. the url that being used to download models is cloudflare R2

when a download is fail you can see it's url, for example:

→ docker exec ollama ollama pull mannix/llama3.1-8b-abliterated:q4_k_m
pulling manifest 
pulling 6a6aadebda16...  62% ▕█████████       ▏ 3.0 GB/4.9 GB  510 KB/s  52m25s
Error: max retries exceeded: Get "https://dd20bc07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/6a/6a6aadebda1698955067154814797e7d248c5a5f7ad123b39c11110d47439e9c/data?very-long-string-here": net/http: TLS handshake timeout

compared to huggingface model download:

Resolving huggingface.co (huggingface.co)... 13.33.30.49, 13.33.30.114, 13.33.30.23, ...
Connecting to huggingface.co (huggingface.co)|13.33.30.49|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.huggingface.co/repos/0a/10/0a10d4e7fae2db98335d1f5d4074cadf96dbd70a58275232c9bce0ba4568f72d/84c5818397e877afe4945b97581332d27dbfb0cec547b2575d0132719abb6dd6?very-long-string-here
<!-- gh-comment-id:2263360572 --> @Kisaragi-ng commented on GitHub (Aug 1, 2024): > (...) > > I'm puzzled by two things: Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download. What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else? afaik, ollama pull doesn't retrieve data from huggingface. the url that being used to download models is cloudflare R2 when a download is fail you can see it's url, for example: ```bash → docker exec ollama ollama pull mannix/llama3.1-8b-abliterated:q4_k_m pulling manifest pulling 6a6aadebda16... 62% ▕█████████ ▏ 3.0 GB/4.9 GB 510 KB/s 52m25s Error: max retries exceeded: Get "https://dd20bc07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/6a/6a6aadebda1698955067154814797e7d248c5a5f7ad123b39c11110d47439e9c/data?very-long-string-here": net/http: TLS handshake timeout ``` compared to huggingface model download: ``` Resolving huggingface.co (huggingface.co)... 13.33.30.49, 13.33.30.114, 13.33.30.23, ... Connecting to huggingface.co (huggingface.co)|13.33.30.49|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cdn-lfs-us-1.huggingface.co/repos/0a/10/0a10d4e7fae2db98335d1f5d4074cadf96dbd70a58275232c9bce0ba4568f72d/84c5818397e877afe4945b97581332d27dbfb0cec547b2575d0132719abb6dd6?very-long-string-here ```
Author
Owner

@joelanman commented on GitHub (Aug 6, 2024):

just to note as I don't think its clearly in this thread - the issue isn't rate limiting per se - it downloads at 10mbps for me. It's that it is setting up 64 concurrent connections to do so, as per the PR here:

<!-- gh-comment-id:2271387962 --> @joelanman commented on GitHub (Aug 6, 2024): just to note as I don't think its clearly in this thread - the issue isn't rate limiting per se - it downloads at 10mbps for me. It's that it is setting up 64 concurrent connections to do so, as per the PR here: - https://github.com/ollama/ollama/pull/5683
Author
Owner

@igorschlum commented on GitHub (Aug 11, 2024):

@joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB.

<!-- gh-comment-id:2282916822 --> @igorschlum commented on GitHub (Aug 11, 2024): @joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB.
Author
Owner

@ShayBox commented on GitHub (Aug 17, 2024):

This reliably crashes my router and causes it to restart, it's too fast.

<!-- gh-comment-id:2294876971 --> @ShayBox commented on GitHub (Aug 17, 2024): This reliably crashes my router and causes it to restart, it's too fast.
Author
Owner

@numbermaniac commented on GitHub (Aug 19, 2024):

I'm having to ctrl+c just to post this comment.

Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable.

It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging itself with the way it downloads files.

<!-- gh-comment-id:2296310179 --> @numbermaniac commented on GitHub (Aug 19, 2024): > I'm having to ctrl+c just to post this comment. Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable. It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging _itself_ with the way it downloads files.
Author
Owner

@supercurio commented on GitHub (Aug 19, 2024):

I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago.

After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch.

How to more forward from here?
Routers crashing, people unable to use their computers: that's not the expected result when using any kind of software.

<!-- gh-comment-id:2296446430 --> @supercurio commented on GitHub (Aug 19, 2024): I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago. After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch. How to more forward from here? Routers crashing, people unable to use their computers: that's not the expected result when using any kind of software.
Author
Owner

@igorschlum commented on GitHub (Aug 19, 2024):

Hi @supercurio (bonjour François), your patch is here:
https://github.com/ollama/ollama/pull/5683
I know that there were issues that were more important during those weeks, like CUDA fixes, Memory and function calling.
Ollama can run easily without this patch, but if it's just a matter of approving your patch, @jmorganca could decide to do it.

<!-- gh-comment-id:2296507740 --> @igorschlum commented on GitHub (Aug 19, 2024): Hi @supercurio (bonjour François), your patch is here: https://github.com/ollama/ollama/pull/5683 I know that there were issues that were more important during those weeks, like CUDA fixes, Memory and function calling. Ollama can run easily without this patch, but if it's just a matter of approving your patch, @jmorganca could decide to do it.
Author
Owner

@supercurio commented on GitHub (Aug 19, 2024):

Salut @igorschlum 😌
All of Ollama's the core functionalities are important, that's for sure.
Downloading model(s) is still the first action every Ollama user will take.

I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly.
Fortunately, llamafile provides a good enough alternative in that case.

<!-- gh-comment-id:2296542906 --> @supercurio commented on GitHub (Aug 19, 2024): Salut @igorschlum 😌 All of Ollama's the core functionalities are important, that's for sure. Downloading model(s) is still the first action every Ollama user will take. I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly. Fortunately, llamafile provides a good enough alternative in that case.
Author
Owner

@joelanman commented on GitHub (Aug 20, 2024):

Is it fixed by this?

<!-- gh-comment-id:2299566366 --> @joelanman commented on GitHub (Aug 20, 2024): Is it fixed by this? - https://github.com/ollama/ollama/pull/6347
Author
Owner

@Fluffkin commented on GitHub (Aug 20, 2024):

It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\(ツ)

<!-- gh-comment-id:2299614543 --> @Fluffkin commented on GitHub (Aug 20, 2024): It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\\_(ツ)_/¯
Author
Owner

@igorschlum commented on GitHub (Aug 20, 2024):

@Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time.

<!-- gh-comment-id:2299807518 --> @igorschlum commented on GitHub (Aug 20, 2024): @Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time.
Author
Owner

@joelanman commented on GitHub (Aug 20, 2024):

@igorschlum no it's changed from 64 to 16, so still a lot of connections

<!-- gh-comment-id:2299886664 --> @joelanman commented on GitHub (Aug 20, 2024): @igorschlum no it's changed from 64 to 16, so still a lot of connections
Author
Owner

@robins commented on GitHub (Aug 20, 2024):

My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users.

I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes.

So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of 4 threads sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads!

<!-- gh-comment-id:2299949959 --> @robins commented on GitHub (Aug 20, 2024): My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users. I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes. So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of **4 threads** sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads!
Author
Owner

@mrtysn commented on GitHub (Sep 2, 2024):

Reporting from Türkiye, I am unable to run ollama pull during the day due to it causing nearly all other connections on my shared Wi-Fi network to almost come to a halt. A download speed rate limit would be greatly appreciated.

<!-- gh-comment-id:2324875779 --> @mrtysn commented on GitHub (Sep 2, 2024): Reporting from Türkiye, I am unable to run `ollama pull` during the day due to it causing nearly all other connections on my shared Wi-Fi network to almost come to a halt. A download speed rate limit would be greatly appreciated.
Author
Owner

@igorschlum commented on GitHub (Sep 2, 2024):

@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads.
What version of Ollama do you use? On which OS?

<!-- gh-comment-id:2324900151 --> @igorschlum commented on GitHub (Sep 2, 2024): @mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS?
Author
Owner

@mrtysn commented on GitHub (Sep 2, 2024):

@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS?

@igorschlum Apologies for the lack of details.

  • I am on a MacBook Pro M2 Max with Sonoma 14.6.1.
  • My ollama is installed from homebrew, and it is currently on version 0.3.9.

However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 ~2 days ago.

image
<!-- gh-comment-id:2324925884 --> @mrtysn commented on GitHub (Sep 2, 2024): > @mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS? @igorschlum Apologies for the lack of details. - I am on a MacBook Pro M2 Max with Sonoma 14.6.1. - My ollama is installed from homebrew, and it is currently on version 0.3.9. However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 [~2 days ago](https://github.com/Homebrew/homebrew-core/commits/87cbc9cf2ccc2b83ae27821cb7203e3d416d71e1/Formula/o/ollama.rb). <img width="438" alt="image" src="https://github.com/user-attachments/assets/56a334b4-210d-4f9a-99a9-c4db1b4aceac">
Author
Owner

@igorschlum commented on GitHub (Sep 2, 2024):

@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

<!-- gh-comment-id:2324992652 --> @igorschlum commented on GitHub (Sep 2, 2024): @mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.
Author
Owner

@joelanman commented on GitHub (Sep 2, 2024):

@igorschlum it's 16, which is still very high

<!-- gh-comment-id:2325019902 --> @joelanman commented on GitHub (Sep 2, 2024): @igorschlum it's 16, which is still very high
Author
Owner

@igorschlum commented on GitHub (Sep 2, 2024):

@joelanman Sorry, I anticipated that it could be 4 :-)

<!-- gh-comment-id:2325041470 --> @igorschlum commented on GitHub (Sep 2, 2024): @joelanman Sorry, I anticipated that it could be 4 :-)
Author
Owner

@mrtysn commented on GitHub (Sep 3, 2024):

@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize.

Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds.

<!-- gh-comment-id:2327153085 --> @mrtysn commented on GitHub (Sep 3, 2024): > @mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing. I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize. Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds.
Author
Owner

@donuts-are-good commented on GitHub (Sep 5, 2024):

Hello all,

I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority.

With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day.

<!-- gh-comment-id:2331813628 --> @donuts-are-good commented on GitHub (Sep 5, 2024): Hello all, I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority. With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day.
Author
Owner

@ShayBox commented on GitHub (Sep 5, 2024):

The issue was fixed 2 weeks ago in 0.3.7...

<!-- gh-comment-id:2332025460 --> @ShayBox commented on GitHub (Sep 5, 2024): The issue was fixed 2 weeks ago in 0.3.7...
Author
Owner

@mdlmarkham commented on GitHub (Sep 7, 2024):

I'm still having the issue.

<!-- gh-comment-id:2336429127 --> @mdlmarkham commented on GitHub (Sep 7, 2024): I'm still having the issue.
Author
Owner

@devrandom commented on GitHub (Oct 6, 2024):

it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1.

<!-- gh-comment-id:2395539514 --> @devrandom commented on GitHub (Oct 6, 2024): it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1.
Author
Owner

@augusto-rehfeldt commented on GitHub (Dec 18, 2024):

I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish.

Any idea about how to rate limit this on Windows?

<!-- gh-comment-id:2552112186 --> @augusto-rehfeldt commented on GitHub (Dec 18, 2024): I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish. Any idea about how to rate limit this on Windows?
Author
Owner

@jimbothegrey commented on GitHub (Jan 7, 2025):

used wondershare to slow down the connection. looks like it is working....

<!-- gh-comment-id:2575720227 --> @jimbothegrey commented on GitHub (Jan 7, 2025): used wondershare to slow down the connection. looks like it is working....
Author
Owner

@mrtysn commented on GitHub (Jan 10, 2025):

used wondershare to slow down the connection. looks like it is working....

@jimbothegrey what might wondershare be? all I'm finding is a file converter

<!-- gh-comment-id:2581648771 --> @mrtysn commented on GitHub (Jan 10, 2025): > used wondershare to slow down the connection. looks like it is working.... @jimbothegrey what might wondershare be? all I'm finding is a file converter
Author
Owner

@TiddlyWiddly commented on GitHub (Jan 12, 2025):

Still experiencing this on pretty quick connection, knocks all my devices off

<!-- gh-comment-id:2585899716 --> @TiddlyWiddly commented on GitHub (Jan 12, 2025): Still experiencing this on pretty quick connection, knocks all my devices off
Author
Owner

@ankh2054 commented on GitHub (Jan 14, 2025):

yeah same here, It's hard to work on anything else when models download and very large models will take multiple days to download at my 3.5MB/s

<!-- gh-comment-id:2589687198 --> @ankh2054 commented on GitHub (Jan 14, 2025): yeah same here, It's hard to work on anything else when models download and very large models will take multiple days to download at my 3.5MB/s
Author
Owner

@xtareq commented on GitHub (Jan 17, 2025):

Same issue here. But in my case it was good in my windows 11 for llama3.1 but the issue arise when i try to pull phi4. Any specific reason why this happens?

<!-- gh-comment-id:2599109073 --> @xtareq commented on GitHub (Jan 17, 2025): Same issue here. But in my case it was good in my windows 11 for llama3.1 but the issue arise when i try to pull phi4. Any specific reason why this happens?
Author
Owner

@ading2210 commented on GitHub (Jan 21, 2025):

As a rudimentary workaround, I wrote a bash script that monitors ollama's network usage and constantly suspends/resumes the ollama process.

https://gist.github.com/ading2210/882565526f7e1f2b9b14a022ac3741ac

Make sure you have nethogs installed (sudo apt install nethogs).

For example, if you want to limit downloads to 5000 KB/s:

$ sudo ./ollama-limiter.sh 5000
<!-- gh-comment-id:2603584766 --> @ading2210 commented on GitHub (Jan 21, 2025): As a rudimentary workaround, I wrote a bash script that monitors ollama's network usage and constantly suspends/resumes the ollama process. https://gist.github.com/ading2210/882565526f7e1f2b9b14a022ac3741ac Make sure you have `nethogs` installed (`sudo apt install nethogs`). For example, if you want to limit downloads to 5000 KB/s: ``` $ sudo ./ollama-limiter.sh 5000 ```
Author
Owner

@digitalextremist commented on GitHub (Feb 15, 2025):

Seems this is a well known problem... Short of handling this at the router-level, the @ading2210 solution seems to help a lot! Thanks very much for sharing that.

Still looking forward to a long term solution that does not chop the process, though that approach is a great idea:

Image

<!-- gh-comment-id:2660700108 --> @digitalextremist commented on GitHub (Feb 15, 2025): Seems this is a well known problem... Short of handling this at the router-level, the @ading2210 solution seems to help a lot! Thanks very much for sharing that. Still looking forward to a long term solution that does not chop the process, though that approach is a great idea: ![Image](https://github.com/user-attachments/assets/e03d1496-2830-41d5-80ae-ac59b407676e)
Author
Owner

@donuts-are-good commented on GitHub (Feb 18, 2025):

The issue was fixed 2 weeks ago in 0.3.7...

It appears that is not the case...

<!-- gh-comment-id:2666163167 --> @donuts-are-good commented on GitHub (Feb 18, 2025): > The issue was fixed 2 weeks ago in 0.3.7... It appears that is not the case...
Author
Owner

@martinoturrina commented on GitHub (Mar 26, 2025):

bump

<!-- gh-comment-id:2754969764 --> @martinoturrina commented on GitHub (Mar 26, 2025): bump
Author
Owner

@martinoturrina commented on GitHub (Mar 26, 2025):

bump

<!-- gh-comment-id:2754969774 --> @martinoturrina commented on GitHub (Mar 26, 2025): bump
Author
Owner

@digitalextremist commented on GitHub (Mar 26, 2025):

My permanent solution for this is lukaszlach/docker-tc ... I no longer see this as an Ollama issue to resolve.

Example docker-compose.yml that works

services:
  traffic_control:
    image: lukaszlach/docker-tc
    container_name:traffic_control
    network_mode: host
    cap_add:
      - NET_ADMIN
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/docker-tc:/var/docker-tc

And then labels added to Ollama container:

    labels:
      com.docker-tc.enabled: "1"
      com.docker-tc.limit: "2mbps"
<!-- gh-comment-id:2755674485 --> @digitalextremist commented on GitHub (Mar 26, 2025): My permanent solution for this is `lukaszlach/docker-tc` ... I no longer see this as an `Ollama` issue to resolve. Example `docker-compose.yml` that works ``` services: traffic_control: image: lukaszlach/docker-tc container_name:traffic_control network_mode: host cap_add: - NET_ADMIN restart: always volumes: - /var/run/docker.sock:/var/run/docker.sock - /var/docker-tc:/var/docker-tc ``` And then labels added to `Ollama` container: ``` labels: com.docker-tc.enabled: "1" com.docker-tc.limit: "2mbps" ```
Author
Owner

@joelanman commented on GitHub (Mar 27, 2025):

that wouldnt fix it for people not using docker?

<!-- gh-comment-id:2757389670 --> @joelanman commented on GitHub (Mar 27, 2025): that wouldnt fix it for people not using docker?
Author
Owner

@digitalextremist commented on GitHub (Mar 28, 2025):

that wouldnt fix it for people not using docker?

No; but if you do not use docker but still use Linux ... the nethogs solution still works: https://github.com/ollama/ollama/issues/2006#issuecomment-2603584766

<!-- gh-comment-id:2759950988 --> @digitalextremist commented on GitHub (Mar 28, 2025): > that wouldnt fix it for people not using docker? No; but if you do not use `docker` but still use `Linux` ... the `nethogs` solution still works: https://github.com/ollama/ollama/issues/2006#issuecomment-2603584766
Author
Owner

@StdLogicTrig commented on GitHub (May 7, 2025):

Bump

<!-- gh-comment-id:2859467460 --> @StdLogicTrig commented on GitHub (May 7, 2025): Bump
Author
Owner

@whjvenyl commented on GitHub (Jun 20, 2025):

Mac version of a rate limiter based on the previous script from @ading2210

#!/bin/bash
#
# ollama-limiter.sh (v6 - Rate Calculation Method)
#
# This version calculates bandwidth as a rate (KB/s) between two snapshots,
# which is the most accurate method. It correctly parses the comma-delimited
# output from `nettop -L 1` and uses `tput` to prevent display issues.
#

# --- Configuration ---
MAX_BANDWIDTH_KBPS=8000
CHECK_INTERVAL=2 # Seconds between measurements
SUSPEND_TIME=4
RESUME_TIME=1

# --- Colors ---
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

# --- Functions ---

# This function runs nettop and extracts the PID and total cumulative bytes
# for the ollama process with the most network activity.
get_ollama_stats() {
    # -L 1: Run for one snapshot.
    # -J bytes_in,bytes_out: Output only these columns for easier parsing.
    nettop -L 1 -J bytes_in,bytes_out | grep 'ollama\.' | awk -F, '
    {
        # $1 is process.pid, $2 is bytes_in, $3 is bytes_out
        split($1, p, ".");
        pid = p[2];
        gsub(/[^0-9]/, "", pid); # Sanitize PID to ensure it is only numeric

        total_bytes = $2 + $3;

        # Print total bytes and pid, so we can sort to find the one with max traffic
        printf "%d %s\n", total_bytes, pid;
    }' | sort -rn | head -n 1
}

# --- Main ---

echo -e "${GREEN}Ollama Bandwidth Limiter is running.${NC}"
echo -e "Limit set to: ${YELLOW}${MAX_BANDWIDTH_KBPS} KB/s${NC}"
echo "Press Ctrl+C to stop."
echo ""

# Initialize stats for rate calculation
last_bytes=0
last_pid=""
last_time=$(date +%s.%N)

while true; do
    # Use tput to prevent terminal display issues.
    # `cr` = carriage return, `el` = erase line.
    tput cr
    tput el

    # Get current stats
    current_stats=$(get_ollama_stats)
    current_time=$(date +%s.%N)

    if [ -z "$current_stats" ]; then
        echo -n -e "${BLUE}[$(date '+%H:%M:%S')] Waiting for an active 'ollama' process...${NC}"
        sleep "$CHECK_INTERVAL"
        continue
    fi

    current_bytes=$(echo "$current_stats" | awk '{print $1}')
    current_pid=$(echo "$current_stats" | awk '{print $2}')

    # If the PID changed (e.g., new pull command) or this is the first run,
    # reset the baseline to start a new measurement.
    if [ "$current_pid" != "$last_pid" ]; then
        echo -n -e "${YELLOW}[$(date '+%H:%M:%S')] New active PID: ${current_pid}. Initializing...${NC}"
        last_pid=$current_pid
        last_bytes=$current_bytes
        last_time=$current_time
        sleep "$CHECK_INTERVAL"
        continue
    fi

    # --- Calculate Bandwidth Rate ---
    time_diff=$(echo "$current_time - $last_time" | bc)
    byte_diff=$(echo "$current_bytes - $last_bytes" | bc)

    # Avoid division by zero if the time difference is too small
    if (( $(echo "$time_diff <= 0.001" | bc -l) )); then
        kbps=0
    else
        # Rate in Bytes/sec, then convert to KB/s
        bytes_per_second=$(echo "$byte_diff / $time_diff" | bc)
        kbps=$(echo "$bytes_per_second / 1024" | bc)
    fi

    # Ensure kbps is a non-decimal integer
    kbps=${kbps%.*}
    if [ -z "$kbps" ]; then kbps=0; fi

    echo -n -e "${GREEN}[$(date '+%H:%M:%S')] Active PID: ${current_pid}, Bandwidth: ${kbps} KB/s${NC}"

    # --- Throttle if needed ---
    if (( kbps > MAX_BANDWIDTH_KBPS )); then
        echo "" # Newline for the suspend message
        echo -e "${RED}Limit exceeded! Suspending process ${current_pid} for ${SUSPEND_TIME}s...${NC}"
        kill -STOP "$current_pid"
        sleep "$SUSPEND_TIME"

        echo -e "${GREEN}Resuming process ${current_pid}...${NC}"
        kill -CONT "$current_pid"

        # Reset baseline after resuming to avoid a huge artificial spike on the next check
        last_pid=""
        sleep "$RESUME_TIME"
    else
        # Update baseline for the next iteration
        last_pid=$current_pid
        last_bytes=$current_bytes
        last_time=$current_time
        sleep "$CHECK_INTERVAL"
    fi
done

<!-- gh-comment-id:2991255045 --> @whjvenyl commented on GitHub (Jun 20, 2025): Mac version of a rate limiter based on the previous script from @ading2210 ``` #!/bin/bash # # ollama-limiter.sh (v6 - Rate Calculation Method) # # This version calculates bandwidth as a rate (KB/s) between two snapshots, # which is the most accurate method. It correctly parses the comma-delimited # output from `nettop -L 1` and uses `tput` to prevent display issues. # # --- Configuration --- MAX_BANDWIDTH_KBPS=8000 CHECK_INTERVAL=2 # Seconds between measurements SUSPEND_TIME=4 RESUME_TIME=1 # --- Colors --- RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # --- Functions --- # This function runs nettop and extracts the PID and total cumulative bytes # for the ollama process with the most network activity. get_ollama_stats() { # -L 1: Run for one snapshot. # -J bytes_in,bytes_out: Output only these columns for easier parsing. nettop -L 1 -J bytes_in,bytes_out | grep 'ollama\.' | awk -F, ' { # $1 is process.pid, $2 is bytes_in, $3 is bytes_out split($1, p, "."); pid = p[2]; gsub(/[^0-9]/, "", pid); # Sanitize PID to ensure it is only numeric total_bytes = $2 + $3; # Print total bytes and pid, so we can sort to find the one with max traffic printf "%d %s\n", total_bytes, pid; }' | sort -rn | head -n 1 } # --- Main --- echo -e "${GREEN}Ollama Bandwidth Limiter is running.${NC}" echo -e "Limit set to: ${YELLOW}${MAX_BANDWIDTH_KBPS} KB/s${NC}" echo "Press Ctrl+C to stop." echo "" # Initialize stats for rate calculation last_bytes=0 last_pid="" last_time=$(date +%s.%N) while true; do # Use tput to prevent terminal display issues. # `cr` = carriage return, `el` = erase line. tput cr tput el # Get current stats current_stats=$(get_ollama_stats) current_time=$(date +%s.%N) if [ -z "$current_stats" ]; then echo -n -e "${BLUE}[$(date '+%H:%M:%S')] Waiting for an active 'ollama' process...${NC}" sleep "$CHECK_INTERVAL" continue fi current_bytes=$(echo "$current_stats" | awk '{print $1}') current_pid=$(echo "$current_stats" | awk '{print $2}') # If the PID changed (e.g., new pull command) or this is the first run, # reset the baseline to start a new measurement. if [ "$current_pid" != "$last_pid" ]; then echo -n -e "${YELLOW}[$(date '+%H:%M:%S')] New active PID: ${current_pid}. Initializing...${NC}" last_pid=$current_pid last_bytes=$current_bytes last_time=$current_time sleep "$CHECK_INTERVAL" continue fi # --- Calculate Bandwidth Rate --- time_diff=$(echo "$current_time - $last_time" | bc) byte_diff=$(echo "$current_bytes - $last_bytes" | bc) # Avoid division by zero if the time difference is too small if (( $(echo "$time_diff <= 0.001" | bc -l) )); then kbps=0 else # Rate in Bytes/sec, then convert to KB/s bytes_per_second=$(echo "$byte_diff / $time_diff" | bc) kbps=$(echo "$bytes_per_second / 1024" | bc) fi # Ensure kbps is a non-decimal integer kbps=${kbps%.*} if [ -z "$kbps" ]; then kbps=0; fi echo -n -e "${GREEN}[$(date '+%H:%M:%S')] Active PID: ${current_pid}, Bandwidth: ${kbps} KB/s${NC}" # --- Throttle if needed --- if (( kbps > MAX_BANDWIDTH_KBPS )); then echo "" # Newline for the suspend message echo -e "${RED}Limit exceeded! Suspending process ${current_pid} for ${SUSPEND_TIME}s...${NC}" kill -STOP "$current_pid" sleep "$SUSPEND_TIME" echo -e "${GREEN}Resuming process ${current_pid}...${NC}" kill -CONT "$current_pid" # Reset baseline after resuming to avoid a huge artificial spike on the next check last_pid="" sleep "$RESUME_TIME" else # Update baseline for the next iteration last_pid=$current_pid last_bytes=$current_bytes last_time=$current_time sleep "$CHECK_INTERVAL" fi done ```
Author
Owner

@codeisnotcode commented on GitHub (Jun 27, 2025):

Rate-limiting is a really important feature! With the source code it should be simple to implement and it would provide high value. It is super annoying to have Ollama kill other network activity every time I download a new model.

<!-- gh-comment-id:3011794262 --> @codeisnotcode commented on GitHub (Jun 27, 2025): Rate-limiting is a really important feature! With the source code it should be simple to implement and it would provide high value. It is super annoying to have Ollama kill other network activity every time I download a new model.
Author
Owner

@sotander commented on GitHub (Jul 22, 2025):

I have this issue also. I pull at ca. 200MB/s. The admins are sending me emails to not slow down the network.

<!-- gh-comment-id:3102759967 --> @sotander commented on GitHub (Jul 22, 2025): I have this issue also. I pull at ca. 200MB/s. The admins are sending me emails to not slow down the network.
Author
Owner

@Mondonno commented on GitHub (Aug 29, 2025):

I have this issue also, it hurts especially on slower networks in my case, seems related also to #3741

<!-- gh-comment-id:3235800515 --> @Mondonno commented on GitHub (Aug 29, 2025): I have this issue also, it hurts especially on slower networks in my case, seems related also to [#3741](https://github.com/ollama/ollama/issues/3741)
Author
Owner

@anton-karlovskiy commented on GitHub (Sep 16, 2025):

@jmorganca

👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues:

  • An existing connection was forcibly closed by the remote host
  • Error: max retries exceeded

What I’ve tried so far:

Image

From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious.

To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly:

Windows
Open new notepad and put this code.

@echo off
:loop
echo Running ollama run llama3.2...
ollama run llama3.2

if %errorlevel% neq 0 (
    echo The command failed. Retrying... Press Ctrl+C to exit.
    goto loop
)
echo The command completed successfully!

Next go to save as… and pick All files as file type and name the file with .bat extension.
And finally you can run it.
To edit the code above, just change model name from llama3.2 to any other model you like.

Linux
I didn’t test the code for Linux but this should hopefully work.

#!/bin/bash

# Function to handle Ctrl+C
trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT
while true; do
    echo "Running ollama run llama3.2..."
    ollama run llama3.2
    if [ $? -eq 0 ]; then
        echo "The command completed successfully!"
        break
    else
        echo "The command failed. Retrying... Press Ctrl+C to exit."
    fi
done

Save this as run_loop.sh then run these commands.

chmod +x run_loop.sh
./run_loop.sh

Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e

Good luck 🤞

<!-- gh-comment-id:3295870365 --> @anton-karlovskiy commented on GitHub (Sep 16, 2025): @jmorganca 👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues: - An existing connection was forcibly closed by the remote host - Error: max retries exceeded What I’ve tried so far: - https://github.com/ollama/ollama/issues/3769#issuecomment-2076767384: closed VPN → no change - https://github.com/ollama/ollama/issues/6211#issuecomment-2626567037: changed DNS → made some progress (got to ~37%) but stuck again <img width="1086" height="124" alt="Image" src="https://github.com/user-attachments/assets/774acce1-b0af-4b66-b2ae-415baf6f0fe0" /> From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious. To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly: **Windows** Open new notepad and put this code. ```bash @echo off :loop echo Running ollama run llama3.2... ollama run llama3.2 if %errorlevel% neq 0 ( echo The command failed. Retrying... Press Ctrl+C to exit. goto loop ) echo The command completed successfully! ``` Next go to save as… and pick All files as file type and name the file with `.bat` extension. And finally you can run it. To edit the code above, just change model name from `llama3.2` to any other model you like. **Linux** I didn’t test the code for Linux but this should hopefully work. #!/bin/bash ```bash # Function to handle Ctrl+C trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT while true; do echo "Running ollama run llama3.2..." ollama run llama3.2 if [ $? -eq 0 ]; then echo "The command completed successfully!" break else echo "The command failed. Retrying... Press Ctrl+C to exit." fi done ``` Save this as `run_loop.sh` then run these commands. ```bash chmod +x run_loop.sh ``` ```bash ./run_loop.sh ``` Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e Good luck 🤞
Author
Owner

@ading2210 commented on GitHub (Sep 16, 2025):

@anton-karlovskiy On Linux there is a retry command already, so there's no need for a shell script.

You would just use it like:

retry ollama run llama3
<!-- gh-comment-id:3295926528 --> @ading2210 commented on GitHub (Sep 16, 2025): @anton-karlovskiy On Linux there is a [`retry` command](https://manpages.debian.org/trixie/retry/retry.1.en.html) already, so there's no need for a shell script. You would just use it like: ``` retry ollama run llama3 ```
Author
Owner

@anton-karlovskiy commented on GitHub (Sep 16, 2025):

@anton-karlovskiy On Linux there is a retry command already, so there's no need for a shell script.

You would just use it like:

retry ollama run llama3

Thank you for your tip. @ading2210
Let's keep in touch. :)

<!-- gh-comment-id:3295954882 --> @anton-karlovskiy commented on GitHub (Sep 16, 2025): > [@anton-karlovskiy](https://github.com/anton-karlovskiy) On Linux there is a [`retry` command](https://manpages.debian.org/trixie/retry/retry.1.en.html) already, so there's no need for a shell script. > > You would just use it like: > > ``` > retry ollama run llama3 > ``` Thank you for your tip. @ading2210 Let's keep in touch. :)
Author
Owner

@digitalextremist commented on GitHub (Sep 16, 2025):

Agreed with @anton-karlovskiy, that was a real-life Pro Tip ™️ @ading2210 ... extra rtfm points there.

And nice work on the meticulousness, and DIY solutions @anton-karlovskiy. Do you ever use containers?

<!-- gh-comment-id:3296452662 --> @digitalextremist commented on GitHub (Sep 16, 2025): Agreed with @anton-karlovskiy, that was a real-life `Pro Tip` ™️ @ading2210 ... extra `rtfm` points there. And nice work on the meticulousness, and DIY solutions @anton-karlovskiy. Do you ever use containers?
Author
Owner

@PeaStew commented on GitHub (Oct 15, 2025):

coming up to 2 years now and no proper solution, just hacks

<!-- gh-comment-id:3407803837 --> @PeaStew commented on GitHub (Oct 15, 2025): coming up to 2 years now and no proper solution, just hacks
Author
Owner

@apassi commented on GitHub (Mar 10, 2026):

I am using the trickle to limit the bandwith:
trickle -s -d 50mb ollama pull xxxxx

<!-- gh-comment-id:4029084828 --> @apassi commented on GitHub (Mar 10, 2026): I am using the trickle to limit the bandwith: trickle -s -d 50mb ollama pull xxxxx
Author
Owner

@floriandotorg commented on GitHub (Mar 20, 2026):

Super important feature, not sure, why it's not here.

<!-- gh-comment-id:4098176547 --> @floriandotorg commented on GitHub (Mar 20, 2026): Super important feature, not sure, why it's not here.
Author
Owner

@dhirajlochib commented on GitHub (Apr 2, 2026):

I've opened a PR to address this: #15219

The implementation adds a new OLLAMA_MAX_DOWNLOAD_SPEED environment variable that lets you cap download bandwidth when pulling models. It uses a shared token-bucket rate limiter (golang.org/x/time/rate) across all 16 concurrent download chunks, so the aggregate bandwidth stays within the specified limit.

Usage examples:

  • OLLAMA_MAX_DOWNLOAD_SPEED=10m — limit to 10 MB/s
  • OLLAMA_MAX_DOWNLOAD_SPEED=500k — limit to 500 KB/s
  • OLLAMA_MAX_DOWNLOAD_SPEED=1g — limit to 1 GB/s

When unset, downloads run at full speed with zero overhead.

<!-- gh-comment-id:4178753034 --> @dhirajlochib commented on GitHub (Apr 2, 2026): I've opened a PR to address this: #15219 The implementation adds a new `OLLAMA_MAX_DOWNLOAD_SPEED` environment variable that lets you cap download bandwidth when pulling models. It uses a shared token-bucket rate limiter (`golang.org/x/time/rate`) across all 16 concurrent download chunks, so the aggregate bandwidth stays within the specified limit. Usage examples: - `OLLAMA_MAX_DOWNLOAD_SPEED=10m` — limit to 10 MB/s - `OLLAMA_MAX_DOWNLOAD_SPEED=500k` — limit to 500 KB/s - `OLLAMA_MAX_DOWNLOAD_SPEED=1g` — limit to 1 GB/s When unset, downloads run at full speed with zero overhead.
Author
Owner

@donuts-are-good commented on GitHub (Apr 13, 2026):

It's crazy that this issue is still open, still being replied to, and still no solution.

<!-- gh-comment-id:4240302747 --> @donuts-are-good commented on GitHub (Apr 13, 2026): It's crazy that this issue is still open, still being replied to, and still no solution.
Author
Owner

@phamngocduy98 commented on GitHub (Apr 15, 2026):

yes, I have the same issue when uploading also. My internet will fail with 16 concurrent connections.

<!-- gh-comment-id:4253745461 --> @phamngocduy98 commented on GitHub (Apr 15, 2026): yes, I have the same issue when uploading also. My internet will fail with 16 concurrent connections.
Author
Owner

@gregs007 commented on GitHub (Apr 16, 2026):

@dhirajlochib Thanks for developing a solution for this problem for us. It's been a nagging issue for quite some time. Really appreciate the contribution. Looking forward to it!

<!-- gh-comment-id:4263962203 --> @gregs007 commented on GitHub (Apr 16, 2026): @dhirajlochib Thanks for developing a solution for this problem for us. It's been a nagging issue for quite some time. Really appreciate the contribution. Looking forward to it!
Author
Owner

@Legendary-Lava commented on GitHub (Apr 19, 2026):

A hack I did was isolating by source/server on an IFB (Intermediate function block) that is set below my max bandwidth.

Modifying anything on ingress is always a hack, but its a little more universal than addressing ollama spamming 16 connections specifically.
It is more prone to multiple different "server" connections like when torrenting
make to change eth2 to the correct interface & set bandwidth somewhere below your actual speed

ip link add name ifb4eth2 type ifb
tc qdisc del dev eth2 ingress
tc qdisc add dev eth2 handle ffff: ingress
tc qdisc del dev ifb4eth2 root
tc qdisc add dev ifb4eth2 root cake bandwidth 200Mbit dual-srchost besteffort egress
ip link set ifb4eth2 up # if you don't bring the device up your connection will lock up on the next step.
tc filter add dev eth2 parent ffff: matchall action mirred egress redirect dev ifb4eth2

To test just go to fast.com set it to 30 streams, extend test duration & try to do anything else thoughout the download.

<!-- gh-comment-id:4276924150 --> @Legendary-Lava commented on GitHub (Apr 19, 2026): A hack I did was isolating by source/server on an IFB (Intermediate function block) that is set below my max bandwidth. Modifying anything on ingress is always a hack, but its a little more universal than addressing ollama spamming 16 connections specifically. It is more prone to multiple different "server" connections like when torrenting make to change eth2 to the correct interface & set bandwidth somewhere below your actual speed ``` ip link add name ifb4eth2 type ifb tc qdisc del dev eth2 ingress tc qdisc add dev eth2 handle ffff: ingress tc qdisc del dev ifb4eth2 root tc qdisc add dev ifb4eth2 root cake bandwidth 200Mbit dual-srchost besteffort egress ip link set ifb4eth2 up # if you don't bring the device up your connection will lock up on the next step. tc filter add dev eth2 parent ffff: matchall action mirred egress redirect dev ifb4eth2 ``` To test just go to fast.com set it to 30 streams, extend test duration & try to do anything else thoughout the download.
Author
Owner

@gregs007 commented on GitHub (Apr 20, 2026):

Maybe this will help someone until this change gets into main

I use a tool called tc to limit the ollama container after it starts. I'm no developer so I'm sure someone could make this more universal by automagically parsing out the right interface, burstsize, etc. but this limits my ollama network speed to 500m (I have 1g internet).

INTERFACE_ID=$(docker exec ollama cat /sys/class/net/eth0/iflink)
VETH=$(ip link show | grep "^${INTERFACE_ID}:" | sed -n 's/.*\(veth[a-z0-9]*\).*/\1/p')

tc qdisc add dev $VETH root tbf rate 500mbit burst 1mb latency 50ms
<!-- gh-comment-id:4281277669 --> @gregs007 commented on GitHub (Apr 20, 2026): Maybe this will help someone until this change gets into main I use a tool called tc to limit the ollama container after it starts. I'm no developer so I'm sure someone could make this more universal by automagically parsing out the right interface, burstsize, etc. but this limits my ollama network speed to 500m (I have 1g internet). ``` INTERFACE_ID=$(docker exec ollama cat /sys/class/net/eth0/iflink) VETH=$(ip link show | grep "^${INTERFACE_ID}:" | sed -n 's/.*\(veth[a-z0-9]*\).*/\1/p') tc qdisc add dev $VETH root tbf rate 500mbit burst 1mb latency 50ms ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26920