[GH-ISSUE #2006] Rate limit download speed on pulling new models #26920

New Issue

GiteaMirror · 2026-04-22T03:39:55-05:00

GiteaMirror commented

2026-04-22 03:39:55 -05:00

Originally created by @donuts-are-good on GitHub (Jan 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2006

Originally assigned to: @mxyng on GitHub.

Is there interest in implementing a rate limiter in the pull command? I'm open to working on this, this is the syntax I have in mind for now:

ollama pull modelname --someflagname 1024 <-- this would limit to 1024 kbps

I took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader.

This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.

Originally created by @donuts-are-good on GitHub (Jan 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2006 Originally assigned to: @mxyng on GitHub. Is there interest in implementing a rate limiter in the `pull` command? I'm open to working on this, this is the syntax I have in mind for now: `ollama pull modelname --someflagname 1024` <-- this would limit to 1024 kbps I took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader. This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.

GiteaMirror added the networking label 2026-04-22 03:39:55 -05:00

GiteaMirror commented

2026-04-22 03:39:57 -05:00

@tkafka commented on GitHub (Jan 23, 2024):

Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet.

How about rate-limit?

@tkafka commented on GitHub (Jan 23, 2024): Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet. How about `rate-limit`?

GiteaMirror commented

2026-04-22 03:39:58 -05:00

@jukofyork commented on GitHub (Jan 24, 2024):

Yeah, same here!

I'm finding ollama pull is really killing my connection and i have to limit myself to just using it at night now...

I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either wget or curl? If so, then it might be good to have control over these parameter(s) too.

@jukofyork commented on GitHub (Jan 24, 2024): Yeah, same here! I'm finding `ollama pull` is *really* killing my connection and i have to limit myself to just using it at night now... I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either `wget` or `curl`? If so, then it might be good to have control over these parameter(s) too.

GiteaMirror commented

2026-04-22 03:39:59 -05:00

@escaroda commented on GitHub (Jan 31, 2024):

I would do the same as wget:

‘--limit-rate=amount’

Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix. For example, ‘--limit-rate=20k’ will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don’t want Wget to consume the entire available bandwidth.

This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, ‘--limit-rate=2.5k’ is a legal value.

Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don’t be surprised if limiting the rate doesn’t work well with very small files.

@escaroda commented on GitHub (Jan 31, 2024): I would do the same as `wget`: > ‘--limit-rate=amount’ > > Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix. For example, ‘--limit-rate=20k’ will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don’t want Wget to consume the entire available bandwidth. > > This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, ‘--limit-rate=2.5k’ is a legal value. > > Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don’t be surprised if limiting the rate doesn’t work well with very small files.

GiteaMirror commented

2026-04-22 03:40:00 -05:00

@easp commented on GitHub (Feb 1, 2024):

I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers.

@easp commented on GitHub (Feb 1, 2024): I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers.

GiteaMirror commented

2026-04-22 03:40:01 -05:00

@akulbe commented on GitHub (Feb 16, 2024):

I would LOVE to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens.

@akulbe commented on GitHub (Feb 16, 2024): I would **_LOVE_** to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens.

GiteaMirror commented

2026-04-22 03:40:02 -05:00

@BruceMacD commented on GitHub (Feb 16, 2024):

Behavior here will be improved by #2221, working on getting that unblocked now

@BruceMacD commented on GitHub (Feb 16, 2024): Behavior here will be improved by #2221, working on getting that unblocked now

GiteaMirror commented

2026-04-22 03:40:03 -05:00

@donuts-are-good commented on GitHub (Feb 17, 2024):

We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow.

@donuts-are-good commented on GitHub (Feb 17, 2024): We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow.

GiteaMirror commented

2026-04-22 03:40:04 -05:00

@pablo-01 commented on GitHub (Mar 17, 2024):

Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset.

@pablo-01 commented on GitHub (Mar 17, 2024): Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset.

GiteaMirror commented

2026-04-22 03:40:05 -05:00

@simmonsm commented on GitHub (Mar 20, 2024):

I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting.

@simmonsm commented on GitHub (Mar 20, 2024): I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting.

GiteaMirror commented

2026-04-22 03:40:06 -05:00

@fermuch commented on GitHub (Mar 20, 2024):

@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

@fermuch commented on GitHub (Mar 20, 2024): @simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

GiteaMirror commented

2026-04-22 03:40:06 -05:00

@simmonsm commented on GitHub (Mar 20, 2024):

@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch.

@simmonsm commented on GitHub (Mar 20, 2024): > @simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic). Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch.

GiteaMirror commented

2026-04-22 03:40:07 -05:00

@LagSlug commented on GitHub (Apr 14, 2024):

You might be able to accomplish this with a docker container

https://stackoverflow.com/questions/25497523/how-can-i-rate-limit-network-traffic-on-a-docker-container

@LagSlug commented on GitHub (Apr 14, 2024): You might be able to accomplish this with a docker container https://stackoverflow.com/questions/25497523/how-can-i-rate-limit-network-traffic-on-a-docker-container

GiteaMirror commented

2026-04-22 03:40:07 -05:00

@supercurio commented on GitHub (Apr 19, 2024):

I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency.

My Ollama instance is running on MacOS as a native app.
For now I found a workaround for my neighbor, using XCode's Network Link Conditioner, but I'm still essentially unable to browse the web on my primary machine when pulling models.

I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all.

My suggestion as a simple solution that can be implemented quickly:

run by default with conservative settings to be a good network citizen. 2, maybe 3. Not more than that.
mention a more aggressive preset as command-line argument when downloading via ollama pull or ollama run
expose a custom "max concurrent download connections" parameter on command line and API.

Then later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters.
This github issue suggests a rate limit which would be helpful as well, but selecting an appropriate amount of concurrent connections should do the trick just fine without resorting to manual tuning.
If Ollama is competing with something else trying to use bandwidth, like a neighbor trying to watch Netflix, it should not try to hack TCP's congestion control standards to get all the bandwidth and respect them instead.

I hope this can be addressed shortly.

@supercurio commented on GitHub (Apr 19, 2024): I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency. My Ollama instance is running on MacOS as a native app. For now I found a workaround for my neighbor, using XCode's Network Link Conditioner, but I'm still essentially unable to browse the web on my primary machine when pulling models. I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all. My suggestion as a simple solution that can be implemented quickly: - run by default with conservative settings to be a good network citizen. 2, maybe 3. Not more than that. - mention a more aggressive preset as command-line argument when downloading via `ollama pull` or `ollama run` - expose a custom "max concurrent download connections" parameter on command line and API. Then later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters. This github issue suggests a rate limit which would be helpful as well, but selecting an appropriate amount of concurrent connections should do the trick just fine without resorting to manual tuning. If Ollama is competing with something else trying to use bandwidth, like a neighbor trying to watch Netflix, it should not try to hack TCP's congestion control standards to get all the bandwidth and respect them instead. I hope this can be addressed shortly.

GiteaMirror commented

2026-04-22 03:40:08 -05:00

@mcraveiro commented on GitHub (May 19, 2024):

Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients.

@mcraveiro commented on GitHub (May 19, 2024): Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients.

GiteaMirror commented

2026-04-22 03:40:09 -05:00

@strangehelix commented on GitHub (May 24, 2024):

Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature.

@strangehelix commented on GitHub (May 24, 2024): Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature.

GiteaMirror commented

2026-04-22 03:40:09 -05:00

@FeyNyXx commented on GitHub (May 28, 2024):

Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :<

@FeyNyXx commented on GitHub (May 28, 2024): Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :<

GiteaMirror commented

2026-04-22 03:40:10 -05:00

@metamec commented on GitHub (May 30, 2024):

Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process ~~on the system~~. (Just realised it's network-wide, not system-wide. When downloading large models, it feels like my home network is being DDoSed.)

@metamec commented on GitHub (May 30, 2024): Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process ~~on the system~~. (Just realised it's network-wide, not system-wide. When downloading large models, it feels like my home network is being DDoSed.)

GiteaMirror commented

2026-04-22 03:40:11 -05:00

@mcraveiro commented on GitHub (May 30, 2024):

Yes, same here, I can only download models at night. Machine is unusable.

@mcraveiro commented on GitHub (May 30, 2024): Yes, same here, I can only download models at night. Machine is unusable.

GiteaMirror commented

2026-04-22 03:40:12 -05:00

@MihailCosmin commented on GitHub (May 30, 2024):

Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues.
I had to find a solution to rate limit download speed, older solutions (wondershaper, trickle or tc) did not work for me. The only one that worked was FireQOS, in case anyone else needs it.

@MihailCosmin commented on GitHub (May 30, 2024): Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues. I had to find a solution to rate limit download speed, older solutions (wondershaper, trickle or tc) did not work for me. The only one that worked was [FireQOS](https://firehol.org/tutorial/fireqos-new-user/), in case anyone else needs it.

GiteaMirror commented

2026-04-22 03:40:13 -05:00

@LutzFassl commented on GitHub (Jun 27, 2024):

+1

@LutzFassl commented on GitHub (Jun 27, 2024): +1

GiteaMirror commented

2026-04-22 03:40:14 -05:00

@robins commented on GitHub (Jul 6, 2024):

My linux box (i5) got reliably stuck every single time I pulled a model... so +1 for the --rate-limit feature.

Two solutions, that did help me limp on for now:

As soon as I started the fetch, I used iotop to change the ionice priority (using i) to idle. That made the issue completely go away in that, although the downloads were still fast, the linux system was quite usable. However, this was still frustrating since one had to type the PIDs when trying to set ionice for them (and there were a few)!
Now since Ollama spun up multiple downloads, the ionice tool didn't work for me - IIUC that's because ionice needs to be run for each process. So it ended up being far simpler to just get the parent PID, and then set ionice for each of the child processes, each time I was downloading a model.

pid=`ps -ef | grep "ollama run" | awk '{print $2}'`
sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '`

@robins commented on GitHub (Jul 6, 2024): My linux box (i5) got reliably stuck *every* *single* *time* I pulled a model... so +1 for the `--rate-limit` feature. Two solutions, that did help me limp on for now: 1. As soon as I started the fetch, I used `iotop` to change the `ionice` priority (using `i`) to `idle`. That made the issue completely go away in that, although the downloads were still fast, the linux system was quite usable. However, this was still frustrating since one had to type the PIDs when trying to set `ionice` for them (and there were a few)! 2. Now since `Ollama` spun up multiple downloads, the `ionice` tool didn't work for me - IIUC that's because `ionice` needs to be run for each process. So it ended up being far simpler to just get the parent PID, and then set ionice for each of the child processes, each time I was downloading a model. ``` pid=`ps -ef | grep "ollama run" | awk '{print $2}'` sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '` ```

GiteaMirror commented

2026-04-22 03:40:15 -05:00

@treibholz commented on GitHub (Jul 12, 2024):

I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model:

$ sudo ethtool -s eth0  autoneg on speed 10 duplex full

This negotiates the linkspeed of my network-interface to 10Mbit.

Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore.

@treibholz commented on GitHub (Jul 12, 2024): I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model: ``` $ sudo ethtool -s eth0 autoneg on speed 10 duplex full ``` This negotiates the linkspeed of my network-interface to 10Mbit. Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore.

GiteaMirror commented

2026-04-22 03:40:16 -05:00

@supercurio commented on GitHub (Jul 13, 2024):

I'm preparing a patch and will submit a PR to address this soon.

@supercurio commented on GitHub (Jul 13, 2024): I'm preparing a patch and will submit a PR to address this soon.

GiteaMirror commented

2026-04-22 03:40:17 -05:00

@Netzvamp commented on GitHub (Jul 13, 2024):

My solution for now, works fine. This docker-tc can also simulate package loss 😂

version: '3'
services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    ports:
      - 11434:11434
    restart: unless-stopped
    labels:
      - "com.docker-tc.enabled=1"
      - "com.docker-tc.limit=30mbit"

  docker-tc:
    image: lukaszlach/docker-tc
    cap_add:
      - NET_ADMIN
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/docker-tc:/var/docker-tc

@Netzvamp commented on GitHub (Jul 13, 2024): My solution for now, works fine. This docker-tc can also simulate package loss 😂 ```yaml version: '3' services: ollama: image: ollama/ollama container_name: ollama ports: - 11434:11434 restart: unless-stopped labels: - "com.docker-tc.enabled=1" - "com.docker-tc.limit=30mbit" docker-tc: image: lukaszlach/docker-tc cap_add: - NET_ADMIN network_mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock - /var/docker-tc:/var/docker-tc ```

GiteaMirror commented

2026-04-22 03:40:18 -05:00

@supercurio commented on GitHub (Jul 13, 2024):

To everyone in this thread, I'd encourage you to build your own Ollama from my branch for testing purposes and report how the issue is solved. I'm curious how much of your available bandwidth is used during downloads with the new default.

For me, on MacOS, making a build was easy following https://github.com/ollama/ollama/blob/main/docs/development.md
I see that many PRs are awaiting review and merging, so I don't know how long it'll take.
However, using Ollama is so annoying until this issue is solved, I'm determined to make this fix happen.

@supercurio commented on GitHub (Jul 13, 2024): To everyone in this thread, I'd encourage you to build your own Ollama from my branch for testing purposes and report how the issue is solved. I'm curious how much of your available bandwidth is used during downloads with the new default. For me, on MacOS, making a build was easy following https://github.com/ollama/ollama/blob/main/docs/development.md I see that many PRs are awaiting review and merging, so I don't know how long it'll take. However, using Ollama is so annoying until this issue is solved, I'm determined to make this fix happen.

GiteaMirror commented

2026-04-22 03:40:19 -05:00

@treibholz commented on GitHub (Jul 14, 2024):

@supercurio Works great here. I can still download with 11MB/sec on my 100mbit line here, the machine is still responsive AND I can still watch something in HD over a streaming-service.

@treibholz commented on GitHub (Jul 14, 2024): @supercurio Works great here. I can still download with 11MB/sec on my 100mbit line here, the machine is still responsive AND I can still watch something in HD over a streaming-service.

GiteaMirror commented

2026-04-22 03:40:20 -05:00

@binarynoise commented on GitHub (Jul 18, 2024):

I can confirm as well that reducing the amount of parallel connections restores network usability (even though I patched the source by hand and not using your feature branch).
Interestingly the download speed is unaffected (maxes at 20MB/s which I think is a server limit).

@binarynoise commented on GitHub (Jul 18, 2024): I can confirm as well that reducing the amount of parallel connections restores network usability (even though I patched the source by hand and not using your feature branch). Interestingly the download speed is unaffected (maxes at 20MB/s which I think is a server limit).

GiteaMirror commented

2026-04-22 03:40:20 -05:00

@scscgit commented on GitHub (Jul 23, 2024):

I'm adding a vote, it's really disruptive when you can't even join a meeting in work due to slow internet; some users may not be able to figure out the root cause. I ran it in Docker; even after pausing container/engine, it still consumed the entire bandwidth, so I had to either wait some more or completely shut down the Docker. Note that our scenario could be using a tool like Open WebUI to download the models, so it's not enough to provide hidden CLI parameters, and we need a quick solution that we won't need to spend time googling; Ollama should properly display a warning if using it may cause such disruptions.

@scscgit commented on GitHub (Jul 23, 2024): I'm adding a vote, it's really disruptive when you can't even join a meeting in work due to slow internet; some users may not be able to figure out the root cause. I ran it in Docker; even after pausing container/engine, it still consumed the entire bandwidth, so I had to either wait some more or completely shut down the Docker. Note that our scenario could be using a tool like Open WebUI to download the models, so it's not enough to provide hidden CLI parameters, and we need a quick solution that we won't need to spend time googling; Ollama should properly display a warning if using it may cause such disruptions.

GiteaMirror commented

2026-04-22 03:40:22 -05:00

@enrico3 commented on GitHub (Jul 30, 2024):

I am using @Netzvamp 's solution. I added these parameters, maybe it helps someone:

 ollama:
    [...]
    volumes:
    #with this bind the container downloads new models into my existing model directory on my host
     - /home/USERNAME/.ollama:/root/.ollama
    environment:
    #do not prune partly downloaded models when starting the container. 
    #This way downloads do not need to be completed in one session.
     - OLLAMA_NOPRUNE=true

The docker-tc container needs to be started before the ollama container I think, because it listens to container:start events (https://github.com/lukaszlach/docker-tc#usage)

Sometimes my download stopped because of a TLS handshake timeout. So I put the command in a loop to immediately resume it after an error:

#!/bin/bash
ollama pull MODELNAME
while [ $? -ne 0 ]; do
    ollama pull MODELNAME
done

With this command on the host the speed limit can be changed while the download is running (https://github.com/lukaszlach/docker-tc#post)
curl -d'rate=20Mbit' localhost:4080/ollama

@enrico3 commented on GitHub (Jul 30, 2024): I am using @Netzvamp 's solution. I added these parameters, maybe it helps someone: ```yaml ollama: [...] volumes: #with this bind the container downloads new models into my existing model directory on my host - /home/USERNAME/.ollama:/root/.ollama environment: #do not prune partly downloaded models when starting the container. #This way downloads do not need to be completed in one session. - OLLAMA_NOPRUNE=true ``` The docker-tc container needs to be started before the ollama container I think, because it listens to container:start events (https://github.com/lukaszlach/docker-tc#usage) Sometimes my download stopped because of a TLS handshake timeout. So I put the command in a loop to immediately resume it after an error: ``` #!/bin/bash ollama pull MODELNAME while [ $? -ne 0 ]; do ollama pull MODELNAME done ``` With this command on the host the speed limit can be changed while the download is running (https://github.com/lukaszlach/docker-tc#post) `curl -d'rate=20Mbit' localhost:4080/ollama`

GiteaMirror commented

2026-04-22 03:40:23 -05:00

@Fluffkin commented on GitHub (Jul 31, 2024):

For those on Linux "Traffic Toll" https://github.com/cryzed/TrafficToll sort of works. But I gave up on ollama because even with that some segments get zero data for long enough to trigger timeouts, so even with a fully saturated bandwidth the download fails. :)

I'm puzzled by two things:
Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download.
What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?

@Fluffkin commented on GitHub (Jul 31, 2024): For those on Linux "Traffic Toll" https://github.com/cryzed/TrafficToll sort of works. But I gave up on ollama because even with that some segments get zero data for long enough to trigger timeouts, so even with a fully saturated bandwidth the download fails. :) I'm puzzled by two things: Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download. What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?

GiteaMirror commented

2026-04-22 03:40:24 -05:00

@Kisaragi-ng commented on GitHub (Aug 1, 2024):

(...)

I'm puzzled by two things: Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download. What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else?

afaik, ollama pull doesn't retrieve data from huggingface. the url that being used to download models is cloudflare R2

when a download is fail you can see it's url, for example:

→ docker exec ollama ollama pull mannix/llama3.1-8b-abliterated:q4_k_m
pulling manifest 
pulling 6a6aadebda16...  62% ▕█████████       ▏ 3.0 GB/4.9 GB  510 KB/s  52m25s
Error: max retries exceeded: Get "https://dd20bc07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/6a/6a6aadebda1698955067154814797e7d248c5a5f7ad123b39c11110d47439e9c/data?very-long-string-here": net/http: TLS handshake timeout

compared to huggingface model download:

Resolving huggingface.co (huggingface.co)... 13.33.30.49, 13.33.30.114, 13.33.30.23, ...
Connecting to huggingface.co (huggingface.co)|13.33.30.49|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.huggingface.co/repos/0a/10/0a10d4e7fae2db98335d1f5d4074cadf96dbd70a58275232c9bce0ba4568f72d/84c5818397e877afe4945b97581332d27dbfb0cec547b2575d0132719abb6dd6?very-long-string-here

@Kisaragi-ng commented on GitHub (Aug 1, 2024): > (...) > > I'm puzzled by two things: Why hasn't Huggingface politely asked to stop opening stupid amounts of connections? It's generally seen as bad internet etiquette to attempt to hog an unnecessary amount of connections for a download. What's the reasoning / use case behind the way the download threads are currently handled in ollama? Is the dev using a near internet backbone speed connection where the bandwidth used doesn't effect anything or anybody else? afaik, ollama pull doesn't retrieve data from huggingface. the url that being used to download models is cloudflare R2 when a download is fail you can see it's url, for example: ```bash → docker exec ollama ollama pull mannix/llama3.1-8b-abliterated:q4_k_m pulling manifest pulling 6a6aadebda16... 62% ▕█████████ ▏ 3.0 GB/4.9 GB 510 KB/s 52m25s Error: max retries exceeded: Get "https://dd20bc07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/6a/6a6aadebda1698955067154814797e7d248c5a5f7ad123b39c11110d47439e9c/data?very-long-string-here": net/http: TLS handshake timeout ``` compared to huggingface model download: ``` Resolving huggingface.co (huggingface.co)... 13.33.30.49, 13.33.30.114, 13.33.30.23, ... Connecting to huggingface.co (huggingface.co)|13.33.30.49|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cdn-lfs-us-1.huggingface.co/repos/0a/10/0a10d4e7fae2db98335d1f5d4074cadf96dbd70a58275232c9bce0ba4568f72d/84c5818397e877afe4945b97581332d27dbfb0cec547b2575d0132719abb6dd6?very-long-string-here ```

GiteaMirror commented

2026-04-22 03:40:25 -05:00

@joelanman commented on GitHub (Aug 6, 2024):

just to note as I don't think its clearly in this thread - the issue isn't rate limiting per se - it downloads at 10mbps for me. It's that it is setting up 64 concurrent connections to do so, as per the PR here:

https://github.com/ollama/ollama/pull/5683

@joelanman commented on GitHub (Aug 6, 2024): just to note as I don't think its clearly in this thread - the issue isn't rate limiting per se - it downloads at 10mbps for me. It's that it is setting up 64 concurrent connections to do so, as per the PR here: - https://github.com/ollama/ollama/pull/5683

GiteaMirror commented

2026-04-22 03:40:25 -05:00

@igorschlum commented on GitHub (Aug 11, 2024):

@joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB.

@igorschlum commented on GitHub (Aug 11, 2024): @joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB.

GiteaMirror commented

2026-04-22 03:40:27 -05:00

@ShayBox commented on GitHub (Aug 17, 2024):

This reliably crashes my router and causes it to restart, it's too fast.

@ShayBox commented on GitHub (Aug 17, 2024): This reliably crashes my router and causes it to restart, it's too fast.

GiteaMirror commented

2026-04-22 03:40:29 -05:00

@numbermaniac commented on GitHub (Aug 19, 2024):

I'm having to ctrl+c just to post this comment.

Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable.

It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging itself with the way it downloads files.

@numbermaniac commented on GitHub (Aug 19, 2024): > I'm having to ctrl+c just to post this comment. Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable. It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging _itself_ with the way it downloads files.

GiteaMirror commented

2026-04-22 03:40:31 -05:00

@supercurio commented on GitHub (Aug 19, 2024):

I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago.

After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch.

How to more forward from here?
Routers crashing, people unable to use their computers: that's not the expected result when using any kind of software.

@supercurio commented on GitHub (Aug 19, 2024): I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago. After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch. How to more forward from here? Routers crashing, people unable to use their computers: that's not the expected result when using any kind of software.

GiteaMirror commented

2026-04-22 03:40:33 -05:00

@igorschlum commented on GitHub (Aug 19, 2024):

Hi @supercurio (bonjour François), your patch is here:
https://github.com/ollama/ollama/pull/5683
I know that there were issues that were more important during those weeks, like CUDA fixes, Memory and function calling.
Ollama can run easily without this patch, but if it's just a matter of approving your patch, @jmorganca could decide to do it.

@igorschlum commented on GitHub (Aug 19, 2024): Hi @supercurio (bonjour François), your patch is here: https://github.com/ollama/ollama/pull/5683 I know that there were issues that were more important during those weeks, like CUDA fixes, Memory and function calling. Ollama can run easily without this patch, but if it's just a matter of approving your patch, @jmorganca could decide to do it.

GiteaMirror commented

2026-04-22 03:40:33 -05:00

@supercurio commented on GitHub (Aug 19, 2024):

Salut @igorschlum 😌
All of Ollama's the core functionalities are important, that's for sure.
Downloading model(s) is still the first action every Ollama user will take.

I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly.
Fortunately, llamafile provides a good enough alternative in that case.

@supercurio commented on GitHub (Aug 19, 2024): Salut @igorschlum 😌 All of Ollama's the core functionalities are important, that's for sure. Downloading model(s) is still the first action every Ollama user will take. I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly. Fortunately, llamafile provides a good enough alternative in that case.

GiteaMirror commented

2026-04-22 03:40:35 -05:00

@joelanman commented on GitHub (Aug 20, 2024):

Is it fixed by this?

https://github.com/ollama/ollama/pull/6347

@joelanman commented on GitHub (Aug 20, 2024): Is it fixed by this? - https://github.com/ollama/ollama/pull/6347

GiteaMirror commented

2026-04-22 03:40:36 -05:00

@Fluffkin commented on GitHub (Aug 20, 2024):

It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\(ツ)/¯

@Fluffkin commented on GitHub (Aug 20, 2024): It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\\_(ツ)_/¯

GiteaMirror commented

2026-04-22 03:40:37 -05:00

@igorschlum commented on GitHub (Aug 20, 2024):

@Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time.

@igorschlum commented on GitHub (Aug 20, 2024): @Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time.

GiteaMirror commented

2026-04-22 03:40:37 -05:00

@joelanman commented on GitHub (Aug 20, 2024):

@igorschlum no it's changed from 64 to 16, so still a lot of connections

@joelanman commented on GitHub (Aug 20, 2024): @igorschlum no it's changed from 64 to 16, so still a lot of connections

GiteaMirror commented

2026-04-22 03:40:38 -05:00

@robins commented on GitHub (Aug 20, 2024):

My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users.

I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes.

So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of 4 threads sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads!

@robins commented on GitHub (Aug 20, 2024): My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users. I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes. So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of **4 threads** sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads!

GiteaMirror commented

2026-04-22 03:40:39 -05:00

@mrtysn commented on GitHub (Sep 2, 2024):

Reporting from Türkiye, I am unable to run ollama pull during the day due to it causing nearly all other connections on my shared Wi-Fi network to almost come to a halt. A download speed rate limit would be greatly appreciated.

@mrtysn commented on GitHub (Sep 2, 2024): Reporting from Türkiye, I am unable to run `ollama pull` during the day due to it causing nearly all other connections on my shared Wi-Fi network to almost come to a halt. A download speed rate limit would be greatly appreciated.

GiteaMirror commented

2026-04-22 03:40:39 -05:00

@igorschlum commented on GitHub (Sep 2, 2024):

@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads.
What version of Ollama do you use? On which OS?

@igorschlum commented on GitHub (Sep 2, 2024): @mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS?

GiteaMirror commented

2026-04-22 03:40:40 -05:00

@mrtysn commented on GitHub (Sep 2, 2024):

@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS?

@igorschlum Apologies for the lack of details.

I am on a MacBook Pro M2 Max with Sonoma 14.6.1.
My ollama is installed from homebrew, and it is currently on version 0.3.9.

However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 ~2 days ago.

@mrtysn commented on GitHub (Sep 2, 2024): > @mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS? @igorschlum Apologies for the lack of details. - I am on a MacBook Pro M2 Max with Sonoma 14.6.1. - My ollama is installed from homebrew, and it is currently on version 0.3.9. However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 [~2 days ago](https://github.com/Homebrew/homebrew-core/commits/87cbc9cf2ccc2b83ae27821cb7203e3d416d71e1/Formula/o/ollama.rb). <img width="438" alt="image" src="https://github.com/user-attachments/assets/56a334b4-210d-4f9a-99a9-c4db1b4aceac">

GiteaMirror commented

2026-04-22 03:40:41 -05:00

@igorschlum commented on GitHub (Sep 2, 2024):

@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

@igorschlum commented on GitHub (Sep 2, 2024): @mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

GiteaMirror commented

2026-04-22 03:40:41 -05:00

@joelanman commented on GitHub (Sep 2, 2024):

@igorschlum it's 16, which is still very high

@joelanman commented on GitHub (Sep 2, 2024): @igorschlum it's 16, which is still very high

GiteaMirror commented

2026-04-22 03:40:42 -05:00

@igorschlum commented on GitHub (Sep 2, 2024):

@joelanman Sorry, I anticipated that it could be 4 :-)

@igorschlum commented on GitHub (Sep 2, 2024): @joelanman Sorry, I anticipated that it could be 4 :-)

GiteaMirror commented

2026-04-22 03:40:43 -05:00

@mrtysn commented on GitHub (Sep 3, 2024):

@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize.

Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds.

@mrtysn commented on GitHub (Sep 3, 2024): > @mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing. I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize. Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds.

GiteaMirror commented

2026-04-22 03:40:44 -05:00

@donuts-are-good commented on GitHub (Sep 5, 2024):

Hello all,

I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority.

With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day.

@donuts-are-good commented on GitHub (Sep 5, 2024): Hello all, I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority. With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day.

GiteaMirror commented

2026-04-22 03:40:45 -05:00

@ShayBox commented on GitHub (Sep 5, 2024):

The issue was fixed 2 weeks ago in 0.3.7...

@ShayBox commented on GitHub (Sep 5, 2024): The issue was fixed 2 weeks ago in 0.3.7...

GiteaMirror commented

2026-04-22 03:40:46 -05:00

@mdlmarkham commented on GitHub (Sep 7, 2024):

I'm still having the issue.

@mdlmarkham commented on GitHub (Sep 7, 2024): I'm still having the issue.

GiteaMirror commented

2026-04-22 03:40:47 -05:00

@devrandom commented on GitHub (Oct 6, 2024):

it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1.

@devrandom commented on GitHub (Oct 6, 2024): it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1.

GiteaMirror commented

2026-04-22 03:40:47 -05:00

@augusto-rehfeldt commented on GitHub (Dec 18, 2024):

I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish.

Any idea about how to rate limit this on Windows?

@augusto-rehfeldt commented on GitHub (Dec 18, 2024): I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish. Any idea about how to rate limit this on Windows?

GiteaMirror commented

2026-04-22 03:40:48 -05:00

@jimbothegrey commented on GitHub (Jan 7, 2025):

used wondershare to slow down the connection. looks like it is working....

@jimbothegrey commented on GitHub (Jan 7, 2025): used wondershare to slow down the connection. looks like it is working....

GiteaMirror commented

2026-04-22 03:40:49 -05:00

@mrtysn commented on GitHub (Jan 10, 2025):

used wondershare to slow down the connection. looks like it is working....

@jimbothegrey what might wondershare be? all I'm finding is a file converter

@mrtysn commented on GitHub (Jan 10, 2025): > used wondershare to slow down the connection. looks like it is working.... @jimbothegrey what might wondershare be? all I'm finding is a file converter

GiteaMirror commented

2026-04-22 03:40:49 -05:00

@TiddlyWiddly commented on GitHub (Jan 12, 2025):

Still experiencing this on pretty quick connection, knocks all my devices off

@TiddlyWiddly commented on GitHub (Jan 12, 2025): Still experiencing this on pretty quick connection, knocks all my devices off

GiteaMirror commented

2026-04-22 03:40:51 -05:00

@ankh2054 commented on GitHub (Jan 14, 2025):

yeah same here, It's hard to work on anything else when models download and very large models will take multiple days to download at my 3.5MB/s

@ankh2054 commented on GitHub (Jan 14, 2025): yeah same here, It's hard to work on anything else when models download and very large models will take multiple days to download at my 3.5MB/s

GiteaMirror commented

2026-04-22 03:40:52 -05:00

@xtareq commented on GitHub (Jan 17, 2025):

Same issue here. But in my case it was good in my windows 11 for llama3.1 but the issue arise when i try to pull phi4. Any specific reason why this happens?

@xtareq commented on GitHub (Jan 17, 2025): Same issue here. But in my case it was good in my windows 11 for llama3.1 but the issue arise when i try to pull phi4. Any specific reason why this happens?

GiteaMirror commented

2026-04-22 03:40:52 -05:00

@ading2210 commented on GitHub (Jan 21, 2025):

As a rudimentary workaround, I wrote a bash script that monitors ollama's network usage and constantly suspends/resumes the ollama process.

https://gist.github.com/ading2210/882565526f7e1f2b9b14a022ac3741ac

Make sure you have nethogs installed (sudo apt install nethogs).

For example, if you want to limit downloads to 5000 KB/s:

$ sudo ./ollama-limiter.sh 5000

@ading2210 commented on GitHub (Jan 21, 2025): As a rudimentary workaround, I wrote a bash script that monitors ollama's network usage and constantly suspends/resumes the ollama process. https://gist.github.com/ading2210/882565526f7e1f2b9b14a022ac3741ac Make sure you have `nethogs` installed (`sudo apt install nethogs`). For example, if you want to limit downloads to 5000 KB/s: ``` $ sudo ./ollama-limiter.sh 5000 ```

GiteaMirror commented

2026-04-22 03:40:53 -05:00

@digitalextremist commented on GitHub (Feb 15, 2025):

Seems this is a well known problem... Short of handling this at the router-level, the @ading2210 solution seems to help a lot! Thanks very much for sharing that.

Still looking forward to a long term solution that does not chop the process, though that approach is a great idea:

@digitalextremist commented on GitHub (Feb 15, 2025): Seems this is a well known problem... Short of handling this at the router-level, the @ading2210 solution seems to help a lot! Thanks very much for sharing that. Still looking forward to a long term solution that does not chop the process, though that approach is a great idea: ![Image](https://github.com/user-attachments/assets/e03d1496-2830-41d5-80ae-ac59b407676e)

GiteaMirror commented

2026-04-22 03:40:53 -05:00

@donuts-are-good commented on GitHub (Feb 18, 2025):

The issue was fixed 2 weeks ago in 0.3.7...

It appears that is not the case...

@donuts-are-good commented on GitHub (Feb 18, 2025): > The issue was fixed 2 weeks ago in 0.3.7... It appears that is not the case...

GiteaMirror commented

2026-04-22 03:40:54 -05:00

@martinoturrina commented on GitHub (Mar 26, 2025):

bump

@martinoturrina commented on GitHub (Mar 26, 2025): bump

GiteaMirror commented

2026-04-22 03:40:55 -05:00

@martinoturrina commented on GitHub (Mar 26, 2025):

bump

@martinoturrina commented on GitHub (Mar 26, 2025): bump

GiteaMirror commented

2026-04-22 03:40:56 -05:00

@digitalextremist commented on GitHub (Mar 26, 2025):

My permanent solution for this is lukaszlach/docker-tc ... I no longer see this as an Ollama issue to resolve.

Example docker-compose.yml that works

services:
  traffic_control:
    image: lukaszlach/docker-tc
    container_name:traffic_control
    network_mode: host
    cap_add:
      - NET_ADMIN
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/docker-tc:/var/docker-tc

And then labels added to Ollama container:

    labels:
      com.docker-tc.enabled: "1"
      com.docker-tc.limit: "2mbps"

@digitalextremist commented on GitHub (Mar 26, 2025): My permanent solution for this is `lukaszlach/docker-tc` ... I no longer see this as an `Ollama` issue to resolve. Example `docker-compose.yml` that works ``` services: traffic_control: image: lukaszlach/docker-tc container_name:traffic_control network_mode: host cap_add: - NET_ADMIN restart: always volumes: - /var/run/docker.sock:/var/run/docker.sock - /var/docker-tc:/var/docker-tc ``` And then labels added to `Ollama` container: ``` labels: com.docker-tc.enabled: "1" com.docker-tc.limit: "2mbps" ```

GiteaMirror commented

2026-04-22 03:40:57 -05:00

@joelanman commented on GitHub (Mar 27, 2025):

that wouldnt fix it for people not using docker?

@joelanman commented on GitHub (Mar 27, 2025): that wouldnt fix it for people not using docker?

GiteaMirror commented

2026-04-22 03:40:58 -05:00

@digitalextremist commented on GitHub (Mar 28, 2025):

that wouldnt fix it for people not using docker?

No; but if you do not use docker but still use Linux ... the nethogs solution still works: https://github.com/ollama/ollama/issues/2006#issuecomment-2603584766

@digitalextremist commented on GitHub (Mar 28, 2025): > that wouldnt fix it for people not using docker? No; but if you do not use `docker` but still use `Linux` ... the `nethogs` solution still works: https://github.com/ollama/ollama/issues/2006#issuecomment-2603584766

GiteaMirror commented

2026-04-22 03:40:59 -05:00

@StdLogicTrig commented on GitHub (May 7, 2025):

Bump

@StdLogicTrig commented on GitHub (May 7, 2025): Bump

GiteaMirror commented

2026-04-22 03:41:00 -05:00

@whjvenyl commented on GitHub (Jun 20, 2025):

Mac version of a rate limiter based on the previous script from @ading2210

#!/bin/bash
#
# ollama-limiter.sh (v6 - Rate Calculation Method)
#
# This version calculates bandwidth as a rate (KB/s) between two snapshots,
# which is the most accurate method. It correctly parses the comma-delimited
# output from `nettop -L 1` and uses `tput` to prevent display issues.
#

# --- Configuration ---
MAX_BANDWIDTH_KBPS=8000
CHECK_INTERVAL=2 # Seconds between measurements
SUSPEND_TIME=4
RESUME_TIME=1

# --- Colors ---
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

# --- Functions ---

# This function runs nettop and extracts the PID and total cumulative bytes
# for the ollama process with the most network activity.
get_ollama_stats() {
    # -L 1: Run for one snapshot.
    # -J bytes_in,bytes_out: Output only these columns for easier parsing.
    nettop -L 1 -J bytes_in,bytes_out | grep 'ollama\.' | awk -F, '
    {
        # $1 is process.pid, $2 is bytes_in, $3 is bytes_out
        split($1, p, ".");
        pid = p[2];
        gsub(/[^0-9]/, "", pid); # Sanitize PID to ensure it is only numeric

        total_bytes = $2 + $3;

        # Print total bytes and pid, so we can sort to find the one with max traffic
        printf "%d %s\n", total_bytes, pid;
    }' | sort -rn | head -n 1
}

# --- Main ---

echo -e "${GREEN}Ollama Bandwidth Limiter is running.${NC}"
echo -e "Limit set to: ${YELLOW}${MAX_BANDWIDTH_KBPS} KB/s${NC}"
echo "Press Ctrl+C to stop."
echo ""

# Initialize stats for rate calculation
last_bytes=0
last_pid=""
last_time=$(date +%s.%N)

while true; do
    # Use tput to prevent terminal display issues.
    # `cr` = carriage return, `el` = erase line.
    tput cr
    tput el

    # Get current stats
    current_stats=$(get_ollama_stats)
    current_time=$(date +%s.%N)

    if [ -z "$current_stats" ]; then
        echo -n -e "${BLUE}[$(date '+%H:%M:%S')] Waiting for an active 'ollama' process...${NC}"
        sleep "$CHECK_INTERVAL"
        continue
    fi

    current_bytes=$(echo "$current_stats" | awk '{print $1}')
    current_pid=$(echo "$current_stats" | awk '{print $2}')

    # If the PID changed (e.g., new pull command) or this is the first run,
    # reset the baseline to start a new measurement.
    if [ "$current_pid" != "$last_pid" ]; then
        echo -n -e "${YELLOW}[$(date '+%H:%M:%S')] New active PID: ${current_pid}. Initializing...${NC}"
        last_pid=$current_pid
        last_bytes=$current_bytes
        last_time=$current_time
        sleep "$CHECK_INTERVAL"
        continue
    fi

    # --- Calculate Bandwidth Rate ---
    time_diff=$(echo "$current_time - $last_time" | bc)
    byte_diff=$(echo "$current_bytes - $last_bytes" | bc)

    # Avoid division by zero if the time difference is too small
    if (( $(echo "$time_diff <= 0.001" | bc -l) )); then
        kbps=0
    else
        # Rate in Bytes/sec, then convert to KB/s
        bytes_per_second=$(echo "$byte_diff / $time_diff" | bc)
        kbps=$(echo "$bytes_per_second / 1024" | bc)
    fi

    # Ensure kbps is a non-decimal integer
    kbps=${kbps%.*}
    if [ -z "$kbps" ]; then kbps=0; fi

    echo -n -e "${GREEN}[$(date '+%H:%M:%S')] Active PID: ${current_pid}, Bandwidth: ${kbps} KB/s${NC}"

    # --- Throttle if needed ---
    if (( kbps > MAX_BANDWIDTH_KBPS )); then
        echo "" # Newline for the suspend message
        echo -e "${RED}Limit exceeded! Suspending process ${current_pid} for ${SUSPEND_TIME}s...${NC}"
        kill -STOP "$current_pid"
        sleep "$SUSPEND_TIME"

        echo -e "${GREEN}Resuming process ${current_pid}...${NC}"
        kill -CONT "$current_pid"

        # Reset baseline after resuming to avoid a huge artificial spike on the next check
        last_pid=""
        sleep "$RESUME_TIME"
    else
        # Update baseline for the next iteration
        last_pid=$current_pid
        last_bytes=$current_bytes
        last_time=$current_time
        sleep "$CHECK_INTERVAL"
    fi
done

@whjvenyl commented on GitHub (Jun 20, 2025): Mac version of a rate limiter based on the previous script from @ading2210 ``` #!/bin/bash # # ollama-limiter.sh (v6 - Rate Calculation Method) # # This version calculates bandwidth as a rate (KB/s) between two snapshots, # which is the most accurate method. It correctly parses the comma-delimited # output from `nettop -L 1` and uses `tput` to prevent display issues. # # --- Configuration --- MAX_BANDWIDTH_KBPS=8000 CHECK_INTERVAL=2 # Seconds between measurements SUSPEND_TIME=4 RESUME_TIME=1 # --- Colors --- RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # --- Functions --- # This function runs nettop and extracts the PID and total cumulative bytes # for the ollama process with the most network activity. get_ollama_stats() { # -L 1: Run for one snapshot. # -J bytes_in,bytes_out: Output only these columns for easier parsing. nettop -L 1 -J bytes_in,bytes_out | grep 'ollama\.' | awk -F, ' { # $1 is process.pid, $2 is bytes_in, $3 is bytes_out split($1, p, "."); pid = p[2]; gsub(/[^0-9]/, "", pid); # Sanitize PID to ensure it is only numeric total_bytes = $2 + $3; # Print total bytes and pid, so we can sort to find the one with max traffic printf "%d %s\n", total_bytes, pid; }' | sort -rn | head -n 1 } # --- Main --- echo -e "${GREEN}Ollama Bandwidth Limiter is running.${NC}" echo -e "Limit set to: ${YELLOW}${MAX_BANDWIDTH_KBPS} KB/s${NC}" echo "Press Ctrl+C to stop." echo "" # Initialize stats for rate calculation last_bytes=0 last_pid="" last_time=$(date +%s.%N) while true; do # Use tput to prevent terminal display issues. # `cr` = carriage return, `el` = erase line. tput cr tput el # Get current stats current_stats=$(get_ollama_stats) current_time=$(date +%s.%N) if [ -z "$current_stats" ]; then echo -n -e "${BLUE}[$(date '+%H:%M:%S')] Waiting for an active 'ollama' process...${NC}" sleep "$CHECK_INTERVAL" continue fi current_bytes=$(echo "$current_stats" | awk '{print $1}') current_pid=$(echo "$current_stats" | awk '{print $2}') # If the PID changed (e.g., new pull command) or this is the first run, # reset the baseline to start a new measurement. if [ "$current_pid" != "$last_pid" ]; then echo -n -e "${YELLOW}[$(date '+%H:%M:%S')] New active PID: ${current_pid}. Initializing...${NC}" last_pid=$current_pid last_bytes=$current_bytes last_time=$current_time sleep "$CHECK_INTERVAL" continue fi # --- Calculate Bandwidth Rate --- time_diff=$(echo "$current_time - $last_time" | bc) byte_diff=$(echo "$current_bytes - $last_bytes" | bc) # Avoid division by zero if the time difference is too small if (( $(echo "$time_diff <= 0.001" | bc -l) )); then kbps=0 else # Rate in Bytes/sec, then convert to KB/s bytes_per_second=$(echo "$byte_diff / $time_diff" | bc) kbps=$(echo "$bytes_per_second / 1024" | bc) fi # Ensure kbps is a non-decimal integer kbps=${kbps%.*} if [ -z "$kbps" ]; then kbps=0; fi echo -n -e "${GREEN}[$(date '+%H:%M:%S')] Active PID: ${current_pid}, Bandwidth: ${kbps} KB/s${NC}" # --- Throttle if needed --- if (( kbps > MAX_BANDWIDTH_KBPS )); then echo "" # Newline for the suspend message echo -e "${RED}Limit exceeded! Suspending process ${current_pid} for ${SUSPEND_TIME}s...${NC}" kill -STOP "$current_pid" sleep "$SUSPEND_TIME" echo -e "${GREEN}Resuming process ${current_pid}...${NC}" kill -CONT "$current_pid" # Reset baseline after resuming to avoid a huge artificial spike on the next check last_pid="" sleep "$RESUME_TIME" else # Update baseline for the next iteration last_pid=$current_pid last_bytes=$current_bytes last_time=$current_time sleep "$CHECK_INTERVAL" fi done ```

GiteaMirror commented

2026-04-22 03:41:02 -05:00

@codeisnotcode commented on GitHub (Jun 27, 2025):

Rate-limiting is a really important feature! With the source code it should be simple to implement and it would provide high value. It is super annoying to have Ollama kill other network activity every time I download a new model.

@codeisnotcode commented on GitHub (Jun 27, 2025): Rate-limiting is a really important feature! With the source code it should be simple to implement and it would provide high value. It is super annoying to have Ollama kill other network activity every time I download a new model.

GiteaMirror commented

2026-04-22 03:41:03 -05:00

@sotander commented on GitHub (Jul 22, 2025):

I have this issue also. I pull at ca. 200MB/s. The admins are sending me emails to not slow down the network.

@sotander commented on GitHub (Jul 22, 2025): I have this issue also. I pull at ca. 200MB/s. The admins are sending me emails to not slow down the network.

GiteaMirror commented

2026-04-22 03:41:04 -05:00

@Mondonno commented on GitHub (Aug 29, 2025):

I have this issue also, it hurts especially on slower networks in my case, seems related also to #3741

@Mondonno commented on GitHub (Aug 29, 2025): I have this issue also, it hurts especially on slower networks in my case, seems related also to [#3741](https://github.com/ollama/ollama/issues/3741)

GiteaMirror commented

2026-04-22 03:41:04 -05:00

@anton-karlovskiy commented on GitHub (Sep 16, 2025):

@jmorganca

👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues:

An existing connection was forcibly closed by the remote host
Error: max retries exceeded

What I’ve tried so far:

https://github.com/ollama/ollama/issues/3769#issuecomment-2076767384: closed VPN → no change
https://github.com/ollama/ollama/issues/6211#issuecomment-2626567037: changed DNS → made some progress (got to ~37%) but stuck again

From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious.

To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly:

Windows
Open new notepad and put this code.

@echo off
:loop
echo Running ollama run llama3.2...
ollama run llama3.2

if %errorlevel% neq 0 (
    echo The command failed. Retrying... Press Ctrl+C to exit.
    goto loop
)
echo The command completed successfully!

Next go to save as… and pick All files as file type and name the file with .bat extension.
And finally you can run it.
To edit the code above, just change model name from llama3.2 to any other model you like.

Linux
I didn’t test the code for Linux but this should hopefully work.

#!/bin/bash

# Function to handle Ctrl+C
trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT
while true; do
    echo "Running ollama run llama3.2..."
    ollama run llama3.2
    if [ $? -eq 0 ]; then
        echo "The command completed successfully!"
        break
    else
        echo "The command failed. Retrying... Press Ctrl+C to exit."
    fi
done

Save this as run_loop.sh then run these commands.

chmod +x run_loop.sh

./run_loop.sh

Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e

Good luck 🤞

@anton-karlovskiy commented on GitHub (Sep 16, 2025): @jmorganca 👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues: - An existing connection was forcibly closed by the remote host - Error: max retries exceeded What I’ve tried so far: - https://github.com/ollama/ollama/issues/3769#issuecomment-2076767384: closed VPN → no change - https://github.com/ollama/ollama/issues/6211#issuecomment-2626567037: changed DNS → made some progress (got to ~37%) but stuck again <img width="1086" height="124" alt="Image" src="https://github.com/user-attachments/assets/774acce1-b0af-4b66-b2ae-415baf6f0fe0" /> From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious. To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly: **Windows** Open new notepad and put this code. ```bash @echo off :loop echo Running ollama run llama3.2... ollama run llama3.2 if %errorlevel% neq 0 ( echo The command failed. Retrying... Press Ctrl+C to exit. goto loop ) echo The command completed successfully! ``` Next go to save as… and pick All files as file type and name the file with `.bat` extension. And finally you can run it. To edit the code above, just change model name from `llama3.2` to any other model you like. **Linux** I didn’t test the code for Linux but this should hopefully work. #!/bin/bash ```bash # Function to handle Ctrl+C trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT while true; do echo "Running ollama run llama3.2..." ollama run llama3.2 if [ $? -eq 0 ]; then echo "The command completed successfully!" break else echo "The command failed. Retrying... Press Ctrl+C to exit." fi done ``` Save this as `run_loop.sh` then run these commands. ```bash chmod +x run_loop.sh ``` ```bash ./run_loop.sh ``` Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e Good luck 🤞

GiteaMirror commented

2026-04-22 03:41:05 -05:00

@ading2210 commented on GitHub (Sep 16, 2025):

@anton-karlovskiy On Linux there is a retry command already, so there's no need for a shell script.

You would just use it like:

retry ollama run llama3

@ading2210 commented on GitHub (Sep 16, 2025): @anton-karlovskiy On Linux there is a [`retry` command](https://manpages.debian.org/trixie/retry/retry.1.en.html) already, so there's no need for a shell script. You would just use it like: ``` retry ollama run llama3 ```

GiteaMirror commented

2026-04-22 03:41:06 -05:00

@anton-karlovskiy commented on GitHub (Sep 16, 2025):

@anton-karlovskiy On Linux there is a retry command already, so there's no need for a shell script.

You would just use it like:
retry ollama run llama3

Thank you for your tip. @ading2210
Let's keep in touch. :)

@anton-karlovskiy commented on GitHub (Sep 16, 2025): > [@anton-karlovskiy](https://github.com/anton-karlovskiy) On Linux there is a [`retry` command](https://manpages.debian.org/trixie/retry/retry.1.en.html) already, so there's no need for a shell script. > > You would just use it like: > > ``` > retry ollama run llama3 > ``` Thank you for your tip. @ading2210 Let's keep in touch. :)

GiteaMirror commented

2026-04-22 03:41:07 -05:00

@digitalextremist commented on GitHub (Sep 16, 2025):

Agreed with @anton-karlovskiy, that was a real-life Pro Tip ™️ @ading2210 ... extra rtfm points there.

And nice work on the meticulousness, and DIY solutions @anton-karlovskiy. Do you ever use containers?

@digitalextremist commented on GitHub (Sep 16, 2025): Agreed with @anton-karlovskiy, that was a real-life `Pro Tip` ™️ @ading2210 ... extra `rtfm` points there. And nice work on the meticulousness, and DIY solutions @anton-karlovskiy. Do you ever use containers?

GiteaMirror commented

2026-04-22 03:41:08 -05:00

@PeaStew commented on GitHub (Oct 15, 2025):

coming up to 2 years now and no proper solution, just hacks

@PeaStew commented on GitHub (Oct 15, 2025): coming up to 2 years now and no proper solution, just hacks

GiteaMirror commented

2026-04-22 03:41:08 -05:00

@apassi commented on GitHub (Mar 10, 2026):

I am using the trickle to limit the bandwith:
trickle -s -d 50mb ollama pull xxxxx

@apassi commented on GitHub (Mar 10, 2026): I am using the trickle to limit the bandwith: trickle -s -d 50mb ollama pull xxxxx

GiteaMirror commented

2026-04-22 03:41:09 -05:00

@floriandotorg commented on GitHub (Mar 20, 2026):

Super important feature, not sure, why it's not here.

@floriandotorg commented on GitHub (Mar 20, 2026): Super important feature, not sure, why it's not here.

GiteaMirror commented

2026-04-22 03:41:10 -05:00

@dhirajlochib commented on GitHub (Apr 2, 2026):

I've opened a PR to address this: #15219

The implementation adds a new OLLAMA_MAX_DOWNLOAD_SPEED environment variable that lets you cap download bandwidth when pulling models. It uses a shared token-bucket rate limiter (golang.org/x/time/rate) across all 16 concurrent download chunks, so the aggregate bandwidth stays within the specified limit.

Usage examples:

OLLAMA_MAX_DOWNLOAD_SPEED=10m — limit to 10 MB/s
OLLAMA_MAX_DOWNLOAD_SPEED=500k — limit to 500 KB/s
OLLAMA_MAX_DOWNLOAD_SPEED=1g — limit to 1 GB/s

When unset, downloads run at full speed with zero overhead.

@dhirajlochib commented on GitHub (Apr 2, 2026): I've opened a PR to address this: #15219 The implementation adds a new `OLLAMA_MAX_DOWNLOAD_SPEED` environment variable that lets you cap download bandwidth when pulling models. It uses a shared token-bucket rate limiter (`golang.org/x/time/rate`) across all 16 concurrent download chunks, so the aggregate bandwidth stays within the specified limit. Usage examples: - `OLLAMA_MAX_DOWNLOAD_SPEED=10m` — limit to 10 MB/s - `OLLAMA_MAX_DOWNLOAD_SPEED=500k` — limit to 500 KB/s - `OLLAMA_MAX_DOWNLOAD_SPEED=1g` — limit to 1 GB/s When unset, downloads run at full speed with zero overhead.

GiteaMirror commented

2026-04-22 03:41:11 -05:00

@donuts-are-good commented on GitHub (Apr 13, 2026):

It's crazy that this issue is still open, still being replied to, and still no solution.

@donuts-are-good commented on GitHub (Apr 13, 2026): It's crazy that this issue is still open, still being replied to, and still no solution.

GiteaMirror commented

2026-04-22 03:41:11 -05:00

@phamngocduy98 commented on GitHub (Apr 15, 2026):

yes, I have the same issue when uploading also. My internet will fail with 16 concurrent connections.

@phamngocduy98 commented on GitHub (Apr 15, 2026): yes, I have the same issue when uploading also. My internet will fail with 16 concurrent connections.

GiteaMirror commented

2026-04-22 03:41:12 -05:00

@gregs007 commented on GitHub (Apr 16, 2026):

@dhirajlochib Thanks for developing a solution for this problem for us. It's been a nagging issue for quite some time. Really appreciate the contribution. Looking forward to it!

@gregs007 commented on GitHub (Apr 16, 2026): @dhirajlochib Thanks for developing a solution for this problem for us. It's been a nagging issue for quite some time. Really appreciate the contribution. Looking forward to it!

GiteaMirror commented

2026-04-22 03:41:13 -05:00

@Legendary-Lava commented on GitHub (Apr 19, 2026):

A hack I did was isolating by source/server on an IFB (Intermediate function block) that is set below my max bandwidth.

Modifying anything on ingress is always a hack, but its a little more universal than addressing ollama spamming 16 connections specifically.
It is more prone to multiple different "server" connections like when torrenting
make to change eth2 to the correct interface & set bandwidth somewhere below your actual speed

ip link add name ifb4eth2 type ifb
tc qdisc del dev eth2 ingress
tc qdisc add dev eth2 handle ffff: ingress
tc qdisc del dev ifb4eth2 root
tc qdisc add dev ifb4eth2 root cake bandwidth 200Mbit dual-srchost besteffort egress
ip link set ifb4eth2 up # if you don't bring the device up your connection will lock up on the next step.
tc filter add dev eth2 parent ffff: matchall action mirred egress redirect dev ifb4eth2

To test just go to fast.com set it to 30 streams, extend test duration & try to do anything else thoughout the download.

@Legendary-Lava commented on GitHub (Apr 19, 2026): A hack I did was isolating by source/server on an IFB (Intermediate function block) that is set below my max bandwidth. Modifying anything on ingress is always a hack, but its a little more universal than addressing ollama spamming 16 connections specifically. It is more prone to multiple different "server" connections like when torrenting make to change eth2 to the correct interface & set bandwidth somewhere below your actual speed ``` ip link add name ifb4eth2 type ifb tc qdisc del dev eth2 ingress tc qdisc add dev eth2 handle ffff: ingress tc qdisc del dev ifb4eth2 root tc qdisc add dev ifb4eth2 root cake bandwidth 200Mbit dual-srchost besteffort egress ip link set ifb4eth2 up # if you don't bring the device up your connection will lock up on the next step. tc filter add dev eth2 parent ffff: matchall action mirred egress redirect dev ifb4eth2 ``` To test just go to fast.com set it to 30 streams, extend test duration & try to do anything else thoughout the download.

GiteaMirror commented

2026-04-22 03:41:14 -05:00

@gregs007 commented on GitHub (Apr 20, 2026):

Maybe this will help someone until this change gets into main

I use a tool called tc to limit the ollama container after it starts. I'm no developer so I'm sure someone could make this more universal by automagically parsing out the right interface, burstsize, etc. but this limits my ollama network speed to 500m (I have 1g internet).

INTERFACE_ID=$(docker exec ollama cat /sys/class/net/eth0/iflink)
VETH=$(ip link show | grep "^${INTERFACE_ID}:" | sed -n 's/.*\(veth[a-z0-9]*\).*/\1/p')

tc qdisc add dev $VETH root tbf rate 500mbit burst 1mb latency 50ms

@gregs007 commented on GitHub (Apr 20, 2026): Maybe this will help someone until this change gets into main I use a tool called tc to limit the ollama container after it starts. I'm no developer so I'm sure someone could make this more universal by automagically parsing out the right interface, burstsize, etc. but this limits my ollama network speed to 500m (I have 1g internet). ``` INTERFACE_ID=$(docker exec ollama cat /sys/class/net/eth0/iflink) VETH=$(ip link show | grep "^${INTERFACE_ID}:" | sed -n 's/.*$veth[a-z0-9]*$.*/\1/p') tc qdisc add dev $VETH root tbf rate 500mbit burst 1mb latency 50ms ```

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#26920