[GH-ISSUE #5852] Ollama download model with cause my Hard Drive to always 100% Usage #29411

Closed
opened 2026-04-22 08:16:00 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @rentianxiang on GitHub (Jul 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5852

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When I want to download new models, for example I run: ollama run gemma2:27b

The model download will stuck, and according to my task manager, my C: drive SSD is always 100% usage.

I cannot kill ollama's process and only option for me is to force restart my PC

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.2.7

Originally created by @rentianxiang on GitHub (Jul 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5852 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When I want to download new models, for example I run: ollama run gemma2:27b The model download will stuck, and according to my task manager, my C: drive SSD is always 100% usage. I cannot kill ollama's process and only option for me is to force restart my PC ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.2.7
GiteaMirror added the bugwindows labels 2026-04-22 08:16:00 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 22, 2024):

Delete stuff to make room for the model.

<!-- gh-comment-id:2243061810 --> @rick-github commented on GitHub (Jul 22, 2024): Delete stuff to make room for the model.
Author
Owner

@rentianxiang commented on GitHub (Jul 22, 2024):

Delete stuff to make room for the model.

Hi Rick, I have 800G in C:/ and 3TB in D:/, no room should not be the issue here

<!-- gh-comment-id:2243067532 --> @rentianxiang commented on GitHub (Jul 22, 2024): > Delete stuff to make room for the model. Hi Rick, I have 800G in C:/ and 3TB in D:/, no room should not be the issue here
Author
Owner

@rick-github commented on GitHub (Jul 22, 2024):

What does 100% mean?

<!-- gh-comment-id:2243071020 --> @rick-github commented on GitHub (Jul 22, 2024): What does 100% mean?
Author
Owner

@rentianxiang commented on GitHub (Jul 22, 2024):

What does 100% mean?

Thank you for looking into it!
This is what happened when I start to run: ollama run gemma2:27b
The writing speed is crazy high, and I believe my network is not that good.
Even I stop the downloading by pressing ctrl+c, it still remains 100%.
image

<!-- gh-comment-id:2243082149 --> @rentianxiang commented on GitHub (Jul 22, 2024): > What does 100% mean? Thank you for looking into it! This is what happened when I start to run: ollama run gemma2:27b The writing speed is crazy high, and I believe my network is not that good. Even I stop the downloading by pressing ctrl+c, it still remains 100%. ![image](https://github.com/user-attachments/assets/457e56d4-e16a-435c-a552-886a2ce122e1)
Author
Owner

@rick-github commented on GitHub (Jul 22, 2024):

Add server logs, it may make it easier to debug.

Also look in the server logs for OLLAMA_MODELS. Then do a dir /s of that directory and report what you see.

<!-- gh-comment-id:2243140977 --> @rick-github commented on GitHub (Jul 22, 2024): Add server logs, it may make it easier to debug. Also look in the server logs for `OLLAMA_MODELS`. Then do a `dir /s` of that directory and report what you see.
Author
Owner

@rentianxiang commented on GitHub (Jul 22, 2024):

Server logs for my last failed run:
2024/07/22 22:24:55 routes.go:1096: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:C:\Users\rtx\.ollama\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR:C:\Users\rtx\AppData\Local\Programs\Ollama\ollama_runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-22T22:24:55.726+08:00 level=INFO source=images.go:778 msg="total blobs: 5"
time=2024-07-22T22:24:55.732+08:00 level=INFO source=images.go:785 msg="total unused blobs removed: 0"
time=2024-07-22T22:24:55.734+08:00 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.2.7)"
time=2024-07-22T22:24:55.736+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v6.1 cpu]"
time=2024-07-22T22:24:55.737+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-22T22:24:55.877+08:00 level=INFO source=gpu.go:287 msg="detected OS VRAM overhead" id=GPU-e3ce22d3-ac09-e72f-5795-3c3f0a60b4d2 library=cuda compute=8.9 driver=12.5 name="NVIDIA GeForce RTX 4080 SUPER" overhead="410.6 MiB"
time=2024-07-22T22:24:55.878+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-e3ce22d3-ac09-e72f-5795-3c3f0a60b4d2 library=cuda compute=8.9 driver=12.5 name="NVIDIA GeForce RTX 4080 SUPER" total="16.0 GiB" available="14.7 GiB"

==========

dir -s on OLLAMA_MODELS
dir -s

目录: C:\Users\rtx\.ollama\models

Mode LastWriteTime Length Name


d----- 2024/7/22 22:24 blobs
d----- 2024/7/22 20:51 manifests

目录: C:\Users\rtx\.ollama\models\blobs

Mode LastWriteTime Length Name


-a---- 2024/7/22 20:51 485 sha256-3f8eb4da87fa7a3c9da615036b0dc418d31fef2a30b115ff33562588b32c69
1d
-a---- 2024/7/22 20:51 12403 sha256-4fa551d4f938f68b8c1e6afa9d28befb70e3f33f75d0753248d530364aeea4
0f
-a---- 2024/7/22 20:51 110 sha256-577073ffcc6ce95b9981eacc77d1039568639e5638e83044994560d9ef82ce
1b
-a---- 2024/7/22 20:51 4661211424 sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2
aa
-a---- 2024/7/22 20:51 254 sha256-8ab4849b038cf0abc5b1c9b8ee1443dca6b93a045c2272180d985126eb40bf
6f

目录: C:\Users\rtx\.ollama\models\manifests

Mode LastWriteTime Length Name


d----- 2024/7/22 20:51 registry.ollama.ai

目录: C:\Users\rtx\.ollama\models\manifests\registry.ollama.ai

Mode LastWriteTime Length Name


d----- 2024/7/22 20:51 library

目录: C:\Users\rtx\.ollama\models\manifests\registry.ollama.ai\library

Mode LastWriteTime Length Name


d----- 2024/7/22 20:51 llama3

目录: C:\Users\rtx\.ollama\models\manifests\registry.ollama.ai\library\llama3

Mode LastWriteTime Length Name


-a---- 2024/7/22 20:51 858 8b

<!-- gh-comment-id:2243182478 --> @rentianxiang commented on GitHub (Jul 22, 2024): **Server logs for my last failed run:** 2024/07/22 22:24:55 routes.go:1096: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:C:\\Users\\rtx\\.ollama\\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR:C:\\Users\\rtx\\AppData\\Local\\Programs\\Ollama\\ollama_runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-07-22T22:24:55.726+08:00 level=INFO source=images.go:778 msg="total blobs: 5" time=2024-07-22T22:24:55.732+08:00 level=INFO source=images.go:785 msg="total unused blobs removed: 0" time=2024-07-22T22:24:55.734+08:00 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.2.7)" time=2024-07-22T22:24:55.736+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v6.1 cpu]" time=2024-07-22T22:24:55.737+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" time=2024-07-22T22:24:55.877+08:00 level=INFO source=gpu.go:287 msg="detected OS VRAM overhead" id=GPU-e3ce22d3-ac09-e72f-5795-3c3f0a60b4d2 library=cuda compute=8.9 driver=12.5 name="NVIDIA GeForce RTX 4080 SUPER" overhead="410.6 MiB" time=2024-07-22T22:24:55.878+08:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-e3ce22d3-ac09-e72f-5795-3c3f0a60b4d2 library=cuda compute=8.9 driver=12.5 name="NVIDIA GeForce RTX 4080 SUPER" total="16.0 GiB" available="14.7 GiB" ========== **dir -s on OLLAMA_MODELS** dir -s 目录: C:\Users\rtx\.ollama\models Mode LastWriteTime Length Name ---- ------------- ------ ---- d----- 2024/7/22 22:24 blobs d----- 2024/7/22 20:51 manifests 目录: C:\Users\rtx\.ollama\models\blobs Mode LastWriteTime Length Name ---- ------------- ------ ---- -a---- 2024/7/22 20:51 485 sha256-3f8eb4da87fa7a3c9da615036b0dc418d31fef2a30b115ff33562588b32c69 1d -a---- 2024/7/22 20:51 12403 sha256-4fa551d4f938f68b8c1e6afa9d28befb70e3f33f75d0753248d530364aeea4 0f -a---- 2024/7/22 20:51 110 sha256-577073ffcc6ce95b9981eacc77d1039568639e5638e83044994560d9ef82ce 1b -a---- 2024/7/22 20:51 4661211424 sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2 aa -a---- 2024/7/22 20:51 254 sha256-8ab4849b038cf0abc5b1c9b8ee1443dca6b93a045c2272180d985126eb40bf 6f 目录: C:\Users\rtx\.ollama\models\manifests Mode LastWriteTime Length Name ---- ------------- ------ ---- d----- 2024/7/22 20:51 registry.ollama.ai 目录: C:\Users\rtx\.ollama\models\manifests\registry.ollama.ai Mode LastWriteTime Length Name ---- ------------- ------ ---- d----- 2024/7/22 20:51 library 目录: C:\Users\rtx\.ollama\models\manifests\registry.ollama.ai\library Mode LastWriteTime Length Name ---- ------------- ------ ---- d----- 2024/7/22 20:51 llama3 目录: C:\Users\rtx\.ollama\models\manifests\registry.ollama.ai\library\llama3 Mode LastWriteTime Length Name ---- ------------- ------ ---- -a---- 2024/7/22 20:51 858 8b
Author
Owner

@MaxJa4 commented on GitHub (Jul 22, 2024):

This behavior (high usage and low throughput) usually either means lots of small read requests (stalling the drive controller) or a partial drive failure afaik.
Can you verify with some SSD benchmark tool (like "ATTO Disk Benchmark" or "CrystalDiskMark") that the drive itself is performing ok?
Also, since it doesn't stop after you aborted the download... the IO queue is probably still being processed. But just to verify, you may also open "Resource Monitor" on your PC (Windows) and check in the drive tab, which process is causing such high activity on your SSD.

<!-- gh-comment-id:2243353610 --> @MaxJa4 commented on GitHub (Jul 22, 2024): This behavior (high usage and low throughput) usually either means lots of small read requests (stalling the drive controller) or a partial drive failure afaik. Can you verify with some SSD benchmark tool (like "ATTO Disk Benchmark" or "CrystalDiskMark") that the drive itself is performing ok? Also, since it doesn't stop after you aborted the download... the IO queue is probably still being processed. But just to verify, you may also open "Resource Monitor" on your PC (Windows) and check in the drive tab, which process is causing such high activity on your SSD.
Author
Owner

@rick-github commented on GitHub (Jul 22, 2024):

I'm not a windows user but this looks fine. One suspicion I had was that you might have had failed downloads in your OLLAMA_MODELS directory and that the high disk usage was from ollama tying to process them, but that appears not to be the case. If you look at the processes when the high disk activity is happening, is ollama at the top? (See here for an example display). Another possible cause of high disk usage is re-reads due to read failure, do you see anything relevant in the system logs?

<!-- gh-comment-id:2243354085 --> @rick-github commented on GitHub (Jul 22, 2024): I'm not a windows user but this looks fine. One suspicion I had was that you might have had failed downloads in your `OLLAMA_MODELS` directory and that the high disk usage was from ollama tying to process them, but that appears not to be the case. If you look at the processes when the high disk activity is happening, is ollama at the top? (See [here](https://answers.microsoft.com/en-us/windows/forum/all/windows-event-log-100-disk-usage/e101dd6a-6d67-412f-ad02-01db349f97d5) for an example display). Another possible cause of high disk usage is re-reads due to read failure, do you see anything relevant in the system logs?
Author
Owner

@rentianxiang commented on GitHub (Jul 23, 2024):

I have used CrystalDiskMark to test my C:/ Drive, looks fine to me
image

When I start downloading, it got stuck in like 10 seconds, and this is the Resource Monitor, ollama.exe is writing like crazy
This situation stayed the same when I press ctrl+c in the terminal and quit Ollama
image

Sorry my system was in Chinese so its hard for you to read.
The issue I found is Ollama is not reading a lot, but writing a lot.

<!-- gh-comment-id:2245546896 --> @rentianxiang commented on GitHub (Jul 23, 2024): I have used CrystalDiskMark to test my C:/ Drive, looks fine to me ![image](https://github.com/user-attachments/assets/076faf8b-5762-4804-b68b-52087f950601) When I start downloading, it got stuck in like 10 seconds, and this is the Resource Monitor, ollama.exe is writing like crazy This situation stayed the same when I press ctrl+c in the terminal and quit Ollama ![image](https://github.com/user-attachments/assets/671c02bf-290a-4f97-b922-e945b389ee49) Sorry my system was in Chinese so its hard for you to read. The issue I found is Ollama is not reading a lot, but writing a lot.
Author
Owner

@rentianxiang commented on GitHub (Jul 23, 2024):

Do you know if there is any other place that I could download the Model and let Ollama to use it? Workaround is also welcomed!

<!-- gh-comment-id:2245564030 --> @rentianxiang commented on GitHub (Jul 23, 2024): Do you know if there is any other place that I could download the Model and let Ollama to use it? Workaround is also welcomed!
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

You can change where ollama stores model by changing the OLLAMA_MODELS environment variable. So you can try stopping ollama, changing OLLAMA_MODELS to D:\models, starting ollama and running ollama pull gemma2:27b. This will save the model to your D: drive. When the download is finished, stop ollama, unset OLLAMA_MODELS, recursively copy D:\models to C:\Users\rtx.ollama, and restart ollama.

This is a bit of a kludge and I don't think it's very different to what's already happening, but it will allow you to test if the problem is really ollama writing to C:.

<!-- gh-comment-id:2245730954 --> @rick-github commented on GitHub (Jul 23, 2024): You can change where ollama stores model by changing the `OLLAMA_MODELS` environment variable. So you can try stopping ollama, changing `OLLAMA_MODELS` to `D:\models`, starting ollama and running `ollama pull gemma2:27b`. This will save the model to your D: drive. When the download is finished, stop ollama, unset `OLLAMA_MODELS`, recursively copy D:\models to C:\Users\rtx\.ollama\, and restart ollama. This is a bit of a kludge and I don't think it's very different to what's already happening, but it will allow you to test if the problem is really ollama writing to C:.
Author
Owner

@dhiltgen commented on GitHub (Jul 24, 2024):

Models are large, so by default Ollama uses multiple threads to download chunks of the model in parallel. Ollama ramps up to try to find the optimal number of threads to maximize the download speed. This will also result in a large number of writes to the disk drive where models are stored. We have an issue to make this more configurable #2006

If you cancel the client request to download a model, the server should cancel the request, although it may take some time to terminate the pending threads/writes. @rentianxiang when you cancel the client, does it eventually stop writing to your drive, or are you seeing it continue to download?

<!-- gh-comment-id:2248850138 --> @dhiltgen commented on GitHub (Jul 24, 2024): Models are large, so by default Ollama uses multiple threads to download chunks of the model in parallel. Ollama ramps up to try to find the optimal number of threads to maximize the download speed. This will also result in a large number of writes to the disk drive where models are stored. We have an issue to make this more configurable #2006 If you cancel the client request to download a model, the server should cancel the request, although it may take some time to terminate the pending threads/writes. @rentianxiang when you cancel the client, does it eventually stop writing to your drive, or are you seeing it continue to download?
Author
Owner

@MaxJa4 commented on GitHub (Jul 26, 2024):

To add to this: I saw this behavior for myself today. When starting the pull, Ollama seems to first allocate space for the full model and creates many small partial files (I suppose one per thread). If you download a 40GB+ model, this may take a few seconds (NVMEs are fast, but it still takes some time) and Ollama will only show a few KB/s meanwhile. Maybe that is unclear to the user.
Regardless, it should show several GB/s in the task manager instead of the ~10MB/s of the issue author.
Don't know if that's connected to the issue here though, just adding this here in case it is relevant.

<!-- gh-comment-id:2253506280 --> @MaxJa4 commented on GitHub (Jul 26, 2024): To add to this: I saw this behavior for myself today. When starting the pull, Ollama seems to first allocate space for the full model and creates many small partial files (I suppose one per thread). If you download a 40GB+ model, this may take a few seconds (NVMEs are fast, but it still takes some time) and Ollama will only show a few KB/s meanwhile. Maybe that is unclear to the user. Regardless, it should show several GB/s in the task manager instead of the ~10MB/s of the issue author. Don't know if that's connected to the issue here though, just adding this here in case it is relevant.
Author
Owner

@rentianxiang commented on GitHub (Jul 27, 2024):

Hi guys thanks for looking into it!
I tried again this morning, llama3.1:8B works fine, the writing lasted a few seconds and the download starts, and model working like a charm.
But.. For llama3.1:70B, i have waited 30 minutes, it stucked on here, there is a marker at the end of the line keep flashing tho.
The write speed after 30 minutes is like 5MB/s.

PS C:\Users\rtx> ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 0% ▕ ▏ 103 KB/ 39 GB

After I have terminated the Ollama download, the usage for my Disk kept 100% for like 30 minutes without dropping, so that I have decided to reboot my PC.

<!-- gh-comment-id:2253708617 --> @rentianxiang commented on GitHub (Jul 27, 2024): Hi guys thanks for looking into it! I tried again this morning, llama3.1:8B works fine, the writing lasted a few seconds and the download starts, and model working like a charm. But.. For llama3.1:70B, i have waited 30 minutes, it stucked on here, there is a marker at the end of the line keep flashing tho. The write speed after 30 minutes is like 5MB/s. PS C:\Users\rtx> ollama run llama3.1:70b pulling manifest pulling aa81b541aae6... 0% ▕ ▏ 103 KB/ 39 GB After I have terminated the Ollama download, the usage for my Disk kept 100% for like 30 minutes without dropping, so that I have decided to reboot my PC.
Author
Owner

@MaxJa4 commented on GitHub (Jul 27, 2024):

If you abort the pull, the ollama process will still continue to allocate the placeholder files, that's why. You'd need to kill the process manually.
Maybe if you try CrystalDiskMark again, but this time with a larger test size (it was 1 GB above) which is more similar to the 70B model size, it may give us more info. If for example some sectors of your SSD are struggling, it may only show if writing really large files.

<!-- gh-comment-id:2254048239 --> @MaxJa4 commented on GitHub (Jul 27, 2024): If you abort the pull, the ollama process will still continue to allocate the placeholder files, that's why. You'd need to kill the process manually. Maybe if you try CrystalDiskMark again, but this time with a larger test size (it was 1 GB above) which is more similar to the 70B model size, it may give us more info. If for example some sectors of your SSD are struggling, it may only show if writing really large files.
Author
Owner

@vulpes2 commented on GitHub (Jul 28, 2024):

Decided to investigate it because I'm concerned that ollama is wearing out people's SSDs by not using sparse files. This has nothing to do with the SSD not being able to keep up with sustained writes, because those sustained writes should not be happening to begin with during disk allocation. This is caused by an oversight that is exclusive to Windows. I can't send a patch because I couldn't get ollama to compile on Windows even in the VS Developer shell, but I have a fix that @dhiltgen likely can implement.

In server/download.go, ollama is correctly expanding the file by calling file.Truncate(), which updates the file size on the fs without writing zeroes to the disk on all supported filesystems (including NTFS). This part works fine, but on Windows you must mark the file as sparse explicitly with the FSCTL_SET_SPARSE ioctl, or call fsutil sparse setflag FILE, otherwise the file will be immediately be expanded on the first time it's being written to, regardless of the offset.

Since I couldn't compile ollama on Windows, here's a simple PoC to reproduce this issue. On Linux and macOS this should return instantly and leave you with a 10GB sparse file, but on Windows it will stall at io.CopyN() until the entire file is expanded. If you delete the file on Windows, replace it with an empty text file, run fsutil sparse setflag testfile and run the go program, it will behave like it's supposed to and only write a single byte to the end of the file.

package main

import (
	"fmt"
	"io"
	"os"
	"strings"
)

func main() {
	fmt.Println("Opening file")
	file, err := os.OpenFile("testfile", os.O_CREATE|os.O_RDWR, 0o644)
	if err != nil {
		fmt.Println("File can't be opened")
		os.Exit(1)
	}
	fmt.Println("Expanding file to 10GiB")
	size := int64(1024*1024*1024*10)
	_ = file.Truncate(size)
	fmt.Println("New OffsetWriter")
	r := strings.NewReader("a")
	w := io.NewOffsetWriter(file, size-1)
	fmt.Println("Copying 1 byte")
	n, err := io.CopyN(w, r, 1)
	if err != nil {
		fmt.Println("Write failed")
		os.Exit(1)
	}
	fmt.Printf("Wrote %d byte, closing file\n", n)
	file.Close()
	fmt.Println("File closed")
}
<!-- gh-comment-id:2254345401 --> @vulpes2 commented on GitHub (Jul 28, 2024): Decided to investigate it because I'm concerned that ollama is wearing out people's SSDs by not using sparse files. This has nothing to do with the SSD not being able to keep up with sustained writes, because those sustained writes should not be happening to begin with during disk allocation. This is caused by an oversight that is exclusive to Windows. I can't send a patch because I couldn't get ollama to compile on Windows even in the VS Developer shell, but I have a fix that @dhiltgen likely can implement. In `server/download.go`, ollama is correctly expanding the file by calling `file.Truncate()`, which updates the file size on the fs _without_ writing zeroes to the disk on all supported filesystems (including NTFS). This part works fine, but on Windows you _must_ mark the file as sparse explicitly with the `FSCTL_SET_SPARSE` ioctl, or call `fsutil sparse setflag FILE`, otherwise the file will be immediately be expanded on the first time it's being written to, regardless of the offset. Since I couldn't compile ollama on Windows, here's a simple PoC to reproduce this issue. On Linux and macOS this should return instantly and leave you with a 10GB sparse file, but on Windows it will stall at `io.CopyN()` until the entire file is expanded. If you delete the file on Windows, replace it with an empty text file, run `fsutil sparse setflag testfile` and run the go program, it will behave like it's supposed to and only write a single byte to the end of the file. ```go package main import ( "fmt" "io" "os" "strings" ) func main() { fmt.Println("Opening file") file, err := os.OpenFile("testfile", os.O_CREATE|os.O_RDWR, 0o644) if err != nil { fmt.Println("File can't be opened") os.Exit(1) } fmt.Println("Expanding file to 10GiB") size := int64(1024*1024*1024*10) _ = file.Truncate(size) fmt.Println("New OffsetWriter") r := strings.NewReader("a") w := io.NewOffsetWriter(file, size-1) fmt.Println("Copying 1 byte") n, err := io.CopyN(w, r, 1) if err != nil { fmt.Println("Write failed") os.Exit(1) } fmt.Printf("Wrote %d byte, closing file\n", n) file.Close() fmt.Println("File closed") } ```
Author
Owner

@dhiltgen commented on GitHub (Aug 6, 2024):

Thanks @vulpes2!

<!-- gh-comment-id:2271824788 --> @dhiltgen commented on GitHub (Aug 6, 2024): Thanks @vulpes2!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29411