[GH-ISSUE #4225] gguf unable load #2635

Closed
opened 2026-04-12 12:59:00 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @bambooqj on GitHub (May 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4225

.........................................................................................
llama_new_context_with_model: n_ctx = 65536
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA_Host KV buffer size = 8192.00 MiB
llama_new_context_with_model: KV self size = 8192.00 MiB, K (f16): 4096.00 MiB, V (f16): 4096.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.50 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4507.00 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4725934080
llama_new_context_with_model: failed to allocate compute buffers
llama_init_from_gpt_params: error: failed to create context with model 'C:\Users\Administrator.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be'
{"function":"load_model","level":"ERR","line":410,"model":"C:\Users\Administrator\.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be","msg":"unable to load model","tid":"11992","timestamp":1715077029}


https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF/tree/main

Originally created by @bambooqj on GitHub (May 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4225 ......................................................................................... llama_new_context_with_model: n_ctx = 65536 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA_Host KV buffer size = 8192.00 MiB llama_new_context_with_model: KV self size = 8192.00 MiB, K (f16): 4096.00 MiB, V (f16): 4096.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.50 MiB ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4507.00 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4725934080 llama_new_context_with_model: failed to allocate compute buffers llama_init_from_gpt_params: error: failed to create context with model 'C:\Users\Administrator\.ollama\models\blobs\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be' {"function":"load_model","level":"ERR","line":410,"model":"C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-912687f2b75ca31331bfcf8b55a34a366dbb0f31df6bf65bc464c1d2431b92be","msg":"unable to load model","tid":"11992","timestamp":1715077029} --------------------------------------------------------- https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF/tree/main
GiteaMirror added the bugmemory labels 2026-04-12 12:59:00 -05:00
Author
Owner

@pdevine commented on GitHub (Oct 23, 2024):

This issue seems to have fallen through the cracks. I did pull the image, and I created a Modelfile which looks like:

FROM Llama-3-8B-Instruct-64k.Q4_K_M.gguf

Creating and running it all worked fine:

% ollama create mymodel
transferring model data 100%
using existing layer sha256:83ddb9e0d2f98446c124506c21a792a482594f77ea7e4552c087357d11a2e0d3
using autodetected template llama3-instruct
using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb
creating new layer sha256:0a89d7e996a21838bb37a0d45ca46c509f267bf7ac42a0ce1655819d93f00566
writing manifest
success
% ollama run mymodel
>>> hi there
Hi there! It's nice to meet you. Is there something I can help you with or would you like to chat about something in particular?

I'm going to go ahead and close the issue as it looks like it's working. We can reopen it if you're still having problems, but the issue is pretty old (sorry again!).

<!-- gh-comment-id:2433588721 --> @pdevine commented on GitHub (Oct 23, 2024): This issue seems to have fallen through the cracks. I did pull the image, and I created a Modelfile which looks like: ``` FROM Llama-3-8B-Instruct-64k.Q4_K_M.gguf ``` Creating and running it all worked fine: ``` % ollama create mymodel transferring model data 100% using existing layer sha256:83ddb9e0d2f98446c124506c21a792a482594f77ea7e4552c087357d11a2e0d3 using autodetected template llama3-instruct using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb creating new layer sha256:0a89d7e996a21838bb37a0d45ca46c509f267bf7ac42a0ce1655819d93f00566 writing manifest success % ollama run mymodel >>> hi there Hi there! It's nice to meet you. Is there something I can help you with or would you like to chat about something in particular? ``` I'm going to go ahead and close the issue as it looks like it's working. We can reopen it if you're still having problems, but the issue is pretty old (sorry again!).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2635