[GH-ISSUE #1492] 7b model on Colab: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:8001: out of memory #47316

Closed
opened 2026-04-28 03:34:48 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @nnWhisperer on GitHub (Dec 12, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1492

Hello,
On a Google Colab 50GB ram 16GB Vram T4 instance (problem persisted in V100 instance), I install ollama as follows:

!sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
!sudo chmod +x /usr/bin/ollama
!ollama serve

On the terminal I say:
ollama run yarn-mistral:7b-128k
Log gives the following error while of the 16gbVRAM only 4.3 Gb of the VRAM was used :
CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:8001: out of memory

Following are the logs received:
output.log

Originally created by @nnWhisperer on GitHub (Dec 12, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1492 Hello, On a Google Colab 50GB ram 16GB Vram T4 instance (problem persisted in V100 instance), I install ollama as follows: ``` !sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama !sudo chmod +x /usr/bin/ollama !ollama serve ``` On the terminal I say: `ollama run yarn-mistral:7b-128k` Log gives the following error while of the 16gbVRAM only 4.3 Gb of the VRAM was used : `CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:8001: out of memory` Following are the logs received: [output.log](https://github.com/jmorganca/ollama/files/13654181/output.log)
GiteaMirror added the bug label 2026-04-28 03:34:48 -05:00
Author
Owner

@phalexo commented on GitHub (Dec 15, 2023):

@BruceMacD

The OOM bug is hiding somewhere in this folder ./ollama/llm/llama.cpp/gguf

When I copied over this folder from the tag v0.1.11 to the tag clone v0.1.12 the problem in v0.1.12 goes away.

<!-- gh-comment-id:1857242689 --> @phalexo commented on GitHub (Dec 15, 2023): @BruceMacD The OOM bug is hiding somewhere in this folder ./ollama/llm/llama.cpp/gguf When I copied over this folder from the tag v0.1.11 to the tag clone v0.1.12 the problem in v0.1.12 goes away.
Author
Owner

@phalexo commented on GitHub (Dec 15, 2023):

git clone --recursive https://github.com/jmorganca/ollama.git
cd ollama/llm/llama.cpp
vi generate_linux.go
//go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on
//go:generate cmake --build ggml/build/cuda --target server --config Release
//go:generate mv ggml/build/cuda/bin/server ggml/build/cuda/bin/ollama-runner
//go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA_PEER_MAX_BATCH_SIZE=0 -DLLAMA_CUDA_FORCE_MMQ=on
//go:generate cmake --build gguf/build/cuda --target server --config Release
//go:generate mv gguf/build/cuda/bin/server gguf/build/cuda/bin/ollama-runner
cd ../..
go generate ./...
go build .
<!-- gh-comment-id:1858573478 --> @phalexo commented on GitHub (Dec 15, 2023): ```bash git clone --recursive https://github.com/jmorganca/ollama.git cd ollama/llm/llama.cpp vi generate_linux.go ``` ```go //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake --build ggml/build/cuda --target server --config Release //go:generate mv ggml/build/cuda/bin/server ggml/build/cuda/bin/ollama-runner //go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA_PEER_MAX_BATCH_SIZE=0 -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake --build gguf/build/cuda --target server --config Release //go:generate mv gguf/build/cuda/bin/server gguf/build/cuda/bin/ollama-runner ``` ```bash cd ../.. go generate ./... go build . ```
Author
Owner

@jmorganca commented on GitHub (Jan 14, 2024):

Hi @nnWhisperer this should be fixed as of version 0.1.20. Note, filling large context windows will still cause potential OOM errors, this is being worked on in https://github.com/jmorganca/ollama/issues/1952

<!-- gh-comment-id:1891088375 --> @jmorganca commented on GitHub (Jan 14, 2024): Hi @nnWhisperer this should be fixed as of version 0.1.20. Note, filling large context windows will still cause potential OOM errors, this is being worked on in https://github.com/jmorganca/ollama/issues/1952
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47316