[GH-ISSUE #7867] Deepseek (various) 236b crashes on run #67089

New Issue

GiteaMirror · 2026-05-04T09:26:26-05:00

GiteaMirror commented

2026-05-04 09:26:26 -05:00

Originally created by @Maltz42 on GitHub (Nov 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7867

What is the issue?

Deepseek V2, V2.5, and V2-coder all crash with an OOM error when loading the 236b size. Other versions of Deepseek may as well, that's all I've tested. Hardware is dual A6000's with 48GB each.

Error: llama runner process has terminated: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 882903040
llama_new_context_with_model: failed to allocate compute buffers

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

v0.4.5

Originally created by @Maltz42 on GitHub (Nov 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7867 ### What is the issue? Deepseek V2, V2.5, and V2-coder all crash with an OOM error when loading the 236b size. Other versions of Deepseek may as well, that's all I've tested. Hardware is dual A6000's with 48GB each. ``` Error: llama runner process has terminated: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 882903040 llama_new_context_with_model: failed to allocate compute buffers ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version v0.4.5

GiteaMirror added the bug needs more info labels 2026-05-04 09:26:27 -05:00

GiteaMirror closed this issue

2026-05-04 09:26:28 -05:00

GiteaMirror commented

2026-05-04 09:26:30 -05:00

@igorschlum commented on GitHub (Nov 28, 2024):

@Maltz42, what behavior were you expecting? To run a 236B model, you would need at least 236GB of VRAM on your system. If you're encountering an out-of-memory error, that's expected behavior due to insufficient VRAM.

I recommend using the 16B model, which should fit within your available VRAM and avoid memory allocation errors.

You can consider closing the issue, as the "out of memory" error is a result of the hardware limitations and not a bug.

@igorschlum commented on GitHub (Nov 28, 2024): @Maltz42, what behavior were you expecting? To run a 236B model, you would need at least 236GB of VRAM on your system. If you're encountering an out-of-memory error, that's expected behavior due to insufficient VRAM. I recommend using the 16B model, which should fit within your available VRAM and avoid memory allocation errors. You can consider closing the issue, as the "out of memory" error is a result of the hardware limitations and not a bug.

GiteaMirror commented

2026-05-04 09:26:31 -05:00

@Maltz42 commented on GitHub (Nov 29, 2024):

@igorschlum - Ollama (normally) falls back to using system RAM when it runs out of VRAM. You don't need a GPU at all to run models, you can run them 100% on CPU, 100% on GPU, or a combination of both. I've installed llama3.1:405b-instruct-q6_K, which is a 336GB quantization, and it runs just fine. Only about 0.75t/s, but otherwise fine. System resources is not the issue.

@Maltz42 commented on GitHub (Nov 29, 2024): @igorschlum - Ollama (normally) falls back to using system RAM when it runs out of VRAM. You don't need a GPU at all to run models, you can run them 100% on CPU, 100% on GPU, or a combination of both. I've installed llama3.1:405b-instruct-q6_K, which is a 336GB quantization, and it runs just fine. Only about 0.75t/s, but otherwise fine. System resources is not the issue.

GiteaMirror commented

2026-05-04 09:26:33 -05:00

@igorschlum commented on GitHub (Nov 29, 2024):

@Maltz42 Your're write for the fall back.How much RAM do you have on your computer. I have a Mac Station with 192GB and could not run larger than Llama3.1:405b-instruct-q2_K?

@igorschlum commented on GitHub (Nov 29, 2024): @Maltz42 Your're write for the fall back.How much RAM do you have on your computer. I have a Mac Station with 192GB and could not run larger than Llama3.1:405b-instruct-q2_K?

GiteaMirror commented

2026-05-04 09:26:35 -05:00

@Maltz42 commented on GitHub (Nov 29, 2024):

A total of 96GB VRAM plus 256GB of system RAM. q6_K fills it all to the rim, but it runs!

@Maltz42 commented on GitHub (Nov 29, 2024): A total of 96GB VRAM plus 256GB of system RAM. q6_K fills it all to the rim, but it runs!

GiteaMirror commented

2026-05-04 09:26:37 -05:00

@rick-github commented on GitHub (Nov 30, 2024):

deepseek architecture is different to most other models, and ollama sometimes misjudges how much of the model can be loaded into VRAM. You can compensate by decreasing the number of layers offloaded (https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650). You can also try enabling fallback memory with GGML_CUDA_ENABLE_UNIFIED_MEMORY=1, just be aware of the performance impact mentioned in the comment. Lastly, you can set OLLAMA_GPU_OVERHEAD to reserve some space on the GPUs for llama.cpp to allocate as required, although sometimes it doesn't seem to help.

@rick-github commented on GitHub (Nov 30, 2024): deepseek architecture is different to most other models, and ollama sometimes misjudges how much of the model can be loaded into VRAM. You can compensate by decreasing the number of layers offloaded (https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650). You can also try enabling [fallback memory](https://github.com/ollama/ollama/issues/7584#issuecomment-2466715900) with `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1`, just be aware of the performance impact mentioned in the comment. Lastly, you can set [`OLLAMA_GPU_OVERHEAD`](https://github.com/ollama/ollama/blob/39e29ae5ddb9ff710c0e28652b61850f458e1205/envconfig/config.go#L237) to reserve some space on the GPUs for llama.cpp to allocate as required, although sometimes it doesn't seem to help.

GiteaMirror commented

2026-05-04 09:26:38 -05:00

@rick-github commented on GitHub (Dec 2, 2024):

Has this been resolved?

@rick-github commented on GitHub (Dec 2, 2024): Has this been resolved?

GiteaMirror commented

2026-05-04 09:26:38 -05:00

@Maltz42 commented on GitHub (Dec 2, 2024):

I'm going to say it has not been resolved. I haven't had a chance to try the mitigations listed above yet, but I have verified that the model still crashes out-of-the-box on v0.4.7

@Maltz42 commented on GitHub (Dec 2, 2024): I'm going to say it has not been resolved. I haven't had a chance to try the mitigations listed above yet, but I have verified that the model still crashes out-of-the-box on v0.4.7

GiteaMirror commented

2026-05-04 09:26:39 -05:00

@rick-github commented on GitHub (Dec 2, 2024):

Serve logs may aid in debugging.

@rick-github commented on GitHub (Dec 2, 2024): [Serve logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-05-04 09:26:39 -05:00

@Maltz42 commented on GitHub (Dec 2, 2024):

@rick-github Oh yes, of course... See attached. The log covers from server launch through the crash.

deepseek_crash.log

@Maltz42 commented on GitHub (Dec 2, 2024): @rick-github Oh yes, of course... See attached. The log covers from server launch through the crash. [deepseek_crash.log](https://github.com/user-attachments/files/17985039/deepseek_crash.log)

GiteaMirror commented

2026-05-04 09:26:40 -05:00

@rick-github commented on GitHub (Dec 3, 2024):

Yes, this looks like ollama being too optimistic about how many layers it can offload. It's figuring it can use [46.8 GiB 46.4 GiB] of [47.3 GiB 47.3 GiB], ie a margin of [0.5GB, 0.9GB]. This comes unstuck while it's trying to allocate 0.8G on GPU0. As mentioned, there are some mitigation techniques:

Set OLLAMA_GPU_OVERHEAD to give llama.cpp a buffer to grow in to (eg, OLLAMA_GPU_OVERHEAD=536870912 to reserve 512M)
Enable GGML_CUDA_ENABLE_UNIFIED_MEMORY=1, allowing the GPU to overflow into system RAM. Potential performance impact, see here.
Reduce the number layers that ollama thinks it can offload to the GPU, see here. Ollama is currently offloading 41 of 61 layers, try setting num_gpu to 35.

@rick-github commented on GitHub (Dec 3, 2024): Yes, this looks like ollama being too optimistic about how many layers it can offload. It's figuring it can use [46.8 GiB 46.4 GiB] of [47.3 GiB 47.3 GiB], ie a margin of [0.5GB, 0.9GB]. This comes unstuck while it's trying to allocate 0.8G on GPU0. As mentioned, there are some mitigation techniques: 1. Set [`OLLAMA_GPU_OVERHEAD`](https://github.com/ollama/ollama/blob/5f8051180e3b9aeafc153f6b5056e7358a939c88/envconfig/config.go#L237) to give llama.cpp a buffer to grow in to (eg, `OLLAMA_GPU_OVERHEAD=536870912` to reserve 512M) 2. Enable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1`, allowing the GPU to overflow into system RAM. Potential performance impact, see [here](https://github.com/ollama/ollama/issues/7584#issuecomment-2466715900). 3. Reduce the number layers that ollama thinks it can offload to the GPU, see [here](https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650). Ollama is currently offloading 41 of 61 layers, try setting `num_gpu` to 35.

GiteaMirror commented

2026-05-04 09:26:40 -05:00

@Maltz42 commented on GitHub (Dec 5, 2024):

Yeah, such miscalculations have been a problem for a very long time, to varying degrees. But I haven't seen it happen on an out-of-the-box model before. Usually it happens when increasing the context size. But I don't think Ollama has ever calculated memory usage really properly.

@Maltz42 commented on GitHub (Dec 5, 2024): Yeah, such miscalculations have been a problem for a very long time, to varying degrees. But I haven't seen it happen on an out-of-the-box model before. Usually it happens when increasing the context size. But I don't think Ollama has ever calculated memory usage really properly.

GiteaMirror commented

2026-05-04 09:26:41 -05:00

@rick-github commented on GitHub (Dec 23, 2024):

Did the mitigations help?

@rick-github commented on GitHub (Dec 23, 2024): Did the mitigations help?

GiteaMirror commented

2026-05-04 09:26:41 -05:00

@Maltz42 commented on GitHub (Jan 13, 2025):

While the mitigations work, as expected, this is still broken in 0.5.5 - not sure why this should be closed as completed?

@Maltz42 commented on GitHub (Jan 13, 2025): While the mitigations work, as expected, this is still broken in 0.5.5 - not sure why this should be closed as completed?

GiteaMirror commented

2026-05-04 09:26:41 -05:00

@rick-github commented on GitHub (Jan 13, 2025):

Which mitigations helped?

@rick-github commented on GitHub (Jan 13, 2025): Which mitigations helped?

GiteaMirror commented

2026-05-04 09:26:44 -05:00

@Maltz42 commented on GitHub (Jan 13, 2025):

num_gpu 35 did the trick in my case, but that's the only one I tried. I can test the others (or other values of num_gpu) if it would be helpful. I've also seen errors occur with very large context windows, so there's something fundamentally wrong somewhere with the way ollama calculates GPU memory usage I guess.

@Maltz42 commented on GitHub (Jan 13, 2025): num_gpu 35 did the trick in my case, but that's the only one I tried. I can test the others (or other values of num_gpu) if it would be helpful. I've also seen errors occur with very large context windows, so there's something fundamentally wrong somewhere with the way ollama calculates GPU memory usage I guess.

GiteaMirror commented

2026-05-04 09:26:47 -05:00

@Maltz42 commented on GitHub (Jan 13, 2025):

Actually, while tinkering with it just now, there might be more to it. I got the following error after a few back-and-forth responses. The output froze, mid-sentence, for a few seconds and then ollama exited with the following error:

Error: an error was encountered while running the model: unexpected EOF

@Maltz42 commented on GitHub (Jan 13, 2025): Actually, while tinkering with it just now, there might be more to it. I got the following error after a few back-and-forth responses. The output froze, mid-sentence, for a few seconds and then ollama exited with the following error: Error: an error was encountered while running the model: unexpected EOF

GiteaMirror commented

2026-05-04 09:26:48 -05:00

@rick-github commented on GitHub (Jan 13, 2025):

Can you add logs for that failure?

@rick-github commented on GitHub (Jan 13, 2025): Can you add logs for that failure?

GiteaMirror commented

2026-05-04 09:26:50 -05:00

@Maltz42 commented on GitHub (Jan 13, 2025):

Jan 12 22:09:45 daisy ollama[590872]: llama.cpp:11968: The current context does not support K-shift
Jan 12 22:09:45 daisy ollama[590872]: SIGSEGV: segmentation violation
Jan 12 22:09:45 daisy ollama[590872]: PC=0x7c9ff2824c47 m=0 sigcode=1 addr=0x20ae03fc8
Jan 12 22:09:45 daisy ollama[590872]: signal arrived during cgo execution

There's a lot after that, but those are the first few lines at the moment the output froze - I suspect that's the relevant bit. I've also reproduced it just now.

@Maltz42 commented on GitHub (Jan 13, 2025): ``` Jan 12 22:09:45 daisy ollama[590872]: llama.cpp:11968: The current context does not support K-shift Jan 12 22:09:45 daisy ollama[590872]: SIGSEGV: segmentation violation Jan 12 22:09:45 daisy ollama[590872]: PC=0x7c9ff2824c47 m=0 sigcode=1 addr=0x20ae03fc8 Jan 12 22:09:45 daisy ollama[590872]: signal arrived during cgo execution ``` There's a lot after that, but those are the first few lines at the moment the output froze - I suspect that's the relevant bit. I've also reproduced it just now.

GiteaMirror commented

2026-05-04 09:26:50 -05:00

@rick-github commented on GitHub (Jan 13, 2025):

OK, this is different to the OOM issues. The deepseek class of models don't support shifting the context window when the buffer fills up, see here.

@rick-github commented on GitHub (Jan 13, 2025): OK, this is different to the OOM issues. The deepseek class of models don't support shifting the context window when the buffer fills up, see [here](https://github.com/ollama/ollama/issues/5975).

GiteaMirror commented

2026-05-04 09:26:51 -05:00

@rick-github commented on GitHub (Apr 13, 2025):

Deepseek k-shift problem resolved with #9433.

@rick-github commented on GitHub (Apr 13, 2025): Deepseek k-shift problem resolved with #9433.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#67089