[GH-ISSUE #2821] Can we have the newest 1-bit model #1713

New Issue

GiteaMirror · 2026-04-12T11:41:16-05:00

GiteaMirror commented

2026-04-12 11:41:16 -05:00

Originally created by @chuangtc on GitHub (Feb 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2821

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
https://thegenerality.com/agi/
https://arxiv.org/abs/2402.17764

Originally created by @chuangtc on GitHub (Feb 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2821 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://thegenerality.com/agi/ https://arxiv.org/abs/2402.17764

GiteaMirror added the model label 2026-04-12 11:41:16 -05:00

GiteaMirror commented

2026-04-12 11:41:17 -05:00

@josharian commented on GitHub (Mar 3, 2024):

IIUC the model hasn't been released yet. When it is, I believe it will appear at https://github.com/microsoft/unilm/tree/master/bitnet. Then there'll be some work to get llama.cpp support. Then Ollama can pull it in.

@josharian commented on GitHub (Mar 3, 2024): IIUC the model hasn't been released yet. When it is, I believe it will appear at https://github.com/microsoft/unilm/tree/master/bitnet. Then there'll be some work to get llama.cpp support. Then Ollama can pull it in.

GiteaMirror commented

2026-04-12 11:41:17 -05:00

@unclemusclez commented on GitHub (Oct 14, 2024):

I have this gguf'd and ready to be pushed to ollama but i am getting

Error: invalid file magic

i need me wizards

@unclemusclez commented on GitHub (Oct 14, 2024): I have this gguf'd and ready to be pushed to ollama but i am getting `Error: invalid file magic` i need me wizards

GiteaMirror commented

2026-04-12 11:41:18 -05:00

@FGRibreau commented on GitHub (Oct 20, 2024):

It's now released https://github.com/microsoft/BitNet \o/

@FGRibreau commented on GitHub (Oct 20, 2024): It's now released https://github.com/microsoft/BitNet \o/

GiteaMirror commented

2026-04-12 11:41:18 -05:00

@akashAD98 commented on GitHub (Oct 20, 2024):

its released

@akashAD98 commented on GitHub (Oct 20, 2024): its released

GiteaMirror commented

2026-04-12 11:41:19 -05:00

@ozbillwang commented on GitHub (Oct 22, 2024):

This request is great! upvote it.

I managed to create a 70MB CPU-only Ollama Docker image (#7184). However, when testing it in real environments like MacBooks and EC2 instances, the response time was too slow, always with high CPU usage. It struggled to handle requests efficiently.

If we could implement a 1-bit model optimized for CPU inference, it would significantly improve performance and allow us to deploy it widely.

@ozbillwang commented on GitHub (Oct 22, 2024): This request is great! upvote it. I managed to create a 70MB CPU-only Ollama Docker image (#7184). However, when testing it in real environments like MacBooks and EC2 instances, the response time was too slow, always with high CPU usage. It struggled to handle requests efficiently. If we could implement a 1-bit model optimized for CPU inference, it would significantly improve performance and allow us to deploy it widely.

GiteaMirror commented

2026-04-12 11:41:19 -05:00

@kth8 commented on GitHub (Oct 23, 2024):

@ozbillwang I did some testing regarding AVX for CPU inferencing. With AVX2 CPU runner:

ollama run llama3.2:1b-instruct-q4_K_M
>>> /set verbose
Set 'verbose' mode.
>>> /set parameter temperature 0
Set parameter 'temperature' to '0'
>>> /set parameter seed 0
Set parameter 'seed' to '0'
>>> why is the sky blue?
The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. He discovered that shorter (blue) 
wavelengths of light are scattered more than longer (red) wavelengths by the tiny molecules of gases in the atmosphere.

Here's what happens:

1. Sunlight enters the Earth's atmosphere.
2. The sunlight is made up of a spectrum of colors, including red, orange, yellow, green, blue, indigo, and violet.
3. The shorter (blue) wavelengths are scattered by the tiny molecules of gases such as nitrogen (N2) and oxygen (O2) in the atmosphere.
4. This scattering effect gives the sky its blue color.

The amount of scattering that occurs depends on several factors, including:

* The altitude of the atmosphere: Scattering decreases with increasing altitude.
* The concentration of atmospheric gases: More gas molecules scatter shorter wavelengths.
* The angle of the sunlight: The more direct the sunlight, the more it is scattered.

As a result, the sky appears blue during the daytime when the sun is overhead and the light has to travel through a longer distance in the atmosphere. At 
sunrise and sunset, the light has to travel through a shorter distance, which scatters the shorter wavelengths, making the sky appear redder or orange.

It's worth noting that the color of the sky can also be affected by other factors, such as pollution, dust, and water vapor, but Rayleigh scattering is the 
primary reason for the blue color we see.

total duration:       18.680602651s
load duration:        26.641548ms
prompt eval count:    31 token(s)
prompt eval duration: 712.777ms
prompt eval rate:     43.49 tokens/s
eval count:           306 token(s)
eval duration:        17.899705s
eval rate:            17.10 tokens/s

with just the CPU runner:

ollama run llama3.2:1b-instruct-q4_K_M
>>> /set verbose
Set 'verbose' mode.
>>> /set parameter temperature 0
Set parameter 'temperature' to '0'
>>> /set parameter seed 0
Set parameter 'seed' to '0'
>>> why is the sky blue?
The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. He discovered that shorter (blue) 
wavelengths of light are scattered more than longer (red) wavelengths by the tiny molecules of gases in the atmosphere.
...
total duration:       1m55.741991018s
load duration:        38.39719ms
prompt eval count:    31 token(s)
prompt eval duration: 2.987612s
prompt eval rate:     10.38 tokens/s
eval count:           319 token(s)
eval duration:        1m52.674415s
eval rate:            2.83 tokens/s

the threads you linked to were using GPU for inference so the lack of AVX may not have been a big deal but for CPU inferencing it makes a massive difference. If you don't use AVX your performance is going to remain terrible regardless if the model is 1 bit or not.

@kth8 commented on GitHub (Oct 23, 2024): @ozbillwang I did some testing regarding AVX for CPU inferencing. With AVX2 CPU runner: ``` ollama run llama3.2:1b-instruct-q4_K_M >>> /set verbose Set 'verbose' mode. >>> /set parameter temperature 0 Set parameter 'temperature' to '0' >>> /set parameter seed 0 Set parameter 'seed' to '0' >>> why is the sky blue? The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. He discovered that shorter (blue) wavelengths of light are scattered more than longer (red) wavelengths by the tiny molecules of gases in the atmosphere. Here's what happens: 1. Sunlight enters the Earth's atmosphere. 2. The sunlight is made up of a spectrum of colors, including red, orange, yellow, green, blue, indigo, and violet. 3. The shorter (blue) wavelengths are scattered by the tiny molecules of gases such as nitrogen (N2) and oxygen (O2) in the atmosphere. 4. This scattering effect gives the sky its blue color. The amount of scattering that occurs depends on several factors, including: * The altitude of the atmosphere: Scattering decreases with increasing altitude. * The concentration of atmospheric gases: More gas molecules scatter shorter wavelengths. * The angle of the sunlight: The more direct the sunlight, the more it is scattered. As a result, the sky appears blue during the daytime when the sun is overhead and the light has to travel through a longer distance in the atmosphere. At sunrise and sunset, the light has to travel through a shorter distance, which scatters the shorter wavelengths, making the sky appear redder or orange. It's worth noting that the color of the sky can also be affected by other factors, such as pollution, dust, and water vapor, but Rayleigh scattering is the primary reason for the blue color we see. total duration: 18.680602651s load duration: 26.641548ms prompt eval count: 31 token(s) prompt eval duration: 712.777ms prompt eval rate: 43.49 tokens/s eval count: 306 token(s) eval duration: 17.899705s eval rate: 17.10 tokens/s ``` with just the CPU runner: ``` ollama run llama3.2:1b-instruct-q4_K_M >>> /set verbose Set 'verbose' mode. >>> /set parameter temperature 0 Set parameter 'temperature' to '0' >>> /set parameter seed 0 Set parameter 'seed' to '0' >>> why is the sky blue? The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. He discovered that shorter (blue) wavelengths of light are scattered more than longer (red) wavelengths by the tiny molecules of gases in the atmosphere. ... total duration: 1m55.741991018s load duration: 38.39719ms prompt eval count: 31 token(s) prompt eval duration: 2.987612s prompt eval rate: 10.38 tokens/s eval count: 319 token(s) eval duration: 1m52.674415s eval rate: 2.83 tokens/s ``` the threads you linked to were using GPU for inference so the lack of AVX may not have been a big deal but for CPU inferencing it makes a massive difference. If you don't use AVX your performance is going to remain terrible regardless if the model is 1 bit or not.

GiteaMirror commented

2026-04-12 11:41:20 -05:00

@ozbillwang commented on GitHub (Oct 23, 2024):

I did some testing regarding AVX for CPU inferencing. With AVX2 CPU runner:

Could you share the way on how to test with AVX and AVX2 CPU?

I got the result, the speed is similar your second test

total duration:       1m1.460351303s
load duration:        24.006432ms
prompt eval count:    71 token(s)
prompt eval duration: 1.398321s
prompt eval rate:     50.78 tokens/s
eval count:           309 token(s)
eval duration:        59.907969s
eval rate:            5.16 tokens/s

@ozbillwang commented on GitHub (Oct 23, 2024): > I did some testing regarding AVX for CPU inferencing. With AVX2 CPU runner: Could you share the way on how to test with AVX and AVX2 CPU? I got the result, the speed is similar your second test ``` total duration: 1m1.460351303s load duration: 24.006432ms prompt eval count: 71 token(s) prompt eval duration: 1.398321s prompt eval rate: 50.78 tokens/s eval count: 309 token(s) eval duration: 59.907969s eval rate: 5.16 tokens/s ```

GiteaMirror commented

2026-04-12 11:41:22 -05:00

@kth8 commented on GitHub (Oct 23, 2024):

if you include

COPY --from=ollama /usr/lib/ollama/runners/cpu_avx /usr/lib/ollama/runners/cpu_avx
COPY --from=ollama /usr/lib/ollama/runners/cpu_avx2 /usr/lib/ollama/runners/cpu_avx2

in your Docker image then if you run ps aux inside the container after loading a model you will see it being used.

@kth8 commented on GitHub (Oct 23, 2024): if you include ``` COPY --from=ollama /usr/lib/ollama/runners/cpu_avx /usr/lib/ollama/runners/cpu_avx COPY --from=ollama /usr/lib/ollama/runners/cpu_avx2 /usr/lib/ollama/runners/cpu_avx2 ``` in your Docker image then if you run `ps aux` inside the container after loading a model you will see it being used.

GiteaMirror commented

2026-04-12 11:41:23 -05:00

@ozbillwang commented on GitHub (Oct 23, 2024):

Thanks. @kth8 . It is faster, but still high CPU Usage

total duration:       29.888580911s
load duration:        31.737842ms
prompt eval count:    346 token(s)
prompt eval duration: 151.134ms
prompt eval rate:     2289.36 tokens/s
eval count:           454 token(s)
eval duration:        29.573488s
eval rate:            15.35 tokens/s

If compare to BitNet, the CPU usage is less

llama_perf_sampler_print:    sampling time =      17.98 ms /   137 runs   (    0.13 ms per token,  7621.27 tokens per second)
llama_perf_context_print:        load time =    1257.96 ms
llama_perf_context_print: prompt eval time =    1363.48 ms /     9 tokens (  151.50 ms per token,     6.60 tokens per second)
llama_perf_context_print:        eval time =   19435.12 ms /   127 runs   (  153.03 ms per token,     6.53 tokens per second)
llama_perf_context_print:       total time =   20850.38 ms /   136 tokens

Seems we still need wait for this feature in Ollama

@ozbillwang commented on GitHub (Oct 23, 2024): Thanks. @kth8 . It is faster, but still high CPU Usage <img width="1522" alt="image" src="https://github.com/user-attachments/assets/e0a15bce-690f-40b2-bc14-f309efab3f77"> ``` total duration: 29.888580911s load duration: 31.737842ms prompt eval count: 346 token(s) prompt eval duration: 151.134ms prompt eval rate: 2289.36 tokens/s eval count: 454 token(s) eval duration: 29.573488s eval rate: 15.35 tokens/s ``` If compare to BitNet, the CPU usage is less <img width="1526" alt="image" src="https://github.com/user-attachments/assets/d11116ad-4b55-467c-b174-2e27337296de"> ``` llama_perf_sampler_print: sampling time = 17.98 ms / 137 runs ( 0.13 ms per token, 7621.27 tokens per second) llama_perf_context_print: load time = 1257.96 ms llama_perf_context_print: prompt eval time = 1363.48 ms / 9 tokens ( 151.50 ms per token, 6.60 tokens per second) llama_perf_context_print: eval time = 19435.12 ms / 127 runs ( 153.03 ms per token, 6.53 tokens per second) llama_perf_context_print: total time = 20850.38 ms / 136 tokens ``` Seems we still need wait for this feature in Ollama

GiteaMirror commented

2026-04-12 11:41:24 -05:00

@kth8 commented on GitHub (Oct 24, 2024):

The new granite3-moe 3B model could be good for CPU inferencing since it was designed for low latency usage and has less active parameters than Llama3.2 1B

@kth8 commented on GitHub (Oct 24, 2024): The new granite3-moe 3B model could be good for CPU inferencing since it was designed for low latency usage and has less active parameters than Llama3.2 1B

GiteaMirror commented

2026-04-12 11:41:24 -05:00

@YangWang92 commented on GitHub (Oct 25, 2024):

From https://github.com/ollama/ollama/issues/7289

Hi all,

We recently developed a fully open-source quantization method called VPTQ (Vector Post-Training Quantization) https://github.com/microsoft/VPTQ which enables fast quantization of large language models (LLMs) down to 1-4 bits. The community has also helped release several models using this method https://huggingface.co/VPTQ-community. I am personally very interested in integrating VPTQ into ollama/llama.cpp.

One of the key advantages of VPTQ is that the dequantization method is very straightforward, relying only on a simple lookup table.

I would like to ask for guidance on how best to support this quantization method within Ollama, even if it's on my own fork. Specifically, which approach should I take?

Define a series of new models (e.g., vptq-llama3.1) using existing data types (int32, fp16), and hide the model dequantization within a separate dequant op.

Define a new quantization data type (e.g., a custom lookup table data structure)?

I’d love to hear your thoughts or any suggestions on how to proceed!

Thank you!
Yang

@YangWang92 commented on GitHub (Oct 25, 2024): From https://github.com/ollama/ollama/issues/7289 Hi all, We recently developed a fully open-source quantization method called VPTQ (Vector Post-Training Quantization) https://github.com/microsoft/VPTQ which enables fast quantization of large language models (LLMs) down to 1-4 bits. The community has also helped release several models using this method https://huggingface.co/VPTQ-community. I am personally very interested in integrating VPTQ into ollama/llama.cpp. One of the key advantages of VPTQ is that the dequantization method is very straightforward, relying only on a simple lookup table. I would like to ask for guidance on how best to support this quantization method within Ollama, even if it's on my own fork. Specifically, which approach should I take? Define a series of new models (e.g., vptq-llama3.1) using existing data types (int32, fp16), and hide the model dequantization within a separate dequant op. Define a new quantization data type (e.g., a custom lookup table data structure)? I’d love to hear your thoughts or any suggestions on how to proceed! Thank you! Yang

GiteaMirror commented

2026-04-12 11:41:25 -05:00

@teaalltr commented on GitHub (Oct 25, 2024):

@YangWang92 @kth8 isn't it already supported in llama.cpp?
https://github.com/ggerganov/llama.cpp/pull/8151

@teaalltr commented on GitHub (Oct 25, 2024): @YangWang92 @kth8 isn't it already supported in llama.cpp? https://github.com/ggerganov/llama.cpp/pull/8151

GiteaMirror commented

2026-04-12 11:41:26 -05:00

@YangWang92 commented on GitHub (Oct 26, 2024):

I'm still trying to integrate VPTQ into llama.cpp. https://github.com/ggerganov/llama.cpp/discussions/9974 :)

@YangWang92 commented on GitHub (Oct 26, 2024): I'm still trying to integrate VPTQ into llama.cpp. https://github.com/ggerganov/llama.cpp/discussions/9974 :)

GiteaMirror commented

2026-04-12 11:41:26 -05:00

@Y-PLONI commented on GitHub (Nov 23, 2024):

Has there been any progress with this?
Did ollama or llama.cpp do this?
And if so, is there any good model that works with it?
Thanks!

@Y-PLONI commented on GitHub (Nov 23, 2024): Has there been any progress with this? Did ollama or llama.cpp do this? And if so, is there any good model that works with it? Thanks!

GiteaMirror commented

2026-04-12 11:41:27 -05:00

@raymond-infinitecode commented on GitHub (Jan 4, 2025):

Still no progress ?

@raymond-infinitecode commented on GitHub (Jan 4, 2025): Still no progress ?

GiteaMirror commented

2026-04-12 11:41:27 -05:00

@southwolf commented on GitHub (Jan 14, 2025):

Still not working ollama run hf.co/mradermacher/phi-4-i1-GGUF:i1-IQ1_M got error
Error: llama runner process has terminated: GGML_ASSERT(hparams.n_swa > 0) failed

@southwolf commented on GitHub (Jan 14, 2025): Still not working `ollama run hf.co/mradermacher/phi-4-i1-GGUF:i1-IQ1_M` got error `Error: llama runner process has terminated: GGML_ASSERT(hparams.n_swa > 0) failed`

GiteaMirror commented

2026-04-12 11:41:28 -05:00

@HKMV commented on GitHub (Apr 24, 2025):

Any progress?

@HKMV commented on GitHub (Apr 24, 2025): Any progress?

GiteaMirror commented

2026-04-12 11:41:28 -05:00

@electriquo commented on GitHub (Apr 24, 2025):

relates to #10337

@electriquo commented on GitHub (Apr 24, 2025): relates to #10337

GiteaMirror commented

2026-04-12 11:41:29 -05:00

@borja-rojo-ilvento commented on GitHub (Apr 24, 2025):

relates to #10337

Yup, this is what I want!!!

@borja-rojo-ilvento commented on GitHub (Apr 24, 2025): > relates to [#10337](https://github.com/ollama/ollama/issues/10337) Yup, this is what I want!!!

GiteaMirror commented

2026-04-12 11:41:30 -05:00

@gordan-bobic commented on GitHub (Jul 6, 2025):

Given that support for what seems to be (partly) bitnet ternary encoded models seems to be already be supported in llama.cpp:
https://unsloth.ai/blog/deepseekr1-dynamic
should this already be working in ollama as is?

@gordan-bobic commented on GitHub (Jul 6, 2025): Given that support for what seems to be (partly) bitnet ternary encoded models seems to be already be supported in llama.cpp: https://unsloth.ai/blog/deepseekr1-dynamic should this already be working in ollama as is?

GiteaMirror referenced this issue

2026-04-22 03:12:42 -05:00

[GH-ISSUE #1713] Call specific options like `num_predict` ignored on master branch #26729

GiteaMirror referenced this issue

2026-04-28 03:55:02 -05:00

[GH-ISSUE #1713] Call specific options like `num_predict` ignored on master branch #47480

GiteaMirror referenced this issue

2026-05-03 11:14:35 -05:00

[GH-ISSUE #1713] Call specific options like `num_predict` ignored on master branch #63005

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#1713