[GH-ISSUE #7656] ollama 0.4 stops answering on granite3-dense but works with 0.3 #30646

New Issue

GiteaMirror · 2026-04-22T10:30:37-05:00

GiteaMirror commented

2026-04-22 10:30:37 -05:00

Originally created by @maxandersen on GitHub (Nov 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7656

What is the issue?

with ollama 0.3.x running ollama run granite-3-dense I can ask multiple questions and get response.

with ollama 0.4.1, I get answer to first question, but then mostly blank responses on any follow up.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.4.1

Originally created by @maxandersen on GitHub (Nov 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7656 ### What is the issue? with ollama 0.3.x running `ollama run granite-3-dense` I can ask multiple questions and get response. with ollama 0.4.1, I get answer to first question, but then mostly blank responses on any follow up. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.4.1

GiteaMirror added the top bug labels 2026-04-22 10:30:37 -05:00

GiteaMirror closed this issue

2026-04-22 10:30:38 -05:00

GiteaMirror commented

2026-04-22 10:30:40 -05:00

@fbricon commented on GitHub (Nov 13, 2024):

Same here

@fbricon commented on GitHub (Nov 13, 2024): Same here

GiteaMirror commented

2026-04-22 10:30:41 -05:00

@maxandersen commented on GitHub (Nov 13, 2024):

example run:

ollama run granite3-dense
>>> what is void?
In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions
but don't produce a result that can be assigned to a variable.

>>> what is python?
Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms,
including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning,
artificial intelligence, and scientific computing.

>>> what is java?
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It
is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can
run on all platforms that support Java without the need for recompilation.

>>> what is ruby?


>>> what is xxx
I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with
information on programming languages and related topics.

>>> what is ruby?


>>> what is python?

with 0.3 this does not seem to happen.

@maxandersen commented on GitHub (Nov 13, 2024): example run: ``` ollama run granite3-dense >>> what is void? In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions but don't produce a result that can be assigned to a variable. >>> what is python? Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning, artificial intelligence, and scientific computing. >>> what is java? Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. >>> what is ruby? >>> what is xxx I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with information on programming languages and related topics. >>> what is ruby? >>> what is python? ``` with 0.3 this does not seem to happen.

GiteaMirror commented

2026-04-22 10:30:41 -05:00

@maxandersen commented on GitHub (Nov 13, 2024):

example run:

ollama run granite3-dense
>>> what is void?
In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions
but don't produce a result that can be assigned to a variable.

>>> what is python?
Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms,
including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning,
artificial intelligence, and scientific computing.

>>> what is java?
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It
is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can
run on all platforms that support Java without the need for recompilation.

>>> what is ruby?


>>> what is xxx
I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with
information on programming languages and related topics.

>>> what is ruby?


>>> what is python?

with 0.3 this does not seem to happen.

@maxandersen commented on GitHub (Nov 13, 2024): example run: ``` ollama run granite3-dense >>> what is void? In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions but don't produce a result that can be assigned to a variable. >>> what is python? Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning, artificial intelligence, and scientific computing. >>> what is java? Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. >>> what is ruby? >>> what is xxx I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with information on programming languages and related topics. >>> what is ruby? >>> what is python? ``` with 0.3 this does not seem to happen.

GiteaMirror commented

2026-04-22 10:30:42 -05:00

@deboer-tim commented on GitHub (Nov 13, 2024):

... and same here. Sometimes it looks ok for a bit on v0.4, but then doesn't have any reply to some questions, other times it just 'hangs' after the first few questions and there are no more responses. On 0.3 I can chat for a long time with no obvious failures.

@deboer-tim commented on GitHub (Nov 13, 2024): ... and same here. Sometimes it looks ok for a bit on v0.4, but then doesn't have any reply to some questions, other times it just 'hangs' after the first few questions and there are no more responses. On 0.3 I can chat for a long time with no obvious failures.

GiteaMirror commented

2026-04-22 10:30:42 -05:00

@maxandersen commented on GitHub (Nov 13, 2024):

here is output and debug log for a run similar to above:

 ollama run granite3-dense
>>> what is java?
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It
is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can
run on all platforms that support Java without the need for recompilation.

>>> what is python?


>>> what is void?
In programming, "void" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++,
and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as
Python, the concept of "void" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning
anything from the function.

>>> what is ruby?


>>> Send a message (/? for help)

debug log:

OLLAMA_DEBUG=1 ollama serve
2024/11/13 20:12:27 routes.go:1189: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/manderse/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: http_proxy: https_proxy: no_proxy:]"
time=2024-11-13T20:12:27.664+01:00 level=INFO source=images.go:755 msg="total blobs: 42"
time=2024-11-13T20:12:27.666+01:00 level=INFO source=images.go:762 msg="total unused blobs removed: 0"
time=2024-11-13T20:12:27.668+01:00 level=INFO source=routes.go:1240 msg="Listening on 127.0.0.1:11434 (version 0.4.1)"
time=2024-11-13T20:12:27.670+01:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners
time=2024-11-13T20:12:27.671+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz
time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server
time=2024-11-13T20:12:27.741+01:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[metal]
time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-11-13T20:12:27.812+01:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="21.3 GiB" available="21.3 GiB"
[GIN] 2024/11/13 - 20:12:52 | 200 |    1.187083ms |       127.0.0.1 | HEAD     "/"
[GIN] 2024/11/13 - 20:12:52 | 200 |    9.720459ms |       127.0.0.1 | POST     "/api/show"
time=2024-11-13T20:12:52.986+01:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x100d5b460 gpu_count=1
time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]"
time=2024-11-13T20:12:52.995+01:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce gpu=0 parallel=4 available=22906503168 required="3.0 GiB"
time=2024-11-13T20:12:52.995+01:00 level=INFO source=server.go:105 msg="system memory" total="32.0 GiB" free="7.6 GiB" free_swap="0 B"
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]"
time=2024-11-13T20:12:52.995+01:00 level=INFO source=memory.go:343 msg="offload to metal" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[21.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.0 GiB" memory.required.partial="3.0 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[3.0 GiB]" memory.weights.total="2.0 GiB" memory.weights.repeating="1.9 GiB" memory.weights.nonrepeating="78.8 MiB" memory.graph.full="426.7 MiB" memory.graph.partial="426.7 MiB"
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server
time=2024-11-13T20:12:52.997+01:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server --model /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 8 --parallel 4 --port 53441"
time=2024-11-13T20:12:52.997+01:00 level=DEBUG source=server.go:400 msg=subprocess environment="[PATH=/Users/manderse/.rvm/gems/ruby-2.7.1/bin:/Users/manderse/.rvm/gems/ruby-2.7.1@global/bin:/Users/manderse/.rvm/rubies/ruby-2.7.1/bin:/Users/manderse/Library/pnpm:/Users/manderse/.jbang/bin:/Users/manderse/bin/google-cloud-sdk/bin:/Users/manderse/.docker/bin:/Users/manderse/perl5/bin:/Users/manderse/Library/Python/3.10/bin:/Users/manderse/.yarn/bin:/Users/manderse/.config/yarn/global/node_modules/.bin:/Users/manderse/.sdkman/candidates/quarkus/current/bin:/Users/manderse/.sdkman/candidates/mvnd/current/bin:/Users/manderse/.sdkman/candidates/maven/current/bin:/Users/manderse/.sdkman/candidates/kscript/current/bin:/Users/manderse/.sdkman/candidates/kotlin/current/bin:/Users/manderse/.sdkman/candidates/jbang/current/bin:/Users/manderse/.sdkman/candidates/java/current/bin:/Users/manderse/.sdkman/candidates/gradle/current/bin:/Users/manderse/.local/bin:/Users/manderse/.krew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/manderse/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Applications/Little Snitch.app/Contents/Components:/usr/local/munki:/opt/podman/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Users/manderse/.rvm/bin LD_LIBRARY_PATH=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal]"
time=2024-11-13T20:12:53.003+01:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding"
time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
time=2024-11-13T20:12:53.960+01:00 level=INFO source=runner.go:863 msg="starting go runner"
time=2024-11-13T20:12:53.961+01:00 level=INFO source=runner.go:864 msg=system info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = -1 | SVE = -1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = -1 | LLAMAFILE = 1 | cgo(clang)" threads=8
time=2024-11-13T20:12:53.961+01:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:53441"
llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granite
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 3.0 2b Instruct
llama_model_loader: - kv   3:                           general.finetune str              = instruct
llama_model_loader: - kv   4:                           general.basename str              = granite-3.0
llama_model_loader: - kv   5:                         general.size_label str              = 2B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,3]       = ["language", "granite-3.0", "text-gen...
llama_model_loader: - kv   8:                        granite.block_count u32              = 40
llama_model_loader: - kv   9:                     granite.context_length u32              = 4096
llama_model_loader: - kv  10:                   granite.embedding_length u32              = 2048
llama_model_loader: - kv  11:                granite.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:               granite.attention.head_count u32              = 32
llama_model_loader: - kv  13:            granite.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                     granite.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  15:   granite.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 15
llama_model_loader: - kv  17:                         granite.vocab_size u32              = 49155
llama_model_loader: - kv  18:               granite.rope.dimension_count u32              = 64
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  20:                    granite.attention.scale f32              = 0.015625
llama_model_loader: - kv  21:                    granite.embedding_scale f32              = 12.000000
llama_model_loader: - kv  22:                     granite.residual_scale f32              = 0.220000
llama_model_loader: - kv  23:                        granite.logit_scale f32              = 8.000000
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = refact
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,49155]   = ["<|end_of_text|>", "<fim_prefix>", "...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,49155]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,48891]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|start_of_r...
llama_model_loader: - kv  34:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
time=2024-11-13T20:12:54.018+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.2826 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = granite
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49155
llm_load_print_meta: n_merges         = 48891
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 8.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 2.63 B
llm_load_print_meta: model size       = 1.49 GiB (4.86 BPW)
llm_load_print_meta: general.name     = Granite 3.0 2b Instruct
llm_load_print_meta: BOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: EOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: PAD token        = 0 '<|end_of_text|>'
llm_load_print_meta: LF token         = 145 'Ä'
llm_load_print_meta: EOG token        = 0 '<|end_of_text|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: f_embedding_scale = 12.000000
llm_load_print_meta: f_residual_scale  = 0.220000
llm_load_print_meta: f_attention_scale = 0.015625
llm_load_tensors: ggml ctx size =    0.34 MiB
ggml_backend_metal_log_allocated_size: allocated buffer, size =  1472.06 MiB, ( 1472.12 / 21845.34)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size =    54.00 MiB
llm_load_tensors:      Metal buffer size =  1472.05 MiB
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: using embedded metal library
time=2024-11-13T20:12:54.275+01:00 level=DEBUG source=server.go:607 msg="model load progress 1.00"
time=2024-11-13T20:12:54.526+01:00 level=DEBUG source=server.go:610 msg="model load completed, waiting for server to become available" status="llm server loading model"
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
llama_kv_cache_init:      Metal KV buffer size =   640.00 MiB
llama_new_context_with_model: KV self size  =  640.00 MiB, K (f16):  320.00 MiB, V (f16):  320.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.78 MiB
llama_new_context_with_model:      Metal compute buffer size =   544.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    20.01 MiB
llama_new_context_with_model: graph nodes  = 1368
llama_new_context_with_model: graph splits = 2
time=2024-11-13T20:12:55.782+01:00 level=INFO source=server.go:601 msg="llama runner started in 2.78 seconds"
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
[GIN] 2024/11/13 - 20:12:55 | 200 |  2.802275167s |       127.0.0.1 | POST     "/api/generate"
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:466 msg="context for request finished"
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
time=2024-11-13T20:13:02.049+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:02.050+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:02.051+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=12 used=0 remaining=12
[GIN] 2024/11/13 - 20:13:04 | 200 |   2.21301825s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
time=2024-11-13T20:13:08.706+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:08.707+01:00 level=DEBUG source=server.go:955 msg="new runner detected, loading model for cgo tokenization"
llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granite
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 3.0 2b Instruct
llama_model_loader: - kv   3:                           general.finetune str              = instruct
llama_model_loader: - kv   4:                           general.basename str              = granite-3.0
llama_model_loader: - kv   5:                         general.size_label str              = 2B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,3]       = ["language", "granite-3.0", "text-gen...
llama_model_loader: - kv   8:                        granite.block_count u32              = 40
llama_model_loader: - kv   9:                     granite.context_length u32              = 4096
llama_model_loader: - kv  10:                   granite.embedding_length u32              = 2048
llama_model_loader: - kv  11:                granite.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:               granite.attention.head_count u32              = 32
llama_model_loader: - kv  13:            granite.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                     granite.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  15:   granite.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 15
llama_model_loader: - kv  17:                         granite.vocab_size u32              = 49155
llama_model_loader: - kv  18:               granite.rope.dimension_count u32              = 64
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  20:                    granite.attention.scale f32              = 0.015625
llama_model_loader: - kv  21:                    granite.embedding_scale f32              = 12.000000
llama_model_loader: - kv  22:                     granite.residual_scale f32              = 0.220000
llama_model_loader: - kv  23:                        granite.logit_scale f32              = 8.000000
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = refact
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,49155]   = ["<|end_of_text|>", "<fim_prefix>", "...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,49155]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,48891]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|start_of_r...
llama_model_loader: - kv  34:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.2826 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = granite
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49155
llm_load_print_meta: n_merges         = 48891
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 2.63 B
llm_load_print_meta: model size       = 1.49 GiB (4.86 BPW)
llm_load_print_meta: general.name     = Granite 3.0 2b Instruct
llm_load_print_meta: BOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: EOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: PAD token        = 0 '<|end_of_text|>'
llm_load_print_meta: LF token         = 145 'Ä'
llm_load_print_meta: EOG token        = 0 '<|end_of_text|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: f_embedding_scale = 0.000000
llm_load_print_meta: f_residual_scale  = 0.000000
llm_load_print_meta: f_attention_scale = 0.000000
llama_model_load: vocab only - skipping tensors
time=2024-11-13T20:13:08.775+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:08.777+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=82 prompt=96 used=82 remaining=14
[GIN] 2024/11/13 - 20:13:08 | 200 |  265.054833ms |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
what time=2024-11-13T20:13:19.431+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:19.433+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:19.435+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=96 prompt=110 used=96 remaining=14
[GIN] 2024/11/13 - 20:13:21 | 200 |  1.742458875s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
time=2024-11-13T20:13:26.226+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:26.232+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>In programming, \"void\" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++, and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as Python, the concept of \"void\" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning anything from the function.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:26.234+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=207 prompt=221 used=207 remaining=14
[GIN] 2024/11/13 - 20:13:26 | 200 |  429.776042ms |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

@maxandersen commented on GitHub (Nov 13, 2024): here is output and debug log for a run similar to above: ``` ollama run granite3-dense >>> what is java? Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. >>> what is python? >>> what is void? In programming, "void" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++, and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as Python, the concept of "void" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning anything from the function. >>> what is ruby? >>> Send a message (/? for help) ``` debug log: ``` OLLAMA_DEBUG=1 ollama serve 2024/11/13 20:12:27 routes.go:1189: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/manderse/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: http_proxy: https_proxy: no_proxy:]" time=2024-11-13T20:12:27.664+01:00 level=INFO source=images.go:755 msg="total blobs: 42" time=2024-11-13T20:12:27.666+01:00 level=INFO source=images.go:762 msg="total unused blobs removed: 0" time=2024-11-13T20:12:27.668+01:00 level=INFO source=routes.go:1240 msg="Listening on 127.0.0.1:11434 (version 0.4.1)" time=2024-11-13T20:12:27.670+01:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners time=2024-11-13T20:12:27.671+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server time=2024-11-13T20:12:27.741+01:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[metal] time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-11-13T20:12:27.812+01:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="21.3 GiB" available="21.3 GiB" [GIN] 2024/11/13 - 20:12:52 | 200 | 1.187083ms | 127.0.0.1 | HEAD "/" [GIN] 2024/11/13 - 20:12:52 | 200 | 9.720459ms | 127.0.0.1 | POST "/api/show" time=2024-11-13T20:12:52.986+01:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x100d5b460 gpu_count=1 time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]" time=2024-11-13T20:12:52.995+01:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce gpu=0 parallel=4 available=22906503168 required="3.0 GiB" time=2024-11-13T20:12:52.995+01:00 level=INFO source=server.go:105 msg="system memory" total="32.0 GiB" free="7.6 GiB" free_swap="0 B" time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]" time=2024-11-13T20:12:52.995+01:00 level=INFO source=memory.go:343 msg="offload to metal" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[21.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.0 GiB" memory.required.partial="3.0 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[3.0 GiB]" memory.weights.total="2.0 GiB" memory.weights.repeating="1.9 GiB" memory.weights.nonrepeating="78.8 MiB" memory.graph.full="426.7 MiB" memory.graph.partial="426.7 MiB" time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server time=2024-11-13T20:12:52.997+01:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server --model /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 8 --parallel 4 --port 53441" time=2024-11-13T20:12:52.997+01:00 level=DEBUG source=server.go:400 msg=subprocess environment="[PATH=/Users/manderse/.rvm/gems/ruby-2.7.1/bin:/Users/manderse/.rvm/gems/ruby-2.7.1@global/bin:/Users/manderse/.rvm/rubies/ruby-2.7.1/bin:/Users/manderse/Library/pnpm:/Users/manderse/.jbang/bin:/Users/manderse/bin/google-cloud-sdk/bin:/Users/manderse/.docker/bin:/Users/manderse/perl5/bin:/Users/manderse/Library/Python/3.10/bin:/Users/manderse/.yarn/bin:/Users/manderse/.config/yarn/global/node_modules/.bin:/Users/manderse/.sdkman/candidates/quarkus/current/bin:/Users/manderse/.sdkman/candidates/mvnd/current/bin:/Users/manderse/.sdkman/candidates/maven/current/bin:/Users/manderse/.sdkman/candidates/kscript/current/bin:/Users/manderse/.sdkman/candidates/kotlin/current/bin:/Users/manderse/.sdkman/candidates/jbang/current/bin:/Users/manderse/.sdkman/candidates/java/current/bin:/Users/manderse/.sdkman/candidates/gradle/current/bin:/Users/manderse/.local/bin:/Users/manderse/.krew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/manderse/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Applications/Little Snitch.app/Contents/Components:/usr/local/munki:/opt/podman/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Users/manderse/.rvm/bin LD_LIBRARY_PATH=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal]" time=2024-11-13T20:12:53.003+01:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding" time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error" time=2024-11-13T20:12:53.960+01:00 level=INFO source=runner.go:863 msg="starting go runner" time=2024-11-13T20:12:53.961+01:00 level=INFO source=runner.go:864 msg=system info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = -1 | SVE = -1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = -1 | LLAMAFILE = 1 | cgo(clang)" threads=8 time=2024-11-13T20:12:53.961+01:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:53441" llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = granite llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Granite 3.0 2b Instruct llama_model_loader: - kv 3: general.finetune str = instruct llama_model_loader: - kv 4: general.basename str = granite-3.0 llama_model_loader: - kv 5: general.size_label str = 2B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.tags arr[str,3] = ["language", "granite-3.0", "text-gen... llama_model_loader: - kv 8: granite.block_count u32 = 40 llama_model_loader: - kv 9: granite.context_length u32 = 4096 llama_model_loader: - kv 10: granite.embedding_length u32 = 2048 llama_model_loader: - kv 11: granite.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: granite.attention.head_count u32 = 32 llama_model_loader: - kv 13: granite.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: granite.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 15: granite.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 15 llama_model_loader: - kv 17: granite.vocab_size u32 = 49155 llama_model_loader: - kv 18: granite.rope.dimension_count u32 = 64 llama_model_loader: - kv 19: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 20: granite.attention.scale f32 = 0.015625 llama_model_loader: - kv 21: granite.embedding_scale f32 = 12.000000 llama_model_loader: - kv 22: granite.residual_scale f32 = 0.220000 llama_model_loader: - kv 23: granite.logit_scale f32 = 8.000000 llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 25: tokenizer.ggml.pre str = refact llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,49155] = ["<|end_of_text|>", "<fim_prefix>", "... llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,49155] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,48891] = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ... llama_model_loader: - kv 29: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 0 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|start_of_r... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_K: 241 tensors llama_model_loader: - type q6_K: 41 tensors time=2024-11-13T20:12:54.018+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 22 llm_load_vocab: token to piece cache size = 0.2826 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = granite llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 49155 llm_load_print_meta: n_merges = 48891 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 512 llm_load_print_meta: n_embd_v_gqa = 512 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 8.0e+00 llm_load_print_meta: n_ff = 8192 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 2.63 B llm_load_print_meta: model size = 1.49 GiB (4.86 BPW) llm_load_print_meta: general.name = Granite 3.0 2b Instruct llm_load_print_meta: BOS token = 0 '<|end_of_text|>' llm_load_print_meta: EOS token = 0 '<|end_of_text|>' llm_load_print_meta: PAD token = 0 '<|end_of_text|>' llm_load_print_meta: LF token = 145 'Ä' llm_load_print_meta: EOG token = 0 '<|end_of_text|>' llm_load_print_meta: max token length = 512 llm_load_print_meta: f_embedding_scale = 12.000000 llm_load_print_meta: f_residual_scale = 0.220000 llm_load_print_meta: f_attention_scale = 0.015625 llm_load_tensors: ggml ctx size = 0.34 MiB ggml_backend_metal_log_allocated_size: allocated buffer, size = 1472.06 MiB, ( 1472.12 / 21845.34) llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 41/41 layers to GPU llm_load_tensors: CPU buffer size = 54.00 MiB llm_load_tensors: Metal buffer size = 1472.05 MiB llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Max ggml_metal_init: picking default device: Apple M1 Max ggml_metal_init: using embedded metal library time=2024-11-13T20:12:54.275+01:00 level=DEBUG source=server.go:607 msg="model load progress 1.00" time=2024-11-13T20:12:54.526+01:00 level=DEBUG source=server.go:610 msg="model load completed, waiting for server to become available" status="llm server loading model" ggml_metal_init: GPU name: Apple M1 Max ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction support = true ggml_metal_init: simdgroup matrix mul. support = true ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB llama_kv_cache_init: Metal KV buffer size = 640.00 MiB llama_new_context_with_model: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB llama_new_context_with_model: CPU output buffer size = 0.78 MiB llama_new_context_with_model: Metal compute buffer size = 544.00 MiB llama_new_context_with_model: CPU compute buffer size = 20.01 MiB llama_new_context_with_model: graph nodes = 1368 llama_new_context_with_model: graph splits = 2 time=2024-11-13T20:12:55.782+01:00 level=INFO source=server.go:601 msg="llama runner started in 2.78 seconds" time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce [GIN] 2024/11/13 - 20:12:55 | 200 | 2.802275167s | 127.0.0.1 | POST "/api/generate" time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:466 msg="context for request finished" time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 time=2024-11-13T20:13:02.049+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:02.050+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:02.051+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=12 used=0 remaining=12 [GIN] 2024/11/13 - 20:13:04 | 200 | 2.21301825s | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 time=2024-11-13T20:13:08.706+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:08.707+01:00 level=DEBUG source=server.go:955 msg="new runner detected, loading model for cgo tokenization" llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = granite llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Granite 3.0 2b Instruct llama_model_loader: - kv 3: general.finetune str = instruct llama_model_loader: - kv 4: general.basename str = granite-3.0 llama_model_loader: - kv 5: general.size_label str = 2B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.tags arr[str,3] = ["language", "granite-3.0", "text-gen... llama_model_loader: - kv 8: granite.block_count u32 = 40 llama_model_loader: - kv 9: granite.context_length u32 = 4096 llama_model_loader: - kv 10: granite.embedding_length u32 = 2048 llama_model_loader: - kv 11: granite.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: granite.attention.head_count u32 = 32 llama_model_loader: - kv 13: granite.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: granite.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 15: granite.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 15 llama_model_loader: - kv 17: granite.vocab_size u32 = 49155 llama_model_loader: - kv 18: granite.rope.dimension_count u32 = 64 llama_model_loader: - kv 19: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 20: granite.attention.scale f32 = 0.015625 llama_model_loader: - kv 21: granite.embedding_scale f32 = 12.000000 llama_model_loader: - kv 22: granite.residual_scale f32 = 0.220000 llama_model_loader: - kv 23: granite.logit_scale f32 = 8.000000 llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 25: tokenizer.ggml.pre str = refact llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,49155] = ["<|end_of_text|>", "<fim_prefix>", "... llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,49155] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,48891] = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ... llama_model_loader: - kv 29: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 0 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|start_of_r... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_K: 241 tensors llama_model_loader: - type q6_K: 41 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 22 llm_load_vocab: token to piece cache size = 0.2826 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = granite llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 49155 llm_load_print_meta: n_merges = 48891 llm_load_print_meta: vocab_only = 1 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 2.63 B llm_load_print_meta: model size = 1.49 GiB (4.86 BPW) llm_load_print_meta: general.name = Granite 3.0 2b Instruct llm_load_print_meta: BOS token = 0 '<|end_of_text|>' llm_load_print_meta: EOS token = 0 '<|end_of_text|>' llm_load_print_meta: PAD token = 0 '<|end_of_text|>' llm_load_print_meta: LF token = 145 'Ä' llm_load_print_meta: EOG token = 0 '<|end_of_text|>' llm_load_print_meta: max token length = 512 llm_load_print_meta: f_embedding_scale = 0.000000 llm_load_print_meta: f_residual_scale = 0.000000 llm_load_print_meta: f_attention_scale = 0.000000 llama_model_load: vocab only - skipping tensors time=2024-11-13T20:13:08.775+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:08.777+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=82 prompt=96 used=82 remaining=14 [GIN] 2024/11/13 - 20:13:08 | 200 | 265.054833ms | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 what time=2024-11-13T20:13:19.431+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:19.433+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:19.435+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=96 prompt=110 used=96 remaining=14 [GIN] 2024/11/13 - 20:13:21 | 200 | 1.742458875s | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 time=2024-11-13T20:13:26.226+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:26.232+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>In programming, \"void\" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++, and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as Python, the concept of \"void\" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning anything from the function.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:26.234+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=207 prompt=221 used=207 remaining=14 [GIN] 2024/11/13 - 20:13:26 | 200 | 429.776042ms | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ```

GiteaMirror commented

2026-04-22 10:30:43 -05:00

@fbricon commented on GitHub (Nov 13, 2024):

Maybe @gabe-l-hart, @kellyaa or @bjhargrave can help?

@fbricon commented on GitHub (Nov 13, 2024): Maybe @gabe-l-hart, @kellyaa or @bjhargrave can help?

GiteaMirror commented

2026-04-22 10:30:44 -05:00

@jessegross commented on GitHub (Nov 13, 2024):

@fbricon @deboer-tim Are you also seeing this on Macs?

@jessegross commented on GitHub (Nov 13, 2024): @fbricon @deboer-tim Are you also seeing this on Macs?

GiteaMirror commented

2026-04-22 10:30:44 -05:00

@deboer-tim commented on GitHub (Nov 13, 2024):

Yes. I have M1 Max 32Gb, Fred is M3 I think.

@deboer-tim commented on GitHub (Nov 13, 2024): Yes. I have M1 Max 32Gb, Fred is M3 I think.

GiteaMirror commented

2026-04-22 10:30:45 -05:00

@fbricon commented on GitHub (Nov 13, 2024):

Yup, M3 Pro 36GB here

@fbricon commented on GitHub (Nov 13, 2024): Yup, M3 Pro 36GB here

GiteaMirror commented

2026-04-22 10:30:46 -05:00

@derek-assurity commented on GitHub (Nov 14, 2024):

Seeing the same thing on Linux with non-Granite model - this is running llama3.2:1b:

ollama --version
ollama version is 0.4.1

ov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.484Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.488Z level=INFO source=runner.go:863 msg="starting go runner"
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=runner.go:864 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:42781"
Nov 14 03:53:11  ollama[2812]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Nov 14 03:53:11  ollama[2812]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   1:                               general.type str              = model
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - type  f32:   34 tensors
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - type q8_0:  113 tensors
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.736Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
Nov 14 03:53:12  ollama[2812]: llm_load_vocab: special tokens cache size = 256
Nov 14 03:53:12  ollama[2812]: llm_load_vocab: token to piece cache size = 0.7999 MB
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: format           = GGUF V3 (latest)
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: arch             = llama
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: vocab type       = BPE
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_vocab          = 128256
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_merges         = 280147
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: vocab_only       = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_ctx_train      = 131072
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd           = 2048
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_layer          = 16
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_head           = 32
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_head_kv        = 8
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_rot            = 64
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_swa            = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_head_k    = 64
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_head_v    = 64
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_gqa            = 4
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_k_gqa     = 512
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_v_gqa     = 512
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_ff             = 8192
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_expert         = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_expert_used    = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: causal attn      = 1
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: pooling type     = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: rope type        = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: rope scaling     = linear
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: freq_base_train  = 500000.0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: freq_scale_train = 1
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_ctx_orig_yarn  = 131072
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: rope_finetuned   = unknown
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_d_conv       = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_d_inner      = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_d_state      = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_dt_rank      = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model type       = 1B
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model ftype      = Q8_0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model params     = 1.24 B
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW)
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: LF token         = 128 'Ä'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: max token length = 256
Nov 14 03:53:12  ollama[2812]: llm_load_tensors: ggml ctx size =    0.07 MiB
Nov 14 03:53:12  ollama[2812]: llm_load_tensors:        CPU buffer size =  1518.57 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: n_ctx      = 8192
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: n_batch    = 2048
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: n_ubatch   = 512
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: flash_attn = 0
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: freq_base  = 500000.0
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: freq_scale = 1
Nov 14 03:53:13  ollama[2812]: llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model:        CPU  output buffer size =     1.99 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model:        CPU compute buffer size =   544.01 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: graph nodes  = 518
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: graph splits = 1

@derek-assurity commented on GitHub (Nov 14, 2024): Seeing the same thing on Linux with non-Granite model - this is running `llama3.2:1b`: ``` ollama --version ollama version is 0.4.1 ``` ``` ov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.484Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error" Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.488Z level=INFO source=runner.go:863 msg="starting go runner" Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=runner.go:864 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:42781" Nov 14 03:53:11 ollama[2812]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Nov 14 03:53:11 ollama[2812]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 0: general.architecture str = llama Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 1: general.type str = model Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 3: general.finetune str = Instruct Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 5: general.size_label str = 1B Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 18: general.file_type u32 = 7 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - type f32: 34 tensors Nov 14 03:53:11 ollama[2812]: llama_model_loader: - type q8_0: 113 tensors Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.736Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model" Nov 14 03:53:12 ollama[2812]: llm_load_vocab: special tokens cache size = 256 Nov 14 03:53:12 ollama[2812]: llm_load_vocab: token to piece cache size = 0.7999 MB Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: format = GGUF V3 (latest) Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: arch = llama Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: vocab type = BPE Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_vocab = 128256 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_merges = 280147 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: vocab_only = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_ctx_train = 131072 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd = 2048 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_layer = 16 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_head = 32 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_head_kv = 8 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_rot = 64 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_swa = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_head_k = 64 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_head_v = 64 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_gqa = 4 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_k_gqa = 512 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_v_gqa = 512 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_norm_eps = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_logit_scale = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_ff = 8192 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_expert = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_expert_used = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: causal attn = 1 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: pooling type = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: rope type = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: rope scaling = linear Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: freq_base_train = 500000.0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: freq_scale_train = 1 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: rope_finetuned = unknown Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_d_conv = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_d_inner = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_d_state = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_dt_rank = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model type = 1B Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model ftype = Q8_0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model params = 1.24 B Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: general.name = Llama 3.2 1B Instruct Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: LF token = 128 'Ä' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOM token = 128008 '<|eom_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOG token = 128008 '<|eom_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOG token = 128009 '<|eot_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: max token length = 256 Nov 14 03:53:12 ollama[2812]: llm_load_tensors: ggml ctx size = 0.07 MiB Nov 14 03:53:12 ollama[2812]: llm_load_tensors: CPU buffer size = 1518.57 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: n_ctx = 8192 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: n_batch = 2048 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: n_ubatch = 512 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: flash_attn = 0 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: freq_base = 500000.0 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: freq_scale = 1 Nov 14 03:53:13 ollama[2812]: llama_kv_cache_init: CPU KV buffer size = 256.00 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: CPU output buffer size = 1.99 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: CPU compute buffer size = 544.01 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: graph nodes = 518 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: graph splits = 1 ```

GiteaMirror commented

2026-04-22 10:30:46 -05:00

@jessegross commented on GitHub (Nov 14, 2024):

@derek-assurity Is that the full log? I don't see any requests coming - if so, it may not be the same issue.

@jessegross commented on GitHub (Nov 14, 2024): @derek-assurity Is that the full log? I don't see any requests coming - if so, it may not be the same issue.

GiteaMirror commented

2026-04-22 10:30:47 -05:00

@bjhargrave commented on GitHub (Nov 14, 2024):

I ran OLLAMA_DEBUG=1 ollama serve to get more output. Then ran ollama run granite3-dense:2b in another window.

what is python?

time=2024-11-14T09:32:32.548-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:32:32.549-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:32:32.551-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=276 prompt=12 used=11 remaining=1
[GIN] 2024/11/14 - 09:32:37 | 200 |  4.513843834s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

what is java?

We can see the prompt includes the previous chat round.

time=2024-11-14T09:32:45.182-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:32:45.192-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:32:45.199-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=429 prompt=442 used=137 remaining=305
[GIN] 2024/11/14 - 09:32:51 | 200 |  6.233008583s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

what is ruby?

At this point, with the prompt including the previous chats, we may have blown the 4K context window of Granite 3.0. I also got no response here.

time=2024-11-14T09:33:05.834-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:33:05.858-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, object-oriented programming language that was developed by Sun Microsystems (now owned by Oracle) in the early 1990s. It was initially designed to create interactive television systems but quickly evolved into a versatile platform for developing various applications. Java is known for its \"write once, run anywhere\" (WORA) principle, which means that compiled Java code can run on any device with a Java Runtime Environment (JRE), without the need for recompilation.\n\nSome key features of Java include:\n\n1. **Object-oriented programming**: Java follows the object-oriented programming paradigm, allowing you to create reusable and modular code using classes and objects.\n2. **Platform independence**: Java's WORA principle enables developers to write code that can run on multiple platforms without modification.\n3. **Strong memory management**: Java has built-in garbage collection, which automatically manages memory allocation and deallocation, reducing the risk of memory leaks and other issues.\n4. **Multithreading support**: Java provides built-in support for multithreading, allowing developers to create concurrent and responsive applications.\n5. **Large standard library**: Java comes with a rich set of libraries and tools, including the Java Collections Framework, which simplifies common programming tasks.\n6. **Extensive ecosystem**: Java has a vast ecosystem of third-party packages, frameworks, and libraries, such as Spring, Hibernate, and JavaFX, which can help you build complex applications more efficiently.\n7. **Strong community and support**: Java has a large and active community, with numerous resources, tutorials, and forums available to help developers learn and troubleshoot issues.\n\nJava is widely used in various domains, such as web development (Spring Framework, Hibernate), enterprise applications, mobile apps (Android), and embedded systems. If you're interested in learning Java, there are many online resources and tutorials available, including the official Oracle Java documentation, Codecademy, Udemy, Coursera, and numerous books and e-books.\n\nTo get started with Java, you'll need to install a Java Development Kit (JDK) on your computer. Once installed, you can use an Integrated Development Environment (IDE) like IntelliJ IDEA, Eclipse, or NetBeans to write, compile, and run your Java code.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:33:05.865-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=929 prompt=943 used=929 remaining=14
[GIN] 2024/11/14 - 09:33:06 | 200 |  179.072125ms |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

/clear
what is ruby?

Clearing the session context gets me back to a smaller prompt and the model responds.

time=2024-11-14T09:35:25.193-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:35:25.194-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:35:25.195-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=943 prompt=12 used=5 remaining=7
[GIN] 2024/11/14 - 09:35:29 | 200 |    4.0158345s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

@bjhargrave commented on GitHub (Nov 14, 2024): I ran `OLLAMA_DEBUG=1 ollama serve` to get more output. Then ran `ollama run granite3-dense:2b` in another window. > what is python? ``` time=2024-11-14T09:32:32.548-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:32:32.549-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:32:32.551-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=276 prompt=12 used=11 remaining=1 [GIN] 2024/11/14 - 09:32:37 | 200 | 4.513843834s | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ``` > what is java? We can see the prompt includes the previous chat round. ``` time=2024-11-14T09:32:45.182-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:32:45.192-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:32:45.199-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=429 prompt=442 used=137 remaining=305 [GIN] 2024/11/14 - 09:32:51 | 200 | 6.233008583s | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ``` > what is ruby? At this point, with the prompt including the previous chats, we may have blown the 4K context window of Granite 3.0. I also got no response here. ``` time=2024-11-14T09:33:05.834-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:33:05.858-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, object-oriented programming language that was developed by Sun Microsystems (now owned by Oracle) in the early 1990s. It was initially designed to create interactive television systems but quickly evolved into a versatile platform for developing various applications. Java is known for its \"write once, run anywhere\" (WORA) principle, which means that compiled Java code can run on any device with a Java Runtime Environment (JRE), without the need for recompilation.\n\nSome key features of Java include:\n\n1. **Object-oriented programming**: Java follows the object-oriented programming paradigm, allowing you to create reusable and modular code using classes and objects.\n2. **Platform independence**: Java's WORA principle enables developers to write code that can run on multiple platforms without modification.\n3. **Strong memory management**: Java has built-in garbage collection, which automatically manages memory allocation and deallocation, reducing the risk of memory leaks and other issues.\n4. **Multithreading support**: Java provides built-in support for multithreading, allowing developers to create concurrent and responsive applications.\n5. **Large standard library**: Java comes with a rich set of libraries and tools, including the Java Collections Framework, which simplifies common programming tasks.\n6. **Extensive ecosystem**: Java has a vast ecosystem of third-party packages, frameworks, and libraries, such as Spring, Hibernate, and JavaFX, which can help you build complex applications more efficiently.\n7. **Strong community and support**: Java has a large and active community, with numerous resources, tutorials, and forums available to help developers learn and troubleshoot issues.\n\nJava is widely used in various domains, such as web development (Spring Framework, Hibernate), enterprise applications, mobile apps (Android), and embedded systems. If you're interested in learning Java, there are many online resources and tutorials available, including the official Oracle Java documentation, Codecademy, Udemy, Coursera, and numerous books and e-books.\n\nTo get started with Java, you'll need to install a Java Development Kit (JDK) on your computer. Once installed, you can use an Integrated Development Environment (IDE) like IntelliJ IDEA, Eclipse, or NetBeans to write, compile, and run your Java code.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:33:05.865-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=929 prompt=943 used=929 remaining=14 [GIN] 2024/11/14 - 09:33:06 | 200 | 179.072125ms | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ``` > /clear > what is ruby? Clearing the session context gets me back to a smaller prompt and the model responds. ``` time=2024-11-14T09:35:25.193-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:35:25.194-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:35:25.195-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=943 prompt=12 used=5 remaining=7 [GIN] 2024/11/14 - 09:35:29 | 200 | 4.0158345s | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ```

GiteaMirror commented

2026-04-22 10:30:48 -05:00

@gabe-l-hart commented on GitHub (Nov 14, 2024):

Alright, I'm going to dig into this this morning. I'm not able to repro with the 8b model, but I was able to repro quickly with 2b:

granite3-dense:8b

$ ollama pull granite3-dense:8b
pulling manifest
pulling 1a9a38833607... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.9 GB
pulling 604785e698e9... 100% ▕████████████████████████████████████████████████████████████████████████▏ 664 B
pulling 43070e2d4e53... 100% ▕████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling d77201a4f430... 100% ▕████████████████████████████████████████████████████████████████████████▏ 417 B
verifying sha256 digest
writing manifest
success
(torchchat2) ghart@Mac [torchchat TokenizersCpp-1251 ?~]$ ollama run granite3-dense:8b

Write a python funciton to compute the fibonocci sequence
Here is a simple Python function that computes the Fibonacci sequence:

def fibonacci(n):
    if n <= 0:
        return "Input should be a positive integer"
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        a, b = 0, 1
        for _ in range(n - 2):
            a, b = b, a + b
        return b

This function takes an integer n as input and returns the nth number in the Fibonacci sequence. The Fibonacci sequence is a
series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1.

Refactor that to golang
Here is the equivalent code in Go:

package main

import "fmt"

func fibonacci(n int) int {
    if n <= 0 {
        return -1 // Return an error value
    } else if n == 1 {
        return 0
    } else if n == 2 {
        return 1
    } else {
        a, b := 0, 1
        for i := 3; i <= n; i++ {
            a, b = b, a + b
        }
        return b
    }
}

func main() {
    fmt.Println(fibonacci(10)) // Replace 10 with the desired input
}

This Go function does the same thing as the Python function: it computes the nth number in the Fibonacci sequence. The function
returns an error value (-1) if the input is less than or equal to 0, and otherwise it returns the nth Fibonacci number.

refactor that to c++
Here is the equivalent code in C++:

#include <iostream>

int fibonacci(int n) {
    if (n <= 0) {
        throw std::invalid_argument("Input should be a positive integer");
    } else if (n == 1) {
        return 0;
    } else if (n == 2) {
        return 1;
    } else {
        int a = 0, b = 1;
        for (int i = 3; i <= n; i++) {
            int temp = a + b;
            a = b;
            b = temp;
        }
        return b;
    }
}

int main() {
    try {
        std::cout << fibonacci(10) << std::endl; // Replace 10 with the desired input
    } catch (const std::invalid_argument& e) {
        std::cerr << e.what() << std::endl;
    }
    return 0;
}

This C++ function does the same thing as the Python and Go functions: it computes the nth number in the Fibonacci sequence. The
function throws an exception if the input is less than or equal to 0, and otherwise it returns the nth Fibonacci number.

now write it in rust
Here is the equivalent code in Rust:

fn fibonacci(n: u32) -> Result<u64, &'static str> {
    if n <= 0 {
        return Err("Input should be a positive integer");
    } else if n == 1 {
        return Ok(0);
    } else if n == 2 {
        return Ok(1);
    } else {
        let mut a = 0;
        let mut b = 1;
        for _ in 3..=n {
            let temp = a + b;
            a = b;
            b = temp;
        }
        Ok(b)
    }
}

fn main() {
    match fibonacci(10) {
        Ok(result) => println!("{}", result),
        Err(e) => println!("{}", e),
    }
}

This Rust function does the same thing as the Python, Go, and C++ functions: it computes the nth number in the Fibonacci
sequence. The function returns a Result type, which can be either an Ok value containing the nth Fibonacci number or an
Err value containing an error message. The function returns an error if the input is less than or equal to 0, and otherwise it
returns the nth Fibonacci number.

let's add java to the mix
Here is the equivalent code in Java:

public class Main {
    public static int fibonacci(int n) {
        if (n <= 0) {
            throw new IllegalArgumentException("Input should be a positive integer");
        } else if (n == 1) {
            return 0;
        } else if (n == 2) {
            return 1;
        } else {
            int a = 0, b = 1;
            for (int i = 3; i <= n; i++) {
                int temp = a + b;
                a = b;
                b = temp;
            }
            return b;
        }
    }

    public static void main(String[] args) {
        try {
            System.out.println(fibonacci(10)); // Replace 10 with the desired input
        } catch (IllegalArgumentException e) {
            System.err.println(e.getMessage());
        }
    }
}

This Java function does the same thing as the Python, Go, C++, and Rust functions: it computes the nth number in the Fibonacci
sequence. The function throws an exception if the input is less than or equal to 0, and otherwise it returns the nth Fibonacci
number.

typescript too?
Here is the equivalent code in TypeScript:

function fibonacci(n: number): number | string {
    if (n <= 0) {
        return "Input should be a positive integer";
    } else if (n === 1) {
        return 0;
    } else if (n === 2) {
        return 1;
    } else {
        let a = 0, b = 1;
        for (let i = 3; i <= n; i++) {
            const temp = a + b;
            a = b;
            b = temp;
        }
        return b;
    }
}

const result = fibonacci(10); // Replace 10 with the desired input
console.log(result);

This TypeScript function does the same thing as the Python, Go, C++, Rust, and Java functions: it computes the nth number in the
Fibonacci sequence. The function returns a number or a string, depending on whether the input is valid or not. If the input is
less than or equal to 0, the function returns a string containing an error message. Otherwise, it returns the nth Fibonacci
number.

oooh, here's a good one: do elixir next!
Here is the equivalent code in Elixir:

defmodule Fibonacci do
  def fib(n) when n <= 0 do
    {:error, "Input should be a positive integer"}
  end

  def fib(1), do: 0
  def fib(2), do: 1
  def fib(n), do: fib_helper(n, 0, 1)

  defp fib_helper(n, a, b) when n == 3 do
    b
  end

  defp fib_helper(n, a, b) do
    fib_helper(n - 1, b, a + b)
  end
end

result = Fibonacci.fib(10) # Replace 10 with the desired input
IO.puts result

This Elixir code defines a module called Fibonacci with a single public function fib/1 that computes the nth number in the
Fibonacci sequence. The function uses pattern matching to handle different cases: if the input is less than or equal to 0, it
returns an error tuple; otherwise, it calls a helper function fib_helper/3 to compute the nth Fibonacci number.

The fib_helper/3 function is a recursive function that takes three arguments: the current value of n, the previous Fibonacci
number a, and the current Fibonacci number b. The function uses pattern matching to handle different cases: if n is equal to
3, it returns b; otherwise, it calls itself with the updated values of n, a, and b.

Finally, the code calls the fib/1 function with the desired input and prints the result using the IO.puts/1 function.

Alright alright, I hear you know cobol too. Do cobol next
I'm sorry for any confusion, but as a helpful assistant, I can provide information and assistance on a wide range of programming
languages, including popular ones like Python, Java, JavaScript, C++, and others. However, I don't have specific knowledge or
capabilities related to COBOL (Common Business-Oriented Language), which is a older programming language primarily used for
business applications.

While I can certainly help you understand the basics of COBOL and provide some examples of how it works, I won't be able to
provide more advanced assistance or write code in COBOL for you. If you have any questions about COBOL or need help understanding
its concepts, I'd be happy to assist you with that!

Nice! I stumped you finally. How about c#. can you do C#?
Yes, I can certainly help you with C# (C Sharp)! Here's an example of how you might write a function to compute the nth number
in the Fibonacci sequence in C#:

using System;

public class Fibonacci
{
    public static int Fibonacci(int n)
    {
        if (n <= 0)
        {
            throw new ArgumentException("Input should be a positive integer");
        }
        else if (n == 1)
        {
            return 0;
        }
        else if (n == 2)
        {
            return 1;
        }
        else
        {
            int a = 0;
            int b = 1;
            for (int i = 3; i <= n; i++)
            {
                int temp = a + b;
                a = b;
                b = temp;
            }
            return b;
        }
    }

    public static void Main()
    {
        int result = Fibonacci(10); // Replace 10 with the desired input
        Console.WriteLine(result);
    }
}

This C# code defines a class called Fibonacci with a single public method Fibonacci/int that computes the nth number in the
Fibonacci sequence. The method uses an if-else statement to handle different cases: if the input is less than or equal to 0, it
throws an exception; otherwise, it uses a loop to compute the nth Fibonacci number and returns the result.

The Main/void method is the entry point of the program, which calls the Fibonacci/int method with the desired input and prints
the result using the Console.WriteLine/string function.

let's see about some gorpy old web programming: do php next
Sure! Here's an example of how you might write a PHP script to compute the nth number in the Fibonacci sequence and display it
on a web page:

<?php
function fibonacci($n) {
    if ($n <= 0) {
        throw new Exception("Input should be a positive integer");
    } elseif ($n === 1) {
        return 0;
    } elseif ($n === 2) {
        return 1;
    } else {
        $a = 0;
        $b = 1;
        for ($i = 3; $i <= $n; $i++) {
            $temp = $a + $b;
            $a = $b;
            $b = $temp;
        }
        return $b;
    }
}

$n = 10; // Replace 10 with the desired input
try {
    $result = fibonacci($n);
    echo "The $n-th number in the Fibonacci sequence is: $result";
} catch (Exception $e) {
    echo $e->getMessage();
}
?>

This PHP script defines a function fibonacci/int that computes the nth number in the Fibonacci sequence, similar to the
previous examples. The function uses an if-else statement to handle different cases and throws an exception if the input is less
than or equal to 0.

The script then calls the fibonacci/int function with the desired input and displays the result using the echo/string
statement. If an exception is thrown, the script catches it and displays the error message using the catch/Exception block.

can you do a minified javascript version?
Sure! Here's a minified version of the JavaScript code I provided earlier:

function f(n){if(n<=0)throw new Error("Input should be a positive integer");else if(n===1)return 0;else if(n===2)return 1;else{let 
a=0,b=1;for(let i=3;i<=n;++i){[a,b]=[b,a+b];}return b;}}
try{console.log("The "+f(10)+"-th number in the Fibonacci sequence is: ",f(10));}catch(e){console.error(e.message);}

This version of the code uses shorthand syntax and omits some whitespace to make it more compact. It still defines a function
f/int that computes the nth number in the Fibonacci sequence, and then calls that function with the input 10 and logs the
result to the console using console.log/string. If an exception is thrown, the script catches it and logs the error message to
the console using console.error/string.

granite3-dense

$ ollama run granite3-dense

Write a python funciton to compute the fibonocci sequence
Here is a simple Python function that computes the Fibonacci sequence:

def fibonacci(n):
    if n <= 0:
        return "Input should be a positive integer."
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        a, b = 0, 1
        for i in range(2, n):
            a, b = b, a + b
        return b

This function takes an integer n as input and returns the nth number in the Fibonacci sequence. The Fibonacci sequence is a
series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1.

Refactor that to golang

@gabe-l-hart commented on GitHub (Nov 14, 2024): Alright, I'm going to dig into this this morning. I'm not able to repro with the `8b` model, but I was able to repro quickly with `2b`: <details> <summary>granite3-dense:8b</summary> $ ollama pull granite3-dense:8b pulling manifest pulling 1a9a38833607... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.9 GB pulling 604785e698e9... 100% ▕████████████████████████████████████████████████████████████████████████▏ 664 B pulling 43070e2d4e53... 100% ▕████████████████████████████████████████████████████████████████████████▏ 11 KB pulling d77201a4f430... 100% ▕████████████████████████████████████████████████████████████████████████▏ 417 B verifying sha256 digest writing manifest success (torchchat2) ghart@Mac [torchchat TokenizersCpp-1251 ?~]$ ollama run granite3-dense:8b >>> Write a python funciton to compute the fibonocci sequence Here is a simple Python function that computes the Fibonacci sequence: ```python def fibonacci(n): if n <= 0: return "Input should be a positive integer" elif n == 1: return 0 elif n == 2: return 1 else: a, b = 0, 1 for _ in range(n - 2): a, b = b, a + b return b ``` This function takes an integer `n` as input and returns the `n`th number in the Fibonacci sequence. The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. >>> Refactor that to golang Here is the equivalent code in Go: ```go package main import "fmt" func fibonacci(n int) int { if n <= 0 { return -1 // Return an error value } else if n == 1 { return 0 } else if n == 2 { return 1 } else { a, b := 0, 1 for i := 3; i <= n; i++ { a, b = b, a + b } return b } } func main() { fmt.Println(fibonacci(10)) // Replace 10 with the desired input } ``` This Go function does the same thing as the Python function: it computes the `n`th number in the Fibonacci sequence. The function returns an error value (`-1`) if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> refactor that to c++ Here is the equivalent code in C++: ```cpp #include <iostream> int fibonacci(int n) { if (n <= 0) { throw std::invalid_argument("Input should be a positive integer"); } else if (n == 1) { return 0; } else if (n == 2) { return 1; } else { int a = 0, b = 1; for (int i = 3; i <= n; i++) { int temp = a + b; a = b; b = temp; } return b; } } int main() { try { std::cout << fibonacci(10) << std::endl; // Replace 10 with the desired input } catch (const std::invalid_argument& e) { std::cerr << e.what() << std::endl; } return 0; } ``` This C++ function does the same thing as the Python and Go functions: it computes the `n`th number in the Fibonacci sequence. The function throws an exception if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> now write it in rust Here is the equivalent code in Rust: ```rust fn fibonacci(n: u32) -> Result<u64, &'static str> { if n <= 0 { return Err("Input should be a positive integer"); } else if n == 1 { return Ok(0); } else if n == 2 { return Ok(1); } else { let mut a = 0; let mut b = 1; for _ in 3..=n { let temp = a + b; a = b; b = temp; } Ok(b) } } fn main() { match fibonacci(10) { Ok(result) => println!("{}", result), Err(e) => println!("{}", e), } } ``` This Rust function does the same thing as the Python, Go, and C++ functions: it computes the `n`th number in the Fibonacci sequence. The function returns a `Result` type, which can be either an `Ok` value containing the `n`th Fibonacci number or an `Err` value containing an error message. The function returns an error if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> let's add java to the mix Here is the equivalent code in Java: ```java public class Main { public static int fibonacci(int n) { if (n <= 0) { throw new IllegalArgumentException("Input should be a positive integer"); } else if (n == 1) { return 0; } else if (n == 2) { return 1; } else { int a = 0, b = 1; for (int i = 3; i <= n; i++) { int temp = a + b; a = b; b = temp; } return b; } } public static void main(String[] args) { try { System.out.println(fibonacci(10)); // Replace 10 with the desired input } catch (IllegalArgumentException e) { System.err.println(e.getMessage()); } } } ``` This Java function does the same thing as the Python, Go, C++, and Rust functions: it computes the `n`th number in the Fibonacci sequence. The function throws an exception if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> typescript too? Here is the equivalent code in TypeScript: ```typescript function fibonacci(n: number): number | string { if (n <= 0) { return "Input should be a positive integer"; } else if (n === 1) { return 0; } else if (n === 2) { return 1; } else { let a = 0, b = 1; for (let i = 3; i <= n; i++) { const temp = a + b; a = b; b = temp; } return b; } } const result = fibonacci(10); // Replace 10 with the desired input console.log(result); ``` This TypeScript function does the same thing as the Python, Go, C++, Rust, and Java functions: it computes the `n`th number in the Fibonacci sequence. The function returns a number or a string, depending on whether the input is valid or not. If the input is less than or equal to 0, the function returns a string containing an error message. Otherwise, it returns the `n`th Fibonacci number. >>> oooh, here's a good one: do elixir next! Here is the equivalent code in Elixir: ```elixir defmodule Fibonacci do def fib(n) when n <= 0 do {:error, "Input should be a positive integer"} end def fib(1), do: 0 def fib(2), do: 1 def fib(n), do: fib_helper(n, 0, 1) defp fib_helper(n, a, b) when n == 3 do b end defp fib_helper(n, a, b) do fib_helper(n - 1, b, a + b) end end result = Fibonacci.fib(10) # Replace 10 with the desired input IO.puts result ``` This Elixir code defines a module called `Fibonacci` with a single public function `fib/1` that computes the `n`th number in the Fibonacci sequence. The function uses pattern matching to handle different cases: if the input is less than or equal to 0, it returns an error tuple; otherwise, it calls a helper function `fib_helper/3` to compute the `n`th Fibonacci number. The `fib_helper/3` function is a recursive function that takes three arguments: the current value of `n`, the previous Fibonacci number `a`, and the current Fibonacci number `b`. The function uses pattern matching to handle different cases: if `n` is equal to 3, it returns `b`; otherwise, it calls itself with the updated values of `n`, `a`, and `b`. Finally, the code calls the `fib/1` function with the desired input and prints the result using the `IO.puts/1` function. >>> Alright alright, I hear you know cobol too. Do cobol next I'm sorry for any confusion, but as a helpful assistant, I can provide information and assistance on a wide range of programming languages, including popular ones like Python, Java, JavaScript, C++, and others. However, I don't have specific knowledge or capabilities related to COBOL (Common Business-Oriented Language), which is a older programming language primarily used for business applications. While I can certainly help you understand the basics of COBOL and provide some examples of how it works, I won't be able to provide more advanced assistance or write code in COBOL for you. If you have any questions about COBOL or need help understanding its concepts, I'd be happy to assist you with that! >>> Nice! I stumped you finally. How about c#. can you do C#? Yes, I can certainly help you with C# (C Sharp)! Here's an example of how you might write a function to compute the `n`th number in the Fibonacci sequence in C#: ```csharp using System; public class Fibonacci { public static int Fibonacci(int n) { if (n <= 0) { throw new ArgumentException("Input should be a positive integer"); } else if (n == 1) { return 0; } else if (n == 2) { return 1; } else { int a = 0; int b = 1; for (int i = 3; i <= n; i++) { int temp = a + b; a = b; b = temp; } return b; } } public static void Main() { int result = Fibonacci(10); // Replace 10 with the desired input Console.WriteLine(result); } } ``` This C# code defines a class called `Fibonacci` with a single public method `Fibonacci/int` that computes the `n`th number in the Fibonacci sequence. The method uses an `if-else` statement to handle different cases: if the input is less than or equal to 0, it throws an exception; otherwise, it uses a loop to compute the `n`th Fibonacci number and returns the result. The `Main/void` method is the entry point of the program, which calls the `Fibonacci/int` method with the desired input and prints the result using the `Console.WriteLine/string` function. >>> let's see about some gorpy old web programming: do php next Sure! Here's an example of how you might write a PHP script to compute the `n`th number in the Fibonacci sequence and display it on a web page: ```php <?php function fibonacci($n) { if ($n <= 0) { throw new Exception("Input should be a positive integer"); } elseif ($n === 1) { return 0; } elseif ($n === 2) { return 1; } else { $a = 0; $b = 1; for ($i = 3; $i <= $n; $i++) { $temp = $a + $b; $a = $b; $b = $temp; } return $b; } } $n = 10; // Replace 10 with the desired input try { $result = fibonacci($n); echo "The $n-th number in the Fibonacci sequence is: $result"; } catch (Exception $e) { echo $e->getMessage(); } ?> ``` This PHP script defines a function `fibonacci/int` that computes the `n`th number in the Fibonacci sequence, similar to the previous examples. The function uses an `if-else` statement to handle different cases and throws an exception if the input is less than or equal to 0. The script then calls the `fibonacci/int` function with the desired input and displays the result using the `echo/string` statement. If an exception is thrown, the script catches it and displays the error message using the `catch/Exception` block. >>> can you do a minified javascript version? Sure! Here's a minified version of the JavaScript code I provided earlier: ```javascript function f(n){if(n<=0)throw new Error("Input should be a positive integer");else if(n===1)return 0;else if(n===2)return 1;else{let a=0,b=1;for(let i=3;i<=n;++i){[a,b]=[b,a+b];}return b;}} try{console.log("The "+f(10)+"-th number in the Fibonacci sequence is: ",f(10));}catch(e){console.error(e.message);} ``` This version of the code uses shorthand syntax and omits some whitespace to make it more compact. It still defines a function `f/int` that computes the `n`th number in the Fibonacci sequence, and then calls that function with the input `10` and logs the result to the console using `console.log/string`. If an exception is thrown, the script catches it and logs the error message to the console using `console.error/string`. </details> <details> <summary>granite3-dense</summary> $ ollama run granite3-dense >>> Write a python funciton to compute the fibonocci sequence Here is a simple Python function that computes the Fibonacci sequence: ```python def fibonacci(n): if n <= 0: return "Input should be a positive integer." elif n == 1: return 0 elif n == 2: return 1 else: a, b = 0, 1 for i in range(2, n): a, b = b, a + b return b ``` This function takes an integer `n` as input and returns the `n`th number in the Fibonacci sequence. The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. >>> Refactor that to golang >>> </details>

GiteaMirror commented

2026-04-22 10:30:49 -05:00

@fbricon commented on GitHub (Nov 14, 2024):

Yup it fails right on the 2nd request.

/// why is the sky blue?
The sky appears blue because of a process called Rayleigh scattering. When the sun's light reaches Earth's atmosphere, it collides with molecules and particles in the air.
Blue light is scattered in all directions more than other colors because it travels in shorter, smaller waves. This scattered blue light is what we see when we look up at
the sky.

/// what is java?

I don't believe we're reaching the context length on that next request. The fact the same model works fine in 0.3.14 tells me there's something really fishy in ollama 0.4.x

@fbricon commented on GitHub (Nov 14, 2024): Yup it fails right on the 2nd request. > /// why is the sky blue? > The sky appears blue because of a process called Rayleigh scattering. When the sun's light reaches Earth's atmosphere, it collides with molecules and particles in the air. > Blue light is scattered in all directions more than other colors because it travels in shorter, smaller waves. This scattered blue light is what we see when we look up at > the sky. > > /// what is java? > > I don't believe we're reaching the context length on that next request. The fact the same model works fine in 0.3.14 tells me there's something really fishy in ollama 0.4.x

GiteaMirror commented

2026-04-22 10:30:49 -05:00

@deboer-tim commented on GitHub (Nov 14, 2024):

+1. Sometimes goes longer, but I've had it fail on the first request a few times.

@deboer-tim commented on GitHub (Nov 14, 2024): +1. Sometimes goes longer, but I've had it fail on the first request a few times.

GiteaMirror commented

2026-04-22 10:30:50 -05:00

@fbricon commented on GitHub (Nov 14, 2024):

:8b failed for me on the 4th request on 0.4.1. Tried 0.3.14, on a session spanning multiple Q&As (way over 4k tokens overall), worked fine

@fbricon commented on GitHub (Nov 14, 2024): :8b failed for me on the 4th request on 0.4.1. Tried 0.3.14, on a session spanning multiple Q&As (way over 4k tokens overall), worked fine

GiteaMirror commented

2026-04-22 10:30:51 -05:00

@gabe-l-hart commented on GitHub (Nov 14, 2024):

I was able to repro on v0.4.0 as well, so it's not in the v0.4.1 delta.

@gabe-l-hart commented on GitHub (Nov 14, 2024): I was able to repro on `v0.4.0` as well, so it's not in the `v0.4.1` delta.

GiteaMirror commented

2026-04-22 10:30:51 -05:00

@gabe-l-hart commented on GitHub (Nov 14, 2024):

Ok, in trying to isolate this, I'm hitting the ollama_llama_runner subprocess directly. It's definitely something that can be repro'ed without the client-side code in the main ollama server.

req1.json

{"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null}

req2.json

{"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003ePython is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It's known for its simplicity and readability, which makes it a great language for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It's widely used in various domains such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is node\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null}

NOTE: I'm using a local build and redirecting to a temp "install" with OLLAMA_HOST, OLLAMA_MODELS, and OLLAMA_TMPDIR

# Terminal 1
./ollama serve

# Terminal 2
./ollama run granite3-dense

# Terminal 3
curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req1.json

curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req2.json

@gabe-l-hart commented on GitHub (Nov 14, 2024): Ok, in trying to isolate this, I'm hitting the `ollama_llama_runner` subprocess directly. It's definitely something that can be repro'ed without the client-side code in the main `ollama` server. <details> <summary>req1.json</summary> ```json {"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null} ``` </details> <details> <summary>req2.json</summary> ```json {"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003ePython is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It's known for its simplicity and readability, which makes it a great language for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It's widely used in various domains such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is node\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null} ``` </details> **NOTE**: I'm using a local build and redirecting to a temp "install" with `OLLAMA_HOST`, `OLLAMA_MODELS`, and `OLLAMA_TMPDIR` ```sh # Terminal 1 ./ollama serve # Terminal 2 ./ollama run granite3-dense # Terminal 3 curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req1.json curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req2.json ```

GiteaMirror commented

2026-04-22 10:30:52 -05:00

@gabe-l-hart commented on GitHub (Nov 14, 2024):

Some more data points:

If I manually run a different GGUF file (e.g. llama3.1), I am not able to repro the failure at all, so this does seem to be specific to granite (and possibly granitemoe, but haven't confirmed)
If I make the calls to the runner with "cache_prompt": false, it's much easier to trigger the repro by just sending the same request over-and-over
If I set the number of GPU layers below the max (i.e. keep the last layer off the GPU), I'm not yet able to repro

@gabe-l-hart commented on GitHub (Nov 14, 2024): Some more data points: * If I manually run a different GGUF file (e.g. `llama3.1`), I am not able to repro the failure at all, so this _does_ seem to be specific to `granite` (and possibly `granitemoe`, but haven't confirmed) * If I make the calls to the runner with `"cache_prompt": false`, it's much easier to trigger the repro by just sending the same request over-and-over * If I set the number of GPU layers _below_ the max (i.e. keep the last layer off the GPU), I'm not yet able to repro

GiteaMirror commented

2026-04-22 10:30:53 -05:00

@gabe-l-hart commented on GitHub (Nov 14, 2024):

I'm unable to repro this with the full-precision version llama run granite3-dense:2b-instruct-fp16, so that points further towards something with lost precision in one of the matrix ops in llama.cpp.

@gabe-l-hart commented on GitHub (Nov 14, 2024): I'm unable to repro this with the full-precision version `llama run granite3-dense:2b-instruct-fp16`, so that points further towards something with lost precision in one of the matrix ops in `llama.cpp`.

GiteaMirror commented

2026-04-22 10:30:54 -05:00

@fbricon commented on GitHub (Nov 18, 2024):

FTR, I still see the same issue with ollama 0.4.2

@fbricon commented on GitHub (Nov 18, 2024): FTR, I still see the same issue with ollama 0.4.2

GiteaMirror commented

2026-04-22 10:30:55 -05:00

@gabe-l-hart commented on GitHub (Nov 19, 2024):

I spent most of today digging into this, but still haven't found the root cause. Some observations from today:

You can repro the issue by running the ollama_llama_server (the "runner") directly, so it's definitely not something in the parent server
When run with --batch-size set to 1, 2, or 10, the problem "magically" goes away, but any other value exhibits the problem of randomly producing a bad EOS on the first token
When a bad EOS is produced, the logits are exactly the same as the first token on "good" results except that the value in index 0 (the value of EOS for Granite) looks like an uninitialized value (some random very large number)
- The random very-large number is static across all occurrences of "bad" calls for a given run running server, but bringing it down and back up will change the value of the random big number
I was able to repro this with the version of the ollama_llama_server built off of 0.3.14 (NB: In this release, the go-based runner existed but was disabled by default)
- This means it's NOT something that was changed in the go-based server between 0.3.14 and 0.4.0 (i.e. it was not added with support for mllama which was the most invasive changeset there)

@gabe-l-hart commented on GitHub (Nov 19, 2024): I spent most of today digging into this, but still haven't found the root cause. Some observations from today: * You can repro the issue by running the `ollama_llama_server` (the "runner") directly, so it's definitely not something in the parent server * When run with `--batch-size` set to `1`, `2`, or `10`, the problem "magically" goes away, but any other value exhibits the problem of randomly producing a bad EOS on the first token * When a bad EOS is produced, the `logits` are exactly the same as the first token on "good" results _except_ that the value in index `0` (the value of EOS for Granite) looks like an uninitialized value (some random very large number) * The random very-large number is static across all occurrences of "bad" calls for a given run running server, but bringing it down and back up will change the value of the random big number * I was able to repro this with the version of the `ollama_llama_server` built off of `0.3.14` (NB: In this release, the go-based runner existed but was disabled by default) * This means it's NOT something that was changed in the go-based server between `0.3.14` and `0.4.0` (i.e. it was not added with support for `mllama` which was the most invasive changeset there)

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#30646