[GH-ISSUE #7656] ollama 0.4 stops answering on granite3-dense but works with 0.3 #30646

Closed
opened 2026-04-22 10:30:37 -05:00 by GiteaMirror · 22 comments
Owner

Originally created by @maxandersen on GitHub (Nov 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7656

What is the issue?

with ollama 0.3.x running ollama run granite-3-dense I can ask multiple questions and get response.

with ollama 0.4.1, I get answer to first question, but then mostly blank responses on any follow up.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.4.1

Originally created by @maxandersen on GitHub (Nov 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7656 ### What is the issue? with ollama 0.3.x running `ollama run granite-3-dense` I can ask multiple questions and get response. with ollama 0.4.1, I get answer to first question, but then mostly blank responses on any follow up. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.4.1
GiteaMirror added the topbug labels 2026-04-22 10:30:37 -05:00
Author
Owner

@fbricon commented on GitHub (Nov 13, 2024):

Same here

<!-- gh-comment-id:2474502903 --> @fbricon commented on GitHub (Nov 13, 2024): Same here
Author
Owner

@maxandersen commented on GitHub (Nov 13, 2024):

example run:

ollama run granite3-dense
>>> what is void?
In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions
but don't produce a result that can be assigned to a variable.

>>> what is python?
Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms,
including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning,
artificial intelligence, and scientific computing.

>>> what is java?
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It
is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can
run on all platforms that support Java without the need for recompilation.

>>> what is ruby?


>>> what is xxx
I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with
information on programming languages and related topics.

>>> what is ruby?


>>> what is python?


with 0.3 this does not seem to happen.

<!-- gh-comment-id:2474504771 --> @maxandersen commented on GitHub (Nov 13, 2024): example run: ``` ollama run granite3-dense >>> what is void? In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions but don't produce a result that can be assigned to a variable. >>> what is python? Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning, artificial intelligence, and scientific computing. >>> what is java? Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. >>> what is ruby? >>> what is xxx I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with information on programming languages and related topics. >>> what is ruby? >>> what is python? ``` with 0.3 this does not seem to happen.
Author
Owner

@maxandersen commented on GitHub (Nov 13, 2024):

example run:

ollama run granite3-dense
>>> what is void?
In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions
but don't produce a result that can be assigned to a variable.

>>> what is python?
Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms,
including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning,
artificial intelligence, and scientific computing.

>>> what is java?
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It
is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can
run on all platforms that support Java without the need for recompilation.

>>> what is ruby?


>>> what is xxx
I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with
information on programming languages and related topics.

>>> what is ruby?


>>> what is python?


with 0.3 this does not seem to happen.

<!-- gh-comment-id:2474504884 --> @maxandersen commented on GitHub (Nov 13, 2024): example run: ``` ollama run granite3-dense >>> what is void? In programming, "void" is a keyword that indicates a function or method doesn't return any value. It's used to define functions that perform actions but don't produce a result that can be assigned to a variable. >>> what is python? Python is a high-level, interpreted programming language known for its readability and simplicity. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, machine learning, artificial intelligence, and scientific computing. >>> what is java? Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. >>> what is ruby? >>> what is xxx I'm sorry, I didn't understand your question. Could you please provide more context or clarify what you're asking about? I'm here to help with information on programming languages and related topics. >>> what is ruby? >>> what is python? ``` with 0.3 this does not seem to happen.
Author
Owner

@deboer-tim commented on GitHub (Nov 13, 2024):

... and same here. Sometimes it looks ok for a bit on v0.4, but then doesn't have any reply to some questions, other times it just 'hangs' after the first few questions and there are no more responses. On 0.3 I can chat for a long time with no obvious failures.

<!-- gh-comment-id:2474506080 --> @deboer-tim commented on GitHub (Nov 13, 2024): ... and same here. Sometimes it looks ok for a bit on v0.4, but then doesn't have any reply to some questions, other times it just 'hangs' after the first few questions and there are no more responses. On 0.3 I can chat for a long time with no obvious failures.
Author
Owner

@maxandersen commented on GitHub (Nov 13, 2024):

here is output and debug log for a run similar to above:

 ollama run granite3-dense
>>> what is java?
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It
is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can
run on all platforms that support Java without the need for recompilation.

>>> what is python?


>>> what is void?
In programming, "void" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++,
and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as
Python, the concept of "void" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning
anything from the function.

>>> what is ruby?


>>> Send a message (/? for help)

debug log:

OLLAMA_DEBUG=1 ollama serve
2024/11/13 20:12:27 routes.go:1189: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/manderse/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: http_proxy: https_proxy: no_proxy:]"
time=2024-11-13T20:12:27.664+01:00 level=INFO source=images.go:755 msg="total blobs: 42"
time=2024-11-13T20:12:27.666+01:00 level=INFO source=images.go:762 msg="total unused blobs removed: 0"
time=2024-11-13T20:12:27.668+01:00 level=INFO source=routes.go:1240 msg="Listening on 127.0.0.1:11434 (version 0.4.1)"
time=2024-11-13T20:12:27.670+01:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners
time=2024-11-13T20:12:27.671+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz
time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server
time=2024-11-13T20:12:27.741+01:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[metal]
time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-11-13T20:12:27.812+01:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="21.3 GiB" available="21.3 GiB"
[GIN] 2024/11/13 - 20:12:52 | 200 |    1.187083ms |       127.0.0.1 | HEAD     "/"
[GIN] 2024/11/13 - 20:12:52 | 200 |    9.720459ms |       127.0.0.1 | POST     "/api/show"
time=2024-11-13T20:12:52.986+01:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x100d5b460 gpu_count=1
time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]"
time=2024-11-13T20:12:52.995+01:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce gpu=0 parallel=4 available=22906503168 required="3.0 GiB"
time=2024-11-13T20:12:52.995+01:00 level=INFO source=server.go:105 msg="system memory" total="32.0 GiB" free="7.6 GiB" free_swap="0 B"
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]"
time=2024-11-13T20:12:52.995+01:00 level=INFO source=memory.go:343 msg="offload to metal" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[21.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.0 GiB" memory.required.partial="3.0 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[3.0 GiB]" memory.weights.total="2.0 GiB" memory.weights.repeating="1.9 GiB" memory.weights.nonrepeating="78.8 MiB" memory.graph.full="426.7 MiB" memory.graph.partial="426.7 MiB"
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server
time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server
time=2024-11-13T20:12:52.997+01:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server --model /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 8 --parallel 4 --port 53441"
time=2024-11-13T20:12:52.997+01:00 level=DEBUG source=server.go:400 msg=subprocess environment="[PATH=/Users/manderse/.rvm/gems/ruby-2.7.1/bin:/Users/manderse/.rvm/gems/ruby-2.7.1@global/bin:/Users/manderse/.rvm/rubies/ruby-2.7.1/bin:/Users/manderse/Library/pnpm:/Users/manderse/.jbang/bin:/Users/manderse/bin/google-cloud-sdk/bin:/Users/manderse/.docker/bin:/Users/manderse/perl5/bin:/Users/manderse/Library/Python/3.10/bin:/Users/manderse/.yarn/bin:/Users/manderse/.config/yarn/global/node_modules/.bin:/Users/manderse/.sdkman/candidates/quarkus/current/bin:/Users/manderse/.sdkman/candidates/mvnd/current/bin:/Users/manderse/.sdkman/candidates/maven/current/bin:/Users/manderse/.sdkman/candidates/kscript/current/bin:/Users/manderse/.sdkman/candidates/kotlin/current/bin:/Users/manderse/.sdkman/candidates/jbang/current/bin:/Users/manderse/.sdkman/candidates/java/current/bin:/Users/manderse/.sdkman/candidates/gradle/current/bin:/Users/manderse/.local/bin:/Users/manderse/.krew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/manderse/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Applications/Little Snitch.app/Contents/Components:/usr/local/munki:/opt/podman/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Users/manderse/.rvm/bin LD_LIBRARY_PATH=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal]"
time=2024-11-13T20:12:53.003+01:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding"
time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
time=2024-11-13T20:12:53.960+01:00 level=INFO source=runner.go:863 msg="starting go runner"
time=2024-11-13T20:12:53.961+01:00 level=INFO source=runner.go:864 msg=system info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = -1 | SVE = -1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = -1 | LLAMAFILE = 1 | cgo(clang)" threads=8
time=2024-11-13T20:12:53.961+01:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:53441"
llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granite
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 3.0 2b Instruct
llama_model_loader: - kv   3:                           general.finetune str              = instruct
llama_model_loader: - kv   4:                           general.basename str              = granite-3.0
llama_model_loader: - kv   5:                         general.size_label str              = 2B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,3]       = ["language", "granite-3.0", "text-gen...
llama_model_loader: - kv   8:                        granite.block_count u32              = 40
llama_model_loader: - kv   9:                     granite.context_length u32              = 4096
llama_model_loader: - kv  10:                   granite.embedding_length u32              = 2048
llama_model_loader: - kv  11:                granite.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:               granite.attention.head_count u32              = 32
llama_model_loader: - kv  13:            granite.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                     granite.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  15:   granite.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 15
llama_model_loader: - kv  17:                         granite.vocab_size u32              = 49155
llama_model_loader: - kv  18:               granite.rope.dimension_count u32              = 64
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  20:                    granite.attention.scale f32              = 0.015625
llama_model_loader: - kv  21:                    granite.embedding_scale f32              = 12.000000
llama_model_loader: - kv  22:                     granite.residual_scale f32              = 0.220000
llama_model_loader: - kv  23:                        granite.logit_scale f32              = 8.000000
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = refact
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,49155]   = ["<|end_of_text|>", "<fim_prefix>", "...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,49155]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,48891]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|start_of_r...
llama_model_loader: - kv  34:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
time=2024-11-13T20:12:54.018+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.2826 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = granite
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49155
llm_load_print_meta: n_merges         = 48891
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 8.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 2.63 B
llm_load_print_meta: model size       = 1.49 GiB (4.86 BPW)
llm_load_print_meta: general.name     = Granite 3.0 2b Instruct
llm_load_print_meta: BOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: EOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: PAD token        = 0 '<|end_of_text|>'
llm_load_print_meta: LF token         = 145 'Ä'
llm_load_print_meta: EOG token        = 0 '<|end_of_text|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: f_embedding_scale = 12.000000
llm_load_print_meta: f_residual_scale  = 0.220000
llm_load_print_meta: f_attention_scale = 0.015625
llm_load_tensors: ggml ctx size =    0.34 MiB
ggml_backend_metal_log_allocated_size: allocated buffer, size =  1472.06 MiB, ( 1472.12 / 21845.34)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size =    54.00 MiB
llm_load_tensors:      Metal buffer size =  1472.05 MiB
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: using embedded metal library
time=2024-11-13T20:12:54.275+01:00 level=DEBUG source=server.go:607 msg="model load progress 1.00"
time=2024-11-13T20:12:54.526+01:00 level=DEBUG source=server.go:610 msg="model load completed, waiting for server to become available" status="llm server loading model"
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
llama_kv_cache_init:      Metal KV buffer size =   640.00 MiB
llama_new_context_with_model: KV self size  =  640.00 MiB, K (f16):  320.00 MiB, V (f16):  320.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.78 MiB
llama_new_context_with_model:      Metal compute buffer size =   544.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    20.01 MiB
llama_new_context_with_model: graph nodes  = 1368
llama_new_context_with_model: graph splits = 2
time=2024-11-13T20:12:55.782+01:00 level=INFO source=server.go:601 msg="llama runner started in 2.78 seconds"
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
[GIN] 2024/11/13 - 20:12:55 | 200 |  2.802275167s |       127.0.0.1 | POST     "/api/generate"
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:466 msg="context for request finished"
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
time=2024-11-13T20:13:02.049+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:02.050+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:02.051+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=12 used=0 remaining=12
[GIN] 2024/11/13 - 20:13:04 | 200 |   2.21301825s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
time=2024-11-13T20:13:08.706+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:08.707+01:00 level=DEBUG source=server.go:955 msg="new runner detected, loading model for cgo tokenization"
llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granite
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 3.0 2b Instruct
llama_model_loader: - kv   3:                           general.finetune str              = instruct
llama_model_loader: - kv   4:                           general.basename str              = granite-3.0
llama_model_loader: - kv   5:                         general.size_label str              = 2B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,3]       = ["language", "granite-3.0", "text-gen...
llama_model_loader: - kv   8:                        granite.block_count u32              = 40
llama_model_loader: - kv   9:                     granite.context_length u32              = 4096
llama_model_loader: - kv  10:                   granite.embedding_length u32              = 2048
llama_model_loader: - kv  11:                granite.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:               granite.attention.head_count u32              = 32
llama_model_loader: - kv  13:            granite.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                     granite.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  15:   granite.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 15
llama_model_loader: - kv  17:                         granite.vocab_size u32              = 49155
llama_model_loader: - kv  18:               granite.rope.dimension_count u32              = 64
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  20:                    granite.attention.scale f32              = 0.015625
llama_model_loader: - kv  21:                    granite.embedding_scale f32              = 12.000000
llama_model_loader: - kv  22:                     granite.residual_scale f32              = 0.220000
llama_model_loader: - kv  23:                        granite.logit_scale f32              = 8.000000
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = refact
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,49155]   = ["<|end_of_text|>", "<fim_prefix>", "...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,49155]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,48891]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|start_of_r...
llama_model_loader: - kv  34:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.2826 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = granite
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49155
llm_load_print_meta: n_merges         = 48891
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 2.63 B
llm_load_print_meta: model size       = 1.49 GiB (4.86 BPW)
llm_load_print_meta: general.name     = Granite 3.0 2b Instruct
llm_load_print_meta: BOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: EOS token        = 0 '<|end_of_text|>'
llm_load_print_meta: PAD token        = 0 '<|end_of_text|>'
llm_load_print_meta: LF token         = 145 'Ä'
llm_load_print_meta: EOG token        = 0 '<|end_of_text|>'
llm_load_print_meta: max token length = 512
llm_load_print_meta: f_embedding_scale = 0.000000
llm_load_print_meta: f_residual_scale  = 0.000000
llm_load_print_meta: f_attention_scale = 0.000000
llama_model_load: vocab only - skipping tensors
time=2024-11-13T20:13:08.775+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:08.777+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=82 prompt=96 used=82 remaining=14
[GIN] 2024/11/13 - 20:13:08 | 200 |  265.054833ms |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
what time=2024-11-13T20:13:19.431+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:19.433+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:19.435+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=96 prompt=110 used=96 remaining=14
[GIN] 2024/11/13 - 20:13:21 | 200 |  1.742458875s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
time=2024-11-13T20:13:26.226+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-13T20:13:26.232+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>In programming, \"void\" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++, and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as Python, the concept of \"void\" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning anything from the function.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-13T20:13:26.234+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=207 prompt=221 used=207 remaining=14
[GIN] 2024/11/13 - 20:13:26 | 200 |  429.776042ms |       127.0.0.1 | POST     "/api/chat"
time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
<!-- gh-comment-id:2474509907 --> @maxandersen commented on GitHub (Nov 13, 2024): here is output and debug log for a run similar to above: ``` ollama run granite3-dense >>> what is java? Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. >>> what is python? >>> what is void? In programming, "void" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++, and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as Python, the concept of "void" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning anything from the function. >>> what is ruby? >>> Send a message (/? for help) ``` debug log: ``` OLLAMA_DEBUG=1 ollama serve 2024/11/13 20:12:27 routes.go:1189: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/manderse/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: http_proxy: https_proxy: no_proxy:]" time=2024-11-13T20:12:27.664+01:00 level=INFO source=images.go:755 msg="total blobs: 42" time=2024-11-13T20:12:27.666+01:00 level=INFO source=images.go:762 msg="total unused blobs removed: 0" time=2024-11-13T20:12:27.668+01:00 level=INFO source=routes.go:1240 msg="Listening on 127.0.0.1:11434 (version 0.4.1)" time=2024-11-13T20:12:27.670+01:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners time=2024-11-13T20:12:27.671+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server time=2024-11-13T20:12:27.741+01:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[metal] time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-11-13T20:12:27.741+01:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-11-13T20:12:27.812+01:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="21.3 GiB" available="21.3 GiB" [GIN] 2024/11/13 - 20:12:52 | 200 | 1.187083ms | 127.0.0.1 | HEAD "/" [GIN] 2024/11/13 - 20:12:52 | 200 | 9.720459ms | 127.0.0.1 | POST "/api/show" time=2024-11-13T20:12:52.986+01:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x100d5b460 gpu_count=1 time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:12:52.994+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]" time=2024-11-13T20:12:52.995+01:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce gpu=0 parallel=4 available=22906503168 required="3.0 GiB" time=2024-11-13T20:12:52.995+01:00 level=INFO source=server.go:105 msg="system memory" total="32.0 GiB" free="7.6 GiB" free_swap="0 B" time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=memory.go:107 msg=evaluating library=metal gpu_count=1 available="[21.3 GiB]" time=2024-11-13T20:12:52.995+01:00 level=INFO source=memory.go:343 msg="offload to metal" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[21.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.0 GiB" memory.required.partial="3.0 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[3.0 GiB]" memory.weights.total="2.0 GiB" memory.weights.repeating="1.9 GiB" memory.weights.nonrepeating="78.8 MiB" memory.graph.full="426.7 MiB" memory.graph.partial="426.7 MiB" time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:168 msg=extracting runner=metal payload=darwin/arm64/metal/ollama_llama_server.gz time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server time=2024-11-13T20:12:52.995+01:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server time=2024-11-13T20:12:52.997+01:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal/ollama_llama_server --model /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 8 --parallel 4 --port 53441" time=2024-11-13T20:12:52.997+01:00 level=DEBUG source=server.go:400 msg=subprocess environment="[PATH=/Users/manderse/.rvm/gems/ruby-2.7.1/bin:/Users/manderse/.rvm/gems/ruby-2.7.1@global/bin:/Users/manderse/.rvm/rubies/ruby-2.7.1/bin:/Users/manderse/Library/pnpm:/Users/manderse/.jbang/bin:/Users/manderse/bin/google-cloud-sdk/bin:/Users/manderse/.docker/bin:/Users/manderse/perl5/bin:/Users/manderse/Library/Python/3.10/bin:/Users/manderse/.yarn/bin:/Users/manderse/.config/yarn/global/node_modules/.bin:/Users/manderse/.sdkman/candidates/quarkus/current/bin:/Users/manderse/.sdkman/candidates/mvnd/current/bin:/Users/manderse/.sdkman/candidates/maven/current/bin:/Users/manderse/.sdkman/candidates/kscript/current/bin:/Users/manderse/.sdkman/candidates/kotlin/current/bin:/Users/manderse/.sdkman/candidates/jbang/current/bin:/Users/manderse/.sdkman/candidates/java/current/bin:/Users/manderse/.sdkman/candidates/gradle/current/bin:/Users/manderse/.local/bin:/Users/manderse/.krew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/manderse/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Applications/Little Snitch.app/Contents/Components:/usr/local/munki:/opt/podman/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Users/manderse/.rvm/bin LD_LIBRARY_PATH=/var/folders/mm/z7zzmyl15bd8byr8vsdxf1740000gn/T/ollama421280676/runners/metal]" time=2024-11-13T20:12:53.003+01:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding" time=2024-11-13T20:12:53.004+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error" time=2024-11-13T20:12:53.960+01:00 level=INFO source=runner.go:863 msg="starting go runner" time=2024-11-13T20:12:53.961+01:00 level=INFO source=runner.go:864 msg=system info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = -1 | SVE = -1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = -1 | LLAMAFILE = 1 | cgo(clang)" threads=8 time=2024-11-13T20:12:53.961+01:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:53441" llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = granite llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Granite 3.0 2b Instruct llama_model_loader: - kv 3: general.finetune str = instruct llama_model_loader: - kv 4: general.basename str = granite-3.0 llama_model_loader: - kv 5: general.size_label str = 2B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.tags arr[str,3] = ["language", "granite-3.0", "text-gen... llama_model_loader: - kv 8: granite.block_count u32 = 40 llama_model_loader: - kv 9: granite.context_length u32 = 4096 llama_model_loader: - kv 10: granite.embedding_length u32 = 2048 llama_model_loader: - kv 11: granite.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: granite.attention.head_count u32 = 32 llama_model_loader: - kv 13: granite.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: granite.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 15: granite.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 15 llama_model_loader: - kv 17: granite.vocab_size u32 = 49155 llama_model_loader: - kv 18: granite.rope.dimension_count u32 = 64 llama_model_loader: - kv 19: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 20: granite.attention.scale f32 = 0.015625 llama_model_loader: - kv 21: granite.embedding_scale f32 = 12.000000 llama_model_loader: - kv 22: granite.residual_scale f32 = 0.220000 llama_model_loader: - kv 23: granite.logit_scale f32 = 8.000000 llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 25: tokenizer.ggml.pre str = refact llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,49155] = ["<|end_of_text|>", "<fim_prefix>", "... llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,49155] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,48891] = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ... llama_model_loader: - kv 29: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 0 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|start_of_r... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_K: 241 tensors llama_model_loader: - type q6_K: 41 tensors time=2024-11-13T20:12:54.018+01:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 22 llm_load_vocab: token to piece cache size = 0.2826 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = granite llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 49155 llm_load_print_meta: n_merges = 48891 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 512 llm_load_print_meta: n_embd_v_gqa = 512 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 8.0e+00 llm_load_print_meta: n_ff = 8192 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 2.63 B llm_load_print_meta: model size = 1.49 GiB (4.86 BPW) llm_load_print_meta: general.name = Granite 3.0 2b Instruct llm_load_print_meta: BOS token = 0 '<|end_of_text|>' llm_load_print_meta: EOS token = 0 '<|end_of_text|>' llm_load_print_meta: PAD token = 0 '<|end_of_text|>' llm_load_print_meta: LF token = 145 'Ä' llm_load_print_meta: EOG token = 0 '<|end_of_text|>' llm_load_print_meta: max token length = 512 llm_load_print_meta: f_embedding_scale = 12.000000 llm_load_print_meta: f_residual_scale = 0.220000 llm_load_print_meta: f_attention_scale = 0.015625 llm_load_tensors: ggml ctx size = 0.34 MiB ggml_backend_metal_log_allocated_size: allocated buffer, size = 1472.06 MiB, ( 1472.12 / 21845.34) llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 41/41 layers to GPU llm_load_tensors: CPU buffer size = 54.00 MiB llm_load_tensors: Metal buffer size = 1472.05 MiB llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Max ggml_metal_init: picking default device: Apple M1 Max ggml_metal_init: using embedded metal library time=2024-11-13T20:12:54.275+01:00 level=DEBUG source=server.go:607 msg="model load progress 1.00" time=2024-11-13T20:12:54.526+01:00 level=DEBUG source=server.go:610 msg="model load completed, waiting for server to become available" status="llm server loading model" ggml_metal_init: GPU name: Apple M1 Max ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction support = true ggml_metal_init: simdgroup matrix mul. support = true ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB llama_kv_cache_init: Metal KV buffer size = 640.00 MiB llama_new_context_with_model: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB llama_new_context_with_model: CPU output buffer size = 0.78 MiB llama_new_context_with_model: Metal compute buffer size = 544.00 MiB llama_new_context_with_model: CPU compute buffer size = 20.01 MiB llama_new_context_with_model: graph nodes = 1368 llama_new_context_with_model: graph splits = 2 time=2024-11-13T20:12:55.782+01:00 level=INFO source=server.go:601 msg="llama runner started in 2.78 seconds" time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce [GIN] 2024/11/13 - 20:12:55 | 200 | 2.802275167s | 127.0.0.1 | POST "/api/generate" time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:466 msg="context for request finished" time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:12:55.782+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 time=2024-11-13T20:13:02.049+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:02.050+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:02.051+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=12 used=0 remaining=12 [GIN] 2024/11/13 - 20:13:04 | 200 | 2.21301825s | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:04.257+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 time=2024-11-13T20:13:08.706+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:08.707+01:00 level=DEBUG source=server.go:955 msg="new runner detected, loading model for cgo tokenization" llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = granite llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Granite 3.0 2b Instruct llama_model_loader: - kv 3: general.finetune str = instruct llama_model_loader: - kv 4: general.basename str = granite-3.0 llama_model_loader: - kv 5: general.size_label str = 2B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.tags arr[str,3] = ["language", "granite-3.0", "text-gen... llama_model_loader: - kv 8: granite.block_count u32 = 40 llama_model_loader: - kv 9: granite.context_length u32 = 4096 llama_model_loader: - kv 10: granite.embedding_length u32 = 2048 llama_model_loader: - kv 11: granite.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: granite.attention.head_count u32 = 32 llama_model_loader: - kv 13: granite.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: granite.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 15: granite.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 15 llama_model_loader: - kv 17: granite.vocab_size u32 = 49155 llama_model_loader: - kv 18: granite.rope.dimension_count u32 = 64 llama_model_loader: - kv 19: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 20: granite.attention.scale f32 = 0.015625 llama_model_loader: - kv 21: granite.embedding_scale f32 = 12.000000 llama_model_loader: - kv 22: granite.residual_scale f32 = 0.220000 llama_model_loader: - kv 23: granite.logit_scale f32 = 8.000000 llama_model_loader: - kv 24: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 25: tokenizer.ggml.pre str = refact llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,49155] = ["<|end_of_text|>", "<fim_prefix>", "... llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,49155] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 28: tokenizer.ggml.merges arr[str,48891] = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ... llama_model_loader: - kv 29: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 0 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|start_of_r... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_K: 241 tensors llama_model_loader: - type q6_K: 41 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 22 llm_load_vocab: token to piece cache size = 0.2826 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = granite llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 49155 llm_load_print_meta: n_merges = 48891 llm_load_print_meta: vocab_only = 1 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 2.63 B llm_load_print_meta: model size = 1.49 GiB (4.86 BPW) llm_load_print_meta: general.name = Granite 3.0 2b Instruct llm_load_print_meta: BOS token = 0 '<|end_of_text|>' llm_load_print_meta: EOS token = 0 '<|end_of_text|>' llm_load_print_meta: PAD token = 0 '<|end_of_text|>' llm_load_print_meta: LF token = 145 'Ä' llm_load_print_meta: EOG token = 0 '<|end_of_text|>' llm_load_print_meta: max token length = 512 llm_load_print_meta: f_embedding_scale = 0.000000 llm_load_print_meta: f_residual_scale = 0.000000 llm_load_print_meta: f_attention_scale = 0.000000 llama_model_load: vocab only - skipping tensors time=2024-11-13T20:13:08.775+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:08.777+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=82 prompt=96 used=82 remaining=14 [GIN] 2024/11/13 - 20:13:08 | 200 | 265.054833ms | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:08.965+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 what time=2024-11-13T20:13:19.431+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:19.433+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:19.435+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=96 prompt=110 used=96 remaining=14 [GIN] 2024/11/13 - 20:13:21 | 200 | 1.742458875s | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:21.168+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 time=2024-11-13T20:13:26.226+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-13T20:13:26.232+01:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let application developers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|><|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is void?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>In programming, \"void\" is a keyword used to indicate that a function or method does not return a value. It is often used in languages like C, C++, and Java to define functions that perform an action but do not produce a result that can be assigned to a variable. In some languages, such as Python, the concept of \"void\" is not explicitly supported, but similar functionality can be achieved using the `None` value or by not returning anything from the function.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-13T20:13:26.234+01:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=207 prompt=221 used=207 remaining=14 [GIN] 2024/11/13 - 20:13:26 | 200 | 429.776042ms | 127.0.0.1 | POST "/api/chat" time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-13T20:13:26.649+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/manderse/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ```
Author
Owner

@fbricon commented on GitHub (Nov 13, 2024):

Maybe @gabe-l-hart, @kellyaa or @bjhargrave can help?

<!-- gh-comment-id:2474965779 --> @fbricon commented on GitHub (Nov 13, 2024): Maybe @gabe-l-hart, @kellyaa or @bjhargrave can help?
Author
Owner

@jessegross commented on GitHub (Nov 13, 2024):

@fbricon @deboer-tim Are you also seeing this on Macs?

<!-- gh-comment-id:2474979434 --> @jessegross commented on GitHub (Nov 13, 2024): @fbricon @deboer-tim Are you also seeing this on Macs?
Author
Owner

@deboer-tim commented on GitHub (Nov 13, 2024):

Yes. I have M1 Max 32Gb, Fred is M3 I think.

<!-- gh-comment-id:2475021070 --> @deboer-tim commented on GitHub (Nov 13, 2024): Yes. I have M1 Max 32Gb, Fred is M3 I think.
Author
Owner

@fbricon commented on GitHub (Nov 13, 2024):

Yup, M3 Pro 36GB here

<!-- gh-comment-id:2475055091 --> @fbricon commented on GitHub (Nov 13, 2024): Yup, M3 Pro 36GB here
Author
Owner

@derek-assurity commented on GitHub (Nov 14, 2024):

Seeing the same thing on Linux with non-Granite model - this is running llama3.2:1b:

ollama --version
ollama version is 0.4.1
ov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.484Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.488Z level=INFO source=runner.go:863 msg="starting go runner"
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=runner.go:864 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:42781"
Nov 14 03:53:11  ollama[2812]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
Nov 14 03:53:11  ollama[2812]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   1:                               general.type str              = model
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   3:                           general.finetune str              = Instruct
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   5:                         general.size_label str              = 1B
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   8:                          llama.block_count u32              = 16
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  18:                          general.file_type u32              = 7
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - kv  29:               general.quantization_version u32              = 2
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - type  f32:   34 tensors
Nov 14 03:53:11  ollama[2812]: llama_model_loader: - type q8_0:  113 tensors
Nov 14 03:53:11  ollama[2812]: time=2024-11-14T03:53:11.736Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
Nov 14 03:53:12  ollama[2812]: llm_load_vocab: special tokens cache size = 256
Nov 14 03:53:12  ollama[2812]: llm_load_vocab: token to piece cache size = 0.7999 MB
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: format           = GGUF V3 (latest)
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: arch             = llama
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: vocab type       = BPE
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_vocab          = 128256
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_merges         = 280147
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: vocab_only       = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_ctx_train      = 131072
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd           = 2048
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_layer          = 16
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_head           = 32
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_head_kv        = 8
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_rot            = 64
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_swa            = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_head_k    = 64
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_head_v    = 64
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_gqa            = 4
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_k_gqa     = 512
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_embd_v_gqa     = 512
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_ff             = 8192
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_expert         = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_expert_used    = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: causal attn      = 1
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: pooling type     = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: rope type        = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: rope scaling     = linear
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: freq_base_train  = 500000.0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: freq_scale_train = 1
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: n_ctx_orig_yarn  = 131072
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: rope_finetuned   = unknown
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_d_conv       = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_d_inner      = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_d_state      = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_dt_rank      = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model type       = 1B
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model ftype      = Q8_0
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model params     = 1.24 B
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW)
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: LF token         = 128 'Ä'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
Nov 14 03:53:12  ollama[2812]: llm_load_print_meta: max token length = 256
Nov 14 03:53:12  ollama[2812]: llm_load_tensors: ggml ctx size =    0.07 MiB
Nov 14 03:53:12  ollama[2812]: llm_load_tensors:        CPU buffer size =  1518.57 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: n_ctx      = 8192
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: n_batch    = 2048
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: n_ubatch   = 512
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: flash_attn = 0
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: freq_base  = 500000.0
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: freq_scale = 1
Nov 14 03:53:13  ollama[2812]: llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model:        CPU  output buffer size =     1.99 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model:        CPU compute buffer size =   544.01 MiB
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: graph nodes  = 518
Nov 14 03:53:13  ollama[2812]: llama_new_context_with_model: graph splits = 1
<!-- gh-comment-id:2475370798 --> @derek-assurity commented on GitHub (Nov 14, 2024): Seeing the same thing on Linux with non-Granite model - this is running `llama3.2:1b`: ``` ollama --version ollama version is 0.4.1 ``` ``` ov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.484Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error" Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.488Z level=INFO source=runner.go:863 msg="starting go runner" Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=runner.go:864 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.489Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:42781" Nov 14 03:53:11 ollama[2812]: llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) Nov 14 03:53:11 ollama[2812]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 0: general.architecture str = llama Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 1: general.type str = model Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 3: general.finetune str = Instruct Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 4: general.basename str = Llama-3.2 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 5: general.size_label str = 1B Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 8: llama.block_count u32 = 16 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 9: llama.context_length u32 = 131072 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 18: general.file_type u32 = 7 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Nov 14 03:53:11 ollama[2812]: llama_model_loader: - kv 29: general.quantization_version u32 = 2 Nov 14 03:53:11 ollama[2812]: llama_model_loader: - type f32: 34 tensors Nov 14 03:53:11 ollama[2812]: llama_model_loader: - type q8_0: 113 tensors Nov 14 03:53:11 ollama[2812]: time=2024-11-14T03:53:11.736Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model" Nov 14 03:53:12 ollama[2812]: llm_load_vocab: special tokens cache size = 256 Nov 14 03:53:12 ollama[2812]: llm_load_vocab: token to piece cache size = 0.7999 MB Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: format = GGUF V3 (latest) Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: arch = llama Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: vocab type = BPE Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_vocab = 128256 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_merges = 280147 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: vocab_only = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_ctx_train = 131072 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd = 2048 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_layer = 16 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_head = 32 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_head_kv = 8 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_rot = 64 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_swa = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_head_k = 64 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_head_v = 64 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_gqa = 4 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_k_gqa = 512 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_embd_v_gqa = 512 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_norm_eps = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: f_logit_scale = 0.0e+00 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_ff = 8192 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_expert = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_expert_used = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: causal attn = 1 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: pooling type = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: rope type = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: rope scaling = linear Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: freq_base_train = 500000.0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: freq_scale_train = 1 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: rope_finetuned = unknown Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_d_conv = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_d_inner = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_d_state = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_dt_rank = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model type = 1B Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model ftype = Q8_0 Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model params = 1.24 B Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: general.name = Llama 3.2 1B Instruct Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: LF token = 128 'Ä' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOM token = 128008 '<|eom_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOG token = 128008 '<|eom_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: EOG token = 128009 '<|eot_id|>' Nov 14 03:53:12 ollama[2812]: llm_load_print_meta: max token length = 256 Nov 14 03:53:12 ollama[2812]: llm_load_tensors: ggml ctx size = 0.07 MiB Nov 14 03:53:12 ollama[2812]: llm_load_tensors: CPU buffer size = 1518.57 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: n_ctx = 8192 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: n_batch = 2048 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: n_ubatch = 512 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: flash_attn = 0 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: freq_base = 500000.0 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: freq_scale = 1 Nov 14 03:53:13 ollama[2812]: llama_kv_cache_init: CPU KV buffer size = 256.00 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: CPU output buffer size = 1.99 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: CPU compute buffer size = 544.01 MiB Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: graph nodes = 518 Nov 14 03:53:13 ollama[2812]: llama_new_context_with_model: graph splits = 1 ```
Author
Owner

@jessegross commented on GitHub (Nov 14, 2024):

@derek-assurity Is that the full log? I don't see any requests coming - if so, it may not be the same issue.

<!-- gh-comment-id:2475453068 --> @jessegross commented on GitHub (Nov 14, 2024): @derek-assurity Is that the full log? I don't see any requests coming - if so, it may not be the same issue.
Author
Owner

@bjhargrave commented on GitHub (Nov 14, 2024):

I ran OLLAMA_DEBUG=1 ollama serve to get more output. Then ran ollama run granite3-dense:2b in another window.

what is python?

time=2024-11-14T09:32:32.548-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:32:32.549-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:32:32.551-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=276 prompt=12 used=11 remaining=1
[GIN] 2024/11/14 - 09:32:37 | 200 |  4.513843834s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

what is java?

We can see the prompt includes the previous chat round.

time=2024-11-14T09:32:45.182-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:32:45.192-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:32:45.199-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=429 prompt=442 used=137 remaining=305
[GIN] 2024/11/14 - 09:32:51 | 200 |  6.233008583s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

what is ruby?

At this point, with the prompt including the previous chats, we may have blown the 4K context window of Granite 3.0. I also got no response here.

time=2024-11-14T09:33:05.834-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:33:05.858-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, object-oriented programming language that was developed by Sun Microsystems (now owned by Oracle) in the early 1990s. It was initially designed to create interactive television systems but quickly evolved into a versatile platform for developing various applications. Java is known for its \"write once, run anywhere\" (WORA) principle, which means that compiled Java code can run on any device with a Java Runtime Environment (JRE), without the need for recompilation.\n\nSome key features of Java include:\n\n1. **Object-oriented programming**: Java follows the object-oriented programming paradigm, allowing you to create reusable and modular code using classes and objects.\n2. **Platform independence**: Java's WORA principle enables developers to write code that can run on multiple platforms without modification.\n3. **Strong memory management**: Java has built-in garbage collection, which automatically manages memory allocation and deallocation, reducing the risk of memory leaks and other issues.\n4. **Multithreading support**: Java provides built-in support for multithreading, allowing developers to create concurrent and responsive applications.\n5. **Large standard library**: Java comes with a rich set of libraries and tools, including the Java Collections Framework, which simplifies common programming tasks.\n6. **Extensive ecosystem**: Java has a vast ecosystem of third-party packages, frameworks, and libraries, such as Spring, Hibernate, and JavaFX, which can help you build complex applications more efficiently.\n7. **Strong community and support**: Java has a large and active community, with numerous resources, tutorials, and forums available to help developers learn and troubleshoot issues.\n\nJava is widely used in various domains, such as web development (Spring Framework, Hibernate), enterprise applications, mobile apps (Android), and embedded systems. If you're interested in learning Java, there are many online resources and tutorials available, including the official Oracle Java documentation, Codecademy, Udemy, Coursera, and numerous books and e-books.\n\nTo get started with Java, you'll need to install a Java Development Kit (JDK) on your computer. Once installed, you can use an Integrated Development Environment (IDE) like IntelliJ IDEA, Eclipse, or NetBeans to write, compile, and run your Java code.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:33:05.865-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=929 prompt=943 used=929 remaining=14
[GIN] 2024/11/14 - 09:33:06 | 200 |  179.072125ms |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

/clear
what is ruby?

Clearing the session context gets me back to a smaller prompt and the model responds.

time=2024-11-14T09:35:25.193-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-14T09:35:25.194-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
time=2024-11-14T09:35:25.195-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=943 prompt=12 used=5 remaining=7
[GIN] 2024/11/14 - 09:35:29 | 200 |    4.0158345s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s
time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
<!-- gh-comment-id:2476527909 --> @bjhargrave commented on GitHub (Nov 14, 2024): I ran `OLLAMA_DEBUG=1 ollama serve` to get more output. Then ran `ollama run granite3-dense:2b` in another window. > what is python? ``` time=2024-11-14T09:32:32.548-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:32:32.549-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:32:32.551-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=276 prompt=12 used=11 remaining=1 [GIN] 2024/11/14 - 09:32:37 | 200 | 4.513843834s | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:32:37.048-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ``` > what is java? We can see the prompt includes the previous chat round. ``` time=2024-11-14T09:32:45.182-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:32:45.192-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:32:45.199-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=429 prompt=442 used=137 remaining=305 [GIN] 2024/11/14 - 09:32:51 | 200 | 6.233008583s | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:32:51.407-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ``` > what is ruby? At this point, with the prompt including the previous chats, we may have blown the 4K context window of Granite 3.0. I also got no response here. ``` time=2024-11-14T09:33:05.834-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:33:05.858-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is python?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Python is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It is known for its simplicity and readability, which makes it a great choice for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It has a large standard library and a vast ecosystem of third-party packages, making it suitable for a wide range of applications, such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\n\nSome key features of Python include:\n\n1. **Easy to learn and read**: Python's syntax is clean and easy to understand, which makes it an excellent choice for beginners and experienced programmers alike.\n2. **Cross-platform compatibility**: Python runs on various operating systems, including Windows, macOS, Linux, and others, without requiring any modifications to the code.\n3. **Rich standard library**: Python comes with a vast collection of built-in modules and functions that cover many common programming tasks, reducing the need for external libraries.\n4. **Large community and ecosystem**: Python has a massive and active community, which contributes to its continuous development and improvement. This also means you can find numerous resources, tutorials, and third-party packages to help you with your projects.\n5. **Versatile applications**: Python is used in various domains, such as web development (Django, Flask), data analysis (Pandas, NumPy), machine learning (TensorFlow, PyTorch), scientific computing (SciPy, Matplotlib), and automation tasks (Python's built-in libraries like os, shutil, and subprocess).\n\nIf you're interested in learning Python, there are many online resources and tutorials available to help you get started. Some popular ones include Codecademy, Udemy, Coursera, and the official Python documentation. Additionally, there are numerous books and e-books available for both beginners and advanced users.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is java?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>Java is a high-level, object-oriented programming language that was developed by Sun Microsystems (now owned by Oracle) in the early 1990s. It was initially designed to create interactive television systems but quickly evolved into a versatile platform for developing various applications. Java is known for its \"write once, run anywhere\" (WORA) principle, which means that compiled Java code can run on any device with a Java Runtime Environment (JRE), without the need for recompilation.\n\nSome key features of Java include:\n\n1. **Object-oriented programming**: Java follows the object-oriented programming paradigm, allowing you to create reusable and modular code using classes and objects.\n2. **Platform independence**: Java's WORA principle enables developers to write code that can run on multiple platforms without modification.\n3. **Strong memory management**: Java has built-in garbage collection, which automatically manages memory allocation and deallocation, reducing the risk of memory leaks and other issues.\n4. **Multithreading support**: Java provides built-in support for multithreading, allowing developers to create concurrent and responsive applications.\n5. **Large standard library**: Java comes with a rich set of libraries and tools, including the Java Collections Framework, which simplifies common programming tasks.\n6. **Extensive ecosystem**: Java has a vast ecosystem of third-party packages, frameworks, and libraries, such as Spring, Hibernate, and JavaFX, which can help you build complex applications more efficiently.\n7. **Strong community and support**: Java has a large and active community, with numerous resources, tutorials, and forums available to help developers learn and troubleshoot issues.\n\nJava is widely used in various domains, such as web development (Spring Framework, Hibernate), enterprise applications, mobile apps (Android), and embedded systems. If you're interested in learning Java, there are many online resources and tutorials available, including the official Oracle Java documentation, Codecademy, Udemy, Coursera, and numerous books and e-books.\n\nTo get started with Java, you'll need to install a Java Development Kit (JDK) on your computer. Once installed, you can use an Integrated Development Environment (IDE) like IntelliJ IDEA, Eclipse, or NetBeans to write, compile, and run your Java code.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:33:05.865-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=929 prompt=943 used=929 remaining=14 [GIN] 2024/11/14 - 09:33:06 | 200 | 179.072125ms | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:33:06.003-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ``` > /clear > what is ruby? Clearing the session context gets me back to a smaller prompt and the model responds. ``` time=2024-11-14T09:35:25.193-05:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce time=2024-11-14T09:35:25.194-05:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_of_role|>user<|end_of_role|>what is ruby?<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>" time=2024-11-14T09:35:25.195-05:00 level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=943 prompt=12 used=5 remaining=7 [GIN] 2024/11/14 - 09:35:29 | 200 | 4.0158345s | 127.0.0.1 | POST "/api/chat" time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:407 msg="context for request finished" time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=5m0s time=2024-11-14T09:35:29.194-05:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/Users/hargrave/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0 ```
Author
Owner

@gabe-l-hart commented on GitHub (Nov 14, 2024):

Alright, I'm going to dig into this this morning. I'm not able to repro with the 8b model, but I was able to repro quickly with 2b:

granite3-dense:8b

$ ollama pull granite3-dense:8b
pulling manifest
pulling 1a9a38833607... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.9 GB
pulling 604785e698e9... 100% ▕████████████████████████████████████████████████████████████████████████▏ 664 B
pulling 43070e2d4e53... 100% ▕████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling d77201a4f430... 100% ▕████████████████████████████████████████████████████████████████████████▏ 417 B
verifying sha256 digest
writing manifest
success
(torchchat2) ghart@Mac [torchchat TokenizersCpp-1251 ?~]$ ollama run granite3-dense:8b

Write a python funciton to compute the fibonocci sequence
Here is a simple Python function that computes the Fibonacci sequence:

def fibonacci(n):
    if n <= 0:
        return "Input should be a positive integer"
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        a, b = 0, 1
        for _ in range(n - 2):
            a, b = b, a + b
        return b

This function takes an integer n as input and returns the nth number in the Fibonacci sequence. The Fibonacci sequence is a
series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1.

Refactor that to golang
Here is the equivalent code in Go:

package main

import "fmt"

func fibonacci(n int) int {
    if n <= 0 {
        return -1 // Return an error value
    } else if n == 1 {
        return 0
    } else if n == 2 {
        return 1
    } else {
        a, b := 0, 1
        for i := 3; i <= n; i++ {
            a, b = b, a + b
        }
        return b
    }
}

func main() {
    fmt.Println(fibonacci(10)) // Replace 10 with the desired input
}

This Go function does the same thing as the Python function: it computes the nth number in the Fibonacci sequence. The function
returns an error value (-1) if the input is less than or equal to 0, and otherwise it returns the nth Fibonacci number.

refactor that to c++
Here is the equivalent code in C++:

#include <iostream>

int fibonacci(int n) {
    if (n <= 0) {
        throw std::invalid_argument("Input should be a positive integer");
    } else if (n == 1) {
        return 0;
    } else if (n == 2) {
        return 1;
    } else {
        int a = 0, b = 1;
        for (int i = 3; i <= n; i++) {
            int temp = a + b;
            a = b;
            b = temp;
        }
        return b;
    }
}

int main() {
    try {
        std::cout << fibonacci(10) << std::endl; // Replace 10 with the desired input
    } catch (const std::invalid_argument& e) {
        std::cerr << e.what() << std::endl;
    }
    return 0;
}

This C++ function does the same thing as the Python and Go functions: it computes the nth number in the Fibonacci sequence. The
function throws an exception if the input is less than or equal to 0, and otherwise it returns the nth Fibonacci number.

now write it in rust
Here is the equivalent code in Rust:

fn fibonacci(n: u32) -> Result<u64, &'static str> {
    if n <= 0 {
        return Err("Input should be a positive integer");
    } else if n == 1 {
        return Ok(0);
    } else if n == 2 {
        return Ok(1);
    } else {
        let mut a = 0;
        let mut b = 1;
        for _ in 3..=n {
            let temp = a + b;
            a = b;
            b = temp;
        }
        Ok(b)
    }
}

fn main() {
    match fibonacci(10) {
        Ok(result) => println!("{}", result),
        Err(e) => println!("{}", e),
    }
}

This Rust function does the same thing as the Python, Go, and C++ functions: it computes the nth number in the Fibonacci
sequence. The function returns a Result type, which can be either an Ok value containing the nth Fibonacci number or an
Err value containing an error message. The function returns an error if the input is less than or equal to 0, and otherwise it
returns the nth Fibonacci number.

let's add java to the mix
Here is the equivalent code in Java:

public class Main {
    public static int fibonacci(int n) {
        if (n <= 0) {
            throw new IllegalArgumentException("Input should be a positive integer");
        } else if (n == 1) {
            return 0;
        } else if (n == 2) {
            return 1;
        } else {
            int a = 0, b = 1;
            for (int i = 3; i <= n; i++) {
                int temp = a + b;
                a = b;
                b = temp;
            }
            return b;
        }
    }

    public static void main(String[] args) {
        try {
            System.out.println(fibonacci(10)); // Replace 10 with the desired input
        } catch (IllegalArgumentException e) {
            System.err.println(e.getMessage());
        }
    }
}

This Java function does the same thing as the Python, Go, C++, and Rust functions: it computes the nth number in the Fibonacci
sequence. The function throws an exception if the input is less than or equal to 0, and otherwise it returns the nth Fibonacci
number.

typescript too?
Here is the equivalent code in TypeScript:

function fibonacci(n: number): number | string {
    if (n <= 0) {
        return "Input should be a positive integer";
    } else if (n === 1) {
        return 0;
    } else if (n === 2) {
        return 1;
    } else {
        let a = 0, b = 1;
        for (let i = 3; i <= n; i++) {
            const temp = a + b;
            a = b;
            b = temp;
        }
        return b;
    }
}

const result = fibonacci(10); // Replace 10 with the desired input
console.log(result);

This TypeScript function does the same thing as the Python, Go, C++, Rust, and Java functions: it computes the nth number in the
Fibonacci sequence. The function returns a number or a string, depending on whether the input is valid or not. If the input is
less than or equal to 0, the function returns a string containing an error message. Otherwise, it returns the nth Fibonacci
number.

oooh, here's a good one: do elixir next!
Here is the equivalent code in Elixir:

defmodule Fibonacci do
  def fib(n) when n <= 0 do
    {:error, "Input should be a positive integer"}
  end

  def fib(1), do: 0
  def fib(2), do: 1
  def fib(n), do: fib_helper(n, 0, 1)

  defp fib_helper(n, a, b) when n == 3 do
    b
  end

  defp fib_helper(n, a, b) do
    fib_helper(n - 1, b, a + b)
  end
end

result = Fibonacci.fib(10) # Replace 10 with the desired input
IO.puts result

This Elixir code defines a module called Fibonacci with a single public function fib/1 that computes the nth number in the
Fibonacci sequence. The function uses pattern matching to handle different cases: if the input is less than or equal to 0, it
returns an error tuple; otherwise, it calls a helper function fib_helper/3 to compute the nth Fibonacci number.

The fib_helper/3 function is a recursive function that takes three arguments: the current value of n, the previous Fibonacci
number a, and the current Fibonacci number b. The function uses pattern matching to handle different cases: if n is equal to
3, it returns b; otherwise, it calls itself with the updated values of n, a, and b.

Finally, the code calls the fib/1 function with the desired input and prints the result using the IO.puts/1 function.

Alright alright, I hear you know cobol too. Do cobol next
I'm sorry for any confusion, but as a helpful assistant, I can provide information and assistance on a wide range of programming
languages, including popular ones like Python, Java, JavaScript, C++, and others. However, I don't have specific knowledge or
capabilities related to COBOL (Common Business-Oriented Language), which is a older programming language primarily used for
business applications.

While I can certainly help you understand the basics of COBOL and provide some examples of how it works, I won't be able to
provide more advanced assistance or write code in COBOL for you. If you have any questions about COBOL or need help understanding
its concepts, I'd be happy to assist you with that!

Nice! I stumped you finally. How about c#. can you do C#?
Yes, I can certainly help you with C# (C Sharp)! Here's an example of how you might write a function to compute the nth number
in the Fibonacci sequence in C#:

using System;

public class Fibonacci
{
    public static int Fibonacci(int n)
    {
        if (n <= 0)
        {
            throw new ArgumentException("Input should be a positive integer");
        }
        else if (n == 1)
        {
            return 0;
        }
        else if (n == 2)
        {
            return 1;
        }
        else
        {
            int a = 0;
            int b = 1;
            for (int i = 3; i <= n; i++)
            {
                int temp = a + b;
                a = b;
                b = temp;
            }
            return b;
        }
    }

    public static void Main()
    {
        int result = Fibonacci(10); // Replace 10 with the desired input
        Console.WriteLine(result);
    }
}

This C# code defines a class called Fibonacci with a single public method Fibonacci/int that computes the nth number in the
Fibonacci sequence. The method uses an if-else statement to handle different cases: if the input is less than or equal to 0, it
throws an exception; otherwise, it uses a loop to compute the nth Fibonacci number and returns the result.

The Main/void method is the entry point of the program, which calls the Fibonacci/int method with the desired input and prints
the result using the Console.WriteLine/string function.

let's see about some gorpy old web programming: do php next
Sure! Here's an example of how you might write a PHP script to compute the nth number in the Fibonacci sequence and display it
on a web page:

<?php
function fibonacci($n) {
    if ($n <= 0) {
        throw new Exception("Input should be a positive integer");
    } elseif ($n === 1) {
        return 0;
    } elseif ($n === 2) {
        return 1;
    } else {
        $a = 0;
        $b = 1;
        for ($i = 3; $i <= $n; $i++) {
            $temp = $a + $b;
            $a = $b;
            $b = $temp;
        }
        return $b;
    }
}

$n = 10; // Replace 10 with the desired input
try {
    $result = fibonacci($n);
    echo "The $n-th number in the Fibonacci sequence is: $result";
} catch (Exception $e) {
    echo $e->getMessage();
}
?>

This PHP script defines a function fibonacci/int that computes the nth number in the Fibonacci sequence, similar to the
previous examples. The function uses an if-else statement to handle different cases and throws an exception if the input is less
than or equal to 0.

The script then calls the fibonacci/int function with the desired input and displays the result using the echo/string
statement. If an exception is thrown, the script catches it and displays the error message using the catch/Exception block.

can you do a minified javascript version?
Sure! Here's a minified version of the JavaScript code I provided earlier:

function f(n){if(n<=0)throw new Error("Input should be a positive integer");else if(n===1)return 0;else if(n===2)return 1;else{let 
a=0,b=1;for(let i=3;i<=n;++i){[a,b]=[b,a+b];}return b;}}
try{console.log("The "+f(10)+"-th number in the Fibonacci sequence is: ",f(10));}catch(e){console.error(e.message);}

This version of the code uses shorthand syntax and omits some whitespace to make it more compact. It still defines a function
f/int that computes the nth number in the Fibonacci sequence, and then calls that function with the input 10 and logs the
result to the console using console.log/string. If an exception is thrown, the script catches it and logs the error message to
the console using console.error/string.

granite3-dense

$ ollama run granite3-dense

Write a python funciton to compute the fibonocci sequence
Here is a simple Python function that computes the Fibonacci sequence:

def fibonacci(n):
    if n <= 0:
        return "Input should be a positive integer."
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        a, b = 0, 1
        for i in range(2, n):
            a, b = b, a + b
        return b

This function takes an integer n as input and returns the nth number in the Fibonacci sequence. The Fibonacci sequence is a
series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1.

Refactor that to golang

<!-- gh-comment-id:2476734715 --> @gabe-l-hart commented on GitHub (Nov 14, 2024): Alright, I'm going to dig into this this morning. I'm not able to repro with the `8b` model, but I was able to repro quickly with `2b`: <details> <summary>granite3-dense:8b</summary> $ ollama pull granite3-dense:8b pulling manifest pulling 1a9a38833607... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.9 GB pulling 604785e698e9... 100% ▕████████████████████████████████████████████████████████████████████████▏ 664 B pulling 43070e2d4e53... 100% ▕████████████████████████████████████████████████████████████████████████▏ 11 KB pulling d77201a4f430... 100% ▕████████████████████████████████████████████████████████████████████████▏ 417 B verifying sha256 digest writing manifest success (torchchat2) ghart@Mac [torchchat TokenizersCpp-1251 ?~]$ ollama run granite3-dense:8b >>> Write a python funciton to compute the fibonocci sequence Here is a simple Python function that computes the Fibonacci sequence: ```python def fibonacci(n): if n <= 0: return "Input should be a positive integer" elif n == 1: return 0 elif n == 2: return 1 else: a, b = 0, 1 for _ in range(n - 2): a, b = b, a + b return b ``` This function takes an integer `n` as input and returns the `n`th number in the Fibonacci sequence. The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. >>> Refactor that to golang Here is the equivalent code in Go: ```go package main import "fmt" func fibonacci(n int) int { if n <= 0 { return -1 // Return an error value } else if n == 1 { return 0 } else if n == 2 { return 1 } else { a, b := 0, 1 for i := 3; i <= n; i++ { a, b = b, a + b } return b } } func main() { fmt.Println(fibonacci(10)) // Replace 10 with the desired input } ``` This Go function does the same thing as the Python function: it computes the `n`th number in the Fibonacci sequence. The function returns an error value (`-1`) if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> refactor that to c++ Here is the equivalent code in C++: ```cpp #include <iostream> int fibonacci(int n) { if (n <= 0) { throw std::invalid_argument("Input should be a positive integer"); } else if (n == 1) { return 0; } else if (n == 2) { return 1; } else { int a = 0, b = 1; for (int i = 3; i <= n; i++) { int temp = a + b; a = b; b = temp; } return b; } } int main() { try { std::cout << fibonacci(10) << std::endl; // Replace 10 with the desired input } catch (const std::invalid_argument& e) { std::cerr << e.what() << std::endl; } return 0; } ``` This C++ function does the same thing as the Python and Go functions: it computes the `n`th number in the Fibonacci sequence. The function throws an exception if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> now write it in rust Here is the equivalent code in Rust: ```rust fn fibonacci(n: u32) -> Result<u64, &'static str> { if n <= 0 { return Err("Input should be a positive integer"); } else if n == 1 { return Ok(0); } else if n == 2 { return Ok(1); } else { let mut a = 0; let mut b = 1; for _ in 3..=n { let temp = a + b; a = b; b = temp; } Ok(b) } } fn main() { match fibonacci(10) { Ok(result) => println!("{}", result), Err(e) => println!("{}", e), } } ``` This Rust function does the same thing as the Python, Go, and C++ functions: it computes the `n`th number in the Fibonacci sequence. The function returns a `Result` type, which can be either an `Ok` value containing the `n`th Fibonacci number or an `Err` value containing an error message. The function returns an error if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> let's add java to the mix Here is the equivalent code in Java: ```java public class Main { public static int fibonacci(int n) { if (n <= 0) { throw new IllegalArgumentException("Input should be a positive integer"); } else if (n == 1) { return 0; } else if (n == 2) { return 1; } else { int a = 0, b = 1; for (int i = 3; i <= n; i++) { int temp = a + b; a = b; b = temp; } return b; } } public static void main(String[] args) { try { System.out.println(fibonacci(10)); // Replace 10 with the desired input } catch (IllegalArgumentException e) { System.err.println(e.getMessage()); } } } ``` This Java function does the same thing as the Python, Go, C++, and Rust functions: it computes the `n`th number in the Fibonacci sequence. The function throws an exception if the input is less than or equal to 0, and otherwise it returns the `n`th Fibonacci number. >>> typescript too? Here is the equivalent code in TypeScript: ```typescript function fibonacci(n: number): number | string { if (n <= 0) { return "Input should be a positive integer"; } else if (n === 1) { return 0; } else if (n === 2) { return 1; } else { let a = 0, b = 1; for (let i = 3; i <= n; i++) { const temp = a + b; a = b; b = temp; } return b; } } const result = fibonacci(10); // Replace 10 with the desired input console.log(result); ``` This TypeScript function does the same thing as the Python, Go, C++, Rust, and Java functions: it computes the `n`th number in the Fibonacci sequence. The function returns a number or a string, depending on whether the input is valid or not. If the input is less than or equal to 0, the function returns a string containing an error message. Otherwise, it returns the `n`th Fibonacci number. >>> oooh, here's a good one: do elixir next! Here is the equivalent code in Elixir: ```elixir defmodule Fibonacci do def fib(n) when n <= 0 do {:error, "Input should be a positive integer"} end def fib(1), do: 0 def fib(2), do: 1 def fib(n), do: fib_helper(n, 0, 1) defp fib_helper(n, a, b) when n == 3 do b end defp fib_helper(n, a, b) do fib_helper(n - 1, b, a + b) end end result = Fibonacci.fib(10) # Replace 10 with the desired input IO.puts result ``` This Elixir code defines a module called `Fibonacci` with a single public function `fib/1` that computes the `n`th number in the Fibonacci sequence. The function uses pattern matching to handle different cases: if the input is less than or equal to 0, it returns an error tuple; otherwise, it calls a helper function `fib_helper/3` to compute the `n`th Fibonacci number. The `fib_helper/3` function is a recursive function that takes three arguments: the current value of `n`, the previous Fibonacci number `a`, and the current Fibonacci number `b`. The function uses pattern matching to handle different cases: if `n` is equal to 3, it returns `b`; otherwise, it calls itself with the updated values of `n`, `a`, and `b`. Finally, the code calls the `fib/1` function with the desired input and prints the result using the `IO.puts/1` function. >>> Alright alright, I hear you know cobol too. Do cobol next I'm sorry for any confusion, but as a helpful assistant, I can provide information and assistance on a wide range of programming languages, including popular ones like Python, Java, JavaScript, C++, and others. However, I don't have specific knowledge or capabilities related to COBOL (Common Business-Oriented Language), which is a older programming language primarily used for business applications. While I can certainly help you understand the basics of COBOL and provide some examples of how it works, I won't be able to provide more advanced assistance or write code in COBOL for you. If you have any questions about COBOL or need help understanding its concepts, I'd be happy to assist you with that! >>> Nice! I stumped you finally. How about c#. can you do C#? Yes, I can certainly help you with C# (C Sharp)! Here's an example of how you might write a function to compute the `n`th number in the Fibonacci sequence in C#: ```csharp using System; public class Fibonacci { public static int Fibonacci(int n) { if (n <= 0) { throw new ArgumentException("Input should be a positive integer"); } else if (n == 1) { return 0; } else if (n == 2) { return 1; } else { int a = 0; int b = 1; for (int i = 3; i <= n; i++) { int temp = a + b; a = b; b = temp; } return b; } } public static void Main() { int result = Fibonacci(10); // Replace 10 with the desired input Console.WriteLine(result); } } ``` This C# code defines a class called `Fibonacci` with a single public method `Fibonacci/int` that computes the `n`th number in the Fibonacci sequence. The method uses an `if-else` statement to handle different cases: if the input is less than or equal to 0, it throws an exception; otherwise, it uses a loop to compute the `n`th Fibonacci number and returns the result. The `Main/void` method is the entry point of the program, which calls the `Fibonacci/int` method with the desired input and prints the result using the `Console.WriteLine/string` function. >>> let's see about some gorpy old web programming: do php next Sure! Here's an example of how you might write a PHP script to compute the `n`th number in the Fibonacci sequence and display it on a web page: ```php <?php function fibonacci($n) { if ($n <= 0) { throw new Exception("Input should be a positive integer"); } elseif ($n === 1) { return 0; } elseif ($n === 2) { return 1; } else { $a = 0; $b = 1; for ($i = 3; $i <= $n; $i++) { $temp = $a + $b; $a = $b; $b = $temp; } return $b; } } $n = 10; // Replace 10 with the desired input try { $result = fibonacci($n); echo "The $n-th number in the Fibonacci sequence is: $result"; } catch (Exception $e) { echo $e->getMessage(); } ?> ``` This PHP script defines a function `fibonacci/int` that computes the `n`th number in the Fibonacci sequence, similar to the previous examples. The function uses an `if-else` statement to handle different cases and throws an exception if the input is less than or equal to 0. The script then calls the `fibonacci/int` function with the desired input and displays the result using the `echo/string` statement. If an exception is thrown, the script catches it and displays the error message using the `catch/Exception` block. >>> can you do a minified javascript version? Sure! Here's a minified version of the JavaScript code I provided earlier: ```javascript function f(n){if(n<=0)throw new Error("Input should be a positive integer");else if(n===1)return 0;else if(n===2)return 1;else{let a=0,b=1;for(let i=3;i<=n;++i){[a,b]=[b,a+b];}return b;}} try{console.log("The "+f(10)+"-th number in the Fibonacci sequence is: ",f(10));}catch(e){console.error(e.message);} ``` This version of the code uses shorthand syntax and omits some whitespace to make it more compact. It still defines a function `f/int` that computes the `n`th number in the Fibonacci sequence, and then calls that function with the input `10` and logs the result to the console using `console.log/string`. If an exception is thrown, the script catches it and logs the error message to the console using `console.error/string`. </details> <details> <summary>granite3-dense</summary> $ ollama run granite3-dense >>> Write a python funciton to compute the fibonocci sequence Here is a simple Python function that computes the Fibonacci sequence: ```python def fibonacci(n): if n <= 0: return "Input should be a positive integer." elif n == 1: return 0 elif n == 2: return 1 else: a, b = 0, 1 for i in range(2, n): a, b = b, a + b return b ``` This function takes an integer `n` as input and returns the `n`th number in the Fibonacci sequence. The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. >>> Refactor that to golang >>> </details>
Author
Owner

@fbricon commented on GitHub (Nov 14, 2024):

Yup it fails right on the 2nd request.

/// why is the sky blue?
The sky appears blue because of a process called Rayleigh scattering. When the sun's light reaches Earth's atmosphere, it collides with molecules and particles in the air.
Blue light is scattered in all directions more than other colors because it travels in shorter, smaller waves. This scattered blue light is what we see when we look up at
the sky.

/// what is java?

I don't believe we're reaching the context length on that next request. The fact the same model works fine in 0.3.14 tells me there's something really fishy in ollama 0.4.x

<!-- gh-comment-id:2476826572 --> @fbricon commented on GitHub (Nov 14, 2024): Yup it fails right on the 2nd request. > /// why is the sky blue? > The sky appears blue because of a process called Rayleigh scattering. When the sun's light reaches Earth's atmosphere, it collides with molecules and particles in the air. > Blue light is scattered in all directions more than other colors because it travels in shorter, smaller waves. This scattered blue light is what we see when we look up at > the sky. > > /// what is java? > > I don't believe we're reaching the context length on that next request. The fact the same model works fine in 0.3.14 tells me there's something really fishy in ollama 0.4.x
Author
Owner

@deboer-tim commented on GitHub (Nov 14, 2024):

+1. Sometimes goes longer, but I've had it fail on the first request a few times.

<!-- gh-comment-id:2476891051 --> @deboer-tim commented on GitHub (Nov 14, 2024): +1. Sometimes goes longer, but I've had it fail on the first request a few times.
Author
Owner

@fbricon commented on GitHub (Nov 14, 2024):

:8b failed for me on the 4th request on 0.4.1. Tried 0.3.14, on a session spanning multiple Q&As (way over 4k tokens overall), worked fine

<!-- gh-comment-id:2476898924 --> @fbricon commented on GitHub (Nov 14, 2024): :8b failed for me on the 4th request on 0.4.1. Tried 0.3.14, on a session spanning multiple Q&As (way over 4k tokens overall), worked fine
Author
Owner

@gabe-l-hart commented on GitHub (Nov 14, 2024):

I was able to repro on v0.4.0 as well, so it's not in the v0.4.1 delta.

<!-- gh-comment-id:2476937636 --> @gabe-l-hart commented on GitHub (Nov 14, 2024): I was able to repro on `v0.4.0` as well, so it's not in the `v0.4.1` delta.
Author
Owner

@gabe-l-hart commented on GitHub (Nov 14, 2024):

Ok, in trying to isolate this, I'm hitting the ollama_llama_runner subprocess directly. It's definitely something that can be repro'ed without the client-side code in the main ollama server.

req1.json
{"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null}
req2.json
{"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003ePython is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It's known for its simplicity and readability, which makes it a great language for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It's widely used in various domains such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is node\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null}

NOTE: I'm using a local build and redirecting to a temp "install" with OLLAMA_HOST, OLLAMA_MODELS, and OLLAMA_TMPDIR

# Terminal 1
./ollama serve

# Terminal 2
./ollama run granite3-dense

# Terminal 3
curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req1.json

curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req2.json
<!-- gh-comment-id:2476997372 --> @gabe-l-hart commented on GitHub (Nov 14, 2024): Ok, in trying to isolate this, I'm hitting the `ollama_llama_runner` subprocess directly. It's definitely something that can be repro'ed without the client-side code in the main `ollama` server. <details> <summary>req1.json</summary> ```json {"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null} ``` </details> <details> <summary>req2.json</summary> ```json {"prompt":"\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is python\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003ePython is a high-level, interpreted programming language that was created by Guido van Rossum and first released in 1991. It's known for its simplicity and readability, which makes it a great language for beginners. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It's widely used in various domains such as web development, data analysis, machine learning, artificial intelligence, and scientific computing.\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003euser\u003c|end_of_role|\u003ewhat is node\u003c|end_of_text|\u003e\n\u003c|start_of_role|\u003eassistant\u003c|end_of_role|\u003e","image_data":null,"grammar":"","cache_prompt":true,"num_ctx":2048,"num_batch":512,"num_gpu":-1,"n_keep":4,"seed":-1,"n_predict":81920,"top_k":40,"top_p":0.9,"min_p":0,"tfs_z":1,"typical_p":1,"repeat_last_n":64,"temperature":0.8,"repeat_penalty":1.1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"penalize_nl":true,"stop":null} ``` </details> **NOTE**: I'm using a local build and redirecting to a temp "install" with `OLLAMA_HOST`, `OLLAMA_MODELS`, and `OLLAMA_TMPDIR` ```sh # Terminal 1 ./ollama serve # Terminal 2 ./ollama run granite3-dense # Terminal 3 curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req1.json curl -X POST http://localhost:$(ps aux | grep ollama_llama_server | grep -v grep | sed 's,.*port ,,g')/completion -H "Content-Type: application/json" -d @req2.json ```
Author
Owner

@gabe-l-hart commented on GitHub (Nov 14, 2024):

Some more data points:

  • If I manually run a different GGUF file (e.g. llama3.1), I am not able to repro the failure at all, so this does seem to be specific to granite (and possibly granitemoe, but haven't confirmed)
  • If I make the calls to the runner with "cache_prompt": false, it's much easier to trigger the repro by just sending the same request over-and-over
  • If I set the number of GPU layers below the max (i.e. keep the last layer off the GPU), I'm not yet able to repro
<!-- gh-comment-id:2477084237 --> @gabe-l-hart commented on GitHub (Nov 14, 2024): Some more data points: * If I manually run a different GGUF file (e.g. `llama3.1`), I am not able to repro the failure at all, so this _does_ seem to be specific to `granite` (and possibly `granitemoe`, but haven't confirmed) * If I make the calls to the runner with `"cache_prompt": false`, it's much easier to trigger the repro by just sending the same request over-and-over * If I set the number of GPU layers _below_ the max (i.e. keep the last layer off the GPU), I'm not yet able to repro
Author
Owner

@gabe-l-hart commented on GitHub (Nov 14, 2024):

I'm unable to repro this with the full-precision version llama run granite3-dense:2b-instruct-fp16, so that points further towards something with lost precision in one of the matrix ops in llama.cpp.

<!-- gh-comment-id:2477218916 --> @gabe-l-hart commented on GitHub (Nov 14, 2024): I'm unable to repro this with the full-precision version `llama run granite3-dense:2b-instruct-fp16`, so that points further towards something with lost precision in one of the matrix ops in `llama.cpp`.
Author
Owner

@fbricon commented on GitHub (Nov 18, 2024):

FTR, I still see the same issue with ollama 0.4.2

<!-- gh-comment-id:2483265381 --> @fbricon commented on GitHub (Nov 18, 2024): FTR, I still see the same issue with ollama 0.4.2
Author
Owner

@gabe-l-hart commented on GitHub (Nov 19, 2024):

I spent most of today digging into this, but still haven't found the root cause. Some observations from today:

  • You can repro the issue by running the ollama_llama_server (the "runner") directly, so it's definitely not something in the parent server
  • When run with --batch-size set to 1, 2, or 10, the problem "magically" goes away, but any other value exhibits the problem of randomly producing a bad EOS on the first token
  • When a bad EOS is produced, the logits are exactly the same as the first token on "good" results except that the value in index 0 (the value of EOS for Granite) looks like an uninitialized value (some random very large number)
    • The random very-large number is static across all occurrences of "bad" calls for a given run running server, but bringing it down and back up will change the value of the random big number
  • I was able to repro this with the version of the ollama_llama_server built off of 0.3.14 (NB: In this release, the go-based runner existed but was disabled by default)
    • This means it's NOT something that was changed in the go-based server between 0.3.14 and 0.4.0 (i.e. it was not added with support for mllama which was the most invasive changeset there)
<!-- gh-comment-id:2484426598 --> @gabe-l-hart commented on GitHub (Nov 19, 2024): I spent most of today digging into this, but still haven't found the root cause. Some observations from today: * You can repro the issue by running the `ollama_llama_server` (the "runner") directly, so it's definitely not something in the parent server * When run with `--batch-size` set to `1`, `2`, or `10`, the problem "magically" goes away, but any other value exhibits the problem of randomly producing a bad EOS on the first token * When a bad EOS is produced, the `logits` are exactly the same as the first token on "good" results _except_ that the value in index `0` (the value of EOS for Granite) looks like an uninitialized value (some random very large number) * The random very-large number is static across all occurrences of "bad" calls for a given run running server, but bringing it down and back up will change the value of the random big number * I was able to repro this with the version of the `ollama_llama_server` built off of `0.3.14` (NB: In this release, the go-based runner existed but was disabled by default) * This means it's NOT something that was changed in the go-based server between `0.3.14` and `0.4.0` (i.e. it was not added with support for `mllama` which was the most invasive changeset there)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30646