UI formatting bug #126

New Issue

GiteaMirror · 2025-11-11T14:07:08-06:00

GiteaMirror commented

2025-11-11 14:07:08 -06:00

Originally created by @iukea1 on GitHub (Dec 27, 2023).

Bug Report

The UI is not formatting the responses back from multiple different LLMs correctly

Description

Bug Summary:
[Provide a brief but clear summary of the bug]

Steps to Reproduce:
[Outline the steps to reproduce the bug. Be as detailed as possible.]

Expected Behavior:
[Describe what you expected to happen.]

Actual Behavior:
[Describe what actually happened.]

Environment

Operating System: [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04]
Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I have reviewed the troubleshooting.md document.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:

2-27 00:45:40 ollama        | 
2023-12-27 00:45:40 ollama        | Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option. [/INST]
2023-12-27 00:45:40 ollama        |  repeat_last_n:64 repeat_penalty:1.1 seed:-1 stop:[</s> [INST] [/INST] <<SYS>> <</SYS>>] stream:true temperature:0.8 tfs_z:1 top_k:40 top_p:0.9 typical_p:1]
2023-12-27 00:45:40 ollama-webui  | chat
2023-12-27 00:45:40 ollama-webui  | INFO:     172.20.0.1:43068 - "POST /chat HTTP/1.1" 200 OK
2023-12-27 00:45:50 ollama        | {"timestamp":1703666750,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":57810,"status":200,"method":"POST","path":"/completion","params":{}}
2023-12-27 00:45:50 ollama        | [GIN] 2023/12/27 - 08:45:50 | 200 |    13.648164s |      172.20.0.3 | POST     "/api/chat"
2023-12-27 00:45:50 ollama        | {"timestamp":1703666750,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43382,"status":200,"method":"HEAD","path":"/","params":{}}
2023-12-27 00:45:50 ollama        | 2023/12/27 08:45:50 routes.go:105: changing loaded model
2023-12-27 00:45:51 ollama        | 2023/12/27 08:45:51 llama.go:455: signal: killed
2023-12-27 00:45:51 ollama        | 2023/12/27 08:45:51 llama.go:529: llama runner stopped successfully
2023-12-27 00:45:51 ollama        | 2023/12/27 08:45:51 llama.go:300: 22003 MB VRAM available, loading up to 87 GPU layers
2023-12-27 00:45:51 ollama        | {{false 2048 512 0 -1 0 false true false false true false true 0 0 0} 0 -1 -1 40 0.9 1 1 64 0.8 1.1 0 0 0 5 0.1 true [</s> [INST] [/INST] <<SYS>> <</SYS>>]} 0 0
2023-12-27 00:45:51 ollama        | [--model /root/.ollama/models/blobs/sha256:a035193675f3ea5c5c97c143ba7be504ebbf002a2c53822819988ac689d70783 --ctx-size 2048 --batch-size 512 --n-gpu-layers 87 --embedding]
2023-12-27 00:45:51 ollama        | 2023/12/27 08:45:51 llama.go:440: starting llama runner
2023-12-27 00:45:51 ollama        | 2023/12/27 08:45:51 llama.go:498: waiting for llama runner to start responding
2023-12-27 00:45:51 ollama        | ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
2023-12-27 00:45:51 ollama        | ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
2023-12-27 00:45:51 ollama        | ggml_init_cublas: found 1 CUDA devices:
2023-12-27 00:45:51 ollama        |   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9
2023-12-27 00:45:51 ollama        | {"timestamp":1703666751,"level":"INFO","function":"main","line":2667,"message":"build info","build":468,"commit":"a7aee47"}
2023-12-27 00:45:51 ollama        | {"timestamp":1703666751,"level":"INFO","function":"main","line":2670,"message":"system info","n_threads":10,"n_threads_batch":-1,"total_threads":20,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
2023-12-27 00:45:51 ollama        | llama_model_loader: loaded meta data with 23 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256:a035193675f3ea5c5c97c143ba7be504ebbf002a2c53822819988ac689d70783 (version GGUF V3 (latest))
2023-12-27 00:45:51 ollama        | llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
2023-12-27 00:45:51 ollama        | llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
2023-12-27 00:45:51 ollama        | llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
2023-12-27 00:45:51 ollama        | llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  5120, 13824,     1,     1 ]
2023-12-27 00:45:51 ollama        | llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_K     [  5120, 13824,     1,     1 ]
2
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
2023-12-27 00:45:51 ollama        | llama_model_loader: - kv  22:               general.quantization_version u32              = 2
2023-12-27 00:45:51 ollama        | llama_model_loader: - type  f32:   81 tensors
2023-12-27 00:45:51 ollama        | llama_model_loader: - type q4_K:  241 tensors
2023-12-27 00:45:51 ollama        | llama_model_loader: - type q6_K:   41 tensors
2023-12-27 00:45:51 ollama        | llm_load_vocab: special tokens definition check successful ( 259/32000 ).
2023-12-27 00:45:51 ollama        | llm_load_print_meta: format           = GGUF V3 (latest)
2023-12-27 00:45:51 ollama        | llm_load_print_meta: arch             = llama
2023-12-27 00:45:51 ollama        | llm_load_print_meta: vocab type       = SPM
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_vocab          = 32000
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_merges         = 0
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_ctx_train      = 4096
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_embd           = 5120
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_head           = 40
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_head_kv        = 40
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_layer          = 40
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_rot            = 128
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_gqa            = 1
2023-12-27 00:45:51 ollama        | llm_load_print_meta: f_norm_eps       = 0.0e+00
2023-12-27 00:45:51 ollama        | llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
2023-12-27 00:45:51 ollama        | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
2023-12-27 00:45:51 ollama        | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_ff             = 13824
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_expert         = 0
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_expert_used    = 0
2023-12-27 00:45:51 ollama        | llm_load_print_meta: rope scaling     = linear
2023-12-27 00:45:51 ollama        | llm_load_print_meta: freq_base_train  = 10000.0
2023-12-27 00:45:51 ollama        | llm_load_print_meta: freq_scale_train = 1
2023-12-27 00:45:51 ollama        | llm_load_print_meta: n_yarn_orig_ctx  = 4096
2023-12-27 00:45:51 ollama        | llm_load_print_meta: rope_finetuned   = unknown
2023-12-27 00:45:51 ollama        | llm_load_print_meta: model type       = 13B
2023-12-27 00:45:51 ollama        | llm_load_print_meta: model ftype      = Q4_K - Medium
2023-12-27 00:45:51 ollama        | llm_load_print_meta: model params     = 13.02 B
2023-12-27 00:45:51 ollama        | llm_load_print_meta: model size       = 7.33 GiB (4.83 BPW) 
2023-12-27 00:45:51 ollama        | llm_load_print_meta: general.name     = LLaMA v2
2023-12-27 00:45:51 ollama        | llm_load_print_meta: BOS token        = 1 '<s>'
2023-12-27 00:45:51 ollama        | llm_load_print_meta: EOS token        = 2 '</s>'
2023-12-27 00:45:51 ollama        | llm_load_print_meta: UNK token        = 0 '<unk>'
2023-12-27 00:45:51 ollama        | llm_load_print_meta: LF token         = 13 '<0x0A>'
2023-12-27 00:45:51 ollama        | llm_load_tensors: ggml ctx size =    0.14 MiB
2023-12-27 00:45:51 ollama        | llm_load_tensors: using CUDA for GPU acceleration
2023-12-27 00:45:51 ollama        | llm_load_tensors: mem required  =   88.03 MiB
2023-12-27 00:45:51 ollama        | llm_load_tensors: offloading 40 repeating layers to GPU
2023-12-27 00:45:51 ollama        | llm_load_tensors: offloading non-repeating layers to GPU
2023-12-27 00:45:51 ollama        | llm_load_tensors: offloaded 41/41 layers to GPU
2023-12-27 00:45:51 ollama        | llm_load_tensors: VRAM used: 7412.96 MiB
2023-12-27 00:45:52 ollama        | ....................................................................................................
2023-12-27 00:45:52 ollama        | llama_new_context_with_model: n_ctx      = 2048
2023-12-27 00:45:52 ollama        | llama_new_context_with_model: freq_base  = 10000.0
2023-12-27 00:45:52 ollama        | llama_new_context_with_model: freq_scale = 1
2023-12-27 00:45:53 ollama        | llama_kv_cache_init: VRAM kv self = 1600.00 MB
2023-12-27 00:45:53 ollama        | llama_new_context_with_model: KV self size  = 1600.00 MiB, K (f16):  800.00 MiB, V (f16):  800.00 MiB
2023-12-27 00:45:53 ollama        | llama_build_graph: non-view tensors processed: 844/844
2023-12-27 00:45:53 ollama        | llama_new_context_with_model: compute buffer total size = 197.19 MiB
2023-12-27 00:45:53 ollama        | llama_new_context_with_model: VRAM scratch buffer: 194.00 MiB
2023-12-27 00:45:53 ollama        | llama_new_context_with_model: total VRAM used: 9206.96 MiB (model: 7412.96 MiB, context: 1794.00 MiB)
2023-12-27 00:45:53 ollama        | {"timestamp":1703666753,"level":"INFO","function":"main","line":3097,"message":"HTTP server listening","port":"60163","hostname":"127.0.0.1"}
2023-12-27 00:45:53 ollama        | {"timestamp":1703666753,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43198,"status":200,"method":"HEAD","path":"/","params":{}}
2023-12-27 00:45:53 ollama        | 2023/12/27 08:45:53 llama.go:512: llama runner started in 1.800561 seconds
2023-12-27 00:45:53 ollama        | 2023/12/27 08:45:53 llama.go:581: loaded 0 images
2023-12-27 00:45:53 ollama        | map[frequency_penalty:0 image_data:[] main_gpu:0 mirostat:0 mirostat_eta:0.1 mirostat_tau:5 n_keep:0 n_predict:-1 penalize_nl:true presence_penalty:0 prompt:[INST] <<SYS>><</SYS>>
2023-12-27 00:45:53 ollama        | 
2023-12-27 00:45:53 ollama        | Generate a brief 3-5 word title for this question, excluding the term 'title.' Then, please reply with only the title: Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option. [/INST]
2023-12-27 00:45:53 ollama        |  repeat_last_n:64 repeat_penalty:1.1 seed:-1 stop:[</s> [INST] [/INST] <<SYS>> <</SYS>>] stream:true temperature:0.8 tfs_z:1 top_k:40 top_p:0.9 typical_p:1]
2023-12-27 00:45:53 ollama        | {"timestamp":1703666753,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43198,"status":200,"method":"POST","path":"/completion","params":{}}
2023-12-27 00:45:53 ollama        | {"timestamp":1703666753,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43202,"status":200,"method":"POST","path":"/tokenize","params":{}}
2023-12-27 00:45:53 ollama        | [GIN] 2023/12/27 - 08:45:53 | 200 |  3.096371564s |      172.20.0.3 | POST     "/api/generate"
2023-12-27 00:45:53 ollama-webui  | generate

Originally created by @iukea1 on GitHub (Dec 27, 2023). # Bug Report The UI is not formatting the responses back from multiple different LLMs correctly ## Description **Bug Summary:** [Provide a brief but clear summary of the bug] **Steps to Reproduce:** [Outline the steps to reproduce the bug. Be as detailed as possible.] **Expected Behavior:** [Describe what you expected to happen.] **Actual Behavior:** [Describe what actually happened.] ## Environment - **Operating System:** [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04] - **Browser (if applicable):** [e.g., Chrome 100.0, Firefox 98.0] ## Reproduction Details **Confirmation:** - [ ] I have read and followed all the instructions provided in the README.md. - [ ] I have reviewed the troubleshooting.md document. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Logs and Screenshots ![image](https://github.com/ollama-webui/ollama-webui/assets/35882815/d8030d95-ead6-4168-add9-26f0cc61c2fe) **Browser Console Logs:** [Include relevant browser console logs, if applicable] **Docker Container Logs:** ``` 2-27 00:45:40 ollama | 2023-12-27 00:45:40 ollama | Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option. [/INST] 2023-12-27 00:45:40 ollama | repeat_last_n:64 repeat_penalty:1.1 seed:-1 stop:[</s> [INST] [/INST] <<SYS>> <</SYS>>] stream:true temperature:0.8 tfs_z:1 top_k:40 top_p:0.9 typical_p:1] 2023-12-27 00:45:40 ollama-webui | chat 2023-12-27 00:45:40 ollama-webui | INFO: 172.20.0.1:43068 - "POST /chat HTTP/1.1" 200 OK 2023-12-27 00:45:50 ollama | {"timestamp":1703666750,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":57810,"status":200,"method":"POST","path":"/completion","params":{}} 2023-12-27 00:45:50 ollama | [GIN] 2023/12/27 - 08:45:50 | 200 | 13.648164s | 172.20.0.3 | POST "/api/chat" 2023-12-27 00:45:50 ollama | {"timestamp":1703666750,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43382,"status":200,"method":"HEAD","path":"/","params":{}} 2023-12-27 00:45:50 ollama | 2023/12/27 08:45:50 routes.go:105: changing loaded model 2023-12-27 00:45:51 ollama | 2023/12/27 08:45:51 llama.go:455: signal: killed 2023-12-27 00:45:51 ollama | 2023/12/27 08:45:51 llama.go:529: llama runner stopped successfully 2023-12-27 00:45:51 ollama | 2023/12/27 08:45:51 llama.go:300: 22003 MB VRAM available, loading up to 87 GPU layers 2023-12-27 00:45:51 ollama | {{false 2048 512 0 -1 0 false true false false true false true 0 0 0} 0 -1 -1 40 0.9 1 1 64 0.8 1.1 0 0 0 5 0.1 true [</s> [INST] [/INST] <<SYS>> <</SYS>>]} 0 0 2023-12-27 00:45:51 ollama | [--model /root/.ollama/models/blobs/sha256:a035193675f3ea5c5c97c143ba7be504ebbf002a2c53822819988ac689d70783 --ctx-size 2048 --batch-size 512 --n-gpu-layers 87 --embedding] 2023-12-27 00:45:51 ollama | 2023/12/27 08:45:51 llama.go:440: starting llama runner 2023-12-27 00:45:51 ollama | 2023/12/27 08:45:51 llama.go:498: waiting for llama runner to start responding 2023-12-27 00:45:51 ollama | ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no 2023-12-27 00:45:51 ollama | ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes 2023-12-27 00:45:51 ollama | ggml_init_cublas: found 1 CUDA devices: 2023-12-27 00:45:51 ollama | Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9 2023-12-27 00:45:51 ollama | {"timestamp":1703666751,"level":"INFO","function":"main","line":2667,"message":"build info","build":468,"commit":"a7aee47"} 2023-12-27 00:45:51 ollama | {"timestamp":1703666751,"level":"INFO","function":"main","line":2670,"message":"system info","n_threads":10,"n_threads_batch":-1,"total_threads":20,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} 2023-12-27 00:45:51 ollama | llama_model_loader: loaded meta data with 23 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256:a035193675f3ea5c5c97c143ba7be504ebbf002a2c53822819988ac689d70783 (version GGUF V3 (latest)) 2023-12-27 00:45:51 ollama | llama_model_loader: - tensor 0: token_embd.weight q4_K [ 5120, 32000, 1, 1 ] 2023-12-27 00:45:51 ollama | llama_model_loader: - tensor 1: blk.0.attn_norm.weight f32 [ 5120, 1, 1, 1 ] 2023-12-27 00:45:51 ollama | llama_model_loader: - tensor 2: blk.0.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ] 2023-12-27 00:45:51 ollama | llama_model_loader: - tensor 3: blk.0.ffn_gate.weight q4_K [ 5120, 13824, 1, 1 ] 2023-12-27 00:45:51 ollama | llama_model_loader: - tensor 4: blk.0.ffn_up.weight q4_K [ 5120, 13824, 1, 1 ] 2 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 11: tokenizer.ggml.model str = llama 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'... 2023-12-27 00:45:51 ollama | llama_model_loader: - kv 22: general.quantization_version u32 = 2 2023-12-27 00:45:51 ollama | llama_model_loader: - type f32: 81 tensors 2023-12-27 00:45:51 ollama | llama_model_loader: - type q4_K: 241 tensors 2023-12-27 00:45:51 ollama | llama_model_loader: - type q6_K: 41 tensors 2023-12-27 00:45:51 ollama | llm_load_vocab: special tokens definition check successful ( 259/32000 ). 2023-12-27 00:45:51 ollama | llm_load_print_meta: format = GGUF V3 (latest) 2023-12-27 00:45:51 ollama | llm_load_print_meta: arch = llama 2023-12-27 00:45:51 ollama | llm_load_print_meta: vocab type = SPM 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_vocab = 32000 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_merges = 0 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_ctx_train = 4096 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_embd = 5120 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_head = 40 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_head_kv = 40 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_layer = 40 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_rot = 128 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_gqa = 1 2023-12-27 00:45:51 ollama | llm_load_print_meta: f_norm_eps = 0.0e+00 2023-12-27 00:45:51 ollama | llm_load_print_meta: f_norm_rms_eps = 1.0e-05 2023-12-27 00:45:51 ollama | llm_load_print_meta: f_clamp_kqv = 0.0e+00 2023-12-27 00:45:51 ollama | llm_load_print_meta: f_max_alibi_bias = 0.0e+00 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_ff = 13824 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_expert = 0 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_expert_used = 0 2023-12-27 00:45:51 ollama | llm_load_print_meta: rope scaling = linear 2023-12-27 00:45:51 ollama | llm_load_print_meta: freq_base_train = 10000.0 2023-12-27 00:45:51 ollama | llm_load_print_meta: freq_scale_train = 1 2023-12-27 00:45:51 ollama | llm_load_print_meta: n_yarn_orig_ctx = 4096 2023-12-27 00:45:51 ollama | llm_load_print_meta: rope_finetuned = unknown 2023-12-27 00:45:51 ollama | llm_load_print_meta: model type = 13B 2023-12-27 00:45:51 ollama | llm_load_print_meta: model ftype = Q4_K - Medium 2023-12-27 00:45:51 ollama | llm_load_print_meta: model params = 13.02 B 2023-12-27 00:45:51 ollama | llm_load_print_meta: model size = 7.33 GiB (4.83 BPW) 2023-12-27 00:45:51 ollama | llm_load_print_meta: general.name = LLaMA v2 2023-12-27 00:45:51 ollama | llm_load_print_meta: BOS token = 1 '<s>' 2023-12-27 00:45:51 ollama | llm_load_print_meta: EOS token = 2 '</s>' 2023-12-27 00:45:51 ollama | llm_load_print_meta: UNK token = 0 '<unk>' 2023-12-27 00:45:51 ollama | llm_load_print_meta: LF token = 13 '<0x0A>' 2023-12-27 00:45:51 ollama | llm_load_tensors: ggml ctx size = 0.14 MiB 2023-12-27 00:45:51 ollama | llm_load_tensors: using CUDA for GPU acceleration 2023-12-27 00:45:51 ollama | llm_load_tensors: mem required = 88.03 MiB 2023-12-27 00:45:51 ollama | llm_load_tensors: offloading 40 repeating layers to GPU 2023-12-27 00:45:51 ollama | llm_load_tensors: offloading non-repeating layers to GPU 2023-12-27 00:45:51 ollama | llm_load_tensors: offloaded 41/41 layers to GPU 2023-12-27 00:45:51 ollama | llm_load_tensors: VRAM used: 7412.96 MiB 2023-12-27 00:45:52 ollama | .................................................................................................... 2023-12-27 00:45:52 ollama | llama_new_context_with_model: n_ctx = 2048 2023-12-27 00:45:52 ollama | llama_new_context_with_model: freq_base = 10000.0 2023-12-27 00:45:52 ollama | llama_new_context_with_model: freq_scale = 1 2023-12-27 00:45:53 ollama | llama_kv_cache_init: VRAM kv self = 1600.00 MB 2023-12-27 00:45:53 ollama | llama_new_context_with_model: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB 2023-12-27 00:45:53 ollama | llama_build_graph: non-view tensors processed: 844/844 2023-12-27 00:45:53 ollama | llama_new_context_with_model: compute buffer total size = 197.19 MiB 2023-12-27 00:45:53 ollama | llama_new_context_with_model: VRAM scratch buffer: 194.00 MiB 2023-12-27 00:45:53 ollama | llama_new_context_with_model: total VRAM used: 9206.96 MiB (model: 7412.96 MiB, context: 1794.00 MiB) 2023-12-27 00:45:53 ollama | {"timestamp":1703666753,"level":"INFO","function":"main","line":3097,"message":"HTTP server listening","port":"60163","hostname":"127.0.0.1"} 2023-12-27 00:45:53 ollama | {"timestamp":1703666753,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43198,"status":200,"method":"HEAD","path":"/","params":{}} 2023-12-27 00:45:53 ollama | 2023/12/27 08:45:53 llama.go:512: llama runner started in 1.800561 seconds 2023-12-27 00:45:53 ollama | 2023/12/27 08:45:53 llama.go:581: loaded 0 images 2023-12-27 00:45:53 ollama | map[frequency_penalty:0 image_data:[] main_gpu:0 mirostat:0 mirostat_eta:0.1 mirostat_tau:5 n_keep:0 n_predict:-1 penalize_nl:true presence_penalty:0 prompt:[INST] <<SYS>><</SYS>> 2023-12-27 00:45:53 ollama | 2023-12-27 00:45:53 ollama | Generate a brief 3-5 word title for this question, excluding the term 'title.' Then, please reply with only the title: Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option. [/INST] 2023-12-27 00:45:53 ollama | repeat_last_n:64 repeat_penalty:1.1 seed:-1 stop:[</s> [INST] [/INST] <<SYS>> <</SYS>>] stream:true temperature:0.8 tfs_z:1 top_k:40 top_p:0.9 typical_p:1] 2023-12-27 00:45:53 ollama | {"timestamp":1703666753,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43198,"status":200,"method":"POST","path":"/completion","params":{}} 2023-12-27 00:45:53 ollama | {"timestamp":1703666753,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"127.0.0.1","remote_port":43202,"status":200,"method":"POST","path":"/tokenize","params":{}} 2023-12-27 00:45:53 ollama | [GIN] 2023/12/27 - 08:45:53 | 200 | 3.096371564s | 172.20.0.3 | POST "/api/generate" 2023-12-27 00:45:53 ollama-webui | generate ```

GiteaMirror closed this issue

2025-11-11 14:07:08 -06:00

GiteaMirror commented

2025-11-11 14:07:09 -06:00

@tjbck commented on GitHub (Dec 27, 2023):

Hi! Were you using JSON mode? Seems like it's rendering the message it should, could you elaborate more on what exactly is the issue. Thanks!

@tjbck commented on GitHub (Dec 27, 2023): Hi! Were you using JSON mode? Seems like it's rendering the message it should, could you elaborate more on what exactly is the issue. Thanks!

GiteaMirror commented

2025-11-11 14:07:09 -06:00

@iukea1 commented on GitHub (Dec 27, 2023):

Hi! Were you using JSON mode? Seems like it's rendering the message it should, could you elaborate more on what exactly is the issue. Thanks!

I believe I was. I will be trying again here in about an hour. Will have more details then

@iukea1 commented on GitHub (Dec 27, 2023): > Hi! Were you using JSON mode? Seems like it's rendering the message it should, could you elaborate more on what exactly is the issue. Thanks! I believe I was. I will be trying again here in about an hour. Will have more details then