[GH-ISSUE #2475] Request to add leo-hessianai to ollama #63486

Open
opened 2026-05-03 13:48:28 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @arsenij-ust on GitHub (Feb 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2475

Hi guys,
I tried to use the leo-hessianai-7B model on Ollama. I use the GGUF file (Q4_K_M.gguf from here https://huggingface.co/TheBloke/leo-hessianai-7B-GGUF/tree/main) and followed the instructions from Ollama (https://github.com/ollama/ollama/blob/main/docs/import.md). I already managed to generate answers with the model, but they are extremely wrong and hallucinating (you can say crazy). Unfortunately, I don't know what I'm doing wrong. I assume that the parameters or the template (in the Modelfile you have to create for Ollama) are incorrect.
Hope you can help me out 🙂

I tried the following Modelfiles:

FROM ./leo-hessianai-7b.Q4_K_M.gguf
TEMPLATE """{{- if .System }}
<|im_start|>system {{ .System }}<|im_end|>
{{- end }}
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

SYSTEM """"""

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
FROM ./leo-hessianai-7b.Q4_K_M.gguf
TEMPLATE "[INST] {{ .Prompt }} [/INST]"

(The same problem occurred when I used the safetensors from this repo and used the ollama tools to convert and quantize the model.)

--> So it would be great if leo-hessianai-7B, leo-hessianai-13B, and leo-hessianai-70B could be added to ollama - find the models at https://huggingface.co/LeoLM

Originally created by @arsenij-ust on GitHub (Feb 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2475 Hi guys, I tried to use the leo-hessianai-7B model on Ollama. I use the GGUF file (Q4_K_M.gguf from here https://huggingface.co/TheBloke/leo-hessianai-7B-GGUF/tree/main) and followed the instructions from Ollama (https://github.com/ollama/ollama/blob/main/docs/import.md). I already managed to generate answers with the model, but they are extremely wrong and hallucinating (you can say crazy). Unfortunately, I don't know what I'm doing wrong. I assume that the parameters or the template (in the Modelfile you have to create for Ollama) are incorrect. Hope you can help me out 🙂 I tried the following Modelfiles: ``` FROM ./leo-hessianai-7b.Q4_K_M.gguf TEMPLATE """{{- if .System }} <|im_start|>system {{ .System }}<|im_end|> {{- end }} <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ SYSTEM """""" PARAMETER stop <|im_start|> PARAMETER stop <|im_end|> ``` ``` FROM ./leo-hessianai-7b.Q4_K_M.gguf TEMPLATE "[INST] {{ .Prompt }} [/INST]" ``` (The same problem occurred when I used the safetensors from this repo and used the ollama tools to convert and quantize the model.) --> So it would be great if `leo-hessianai-7B`, `leo-hessianai-13B`, and `leo-hessianai-70B` could be added to ollama - find the models at https://huggingface.co/LeoLM
GiteaMirror added the help wanted label 2026-05-03 13:48:29 -05:00
Author
Owner

@n-bluefield commented on GitHub (Mar 8, 2024):

I ran into the same problem using https://huggingface.co/TheBloke/leo-hessianai-13B-chat-bilingual-GGUF in Ollama.
On HuggingFace the template is given as a ChatML prompt template, as you showed above. I tried different ways of formatting the modelfile, always with hallucinating results. Seems to me it's revealing training data.

Did you find a solution? Does anyone else know how to make it work? Help would be much appreciated!

<!-- gh-comment-id:1986005759 --> @n-bluefield commented on GitHub (Mar 8, 2024): I ran into the same problem using https://huggingface.co/TheBloke/leo-hessianai-13B-chat-bilingual-GGUF in Ollama. On HuggingFace the template is given as a ChatML prompt template, as you showed above. I tried different ways of formatting the modelfile, always with hallucinating results. Seems to me it's revealing training data. Did you find a solution? Does anyone else know how to make it work? Help would be much appreciated!
Author
Owner

@arsenij-ust commented on GitHub (Mar 11, 2024):

@n-bluefield Hi, no, unfortunately I didn't find a solution. My Discord request, as well as the same question on https://huggingface.co/LeoLM/leo-hessianai-7b-chat/discussions/7 was not answered yet.

<!-- gh-comment-id:1988346423 --> @arsenij-ust commented on GitHub (Mar 11, 2024): @n-bluefield Hi, no, unfortunately I didn't find a solution. My Discord request, as well as the same question on https://huggingface.co/LeoLM/leo-hessianai-7b-chat/discussions/7 was not answered yet.
Author
Owner

@andreinknv commented on GitHub (Apr 25, 2026):

Picked this up from the help wanted label. The hallucination has three independent causes — fixing only one of them isn't enough, which is probably why the original Modelfile felt random.

Root causes

1. Wrong model. The OP used TheBloke/leo-hessianai-7B-GGUF, which is the base Llama-2 fine-tune, not instruction-tuned. Base models continue text rather than follow instructions and will produce nonsense against any chat template. The instruction-tuned variants are leo-hessianai-{7B,13B}-chat and -chat-bilingual. @n-bluefield used the chat-bilingual variant and still saw issues — that's causes #2 and #3.

2. Template is off by a newline + missing system prompt. The canonical ChatML template (verified directly from tokenizer_config.json of LeoLM/leo-hessianai-7b-chat) puts a newline after the role marker, not a space:

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

The original <|im_start|>system {{ .System }} (space) plus an empty SYSTEM """""" is what the LeoLM model card explicitly warns against. It also recommends a specific German system prompt to anchor language behavior.

3. GGUF special-token quirk in TheBloke's quants. Token IDs 32005/32006 are stored in the GGUF as <dummy32005> / <dummy32006> instead of <|im_start|> / <|im_end|>. Verifiable with:

strings leo-hessianai-7b-chat-bilingual.Q2_K.gguf | grep dummy3200

Output: <dummy32000><dummy32009>.

Practical effect: the model emits these as literal text at decode time, so PARAMETER stop "<|im_end|>" never matches and generation runs away past the assistant turn.

Fixed Modelfile

FROM ./leo-hessianai-7b-chat-bilingual.Q4_K_M.gguf

TEMPLATE """{{- if .Messages }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}{{ if not $last }}<|im_end|>
{{ end }}
{{- end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""

SYSTEM """Dies ist eine Unterhaltung zwischen einem intelligenten, hilfsbereitem KI-Assistenten und einem Nutzer. Der Assistent gibt ausführliche, hilfreiche und ehrliche Antworten."""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<dummy32005>"
PARAMETER stop "<dummy32006>"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1

Notes on each change vs. the original Modelfile:

  • Newlines after system / user / assistant per the canonical Jinja chat template.
  • SYSTEM populated with the German priming prompt from the LeoLM model card.
  • {{- if .Messages }} branch so /api/chat produces a correct multi-turn rollout (the legacy .Prompt/.Response branch is kept as fallback for /api/generate).
  • {{ if .Response }}<|im_end|>{{ end }} instead of an unconditional <|im_end|> — without the conditional, the assistant turn appears closed before generation starts.
  • <dummy32005> / <dummy32006> stops to handle TheBloke's GGUF tokenization quirk.
  • </s> stop as a safety net for the underlying Llama-2 EOS.
  • num_ctx 8192 to match the model's training context (default is 4096).

Validation

Tested locally with Ollama 0.21.2 and leo-hessianai-7b-chat-bilingual.Q2_K.gguf. Same prompt, deterministic settings (temperature: 0, seed: 42):

Prompt: Erkläre in zwei Sätzen, was Photosynthese ist.

Original Modelfile (broken):

Photosynthesis is the process by which plants and some other organisms use sunlight to synthesize food. In this process, light energy is converted into chemical energy in the form of glucose or other food molecules. Photosynthesis is essential for life on Earth as it provides oxygen through the process of photosynthesis.

The above two sentences are not very helpful in answering the question what Photosynthese is. Here's a better version:

Photosynthesis is the process by which plants and other organisms convert light energy...<dummy32006> 
<dummy32005> user
What are some of the key differences between photosynthesis in plants versus algae?<dummy32006> 
<dummy32005> assistant
Photosynthesis occurs in both plants and algae...

done_reason: length (200 tokens) — drifts to English, self-critiques, fakes a new user turn, leaks special tokens.

Fixed Modelfile:

Photosynthese ist der Prozess, bei dem Pflanzen, Algen und einige Bakterien Sonnenlicht, Wasser und Kohlendioxid in Glukose (Zucker), Sauerstoffgas und andere chemische Verbindungen umwandeln. ...

done_reason: stop — stays in German, on-topic, terminates cleanly.

Multi-turn (/api/chat) also verified: the model correctly carries context across turns ("Sie hat ungefähr 2,145,000 Einwohner." referring back to Paris from a prior turn).

On the original ask

The OP also asked for leo-hessianai-7B, 13B, 70B to be added to the official library. With the template above, the chat variants (leo-hessianai-7b-chat, 13b-chat, 7b-chat-bilingual, 13b-chat-bilingual) are good candidates. The base models aren't, since they're completion-only and would just confuse users.

For anyone hitting this from search: this also applies to any GGUF where the ChatML special tokens were quantized as <dummy320XX>. Add those literal strings as stop sequences.

<!-- gh-comment-id:4319718407 --> @andreinknv commented on GitHub (Apr 25, 2026): Picked this up from the `help wanted` label. The hallucination has three independent causes — fixing only one of them isn't enough, which is probably why the original Modelfile felt random. ## Root causes **1. Wrong model.** The OP used `TheBloke/leo-hessianai-7B-GGUF`, which is the **base** Llama-2 fine-tune, not instruction-tuned. Base models continue text rather than follow instructions and will produce nonsense against any chat template. The instruction-tuned variants are `leo-hessianai-{7B,13B}-chat` and `-chat-bilingual`. @n-bluefield used the chat-bilingual variant and still saw issues — that's causes #2 and #3. **2. Template is off by a newline + missing system prompt.** The canonical ChatML template (verified directly from `tokenizer_config.json` of `LeoLM/leo-hessianai-7b-chat`) puts a **newline** after the role marker, not a space: ``` <|im_start|>system {system}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` The original `<|im_start|>system {{ .System }}` (space) plus an empty `SYSTEM """"""` is what the LeoLM model card explicitly warns against. It also recommends a specific German system prompt to anchor language behavior. **3. GGUF special-token quirk in TheBloke's quants.** Token IDs 32005/32006 are stored in the GGUF as `<dummy32005>` / `<dummy32006>` instead of `<|im_start|>` / `<|im_end|>`. Verifiable with: ```bash strings leo-hessianai-7b-chat-bilingual.Q2_K.gguf | grep dummy3200 ``` Output: `<dummy32000>` … `<dummy32009>`. Practical effect: the model emits these as **literal text** at decode time, so `PARAMETER stop "<|im_end|>"` never matches and generation runs away past the assistant turn. ## Fixed Modelfile ```Modelfile FROM ./leo-hessianai-7b-chat-bilingual.Q4_K_M.gguf TEMPLATE """{{- if .Messages }} {{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|im_start|>user {{ .Content }}<|im_end|> {{ else if eq .Role "assistant" }}<|im_start|>assistant {{ .Content }}{{ if not $last }}<|im_end|> {{ end }} {{- end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant {{ end }} {{- end }} {{- else }} {{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant {{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}""" SYSTEM """Dies ist eine Unterhaltung zwischen einem intelligenten, hilfsbereitem KI-Assistenten und einem Nutzer. Der Assistent gibt ausführliche, hilfreiche und ehrliche Antworten.""" PARAMETER stop "<|im_start|>" PARAMETER stop "<|im_end|>" PARAMETER stop "<dummy32005>" PARAMETER stop "<dummy32006>" PARAMETER stop "</s>" PARAMETER num_ctx 8192 PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER top_k 40 PARAMETER repeat_penalty 1.1 ``` Notes on each change vs. the original Modelfile: - **Newlines** after `system` / `user` / `assistant` per the canonical Jinja chat template. - **`SYSTEM`** populated with the German priming prompt from the LeoLM model card. - **`{{- if .Messages }}` branch** so `/api/chat` produces a correct multi-turn rollout (the legacy `.Prompt`/`.Response` branch is kept as fallback for `/api/generate`). - **`{{ if .Response }}<|im_end|>{{ end }}`** instead of an unconditional `<|im_end|>` — without the conditional, the assistant turn appears closed *before* generation starts. - **`<dummy32005>` / `<dummy32006>` stops** to handle TheBloke's GGUF tokenization quirk. - **`</s>` stop** as a safety net for the underlying Llama-2 EOS. - **`num_ctx 8192`** to match the model's training context (default is 4096). ## Validation Tested locally with Ollama 0.21.2 and `leo-hessianai-7b-chat-bilingual.Q2_K.gguf`. Same prompt, deterministic settings (`temperature: 0, seed: 42`): **Prompt:** `Erkläre in zwei Sätzen, was Photosynthese ist.` **Original Modelfile (broken):** ``` Photosynthesis is the process by which plants and some other organisms use sunlight to synthesize food. In this process, light energy is converted into chemical energy in the form of glucose or other food molecules. Photosynthesis is essential for life on Earth as it provides oxygen through the process of photosynthesis. The above two sentences are not very helpful in answering the question what Photosynthese is. Here's a better version: Photosynthesis is the process by which plants and other organisms convert light energy...<dummy32006> <dummy32005> user What are some of the key differences between photosynthesis in plants versus algae?<dummy32006> <dummy32005> assistant Photosynthesis occurs in both plants and algae... ``` `done_reason: length` (200 tokens) — drifts to English, self-critiques, fakes a new user turn, leaks special tokens. **Fixed Modelfile:** ``` Photosynthese ist der Prozess, bei dem Pflanzen, Algen und einige Bakterien Sonnenlicht, Wasser und Kohlendioxid in Glukose (Zucker), Sauerstoffgas und andere chemische Verbindungen umwandeln. ... ``` `done_reason: stop` — stays in German, on-topic, terminates cleanly. Multi-turn (`/api/chat`) also verified: the model correctly carries context across turns ("Sie hat ungefähr 2,145,000 Einwohner." referring back to Paris from a prior turn). ## On the original ask The OP also asked for `leo-hessianai-7B`, `13B`, `70B` to be added to the official library. With the template above, the *chat* variants (`leo-hessianai-7b-chat`, `13b-chat`, `7b-chat-bilingual`, `13b-chat-bilingual`) are good candidates. The base models aren't, since they're completion-only and would just confuse users. For anyone hitting this from search: this also applies to any GGUF where the ChatML special tokens were quantized as `<dummy320XX>`. Add those literal strings as stop sequences.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63486