[GH-ISSUE #9843] Add EXAONE Deep Reasoning Model Series #52956

Open
opened 2026-04-29 01:27:45 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @switiz on GitHub (Mar 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9843

https://huggingface.co/collections/LGAI-EXAONE/exaone-deep-67d119918816ec6efa79a4aa

We have announced the EXAONE Deep model today, which delivers excellent performance on par with top-tier models.

If possible, could you add it to the model library?

The GGUF link is as follows.
https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B-GGUF
https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-7.8B-GGUF
https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B-GGUF

Image Image
Originally created by @switiz on GitHub (Mar 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9843 https://huggingface.co/collections/LGAI-EXAONE/exaone-deep-67d119918816ec6efa79a4aa --- We have announced the EXAONE Deep model today, which delivers excellent performance on par with top-tier models. If possible, could you add it to the model library? The GGUF link is as follows. https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B-GGUF https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-7.8B-GGUF https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B-GGUF <img width="910" alt="Image" src="https://github.com/user-attachments/assets/4c73163b-3dc8-4626-acf8-62393e3ab27f" /> <img width="904" alt="Image" src="https://github.com/user-attachments/assets/1c9970bf-a8b3-43d8-88e3-483a6a26467e" />
GiteaMirror added the model label 2026-04-29 01:27:46 -05:00
Author
Owner

@switiz commented on GitHub (Mar 18, 2025):

Here's out ollma Modelfile script

# Model path (choose appropriate GGUF weights on your own)
FROM ./EXAONE-Deep-7.8B-BF16.gguf

# Parameter values
PARAMETER stop "[|endofturn|]"
PARAMETER repeat_penalty 1.0
PARAMETER num_ctx 32768
PARAMETER temperature 0.6
PARAMETER top_p 0.95

# Chat template
#   Note: currently there is no feature of removing `<thought></thought>` steps from context 
#   because ollama does not support yet. We will update when according feature is available.
TEMPLATE """{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{ if eq .Role "system" }}[|system|]{{ .Content }}[|endofturn|]
{{ continue }}
{{ else if eq .Role "user" }}[|user|]{{ .Content }}
{{ else if eq .Role "assistant" }}[|assistant|]{{ .Content }}[|endofturn|]
{{ end }}
{{- if and (ne .Role "assistant") $last }}[|assistant|]<thought>
{{ end }}
{{- end -}}"""

# System prompt
SYSTEM """"""
<!-- gh-comment-id:2731384549 --> @switiz commented on GitHub (Mar 18, 2025): Here's out ollma Modelfile script ``` # Model path (choose appropriate GGUF weights on your own) FROM ./EXAONE-Deep-7.8B-BF16.gguf # Parameter values PARAMETER stop "[|endofturn|]" PARAMETER repeat_penalty 1.0 PARAMETER num_ctx 32768 PARAMETER temperature 0.6 PARAMETER top_p 0.95 # Chat template # Note: currently there is no feature of removing `<thought></thought>` steps from context # because ollama does not support yet. We will update when according feature is available. TEMPLATE """{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{ if eq .Role "system" }}[|system|]{{ .Content }}[|endofturn|] {{ continue }} {{ else if eq .Role "user" }}[|user|]{{ .Content }} {{ else if eq .Role "assistant" }}[|assistant|]{{ .Content }}[|endofturn|] {{ end }} {{- if and (ne .Role "assistant") $last }}[|assistant|]<thought> {{ end }} {{- end -}}""" # System prompt SYSTEM """""" ```
Author
Owner

@ALLMI78 commented on GitHub (Mar 19, 2025):

"which delivers excellent performance on par with top-tier models"

hf.co/LGAI-EXAONE/EXAONE-Deep-7.8B-GGUF:Q8_0

I got a system running with qwen 14b models, now ich checked your model...

The LLM is struggling with several key issues in its response:

  • Lack of Clarity in Instructions – It keeps re-reading and second-guessing the user's instructions, indicating that it hasn't fully understood or structured the task correctly.

  • Unstructured Thought Process – The response is highly chaotic, with constant backtracking ("Wait, no—the problem says..."), making it clear that the model is unable to form a structured, step-by-step approach to solving the problem.

  • Overcomplication of Simple Concepts – Instead of breaking down the task into smaller, logical steps, it overanalyzes and overexplains, making unnecessary assumptions about weighting, percentages, and aggregation methods.

  • Inability to Converge on a Solution – The model keeps looping in a state of uncertainty, questioning its own logic repeatedly without arriving at a definitive answer. It lacks confidence and keeps suggesting alternative methods without implementing any.

  • Syntax and Grammar Issues – The response contains awkward phrasing, incorrect sentence structure, and inconsistent terminology, suggesting that the model is either overthinking or struggling with coherence.

Main Problem

The LLM lacks a structured approach to problem-solving. Instead of logically processing the user's instructions and executing the required calculations, it gets stuck in a loop of hesitation and self-correction. It should first clearly define the inputs, processing steps, and expected output before attempting to solve the problem.

If I send the same prompts to a Qwen 14B, everything runs smoothly—I get proper responses, and it's also significantly faster.

Is this due to some parameters, or does anyone have an idea how to get better results?

<!-- gh-comment-id:2735993721 --> @ALLMI78 commented on GitHub (Mar 19, 2025): **_"which delivers excellent performance on par with top-tier models"_** hf.co/LGAI-EXAONE/EXAONE-Deep-7.8B-GGUF:Q8_0 I got a system running with qwen 14b models, now ich checked your model... The LLM is struggling with several key issues in its response: - Lack of Clarity in Instructions – It keeps re-reading and second-guessing the user's instructions, indicating that it hasn't fully understood or structured the task correctly. - Unstructured Thought Process – The response is highly chaotic, with constant backtracking ("Wait, no—the problem says..."), making it clear that the model is unable to form a structured, step-by-step approach to solving the problem. - Overcomplication of Simple Concepts – Instead of breaking down the task into smaller, logical steps, it overanalyzes and overexplains, making unnecessary assumptions about weighting, percentages, and aggregation methods. - Inability to Converge on a Solution – The model keeps looping in a state of uncertainty, questioning its own logic repeatedly without arriving at a definitive answer. It lacks confidence and keeps suggesting alternative methods without implementing any. - Syntax and Grammar Issues – The response contains awkward phrasing, incorrect sentence structure, and inconsistent terminology, suggesting that the model is either overthinking or struggling with coherence. **Main Problem** The LLM lacks a structured approach to problem-solving. Instead of logically processing the user's instructions and executing the required calculations, it gets stuck in a loop of hesitation and self-correction. It should first clearly define the inputs, processing steps, and expected output before attempting to solve the problem. If I send the same prompts to a Qwen 14B, everything runs smoothly—I get proper responses, and it's also significantly faster. Is this due to some parameters, or does anyone have an idea how to get better results?
Author
Owner

@digitalextremist commented on GitHub (Mar 22, 2025):

Came here to report an issue also, noticing a kind of "false start" behavior ( at least with Open Web UI ) and also a strange behavior with Zed where it returns empty responses... or something... it is difficult to describe.

But on the "false start" behavior it seems to get stuck in the stopping... state for long periods then suddenly return. Or it will be scheduled to time out and expire from the running models, then suddenly revive itself and extend its time.

Also it shifts inexplicably in resource distribution when splitting ( i.e. with 32b across 16gb VRAM ) without any change to num_ctx or any other explainable reason, which then also puts it into the 'fuge' state if not careful.

Hard to isolate and cannot look further into this right now, but what I was able to see from the model is promising.

[ The troubling thing is it requires Ollama restarts and seems to block other models from loading for some reason. Does not seem to respond to stop signals very reliably if at all, in somewhat 'fuge' states it gets into. ]

<!-- gh-comment-id:2745022753 --> @digitalextremist commented on GitHub (Mar 22, 2025): Came here to report an issue also, noticing a kind of "false start" behavior ( at least with `Open Web UI` ) and also a strange behavior with [`Zed`](https://zed.dev) where it returns empty responses... or something... it is difficult to describe. But on the "false start" behavior it seems to get stuck in the `stopping...` state for long periods then suddenly return. Or it will be scheduled to time out and expire from the running models, then suddenly revive itself and extend its time. Also it shifts inexplicably in resource distribution when splitting ( i.e. with `32b` across 16gb VRAM ) without any change to `num_ctx` or any other explainable reason, which then also puts it into the 'fuge' state if not careful. Hard to isolate and cannot look further into this right now, but what I was able to see from the model is promising. [ The troubling thing is it requires `Ollama` restarts and seems to block other models from loading for some reason. Does not seem to respond to `stop` signals very reliably if at all, in somewhat 'fuge' states it gets into. ]
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52956