[GH-ISSUE #8434] internlm3-8b-instruct #5422

Closed
opened 2026-04-12 16:39:39 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @vYLQs6 on GitHub (Jan 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8434

https://huggingface.co/internlm/internlm3-8b-instruct


llama.cpp commit:

https://github.com/ggerganov/llama.cpp/pull/11233


Introduction

InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. This model has the following characteristics:

  • Enhanced performance at reduced cost:
    State-of-the-art performance on reasoning and knowledge-intensive tasks surpass models like Llama3.1-8B and Qwen2.5-7B. Remarkably, InternLM3 is trained on only 4 trillion high-quality tokens, saving more than 75% of the training cost compared to other LLMs of similar scale.
  • Deep thinking capability:
    InternLM3 supports both the deep thinking mode for solving complicated reasoning tasks via the long chain-of-thought and the normal response mode for fluent user interactions.

InternLM3-8B-Instruct

Performance Evaluation

We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool OpenCompass. The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the OpenCompass leaderboard for more evaluation results.

Benchmark InternLM3-8B-Instruct Qwen2.5-7B-Instruct Llama3.1-8B-Instruct GPT-4o-mini(close source)
General CMMLU(0-shot) 83.1 75.8 53.9 66.0
MMLU(0-shot) 76.6 76.8 71.8 82.7
MMLU-Pro(0-shot) 57.6 56.2 48.1 64.1
Reasoning GPQA-Diamond(0-shot) 37.4 33.3 24.2 42.9
DROP(0-shot) 83.1 80.4 81.6 85.2
HellaSwag(10-shot) 91.2 85.3 76.7 89.5
KOR-Bench(0-shot) 56.4 44.6 47.7 58.2
MATH MATH-500(0-shot) 83.0* 72.4 48.4 74.0
AIME2024(0-shot) 20.0* 16.7 6.7 13.3
Coding LiveCodeBench(2407-2409 Pass@1) 17.8 16.8 12.9 21.8
HumanEval(Pass@1) 82.3 85.4 72.0 86.6
Instrunction IFEval(Prompt-Strict) 79.3 71.7 75.2 79.7
Long Context RULER(4-128K Average) 87.9 81.4 88.5 90.7
Chat AlpacaEval 2.0(LC WinRate) 51.1 30.3 25.0 50.7
WildBench(Raw Score) 33.1 23.3 1.5 40.3
MT-Bench-101(Score 1-10) 8.59 8.49 8.37 8.87
  • The evaluation results were obtained from OpenCompass (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by OpenCompass.
  • The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.
Originally created by @vYLQs6 on GitHub (Jan 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8434 ### https://huggingface.co/internlm/internlm3-8b-instruct --- ### llama.cpp commit: ### https://github.com/ggerganov/llama.cpp/pull/11233 --- ## Introduction InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. This model has the following characteristics: - **Enhanced performance at reduced cost**: State-of-the-art performance on reasoning and knowledge-intensive tasks surpass models like Llama3.1-8B and Qwen2.5-7B. Remarkably, InternLM3 is trained on only 4 trillion high-quality tokens, saving more than 75% of the training cost compared to other LLMs of similar scale. - **Deep thinking capability**: InternLM3 supports both the deep thinking mode for solving complicated reasoning tasks via the long chain-of-thought and the normal response mode for fluent user interactions. ## InternLM3-8B-Instruct ### Performance Evaluation We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results. | Benchmark | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) | | ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- | | General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 | | | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 | | | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 | | Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 | | | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 | | | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 | | | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 | | MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 | | | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 | | Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 | | | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 | | Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 | | Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 | | Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 | | | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 | | | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 | - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/). - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
GiteaMirror added the model label 2026-04-12 16:39:39 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 16, 2025):

Already available from the developer: https://ollama.com/internlm/internlm3-8b-instruct

<!-- gh-comment-id:2595849441 --> @rick-github commented on GitHub (Jan 16, 2025): Already available from the developer: https://ollama.com/internlm/internlm3-8b-instruct
Author
Owner

@ag2s20150909 commented on GitHub (Jan 17, 2025):

Already available from the developer: https://ollama.com/internlm/internlm3-8b-instruct

But only fp16 available.
https://huggingface.co/internlm/internlm3-8b-instruct-gguf

<!-- gh-comment-id:2597240157 --> @ag2s20150909 commented on GitHub (Jan 17, 2025): > Already available from the developer: https://ollama.com/internlm/internlm3-8b-instruct But only fp16 available. https://huggingface.co/internlm/internlm3-8b-instruct-gguf
Author
Owner

@rick-github commented on GitHub (Jan 17, 2025):

Until different quants show up, you can make your own:

ollama pull internlm/internlm3-8b-instruct
ollama show --modelfile internlm/internlm3-8b-instruct > Modelfile
ollama create --quantize q4_K_M internlm3:8b-instruct-q4_K_M
$ ollama run internlm3:8b-instruct-q4_K_M hello
Hello! It's great to meet you. How can I assist you today? If you have any specific questions or topics you'd like to discuss, feel 
free to let me know! 😊
$ ollama ps
NAME                            ID              SIZE      PROCESSOR    UNTIL   
internlm3:8b-instruct-q4_K_M    47292956d918    5.5 GB    100% CPU     Forever    
<!-- gh-comment-id:2597309340 --> @rick-github commented on GitHub (Jan 17, 2025): Until different quants show up, you can make your own: ``` ollama pull internlm/internlm3-8b-instruct ollama show --modelfile internlm/internlm3-8b-instruct > Modelfile ollama create --quantize q4_K_M internlm3:8b-instruct-q4_K_M ``` ```console $ ollama run internlm3:8b-instruct-q4_K_M hello Hello! It's great to meet you. How can I assist you today? If you have any specific questions or topics you'd like to discuss, feel free to let me know! 😊 $ ollama ps NAME ID SIZE PROCESSOR UNTIL internlm3:8b-instruct-q4_K_M 47292956d918 5.5 GB 100% CPU Forever ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5422