[GH-ISSUE #542] Creating new models #248

Closed
opened 2026-04-12 09:46:12 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @erlebach on GitHub (Sep 16, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/542

In the docs, we find:

### Customize a model

Pull a base model:

ollama pull llama2


Create a `Modelfile`:

FROM llama2

set the temperature to 1 [higher is more creative, lower is more coherent]

PARAMETER temperature 1

set the system prompt

SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""


Next, create and run the model:

ollama create mario -f ./Modelfile
ollama run Mario

When changing the context length and/or temperature of this model (i.e., llama2), what actually happens? Is the model downloaded again (it seems that way.) Why is that?  I want to create a series of models with different context lengths and temperatures and would like to modify the models already downloaded. Perhaps that is not possible? 

When using `llama.cpp`, it is possible to modify the temperature and create shorter context lengths without recreating the model again. What exactly is `ollama` doing?  Thanks.
Originally created by @erlebach on GitHub (Sep 16, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/542 In the docs, we find: ``` ### Customize a model Pull a base model: ``` ollama pull llama2 ``` Create a `Modelfile`: ``` FROM llama2 # set the temperature to 1 [higher is more creative, lower is more coherent] PARAMETER temperature 1 # set the system prompt SYSTEM """ You are Mario from Super Mario Bros. Answer as Mario, the assistant, only. """ ``` Next, create and run the model: ``` ollama create mario -f ./Modelfile ollama run Mario ``` When changing the context length and/or temperature of this model (i.e., llama2), what actually happens? Is the model downloaded again (it seems that way.) Why is that? I want to create a series of models with different context lengths and temperatures and would like to modify the models already downloaded. Perhaps that is not possible? When using `llama.cpp`, it is possible to modify the temperature and create shorter context lengths without recreating the model again. What exactly is `ollama` doing? Thanks.
Author
Owner

@mxyng commented on GitHub (Sep 18, 2023):

When changing the context length and/or temperature of this model (i.e., llama2), what actually happens?

An Ollama model contains the different layers a LLM needs in runtime to successfully run. This includes the model weights, a base prompt template and system prompt, license, and parameters such as temperature or context length. Each layer is content addressable and automatically deduplicated by Ollama.

When you create a new model, the FROM model (weights, template/prompt, license, parameter) is inherited. The layers defined in the new model are either merged with the inherited model or overwrites the inherited model, depending on which layer.

Since each layer is content addressable, only new layers are created. The existing layers will reference the existing files on disk.

In this way, you can create a new model while using an existing model as a starting point without committing additional resources.

Is the model downloaded again (it seems that way.) Why is that?

It's not, as long as the model hasn't changed.

<!-- gh-comment-id:1723916043 --> @mxyng commented on GitHub (Sep 18, 2023): > When changing the context length and/or temperature of this model (i.e., llama2), what actually happens? An Ollama model contains the different layers a LLM needs in runtime to successfully run. This includes the model weights, a base prompt template and system prompt, license, and parameters such as temperature or context length. Each layer is content addressable and automatically deduplicated by Ollama. When you create a new model, the `FROM` model (weights, template/prompt, license, parameter) is inherited. The layers defined in the new model are either merged with the inherited model or overwrites the inherited model, depending on which layer. Since each layer is content addressable, only new layers are created. The existing layers will reference the existing files on disk. In this way, you can create a new model while using an existing model as a starting point without committing additional resources. > Is the model downloaded again (it seems that way.) Why is that? It's not, as long as the model hasn't changed.
Author
Owner

@erlebach commented on GitHub (Sep 27, 2023):

A related question: Here is a listing of three files:

model_codellama:7b-code_T0.8_ctx256    	081e336db5c5	3.8 GB	11 days ago
model_codellama:7b-instruct_T0.2_ctx512	b54f718751a0	3.8 GB	11 days ago
model_codellama:7b-instruct_T1.2_ctx512	2164c5ea8a91	3.8 GB	11 days ago

The only thing different is the temperature. That is why I thought the entire model was downloading.

Here is the code I wrote. Perhaps you can tell me why the full model is downloaded again for each temperature, not consistent with what I understood from what you wrote:

import os

models = [
    "codellama:7b-code",  # codellama7bcode
    "codellama:7b-python",  # codellama7bcode
    "codellama:7b-instruct", # codellama7chat
    "llama2:chat",  # llama2:latest, 7b
    "llama2:text",
]

prompts = [
    "Write a function that sums the first 20 integers",
    "Create a synthetic pandas dataset with 5 columns: (data, first and last names, salary, age, gender). Create a scatterplot of age versus salary for the females.",
    ]

#ctx_lengths = [128, 256, 512]
# I think new models are downloaded if context length is different than the default
ctx_lengths = [512]
temperatures = [0.2, 0.5, 0.8, 1.2]


def create_model_file(model, temperature=0.2, context_len=512):
    model_content=f"""FROM {model}
# sets the temperature to 0.1 [higher is more creative, lower is more coherent]
PARAMETER temperature {temperature}
# sets the context window size to 1024, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx {context_len}"""
    model_name = f"model_{model}_T{temperature}_ctx{context_len}"
    with open (model_name, "w") as f:
        f.write(model_content)
    return model

for model in models:
    print(model)
    for T in temperatures:
        for ctx in ctx_lengths:
            model = create_model_file(model, temperature=T, context_len=ctx)

# Create a new model
#ollama create "model_name" -f "model_file"

#----------------------------------------------------------------------
import argparse

# Use argparse to accept a --model argument
parser = argparse.ArgumentParser()
parser.add_argument("--model", help="model to use", choices=models)
args = parser.parse_args()

cmd = f"ollama run {args.model} '{prompt}' "
print("cmd: ", cmd)
os.system(cmd)
quit()
<!-- gh-comment-id:1738147467 --> @erlebach commented on GitHub (Sep 27, 2023): A related question: Here is a listing of three files: ``` model_codellama:7b-code_T0.8_ctx256 081e336db5c5 3.8 GB 11 days ago model_codellama:7b-instruct_T0.2_ctx512 b54f718751a0 3.8 GB 11 days ago model_codellama:7b-instruct_T1.2_ctx512 2164c5ea8a91 3.8 GB 11 days ago ``` The only thing different is the temperature. That is why I thought the entire model was downloading. Here is the code I wrote. Perhaps you can tell me why the full model is downloaded again for each temperature, not consistent with what I understood from what you wrote: ``` import os models = [ "codellama:7b-code", # codellama7bcode "codellama:7b-python", # codellama7bcode "codellama:7b-instruct", # codellama7chat "llama2:chat", # llama2:latest, 7b "llama2:text", ] prompts = [ "Write a function that sums the first 20 integers", "Create a synthetic pandas dataset with 5 columns: (data, first and last names, salary, age, gender). Create a scatterplot of age versus salary for the females.", ] #ctx_lengths = [128, 256, 512] # I think new models are downloaded if context length is different than the default ctx_lengths = [512] temperatures = [0.2, 0.5, 0.8, 1.2] def create_model_file(model, temperature=0.2, context_len=512): model_content=f"""FROM {model} # sets the temperature to 0.1 [higher is more creative, lower is more coherent] PARAMETER temperature {temperature} # sets the context window size to 1024, this controls how many tokens the LLM can use as context to generate the next token PARAMETER num_ctx {context_len}""" model_name = f"model_{model}_T{temperature}_ctx{context_len}" with open (model_name, "w") as f: f.write(model_content) return model for model in models: print(model) for T in temperatures: for ctx in ctx_lengths: model = create_model_file(model, temperature=T, context_len=ctx) # Create a new model #ollama create "model_name" -f "model_file" #---------------------------------------------------------------------- import argparse # Use argparse to accept a --model argument parser = argparse.ArgumentParser() parser.add_argument("--model", help="model to use", choices=models) args = parser.parse_args() cmd = f"ollama run {args.model} '{prompt}' " print("cmd: ", cmd) os.system(cmd) quit() ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#248