[GH-ISSUE #159] Using already downloaded models #61

Closed
opened 2026-04-12 09:35:44 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @kartikwatwani on GitHub (Jul 21, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/159

I want to use the models I have already downloaded using the link provided via email from Meta which are saved in a specific location on my PC. Is there any way to do that?

Originally created by @kartikwatwani on GitHub (Jul 21, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/159 I want to use the models I have already downloaded using the link provided via email from Meta which are saved in a specific location on my PC. Is there any way to do that?
GiteaMirror added the question label 2026-04-12 09:35:44 -05:00
Author
Owner

@pdevine commented on GitHub (Jul 21, 2023):

You would need to re-quantize the weights for the model (which is a pain), and convert it to ggml v3. Once you do that, you can create a new Modelfile and specify FROM <path to file>.

<!-- gh-comment-id:1645668879 --> @pdevine commented on GitHub (Jul 21, 2023): You would need to re-quantize the weights for the model (which is a pain), and convert it to ggml v3. Once you do that, you can create a new Modelfile and specify `FROM <path to file>`.
Author
Owner

@smuskal commented on GitHub (Jul 24, 2023):

Can you provide a link with a stepwise procedure for "need to re-quantize the weights for the model (which is a pain), and convert it to ggml v3." ? I am sure others will benefit from it. I asked bard, but not convinced it is accurate...

Also, once a local model is referenced, is there a way to leverage langchain against it?

<!-- gh-comment-id:1648395337 --> @smuskal commented on GitHub (Jul 24, 2023): Can you provide a link with a stepwise procedure for "need to re-quantize the weights for the model (which is a pain), and convert it to ggml v3." ? I am sure others will benefit from it. I asked bard, but not convinced it is accurate... Also, once a local model is referenced, is there a way to leverage langchain against it?
Author
Owner

@mchiang0610 commented on GitHub (Jul 24, 2023):

@smuskal @kartikwatwani I don't have a specific script to convert it, but there is one from the llama.cpp repo:

https://github.com/ggerganov/llama.cpp/blob/master/convert.py

<!-- gh-comment-id:1648447676 --> @mchiang0610 commented on GitHub (Jul 24, 2023): @smuskal @kartikwatwani I don't have a specific script to convert it, but there is one from the llama.cpp repo: https://github.com/ggerganov/llama.cpp/blob/master/convert.py
Author
Owner

@tomasmcm commented on GitHub (Jul 31, 2023):

What if I download a ggml model from here?
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML

I've tried creating a new modelfile, but when I do ollama create it gives an error about manifest:

pulling manifest  Error: pull model manifest: Get "https:///v2///manifests/": http: no Host in request URL

Any suggestions?

edit:

Found the issue, the model file needs to be in the same dir as the modelfile.

<!-- gh-comment-id:1658619599 --> @tomasmcm commented on GitHub (Jul 31, 2023): What if I download a ggml model from here? https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML I've tried creating a new modelfile, but when I do ollama create it gives an error about manifest: ``` pulling manifest Error: pull model manifest: Get "https:///v2///manifests/": http: no Host in request URL ``` Any suggestions? edit: Found the issue, the model file needs to be in the same dir as the modelfile.
Author
Owner

@pdevine commented on GitHub (Jul 31, 2023):

What does your Modelfile look like? You should be able to do this with:

FROM /path/to/modelfile

You'll need to create sections for the SYSTEM prompt and the TEMPLATE. The whole file should look something like:

FROM /path/to/modelfile
SYSTEM """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""
TEMPLATE """
{{- if .First }}
<<SYS>>
{{ .System }}
<</SYS>>
{{- end }}

[INST] {{ .Prompt }} [/INST]
"""
<!-- gh-comment-id:1659262951 --> @pdevine commented on GitHub (Jul 31, 2023): What does your Modelfile look like? You should be able to do this with: ``` FROM /path/to/modelfile ``` You'll need to create sections for the `SYSTEM` prompt and the `TEMPLATE`. The whole file should look something like: ``` FROM /path/to/modelfile SYSTEM """ You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. """ TEMPLATE """ {{- if .First }} <<SYS>> {{ .System }} <</SYS>> {{- end }} [INST] {{ .Prompt }} [/INST] """ ```
Author
Owner

@vaibhav1618 commented on GitHub (Aug 8, 2023):

What if I download a ggml model from here?
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML

I've tried creating a new modelfile, but when I do ollama create it gives an error about manifest:

pulling manifest  Error: pull model manifest: Get "https:///v2///manifests/": http: no Host in request URL

Any suggestions?

edit:

Found the issue, the model file needs to be in the same dir as the modelfile.

Hi, I'm using the same model from huggingface and I'm getting the exact same error with the manifest. Were you able to resolve this issue? I have the modelfile and the model in the same directory and I am providing the absolute path everywhere. Note that my model is placed in a different directory which is not under the ollama directory structure.
please let me know @tomasmcm

<!-- gh-comment-id:1668976379 --> @vaibhav1618 commented on GitHub (Aug 8, 2023): > What if I download a ggml model from here? > https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML > > I've tried creating a new modelfile, but when I do ollama create it gives an error about manifest: > ``` > pulling manifest Error: pull model manifest: Get "https:///v2///manifests/": http: no Host in request URL > ``` > Any suggestions? > > edit: > > Found the issue, the model file needs to be in the same dir as the modelfile. Hi, I'm using the same model from huggingface and I'm getting the exact same error with the manifest. Were you able to resolve this issue? I have the modelfile and the model in the same directory and I am providing the absolute path everywhere. Note that my model is placed in a different directory which is not under the ollama directory structure. please let me know @tomasmcm
Author
Owner

@tomasmcm commented on GitHub (Aug 8, 2023):

@vaibhav1618 This is the modelfile I used: Modelfile. The "llama-2-7b-chat.ggmlv3.q3_K_S.bin" file needs to be in the same folder. Then you run ollama create llama-2-7b -f ./Modelfile, it takes a bit to convert and then you can use it.

<!-- gh-comment-id:1669178318 --> @tomasmcm commented on GitHub (Aug 8, 2023): @vaibhav1618 This is the modelfile I used: [Modelfile](https://github.com/jmorganca/ollama/files/12289162/llama-2-7b-chat-q3_K_S.txt). The "llama-2-7b-chat.ggmlv3.q3_K_S.bin" file needs to be in the same folder. Then you run `ollama create llama-2-7b -f ./Modelfile`, it takes a bit to convert and then you can use it.
Author
Owner

@pdevine commented on GitHub (Aug 9, 2023):

@tomasmcm for that particular image, we do have it available (although not yet advertised!) if you do:

ollama pull llama2:7b-chat-q3_K_S

Or for the non-chat version:

ollama pull llama2:7b-q3_K_S

@kartikwatwani @smuskal after using the convert.py script to convert the pytorch model, to quantize the weights you will need to use the quantize binary which you can build from llama.cpp. So the steps are:

python convert.py /path/to/pytorch/model/directory

cmake . && cmake --build . --target quantize
./quantize /path/to/pytorch/model/directory/ggml-model-f32.bin model-name-$quant.bin <quant level>

The quant level should be something likeQ4_0, Q4_K_S, etc. We default to Q4_0 for the default models.

<!-- gh-comment-id:1672277509 --> @pdevine commented on GitHub (Aug 9, 2023): @tomasmcm for that particular image, we do have it available (although not yet advertised!) if you do: ``` ollama pull llama2:7b-chat-q3_K_S ``` Or for the non-chat version: ``` ollama pull llama2:7b-q3_K_S ``` @kartikwatwani @smuskal after using the `convert.py` script to convert the pytorch model, to quantize the weights you will need to use the quantize binary which you can build from llama.cpp. So the steps are: ``` python convert.py /path/to/pytorch/model/directory cmake . && cmake --build . --target quantize ./quantize /path/to/pytorch/model/directory/ggml-model-f32.bin model-name-$quant.bin <quant level> ``` The quant level should be something like`Q4_0`, `Q4_K_S`, etc. We default to Q4_0 for the default models.
Author
Owner

@mchiang0610 commented on GitHub (Aug 30, 2023):

This seems resolved. Closing this issue. Please feel free to reopen anytime!!

Thank you!

<!-- gh-comment-id:1699870576 --> @mchiang0610 commented on GitHub (Aug 30, 2023): This seems resolved. Closing this issue. Please feel free to reopen anytime!! Thank you!
Author
Owner

@benbot commented on GitHub (Jan 18, 2024):

Sorry if i'm missing something, but if I already have a gguf quantized model downloaded and I point to it with the FROM line in my Modelfile it still seems to copy the entire model to ollama's directory.

I'm pretty tight on storage space right now and would like to use ollama, but without it copying my models around.

Is there any way to do that?

<!-- gh-comment-id:1898987006 --> @benbot commented on GitHub (Jan 18, 2024): Sorry if i'm missing something, but if I already have a gguf quantized model downloaded and I point to it with the `FROM` line in my Modelfile it still seems to copy the entire model to ollama's directory. I'm pretty tight on storage space right now and would like to use ollama, but without it copying my models around. Is there any way to do that?
Author
Owner

@pdevine commented on GitHub (Jan 18, 2024):

@benbot If you put it in the FROM line, it will copy the file to the blobs directory and name it with something like models/blobs/sha256:.... Any subsequent models based on those same weights will just reference the same blob without taking up any more disk space. The reason for doing this is that it makes the model content addressable and you can ollama push it or ollama pull it to/from a registry and have it work the same way every time. It also deduplicates any storage automatically.

<!-- gh-comment-id:1899006483 --> @pdevine commented on GitHub (Jan 18, 2024): @benbot If you put it in the `FROM` line, it will copy the file to the blobs directory and name it with something like `models/blobs/sha256:...`. Any _subsequent_ models based on those same weights will just reference the same blob without taking up any more disk space. The reason for doing this is that it makes the model content addressable and you can `ollama push` it or `ollama pull` it to/from a registry and have it work the same way every time. It also deduplicates any storage automatically.
Author
Owner

@benbot commented on GitHub (Jan 18, 2024):

@pdevine Are those sha hashes deterministic?

Could I symlink the model to the blobs directory and name it a specific hash?

<!-- gh-comment-id:1899032747 --> @benbot commented on GitHub (Jan 18, 2024): @pdevine Are those sha hashes deterministic? Could I symlink the model to the blobs directory and name it a specific hash?
Author
Owner

@pdevine commented on GitHub (Jan 18, 2024):

That could probably work, but it will be a bit finicky. Just sha2 -256 <filename> to get the hash. Be careful of the blob pruning routines which get triggered on server startup. This will remove any unused blob. It will also get triggered if you pull a newer version of the same model. You can turn it off with the OLLAMA_NOPRUNE env variable.

To be clear though, I wouldn't recommend doing it this way, just that it will probably work. If you're worried about disk space you can always ollama push your model back to ollama.ai and then pull it when you need it.

<!-- gh-comment-id:1899218648 --> @pdevine commented on GitHub (Jan 18, 2024): That could probably work, but it will be a bit finicky. Just `sha2 -256 <filename>` to get the hash. Be careful of the blob pruning routines which get triggered on server startup. This will remove any unused blob. It will also get triggered if you pull a newer version of the same model. You can turn it off with the `OLLAMA_NOPRUNE` env variable. To be clear though, I wouldn't _recommend_ doing it this way, just that it will probably work. If you're worried about disk space you can always `ollama push` your model back to `ollama.ai` and then pull it when you need it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61