[GH-ISSUE #1660] Docker image for quantize/convert no longer working #932

Closed
opened 2026-04-12 10:37:38 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @technovangelist on GitHub (Dec 21, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1660

I have an older version of the image on my Mac and converting a model works fine. But I pulled to a new machine and getting an error about protobufs.

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Traceback (most recent call last):
  File "/workdir/llama.cpp/convert.py", line 1279, in <module>
    main()
  File "/workdir/llama.cpp/convert.py", line 1255, in main
    vocab = VocabLoader(params, vocab_dir)
  File "/workdir/llama.cpp/convert.py", line 342, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(str(fname_tokenizer), trust_remote_code=True)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 787, in from_pretrained
Loading model file /model/pytorch_model-00001-of-00006.bin
Loading model file /model/pytorch_model-00001-of-00006.bin
Loading model file /model/pytorch_model-00002-of-00006.bin
Loading model file /model/pytorch_model-00003-of-00006.bin
Loading model file /model/pytorch_model-00004-of-00006.bin
Loading model file /model/pytorch_model-00005-of-00006.bin
Loading model file /model/pytorch_model-00006-of-00006.bin
params = Params(n_vocab=32001, n_embd=5120, n_layer=40, n_ctx=2048, n_ff=13824, n_head=40, n_head_kv=40, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=None, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('/model'))
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/usr/local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 1336, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/usr/local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 459, in __init__
    requires_backends(self, "protobuf")
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1276, in requires_backends
    raise ImportError("".join(failed))
ImportError:
LlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

The model is chavinlo/gpt4-x-alpaca but an older image works just fine to do the conversion.

Originally created by @technovangelist on GitHub (Dec 21, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1660 I have an older version of the image on my Mac and converting a model works fine. But I pulled to a new machine and getting an error about protobufs. ``` You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Traceback (most recent call last): File "/workdir/llama.cpp/convert.py", line 1279, in <module> main() File "/workdir/llama.cpp/convert.py", line 1255, in main vocab = VocabLoader(params, vocab_dir) File "/workdir/llama.cpp/convert.py", line 342, in __init__ self.tokenizer = AutoTokenizer.from_pretrained(str(fname_tokenizer), trust_remote_code=True) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 787, in from_pretrained Loading model file /model/pytorch_model-00001-of-00006.bin Loading model file /model/pytorch_model-00001-of-00006.bin Loading model file /model/pytorch_model-00002-of-00006.bin Loading model file /model/pytorch_model-00003-of-00006.bin Loading model file /model/pytorch_model-00004-of-00006.bin Loading model file /model/pytorch_model-00005-of-00006.bin Loading model file /model/pytorch_model-00006-of-00006.bin params = Params(n_vocab=32001, n_embd=5120, n_layer=40, n_ctx=2048, n_ff=13824, n_head=40, n_head_kv=40, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=None, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('/model')) return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in __init__ super().__init__( File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__ fast_tokenizer = convert_slow_tokenizer(slow_tokenizer) File "/usr/local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 1336, in convert_slow_tokenizer return converter_class(transformer_tokenizer).converted() File "/usr/local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 459, in __init__ requires_backends(self, "protobuf") File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1276, in requires_backends raise ImportError("".join(failed)) ImportError: LlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation. ``` The model is chavinlo/gpt4-x-alpaca but an older image works just fine to do the conversion.
Author
Owner

@technovangelist commented on GitHub (Dec 21, 2023):

I tried with llama.cpp directly and getting other errors. Looking online it appears this model is no longer supported.

<!-- gh-comment-id:1866667249 --> @technovangelist commented on GitHub (Dec 21, 2023): I tried with llama.cpp directly and getting other errors. Looking online it appears this model is no longer supported.
Author
Owner

@technovangelist commented on GitHub (Dec 21, 2023):

Just confirmed with HuggingFaceH4/zephyr-7b-beta and it works fine. Closing this issue.

<!-- gh-comment-id:1866676579 --> @technovangelist commented on GitHub (Dec 21, 2023): Just confirmed with HuggingFaceH4/zephyr-7b-beta and it works fine. Closing this issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#932