[GH-ISSUE #4610] load_summarize_chain.map_reduce when use combine_prompt cause error'Can't load tokenizer for 'gpt2'(lastest or0.1.39 all have the issue) #2895

Closed
opened 2026-04-12 13:15:06 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @stephenchen2000 on GitHub (May 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4610

What is the issue?

File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain\chains\combine_documents\base.py", line 137, in _call
output, extra_return_dict = self.combine_docs(
^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain\chains\combine_documents\map_reduce.py", line 237, in combine_docs
result, extra_return_dict = self.reduce_documents_chain.combine_docs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain\chains\combine_documents\reduce.py", line 243, in combine_docs
result_docs, extra_return_dict = self._collapse(
^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain\chains\combine_documents\reduce.py", line 288, in _collapse
num_tokens = length_func(result_docs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain\chains\combine_documents\stuff.py", line 226, in prompt_length
return self.llm_chain._get_num_tokens(prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain\chains\llm.py", line 407, in _get_num_tokens
return _get_language_model(self.llm).get_num_tokens(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain_core\language_models\base.py", line 335, in get_num_tokens
return len(self.get_token_ids(text))
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain_core\language_models\base.py", line 322, in get_token_ids
return _get_token_ids_default_method(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain_core\language_models\base.py", line 57, in _get_token_ids_default_method
tokenizer = get_tokenizer()
^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\langchain_core\language_models\base.py", line 51, in get_tokenizer
return GPT2TokenizerFast.from_pretrained("gpt2")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\codecheck.venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 2094, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.
错误

the llama3 is on the linux .the call code is on the windows

OS

Linux, Windows

GPU

AMD

CPU

AMD

Ollama version

0.1.38 or lastest

Originally created by @stephenchen2000 on GitHub (May 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4610 ### What is the issue? File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain\chains\combine_documents\base.py", line 137, in _call output, extra_return_dict = self.combine_docs( ^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain\chains\combine_documents\map_reduce.py", line 237, in combine_docs result, extra_return_dict = self.reduce_documents_chain.combine_docs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain\chains\combine_documents\reduce.py", line 243, in combine_docs result_docs, extra_return_dict = self._collapse( ^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain\chains\combine_documents\reduce.py", line 288, in _collapse num_tokens = length_func(result_docs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain\chains\combine_documents\stuff.py", line 226, in prompt_length return self.llm_chain._get_num_tokens(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain\chains\llm.py", line 407, in _get_num_tokens return _get_language_model(self.llm).get_num_tokens(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain_core\language_models\base.py", line 335, in get_num_tokens return len(self.get_token_ids(text)) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain_core\language_models\base.py", line 322, in get_token_ids return _get_token_ids_default_method(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain_core\language_models\base.py", line 57, in _get_token_ids_default_method tokenizer = get_tokenizer() ^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\langchain_core\language_models\base.py", line 51, in get_tokenizer return GPT2TokenizerFast.from_pretrained("gpt2") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\PycharmProjects\codecheck\.venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 2094, in from_pretrained raise EnvironmentError( OSError: Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer. ![错误](https://github.com/ollama/ollama/assets/154590092/8a7b6434-0db4-4ab7-ad78-83f9968c4982) the llama3 is on the linux .the call code is on the windows ### OS Linux, Windows ### GPU AMD ### CPU AMD ### Ollama version 0.1.38 or lastest
GiteaMirror added the bug label 2026-04-12 13:15:06 -05:00
Author
Owner

@stephenchen2000 commented on GitHub (May 24, 2024):

i have fix it by
image

<!-- gh-comment-id:2129122442 --> @stephenchen2000 commented on GitHub (May 24, 2024): i have fix it by ![image](https://github.com/ollama/ollama/assets/154590092/303d5039-ad45-4c83-8576-4fb2020bfce2)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2895