[GH-ISSUE #305] OpenAI API compatibility #62173

Closed
opened 2026-05-03 07:44:38 -05:00 by GiteaMirror · 60 comments
Owner

Originally created by @handrew on GitHub (Aug 7, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/305

Originally assigned to: @jmorganca on GitHub.

Any chance you would consider mirroring OpenAI's API specs and output? e.g., /completions and /chat/completions. That way, it could be a drop-in replacement for the Python openai package by changing out the url.

Originally created by @handrew on GitHub (Aug 7, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/305 Originally assigned to: @jmorganca on GitHub. Any chance you would consider mirroring OpenAI's API specs and output? e.g., /completions and /chat/completions. That way, it could be a drop-in replacement for the Python openai package by changing out the url.
GiteaMirror added the feature request label 2026-05-03 07:44:38 -05:00
Author
Owner

@priamai commented on GitHub (Aug 10, 2023):

That would be awesome and also embeddings!

<!-- gh-comment-id:1672893590 --> @priamai commented on GitHub (Aug 10, 2023): That would be awesome and also embeddings!
Author
Owner

@hakt0-r commented on GitHub (Aug 11, 2023):

yup I'll +1 on this too :-)

<!-- gh-comment-id:1674135457 --> @hakt0-r commented on GitHub (Aug 11, 2023): yup I'll +1 on this too :-)
Author
Owner

@kamuridesu commented on GitHub (Aug 11, 2023):

+1

<!-- gh-comment-id:1675245400 --> @kamuridesu commented on GitHub (Aug 11, 2023): +1
Author
Owner

@loyaliu commented on GitHub (Aug 30, 2023):

+1

<!-- gh-comment-id:1699032100 --> @loyaliu commented on GitHub (Aug 30, 2023): +1
Author
Owner
<!-- gh-comment-id:1703443018 --> @colindotfun commented on GitHub (Sep 1, 2023): this would be a big win prior work: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md and https://github.com/ggerganov/llama.cpp/blob/master/examples/server/api_like_OAI.py
Author
Owner

@ValValu commented on GitHub (Sep 2, 2023):

yeah would be great1!

<!-- gh-comment-id:1703610798 --> @ValValu commented on GitHub (Sep 2, 2023): yeah would be great1!
Author
Owner

@jmorganca commented on GitHub (Sep 7, 2023):

Thanks for the issue and comments, all! Sorry for not replying sooner. Which clients/use cases are you looking to use that require the OpenAI API? Quite a few folks have mentioned LlamaIndex (also: see #278!) Would love to know!

<!-- gh-comment-id:1710185838 --> @jmorganca commented on GitHub (Sep 7, 2023): Thanks for the issue and comments, all! Sorry for not replying sooner. Which clients/use cases are you looking to use that require the OpenAI API? Quite a few folks have mentioned LlamaIndex (also: see #278!) Would love to know!
Author
Owner

@kamuridesu commented on GitHub (Sep 7, 2023):

Interoperability with OpenAI projects, like Auto-GPT. If you check https://github.com/go-skynet/LocalAI, you can see that their API works with pretty much every project that uses the OpenAI endpoint, in most cases you just need to point an Environment Variable to it.

<!-- gh-comment-id:1710250991 --> @kamuridesu commented on GitHub (Sep 7, 2023): Interoperability with OpenAI projects, like Auto-GPT. If you check https://github.com/go-skynet/LocalAI, you can see that their API works with pretty much every project that uses the OpenAI endpoint, in most cases you just need to point an Environment Variable to it.
Author
Owner

@colindotfun commented on GitHub (Sep 7, 2023):

www.galactus.ai also

<!-- gh-comment-id:1710498335 --> @colindotfun commented on GitHub (Sep 7, 2023): www.galactus.ai also
Author
Owner

@cori commented on GitHub (Sep 8, 2023):

I was looking to connect to it with both Continue.dev (which supports Ollama explicitly) and LocalAI, so interop was my hope as well.

<!-- gh-comment-id:1711483742 --> @cori commented on GitHub (Sep 8, 2023): I was looking to connect to it with both Continue.dev (which supports Ollama explicitly) and LocalAI, so interop was my hope as well.
Author
Owner

@MchLrnX commented on GitHub (Sep 19, 2023):

I'd love to be able to do this. I'm specifically looking at running ToolBench, MetaGPT and ChatDEV. I have MetaGPT ready to test with this if we get this working.

<!-- gh-comment-id:1726374718 --> @MchLrnX commented on GitHub (Sep 19, 2023): I'd love to be able to do this. I'm specifically looking at running ToolBench, MetaGPT and ChatDEV. I have MetaGPT ready to test with this if we get this working.
Author
Owner

@comalice commented on GitHub (Sep 28, 2023):

I'd like to throw in Ironclad's Rivet application expects an OpenAI API endpoint as well: https://github.com/Ironclad/rivet

<!-- gh-comment-id:1740030691 --> @comalice commented on GitHub (Sep 28, 2023): I'd like to throw in Ironclad's Rivet application expects an OpenAI API endpoint as well: https://github.com/Ironclad/rivet
Author
Owner

@mjtechguy commented on GitHub (Sep 29, 2023):

+1. I would like to use ollama as a target for LibreChat: https://github.com/danny-avila/LibreChat/tree/main

<!-- gh-comment-id:1740248512 --> @mjtechguy commented on GitHub (Sep 29, 2023): +1. I would like to use ollama as a target for LibreChat: https://github.com/danny-avila/LibreChat/tree/main
Author
Owner

@jtoy commented on GitHub (Sep 29, 2023):

+1

<!-- gh-comment-id:1740864124 --> @jtoy commented on GitHub (Sep 29, 2023): +1
Author
Owner

@Anon2578 commented on GitHub (Sep 30, 2023):

Yes this would be a plus one if we can get this working with openai api specs. Can someone notify me when this is done I might forget and this was one of the reasons I took a look at this project.

<!-- gh-comment-id:1741863307 --> @Anon2578 commented on GitHub (Sep 30, 2023): Yes this would be a plus one if we can get this working with openai api specs. Can someone notify me when this is done I might forget and this was one of the reasons I took a look at this project.
Author
Owner

@shtrophic commented on GitHub (Oct 1, 2023):

This would be pretty cool since Nextcloud instances could use a locally running ollama server. Nextcloud itself ships with openai/localai compatability (through a plugin).

<!-- gh-comment-id:1741948811 --> @shtrophic commented on GitHub (Oct 1, 2023): This would be pretty cool since Nextcloud instances could use a locally running ollama server. Nextcloud itself ships with openai/localai compatability (through a plugin).
Author
Owner

@Nivek92 commented on GitHub (Oct 4, 2023):

AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/

<!-- gh-comment-id:1747090985 --> @Nivek92 commented on GitHub (Oct 4, 2023): AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/
Author
Owner

@rcalv002 commented on GitHub (Oct 7, 2023):

+1

<!-- gh-comment-id:1751812315 --> @rcalv002 commented on GitHub (Oct 7, 2023): +1
Author
Owner

@vividfog commented on GitHub (Oct 7, 2023):

I'm surprised LiteLLM hasn't been mentioned in the thread yet. Found it from the README.md of Ollama repo today. "Call LLM APIs using the OpenAI format", 100+ of them, including Ollama. This worked for me:

pip install litellm

ollama pull codellama

litellm --model ollama/codellama --api_base http://localhost:11434 --temperature 0.3 --max_tokens 2048

Double check that the port, model name and parameters match your configuration and VRAM situation.

As an example, Continue.dev configuration then goes like this, OpenAI style:

        default=OpenAI(
            api_key="IGNORED",
            model="ollama/codellama",
            context_length=2048,
            api_base="http://your_litellm_hostname:8000"
        ),

Set context_length and max_tokens as appropriate. 2048 is a conservative value if you're gpu-poor or aren't sure.

Note that LiteLLM/Uvicorn opens the API at 0.0.0.0:8000, it's not confined to localhost by default and people can piggyback on your server if it's not a private network. I believe you need to edit litellm source code here if you want to only serve localhost, then pip install -e . from that local clone before running litellm.

<!-- gh-comment-id:1751848077 --> @vividfog commented on GitHub (Oct 7, 2023): I'm surprised [LiteLLM](https://github.com/BerriAI/litellm) hasn't been mentioned in the thread yet. Found it from the [README.md](https://github.com/jmorganca/ollama#community-integrations) of Ollama repo today. "Call LLM APIs using the OpenAI format", 100+ of them, including Ollama. This worked for me: `pip install litellm` `ollama pull codellama` `litellm --model ollama/codellama --api_base http://localhost:11434 --temperature 0.3 --max_tokens 2048` Double check that the port, model name and parameters match your configuration and VRAM situation. As an example, [Continue.dev](https://github.com/continuedev/continue) configuration then goes like this, OpenAI style: ``` default=OpenAI( api_key="IGNORED", model="ollama/codellama", context_length=2048, api_base="http://your_litellm_hostname:8000" ), ``` Set context_length and max_tokens as appropriate. 2048 is a conservative value if you're [gpu-poor](https://github.com/RahulSChand/gpu_poor) or aren't sure. Note that LiteLLM/Uvicorn opens the API at 0.0.0.0:8000, it's not confined to localhost by default and people can piggyback on your server if it's not a private network. I believe you need to edit litellm source code [here](https://github.com/BerriAI/litellm/blob/86a835f6fd174ef64c4cb41db5eae86c2fffa555/litellm/proxy/proxy_cli.py#L118) if you want to only serve localhost, then `pip install -e .` from that local clone before running `litellm`.
Author
Owner

@ishaan-jaff commented on GitHub (Oct 7, 2023):

Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama

Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server

Please let me know how we can make it better for the ollama community😃

<!-- gh-comment-id:1751850766 --> @ishaan-jaff commented on GitHub (Oct 7, 2023): Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server Please let me know how we can make it better for the ollama community😃
Author
Owner

@ghost commented on GitHub (Oct 8, 2023):

Hey @vividfog thanks for this incredible tutorial.

I added it to our docs and gave you credit for it.

Docs: https://docs.litellm.ai/docs/proxy_server#tutorial-use-with-aiderautogencontinue-dev
Screenshot 2023-10-07 at 5 55 12 PM

If you have a twitter/linkedin - happy to link to that instead!

<!-- gh-comment-id:1751879945 --> @ghost commented on GitHub (Oct 8, 2023): Hey @vividfog thanks for this incredible tutorial. I added it to our docs and gave you credit for it. Docs: https://docs.litellm.ai/docs/proxy_server#tutorial-use-with-aiderautogencontinue-dev <img width="424" alt="Screenshot 2023-10-07 at 5 55 12 PM" src="https://github.com/jmorganca/ollama/assets/17561003/3fdefee7-493e-4902-9481-b5e5e4eebbf2"> If you have a twitter/linkedin - happy to link to that instead!
Author
Owner

@shtrophic commented on GitHub (Oct 8, 2023):

Wow, thanks for pointing to litellm @vividfog.

For anyone on Arch Linux (btw) and interested, I came up with a PKGBUILD that sets up litellm with ollama as a systemd service. You can check it out on the AUR. Feel free to get back to me with any feedback!

<!-- gh-comment-id:1752097219 --> @shtrophic commented on GitHub (Oct 8, 2023): Wow, thanks for pointing to litellm @vividfog. For anyone on Arch Linux (btw) and interested, I came up with a PKGBUILD that sets up litellm with ollama as a systemd service. You can check it out on the [AUR](https://aur.archlinux.org/packages/litellm-ollama). Feel free to get back to me with any feedback!
Author
Owner

@vividfog commented on GitHub (Oct 8, 2023):

My initial advice was not complete I learned today. Continue.dev sends two parallel queries, one for the user task and another to summarize the conversation. And LiteLLM logs may show an error from Ollama after the second call. There's a fix for this client-side.

This Continue.dev configuration imports a wrapper that makes all calls sequential, queued:

  1. Import the QueuedLLM wrapper near the top of config.py:
from continuedev.src.continuedev.libs.llm.queued import QueuedLLM
  1. The server calls can now be made sequential like this:
    models=Models(
        default=QueuedLLM(
            llm=OpenAI(
                api_key="IGNORED",
                model="ollama/codellama",
                context_length=2048,
                api_base="http://localhost:8000"
            )
        )
    ),

This may now be leaning off-topic vs. the original issue, but hope it helps those who used the previous advice. The friendly developers at Continue.dev Github/Discord are there if needed. I learned about the QueuedLLM wrapper initially in their Discord.

What remains a little confusing is that previously I've seen Ollama handle parallel API calls in sequence, or was I hallucinating? Not sure why QueuedLLM() is then needed, but if the shoe fits, wear it I guess. Material for another issue if someone wants to drill down and verify.

What I really like is how these 3 projects work together without knowing about each other at code level, as if following the same plan. That indeed is the benefit of following the same API conventions, the topic of this issue.

<!-- gh-comment-id:1752150867 --> @vividfog commented on GitHub (Oct 8, 2023): My initial advice was not complete I learned today. Continue.dev sends two parallel queries, one for the user task and another to summarize the conversation. And LiteLLM logs may show an error from Ollama after the second call. There's a fix for this client-side. This Continue.dev configuration imports a wrapper that makes all calls sequential, queued: 1. Import the QueuedLLM wrapper near the top of `config.py`: ``` from continuedev.src.continuedev.libs.llm.queued import QueuedLLM ``` 2. The server calls can now be made sequential like this: ``` models=Models( default=QueuedLLM( llm=OpenAI( api_key="IGNORED", model="ollama/codellama", context_length=2048, api_base="http://localhost:8000" ) ) ), ``` This may now be leaning off-topic vs. the original issue, but hope it helps those who used the previous advice. The friendly developers at [Continue.dev](https://github.com/continuedev/continue) Github/Discord are there if needed. I learned about the [QueuedLLM](https://continue.dev/docs/reference/Models/queuedllm) wrapper initially in their Discord. What remains a little confusing is that previously I've seen Ollama handle parallel API calls in sequence, or was I hallucinating? Not sure why QueuedLLM() is then needed, but if the shoe fits, wear it I guess. Material for another issue if someone wants to drill down and verify. What I really like is how these 3 projects work together without knowing about each other at code level, as if following the same plan. That indeed is the benefit of following the same API conventions, the topic of this issue.
Author
Owner

@MilleniumDawn commented on GitHub (Nov 11, 2023):

I realise its probably my lack of knowledge that is the probleme, but my Front end can use either LM Studio or oobabooga/text-generation-webui simply by change the base_api.

I wanted to try Ollama cause its seem to be doing a lot of thing simpler/faster.

But not supporting what seem to develop as the goto format for API, openAi api is a big minus. (i realise this is free, i dont want to be a choser/begger, just trying to provide feedback).

I try LiteLLM, and its not a drop-in replacement, and now, what was suposed to be simple, need to be debbuged.

So my feedback is, i hope Ollama gonna nativly have support for openAI API rather than rely on external Library that migh seem easy for ppl who know there stuff, but not as easy for ppl that went to Ollama for its simplicity.

I'm leaving my error log of LiteLLM just as reference, i know its not this project.

@mac ~ % litellm --drop_params --debug --model ollama/dolphin --api_base http://localhost:11434
ollama called
INFO:     Started server process [42896]
INFO:     Waiting for application startup.

#------------------------------------------------------------#
#                                                            #
#            'The thing I wish you improved is...'            #
#        https://github.com/BerriAI/litellm/issues/new        #
#                                                            #
#------------------------------------------------------------#

 Thank you for using LiteLLM! - Krrish & Ishaan



Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new

Docs: https://docs.litellm.ai/docs/simple_proxy

LiteLLM: Test your local endpoint with: "litellm --test" [In a new terminal tab]


INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
litellm.caching: False; litellm.caching_with_models: False; litellm.cache: None
kwargs[caching]: False; litellm.cache: None

LiteLLM completion() model= dolphin

LiteLLM: Params passed to completion() {'functions': [], 'function_call': '', 'temperature': 0.7, 'top_p': 0.9, 'n': None, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5, 'logit_bias': {}, 'user': '', 'model': 'dolphin', 'custom_llm_provider': 'ollama', 'repetition_penalty': 1.1, 'top_k': 20}

LiteLLM: Non-Default params passed to completion() {'temperature': 0.7, 'top_p': 0.9, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5}
self.optional_params: {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}
Logging Details Pre-API Call for call id b91948c3-ba26-4ebc-a140-c141a9e68764
MODEL CALL INPUT: {'model': 'dolphin', 'messages': [{'role': 'system', 'content': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."}, {'role': 'user', 'content': 'USER : Tell me what you are in one phrase. ASSISTANT: '}], 'optional_params': {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}, 'litellm_params': {'return_async': False, 'api_key': None, 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'ollama', 'api_base': 'http://localhost:11434', 'litellm_call_id': 'b91948c3-ba26-4ebc-a140-c141a9e68764', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': None, 'stream_response': {}}, 'start_time': datetime.datetime(2023, 11, 11, 10, 0, 17, 953683), 'input': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'api_key': None, 'additional_args': {'complete_input_dict': {'text': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}}, 'log_event_type': 'pre_api_call'}


Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
self.failure_callback: []
An error occurred: Failed to parse: http://localhost:11434dolphin/generation

 Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
INFO:     127.0.0.1:61413 - "POST /chat/completions HTTP/1.1" 200 OK
<!-- gh-comment-id:1806841169 --> @MilleniumDawn commented on GitHub (Nov 11, 2023): I realise its probably my lack of knowledge that is the probleme, but my Front end can use either LM Studio or oobabooga/text-generation-webui simply by change the base_api. I wanted to try Ollama cause its seem to be doing a lot of thing simpler/faster. But not supporting what seem to develop as the goto format for API, openAi api is a big minus. (i realise this is free, i dont want to be a choser/begger, just trying to provide feedback). I try LiteLLM, and its not a drop-in replacement, and now, what was suposed to be simple, need to be debbuged. So my feedback is, i hope Ollama gonna nativly have support for openAI API rather than rely on external Library that migh seem easy for ppl who know there stuff, but not as easy for ppl that went to Ollama for its simplicity. I'm leaving my error log of LiteLLM just as reference, i know its not this project. ``` @mac ~ % litellm --drop_params --debug --model ollama/dolphin --api_base http://localhost:11434 ollama called INFO: Started server process [42896] INFO: Waiting for application startup. #------------------------------------------------------------# # # # 'The thing I wish you improved is...' # # https://github.com/BerriAI/litellm/issues/new # # # #------------------------------------------------------------# Thank you for using LiteLLM! - Krrish & Ishaan Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new Docs: https://docs.litellm.ai/docs/simple_proxy LiteLLM: Test your local endpoint with: "litellm --test" [In a new terminal tab] INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) litellm.caching: False; litellm.caching_with_models: False; litellm.cache: None kwargs[caching]: False; litellm.cache: None LiteLLM completion() model= dolphin LiteLLM: Params passed to completion() {'functions': [], 'function_call': '', 'temperature': 0.7, 'top_p': 0.9, 'n': None, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5, 'logit_bias': {}, 'user': '', 'model': 'dolphin', 'custom_llm_provider': 'ollama', 'repetition_penalty': 1.1, 'top_k': 20} LiteLLM: Non-Default params passed to completion() {'temperature': 0.7, 'top_p': 0.9, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5} self.optional_params: {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20} Logging Details Pre-API Call for call id b91948c3-ba26-4ebc-a140-c141a9e68764 MODEL CALL INPUT: {'model': 'dolphin', 'messages': [{'role': 'system', 'content': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."}, {'role': 'user', 'content': 'USER : Tell me what you are in one phrase. ASSISTANT: '}], 'optional_params': {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}, 'litellm_params': {'return_async': False, 'api_key': None, 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'ollama', 'api_base': 'http://localhost:11434', 'litellm_call_id': 'b91948c3-ba26-4ebc-a140-c141a9e68764', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': None, 'stream_response': {}}, 'start_time': datetime.datetime(2023, 11, 11, 10, 0, 17, 953683), 'input': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'api_key': None, 'additional_args': {'complete_input_dict': {'text': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}}, 'log_event_type': 'pre_api_call'} Logging Details: logger_fn - None | callable(logger_fn) - False Logging Details LiteLLM-Failure Call self.failure_callback: [] An error occurred: Failed to parse: http://localhost:11434dolphin/generation Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug` INFO: 127.0.0.1:61413 - "POST /chat/completions HTTP/1.1" 200 OK ```
Author
Owner

@PetrarcaBruto commented on GitHub (Nov 14, 2023):

I agree about the speed of litellm vs the ollama server comment made by @MilleniumDawn. I may be wrong but I have noticed the native ollama server logs that my WSL GPU is being used, e.g. the following server message:

"ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti with Max-Q Design, compute capability 7.5"

I suspect that litellm server or workers are not using my GPU. If that is the case then it will explain the difference in speed.

Any comments/advice will be very welcomed.

<!-- gh-comment-id:1809548087 --> @PetrarcaBruto commented on GitHub (Nov 14, 2023): I agree about the speed of litellm vs the ollama server comment made by @MilleniumDawn. I may be wrong but I have noticed the native ollama server logs that my WSL GPU is being used, e.g. the following server message: "ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 1660 Ti with Max-Q Design, compute capability 7.5" I suspect that litellm server or workers are not using my GPU. If that is the case then it will explain the difference in speed. Any comments/advice will be very welcomed.
Author
Owner

@kylemclaren commented on GitHub (Nov 14, 2023):

@PetrarcaBruto nvidia-smi should show the ollama runner process if GPU is utilized, like this:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:00:06.0 Off |                    0 |
| N/A   37C    P0              38W / 250W |  15261MiB / 40960MiB |     16%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       501      C   ...p/gguf/build/cuda/bin/ollama-runner    15248MiB |
+---------------------------------------------------------------------------------------+
<!-- gh-comment-id:1809618928 --> @kylemclaren commented on GitHub (Nov 14, 2023): @PetrarcaBruto `nvidia-smi` should show the ollama runner process if GPU is utilized, like this: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-PCIE-40GB Off | 00000000:00:06.0 Off | 0 | | N/A 37C P0 38W / 250W | 15261MiB / 40960MiB | 16% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 501 C ...p/gguf/build/cuda/bin/ollama-runner 15248MiB | +---------------------------------------------------------------------------------------+ ```
Author
Owner

@ghost commented on GitHub (Nov 14, 2023):

+1

<!-- gh-comment-id:1811568079 --> @ghost commented on GitHub (Nov 14, 2023): +1
Author
Owner

@ghost commented on GitHub (Nov 15, 2023):

Hey @MilleniumDawn i found the issue - it was being misrouted. Just pushed a fix - 1738341dcb

Should be live in v1.0.2 by EOD. I'm really sorry for that.

@PetrarcaBruto re: litellm workers

For ollama specifically - we check if you're making an ollama call, and run ollama serve in a separate worker - c7780cbc40/litellm/proxy/proxy_cli.py (L20)

open to suggestions for how we can improve this further.

<!-- gh-comment-id:1811730442 --> @ghost commented on GitHub (Nov 15, 2023): Hey @MilleniumDawn i found the issue - it was being misrouted. Just pushed a fix - https://github.com/BerriAI/litellm/commit/1738341dcb16884bfff42a0b2004ba5afd856c5d Should be live in v`1.0.2` by EOD. I'm really sorry for that. @PetrarcaBruto re: litellm workers For ollama specifically - we check if you're making an ollama call, and run `ollama serve` in a separate worker - https://github.com/BerriAI/litellm/blob/c7780cbc40b6d34144677d7979ba4318f0a0d5a9/litellm/proxy/proxy_cli.py#L20 open to suggestions for how we can improve this further.
Author
Owner

@PetrarcaBruto commented on GitHub (Nov 15, 2023):

@kylemclaren & @krrishdholakia thanks for the tips. I found that my GPU is being used also when running litellm which is good news.

<!-- gh-comment-id:1811813109 --> @PetrarcaBruto commented on GitHub (Nov 15, 2023): @kylemclaren & @krrishdholakia thanks for the tips. I found that my GPU is being used also when running litellm which is good news.
Author
Owner

@patrickdobler commented on GitHub (Nov 16, 2023):

That would be a great addition. I would love to use Ollama with TypingMind.

<!-- gh-comment-id:1813833154 --> @patrickdobler commented on GitHub (Nov 16, 2023): That would be a great addition. I would love to use Ollama with TypingMind.
Author
Owner

@priamai commented on GitHub (Nov 16, 2023):

Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama

Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server

Please let me know how we can make it better for the ollama community😃

AMAZING how did I not see this before! It will be useful to add also a simple API_TOKEN so at least I can put it on a cloud service without having to fiddle with additional proxy authenticators.

<!-- gh-comment-id:1813884419 --> @priamai commented on GitHub (Nov 16, 2023): > Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama > > Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server > > Please let me know how we can make it better for the ollama community😃 AMAZING how did I not see this before! It will be useful to add also a simple API_TOKEN so at least I can put it on a cloud service without having to fiddle with additional proxy authenticators.
Author
Owner

@ghost commented on GitHub (Nov 16, 2023):

@priamai we have that - https://docs.litellm.ai/docs/simple_proxy#example-config

you can add a master_key in the config.yaml, and this will require all calls to pass that key as part of the bearer token.

let me know if you end up using it, would love to know how we can improve it for you - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

<!-- gh-comment-id:1815024578 --> @ghost commented on GitHub (Nov 16, 2023): @priamai we have that - https://docs.litellm.ai/docs/simple_proxy#example-config you can add a master_key in the config.yaml, and this will require all calls to pass that key as part of the bearer token. let me know if you end up using it, would love to know how we can improve it for you - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Author
Owner

@agonbina commented on GitHub (Nov 18, 2023):

embeddings with Ollama do not seem to be supported through the Litellm proxy.

<!-- gh-comment-id:1817514770 --> @agonbina commented on GitHub (Nov 18, 2023): **embeddings** with Ollama do not seem to be supported through the Litellm proxy.
Author
Owner

@sottey commented on GitHub (Nov 25, 2023):

I, too, would love this. It would allow me to integrate in TypingMind. Thank you for your amazing stuff!

<!-- gh-comment-id:1826195839 --> @sottey commented on GitHub (Nov 25, 2023): I, too, would love this. It would allow me to integrate in TypingMind. Thank you for your amazing stuff!
Author
Owner

@iplayfast commented on GitHub (Nov 27, 2023):

Yeah, I'm trying to use litellm and it's a very weak crutch. If you want something done right you gotta do it yourself and build the openai api into ollama.

<!-- gh-comment-id:1827155549 --> @iplayfast commented on GitHub (Nov 27, 2023): Yeah, I'm trying to use litellm and it's a very weak crutch. If you want something done right you gotta do it yourself and build the openai api into ollama.
Author
Owner

@kamuridesu commented on GitHub (Nov 27, 2023):

If you want something done right you gotta do it yourself

So go ahead and contribute with a pr for ollama or improving litellm

<!-- gh-comment-id:1827845842 --> @kamuridesu commented on GitHub (Nov 27, 2023): > If you want something done right you gotta do it yourself So go ahead and contribute with a pr for ollama or improving litellm
Author
Owner

@flaviovs commented on GitHub (Nov 29, 2023):

Two things to be aware of when using LiteLLM:

I hope this saves people's time if their plan is to use Ollama+LiteLLM offline for privacy/compliance reasons.

<!-- gh-comment-id:1831016257 --> @flaviovs commented on GitHub (Nov 29, 2023): Two things to be aware of when using LiteLLM: - [LiteLLM does outbound network connections](https://github.com/BerriAI/litellm/issues/739) therefore it won't work in firewalled environments; and - [By default their OpenAI API proxy does phone home](https://docs.litellm.ai/docs/simple_proxy#--telemetry) (you can turn this feature off). I hope this saves people's time if their plan is to use Ollama+LiteLLM offline for privacy/compliance reasons.
Author
Owner

@MARYAMJAHANIR commented on GitHub (Dec 15, 2023):

AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/

hey,
I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video.

Error:
/home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py
(autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py
Traceback (most recent call last):
File "/mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py", line 25, in
assistant = autogen.AssistantAgent(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/assistant_agent.py", line 61, in init
super().init(
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/conversable_agent.py", line 121, in init
self.client = OpenAIWrapper(**self.llm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in init
self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in
self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client
client = OpenAI(**openai_config)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in init
raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
(autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$

If you can suggest something regarding this so it wuld be great.

<!-- gh-comment-id:1858360967 --> @MARYAMJAHANIR commented on GitHub (Dec 15, 2023): > AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/ hey, I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video. Error: /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py Traceback (most recent call last): File "/mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py", line 25, in <module> assistant = autogen.AssistantAgent( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/assistant_agent.py", line 61, in __init__ super().__init__( File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/conversable_agent.py", line 121, in __init__ self.client = OpenAIWrapper(**self.llm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in __init__ self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in <listcomp> self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client client = OpenAI(**openai_config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in __init__ raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ If you can suggest something regarding this so it wuld be great.
Author
Owner

@clevcode commented on GitHub (Dec 19, 2023):

hey, I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video.
...
"/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client client = OpenAI(**openai_config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$

If you can suggest something regarding this so it wuld be great.

The litellm proxy doesn't care about the value of the API key, or whether it is sent or not, but since the OpenAI package requires it to be set you can simply set it to anything in order to satisfy the requirements of the OpenAI module

Either use "export OPENAI_API_KEY=whatever" in the shell before you run your agent, or set "api_key": "whatever" in the llm_config dict that you pass to the *Agent() constructors

<!-- gh-comment-id:1862640502 --> @clevcode commented on GitHub (Dec 19, 2023): > hey, I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video. ... > "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client client = OpenAI(**openai_config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in **init** raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ > > If you can suggest something regarding this so it wuld be great. The litellm proxy doesn't care about the value of the API key, or whether it is sent or not, but since the OpenAI package requires it to be set you can simply set it to anything in order to satisfy the requirements of the OpenAI module Either use "export OPENAI_API_KEY=whatever" in the shell before you run your agent, or set "api_key": "whatever" in the llm_config dict that you pass to the *Agent() constructors
Author
Owner

@MARYAMJAHANIR commented on GitHub (Dec 22, 2023):

@clevcode thanks for your reply i have sorted that but the thing is when i tried this with meta got so i was getting error like this:

(metagpt) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt$ python startup.py "create a 2048 game in python"
2023-12-22 07:27:14.516 | INFO | metagpt.const:get_metagpt_package_root:32 - Package root set to /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt
2023-12-22 07:27:15.188 | INFO | metagpt.config:get_default_llm_provider_enum:88 - OpenAI API Model: gpt-4-1106-preview
2023-12-22 07:27:15.869 | INFO | metagpt.team:invest:84 - Investment: $3.0.
2023-12-22 07:27:15.873 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to PrepareDocuments
2023-12-22 07:27:16.639 | INFO | metagpt.utils.file_repository:save:60 - save to: /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt/workspace/20231222072715/docs/requirement.txt
2023-12-22 07:27:16.646 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to WritePRD
2023-12-22 07:27:16.960 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.293(s), this was the 1st time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:17.467 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.800(s), this was the 2nd time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:18.830 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 2.163(s), this was the 3rd time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:21.207 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 4.540(s), this was the 4th time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:21.955 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 5.288(s), this was the 5th time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:23.664 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 6.997(s), this was the 6th time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:23.668 | WARNING | metagpt.utils.common:wrapper:505 - There is a exception in role's execution, in order to resume, we delete the newest role communication message in the role's memory.
2023-12-22 07:27:23.698 | ERROR | metagpt.utils.common:wrapper:487 - Exception occurs, start to serialize the project, exp:
Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1
content = await self.llm.aask(prompt, system_msgs)
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper
return await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run
rsp = await self.react()
tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 482, in wrapper
result = await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/team.py", line 124, in run
await self.env.run()
Exception: Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1
content = await self.llm.aask(prompt, system_msgs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/base_gpt_api.py", line 53, in aask
rsp = await self.acompletion_text(message, stream=stream)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 274, in acompletion_text
return await self._achat_completion_stream(messages)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 211, in _achat_completion_stream
response: AsyncStream[ChatCompletionChunk] = await self.async_client.chat.completions.create(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 1295, in create
return await self._post(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1536, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1315, in request
return await self._request(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1392, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper
return await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run
rsp = await self.react()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 479, in react
rsp = await self._react()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 459, in _react
rsp = await self._act() # 这个rsp是否需要publish_message?
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 380, in _act
response = await self._rc.todo.run(self._rc.important_memory)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 105, in run
prd_doc = await self._update_prd(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 146, in _update_prd
prd = await self._run_new_requirement(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 126, in _run_new_requirement
node = await WRITE_PRD_NODE.fill(context=context, llm=self.llm, schema=schema)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 314, in fill
return await self.simple_fill(schema, mode)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 288, in simple_fill
content, scontent = await self._aask_v1(prompt, class_name, mapping, schema=schema)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>]

i thought i am not configured it in a right way but don't know exactly what should i do for this.

<!-- gh-comment-id:1867178292 --> @MARYAMJAHANIR commented on GitHub (Dec 22, 2023): @clevcode thanks for your reply i have sorted that but the thing is when i tried this with meta got so i was getting error like this: (metagpt) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt$ python startup.py "create a 2048 game in python" 2023-12-22 07:27:14.516 | INFO | metagpt.const:get_metagpt_package_root:32 - Package root set to /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt 2023-12-22 07:27:15.188 | INFO | metagpt.config:get_default_llm_provider_enum:88 - OpenAI API Model: gpt-4-1106-preview 2023-12-22 07:27:15.869 | INFO | metagpt.team:invest:84 - Investment: $3.0. 2023-12-22 07:27:15.873 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to PrepareDocuments 2023-12-22 07:27:16.639 | INFO | metagpt.utils.file_repository:save:60 - save to: /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt/workspace/20231222072715/docs/requirement.txt 2023-12-22 07:27:16.646 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to WritePRD 2023-12-22 07:27:16.960 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.293(s), this was the 1st time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:17.467 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.800(s), this was the 2nd time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:18.830 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 2.163(s), this was the 3rd time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:21.207 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 4.540(s), this was the 4th time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:21.955 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 5.288(s), this was the 5th time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:23.664 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 6.997(s), this was the 6th time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:23.668 | WARNING | metagpt.utils.common:wrapper:505 - There is a exception in role's execution, in order to resume, we delete the newest role communication message in the role's memory. 2023-12-22 07:27:23.698 | ERROR | metagpt.utils.common:wrapper:487 - Exception occurs, start to serialize the project, exp: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__ result = await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1 content = await self.llm.aask(prompt, system_msgs) openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper return await func(self, *args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run rsp = await self.react() tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 482, in wrapper result = await func(self, *args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/team.py", line 124, in run await self.env.run() Exception: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__ result = await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1 content = await self.llm.aask(prompt, system_msgs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/base_gpt_api.py", line 53, in aask rsp = await self.acompletion_text(message, stream=stream) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in __call__ do = self.iter(retry_state=retry_state) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter return fut.result() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__ result = await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 274, in acompletion_text return await self._achat_completion_stream(messages) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 211, in _achat_completion_stream response: AsyncStream[ChatCompletionChunk] = await self.async_client.chat.completions.create( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 1295, in create return await self._post( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1536, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1315, in request return await self._request( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1392, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper return await func(self, *args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run rsp = await self.react() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 479, in react rsp = await self._react() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 459, in _react rsp = await self._act() # 这个rsp是否需要publish_message? File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 380, in _act response = await self._rc.todo.run(self._rc.important_memory) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 105, in run prd_doc = await self._update_prd( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 146, in _update_prd prd = await self._run_new_requirement( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 126, in _run_new_requirement node = await WRITE_PRD_NODE.fill(context=context, llm=self.llm, schema=schema) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 314, in fill return await self.simple_fill(schema, mode) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 288, in simple_fill content, scontent = await self._aask_v1(prompt, class_name, mapping, schema=schema) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in __call__ do = self.iter(retry_state=retry_state) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>] i thought i am not configured it in a right way but don't know exactly what should i do for this.
Author
Owner

@MARYAMJAHANIR commented on GitHub (Dec 22, 2023):

@clevcode I am trying metagpt with ollama model codellama with the help of litellm so that i don't need any API key but now it seems difficult because it did not work yet here is metagpt config.yaml file:

DO NOT MODIFY THIS FILE, create a new key.yaml, define OPENAI_API_KEY.

The configuration of key.yaml has a higher priority and will not enter git

Project Path Setting

WORKSPACE_PATH: "Path for placing output files"

if OpenAI

The official OPENAI_BASE_URL is https://api.openai.com/v1

If the official OPENAI_BASE_URL is not available, we recommend using the openai-forward.

Or, you can configure OPENAI_PROXY to access official OPENAI_BASE_URL.

OPENAI_BASE_URL: "http://0.0.0.0:8000"
#OPENAI_PROXY: "http://127.0.0.1:8118"

OPENAI_API_KEY: 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 # set the value to sk-xxx if you host the openai interface for open llm model

OPENAI_API_MODEL: "ollama/codellama"
MAX_TOKENS: 4096
RPM: 10

if Spark

#SPARK_APPID : "YOUR_APPID"
#SPARK_API_SECRET : "YOUR_APISecret"
#SPARK_API_KEY : "YOUR_APIKey"
#DOMAIN : "generalv2"
#SPARK_URL : "ws://spark-api.xf-yun.com/v2.1/chat"

if Anthropic

#ANTHROPIC_API_KEY: "YOUR_API_KEY"

if AZURE, check https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat.ipynb

#OPENAI_API_TYPE: "azure"
#OPENAI_BASE_URL: "YOUR_AZURE_ENDPOINT"
#OPENAI_API_KEY: "YOUR_AZURE_API_KEY"
#OPENAI_API_VERSION: "YOUR_AZURE_API_VERSION"
#DEPLOYMENT_NAME: "YOUR_DEPLOYMENT_NAME"

if zhipuai from https://open.bigmodel.cn. You can set here or export API_KEY="YOUR_API_KEY"

ZHIPUAI_API_KEY: "YOUR_API_KEY"

if Google Gemini from https://ai.google.dev/ and API_KEY from https://makersuite.google.com/app/apikey.

You can set here or export GOOGLE_API_KEY="YOUR_API_KEY"

GEMINI_API_KEY: "YOUR_API_KEY"

if use self-host open llm model with openai-compatible interface

#OPEN_LLM_API_BASE: "http://127.0.0.1:8000/v1"
#OPEN_LLM_API_MODEL: "llama2-13b"

if use Fireworks api

#FIREWORKS_API_KEY: "YOUR_API_KEY"
#FIREWORKS_API_BASE: "https://api.fireworks.ai/inference/v1"
#FIREWORKS_API_MODEL: "YOUR_LLM_MODEL" # example, accounts/fireworks/models/llama-v2-13b-chat

Supported values: serpapi/google/serper/ddg

#SEARCH_ENGINE: serpapi

Visit https://serpapi.com/ to get key.

#SERPAPI_API_KEY: "YOUR_API_KEY"

Visit https://console.cloud.google.com/apis/credentials to get key.

#GOOGLE_API_KEY: "YOUR_API_KEY"

Visit https://programmablesearchengine.google.com/controlpanel/create to get id.

#GOOGLE_CSE_ID: "YOUR_CSE_ID"

Visit https://serper.dev/ to get key.

#SERPER_API_KEY: "YOUR_API_KEY"

for web access

Supported values: playwright/selenium

#WEB_BROWSER_ENGINE: playwright

Supported values: chromium/firefox/webkit, visit https://playwright.dev/python/docs/api/class-browsertype

##PLAYWRIGHT_BROWSER_TYPE: chromium

Supported values: chrome/firefox/edge/ie, visit https://www.selenium.dev/documentation/webdriver/browsers/

SELENIUM_BROWSER_TYPE: chrome

for TTS

#AZURE_TTS_SUBSCRIPTION_KEY: "YOUR_API_KEY"
#AZURE_TTS_REGION: "eastus"

for Stable Diffusion

Use SD service, based on https://github.com/AUTOMATIC1111/stable-diffusion-webui

#SD_URL: "YOUR_SD_URL"
#SD_T2I_API: "/sdapi/v1/txt2img"

for Execution

#LONG_TERM_MEMORY: false

for Mermaid CLI

If you installed mmdc (Mermaid CLI) only for metagpt then enable the following configuration.

#PUPPETEER_CONFIG: "./config/puppeteer-config.json"
#MMDC: "./node_modules/.bin/mmdc"

for calc_usage

CALC_USAGE: false

for Research

MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo

MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k

choose the engine for mermaid conversion,

default is nodejs, you can change it to playwright,pyppeteer or ink

MERMAID_ENGINE: nodejs

browser path for pyppeteer engine, support Chrome, Chromium,MS Edge

#PYPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable"

for repair non-openai LLM's output when parse json-text if PROMPT_FORMAT=json

due to non-openai LLM's output will not always follow the instruction, so here activate a post-process

repair operation on the content extracted from LLM's raw output. Warning, it improves the result but not fix all cases.

REPAIR_LLM_OUTPUT: false

PROMPT_FORMAT: json #json or markdown

<!-- gh-comment-id:1867179681 --> @MARYAMJAHANIR commented on GitHub (Dec 22, 2023): @clevcode I am trying metagpt with ollama model codellama with the help of litellm so that i don't need any API key but now it seems difficult because it did not work yet here is metagpt config.yaml file: # DO NOT MODIFY THIS FILE, create a new key.yaml, define OPENAI_API_KEY. # The configuration of key.yaml has a higher priority and will not enter git #### Project Path Setting # WORKSPACE_PATH: "Path for placing output files" #### if OpenAI ## The official OPENAI_BASE_URL is https://api.openai.com/v1 ## If the official OPENAI_BASE_URL is not available, we recommend using the [openai-forward](https://github.com/beidongjiedeguang/openai-forward). ## Or, you can configure OPENAI_PROXY to access official OPENAI_BASE_URL. OPENAI_BASE_URL: "http://0.0.0.0:8000" #OPENAI_PROXY: "http://127.0.0.1:8118" # OPENAI_API_KEY: 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 # set the value to sk-xxx if you host the openai interface for open llm model OPENAI_API_MODEL: "ollama/codellama" MAX_TOKENS: 4096 RPM: 10 #### if Spark #SPARK_APPID : "YOUR_APPID" #SPARK_API_SECRET : "YOUR_APISecret" #SPARK_API_KEY : "YOUR_APIKey" #DOMAIN : "generalv2" #SPARK_URL : "ws://spark-api.xf-yun.com/v2.1/chat" #### if Anthropic #ANTHROPIC_API_KEY: "YOUR_API_KEY" #### if AZURE, check https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat.ipynb #OPENAI_API_TYPE: "azure" #OPENAI_BASE_URL: "YOUR_AZURE_ENDPOINT" #OPENAI_API_KEY: "YOUR_AZURE_API_KEY" #OPENAI_API_VERSION: "YOUR_AZURE_API_VERSION" #DEPLOYMENT_NAME: "YOUR_DEPLOYMENT_NAME" #### if zhipuai from `https://open.bigmodel.cn`. You can set here or export API_KEY="YOUR_API_KEY" # ZHIPUAI_API_KEY: "YOUR_API_KEY" #### if Google Gemini from `https://ai.google.dev/` and API_KEY from `https://makersuite.google.com/app/apikey`. #### You can set here or export GOOGLE_API_KEY="YOUR_API_KEY" # GEMINI_API_KEY: "YOUR_API_KEY" #### if use self-host open llm model with openai-compatible interface #OPEN_LLM_API_BASE: "http://127.0.0.1:8000/v1" #OPEN_LLM_API_MODEL: "llama2-13b" # ##### if use Fireworks api #FIREWORKS_API_KEY: "YOUR_API_KEY" #FIREWORKS_API_BASE: "https://api.fireworks.ai/inference/v1" #FIREWORKS_API_MODEL: "YOUR_LLM_MODEL" # example, accounts/fireworks/models/llama-v2-13b-chat #### for Search ## Supported values: serpapi/google/serper/ddg #SEARCH_ENGINE: serpapi ## Visit https://serpapi.com/ to get key. #SERPAPI_API_KEY: "YOUR_API_KEY" ## Visit https://console.cloud.google.com/apis/credentials to get key. #GOOGLE_API_KEY: "YOUR_API_KEY" ## Visit https://programmablesearchengine.google.com/controlpanel/create to get id. #GOOGLE_CSE_ID: "YOUR_CSE_ID" ## Visit https://serper.dev/ to get key. #SERPER_API_KEY: "YOUR_API_KEY" #### for web access ## Supported values: playwright/selenium #WEB_BROWSER_ENGINE: playwright ## Supported values: chromium/firefox/webkit, visit https://playwright.dev/python/docs/api/class-browsertype ##PLAYWRIGHT_BROWSER_TYPE: chromium ## Supported values: chrome/firefox/edge/ie, visit https://www.selenium.dev/documentation/webdriver/browsers/ # SELENIUM_BROWSER_TYPE: chrome #### for TTS #AZURE_TTS_SUBSCRIPTION_KEY: "YOUR_API_KEY" #AZURE_TTS_REGION: "eastus" #### for Stable Diffusion ## Use SD service, based on https://github.com/AUTOMATIC1111/stable-diffusion-webui #SD_URL: "YOUR_SD_URL" #SD_T2I_API: "/sdapi/v1/txt2img" #### for Execution #LONG_TERM_MEMORY: false #### for Mermaid CLI ## If you installed mmdc (Mermaid CLI) only for metagpt then enable the following configuration. #PUPPETEER_CONFIG: "./config/puppeteer-config.json" #MMDC: "./node_modules/.bin/mmdc" ### for calc_usage # CALC_USAGE: false ### for Research # MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo # MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k ### choose the engine for mermaid conversion, # default is nodejs, you can change it to playwright,pyppeteer or ink # MERMAID_ENGINE: nodejs ### browser path for pyppeteer engine, support Chrome, Chromium,MS Edge #PYPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable" ### for repair non-openai LLM's output when parse json-text if PROMPT_FORMAT=json ### due to non-openai LLM's output will not always follow the instruction, so here activate a post-process ### repair operation on the content extracted from LLM's raw output. Warning, it improves the result but not fix all cases. # REPAIR_LLM_OUTPUT: false # PROMPT_FORMAT: json #json or markdown
Author
Owner

@bdurrani commented on GitHub (Dec 23, 2023):

There is also BrainGPT which requires OpenAPI compatibility

<!-- gh-comment-id:1868185262 --> @bdurrani commented on GitHub (Dec 23, 2023): There is also [BrainGPT](https://bionic-gpt.com/docs/administration/external-api/) which requires OpenAPI compatibility
Author
Owner

@puckettgw commented on GitHub (Dec 31, 2023):

+1 for this issue

I'm trying to use LangChain to create a GitHub coder bot. Trouble is, Ollama doesn't produce the output expected by certain tools, e.g.
GitHub Toolkit CreateFile

The output from Ollama + Mixtral is

Thought: Now, let's create the main application file `app.py` inside the 'recipe_app' directory:
{
  "action": "Create File",
  "action_input": {
    "path": "recipe_app/app.py"
  }
}

But the toolkit is expecting a formatted_file arg:

pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateFile
formatted_file
  field required (type=value_error.missing)

Of course I could implement my own tools for this, but that's kind of smelly.

<!-- gh-comment-id:1873009335 --> @puckettgw commented on GitHub (Dec 31, 2023): +1 for this issue I'm trying to use LangChain to create a GitHub coder bot. Trouble is, Ollama doesn't produce the output expected by certain tools, e.g. [GitHub Toolkit CreateFile](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.github.toolkit.CreateFile.html) The output from Ollama + Mixtral is ``` Thought: Now, let's create the main application file `app.py` inside the 'recipe_app' directory: { "action": "Create File", "action_input": { "path": "recipe_app/app.py" } } ``` But the toolkit is expecting a formatted_file arg: ``` pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateFile formatted_file field required (type=value_error.missing) ``` Of course I could implement my own tools for this, but that's kind of smelly.
Author
Owner

@louis030195 commented on GitHub (Jan 11, 2024):

Would be great to be able to use ollama with OpenAI SDK directly, (and not having to use stuff like litellm)

<!-- gh-comment-id:1888059506 --> @louis030195 commented on GitHub (Jan 11, 2024): Would be great to be able to use ollama with OpenAI SDK directly, (and not having to use stuff like litellm)
Author
Owner

@vtboyarc commented on GitHub (Jan 27, 2024):

Is this being worked?

<!-- gh-comment-id:1912889866 --> @vtboyarc commented on GitHub (Jan 27, 2024): Is this being worked?
Author
Owner

@johnnyq commented on GitHub (Feb 5, 2024):

Yes Please make it openai API compatible to intergrate with FusionPBX for Voicemail Transriptions and Nextcloud Integration for the AI functions

<!-- gh-comment-id:1926263335 --> @johnnyq commented on GitHub (Feb 5, 2024): Yes Please make it openai API compatible to intergrate with FusionPBX for Voicemail Transriptions and Nextcloud Integration for the AI functions
Author
Owner

@NeevJewalkar commented on GitHub (Feb 5, 2024):

is this being worked on?

<!-- gh-comment-id:1927161906 --> @NeevJewalkar commented on GitHub (Feb 5, 2024): is this being worked on?
Author
Owner

@jmorganca commented on GitHub (Feb 6, 2024):

It is! https://github.com/ollama/ollama/pull/2376

<!-- gh-comment-id:1930671088 --> @jmorganca commented on GitHub (Feb 6, 2024): It is! https://github.com/ollama/ollama/pull/2376
Author
Owner

@jmorganca commented on GitHub (Feb 8, 2024):

Wanted to share an update: version 0.1.24 is out with initial OpenAI compatibility.

<!-- gh-comment-id:1934831414 --> @jmorganca commented on GitHub (Feb 8, 2024): Wanted to share an update: [version 0.1.24](https://github.com/ollama/ollama/releases/tag/v0.1.24) is out with initial OpenAI compatibility. * [Blog post](https://ollama.ai/blog/openai-compatibility) with examples such as Autogen and the Vercel AI SDK * [Twitter post](https://twitter.com/ollama/status/1755675732997505393) with some feedback so far
Author
Owner

@johnnyq commented on GitHub (Feb 8, 2024):

@jmorganca works great!
We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating
Hopefully in the future Nextcloud gets full integration with the Ollama API.
Thanks a bunch for this!!

<!-- gh-comment-id:1934989022 --> @johnnyq commented on GitHub (Feb 8, 2024): @jmorganca works great! We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating Hopefully in the future Nextcloud gets full integration with the Ollama API. Thanks a bunch for this!!
Author
Owner

@spmfox commented on GitHub (Feb 9, 2024):

@jmorganca works great! We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating Hopefully in the future Nextcloud gets full integration with the Ollama API. Thanks a bunch for this!!

Screenshot from 2024-02-08 17-42-08
It looks like the API for /v1/models isn't implemented yet (see the 404 errors above), I assume this returns the available models - my Nextcloud could not detect them either, and it defaulted to "gpt-3.5-turbo".

Screenshot from 2024-02-08 17-45-51-2
I was able to work around this by just doing a 'ollama cp' from the model I wanted to the model Nextcloud was expecting (gpt-3.5-turbo), then it works.

<!-- gh-comment-id:1936182857 --> @spmfox commented on GitHub (Feb 9, 2024): > @jmorganca works great! We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating Hopefully in the future Nextcloud gets full integration with the Ollama API. Thanks a bunch for this!! ![Screenshot from 2024-02-08 17-42-08](https://github.com/ollama/ollama/assets/2227281/3c965cf5-dc43-4b90-bb39-d97895f44c4e) It looks like the API for /v1/models isn't implemented yet (see the 404 errors above), I assume this returns the available models - my Nextcloud could not detect them either, and it defaulted to "gpt-3.5-turbo". ![Screenshot from 2024-02-08 17-45-51-2](https://github.com/ollama/ollama/assets/2227281/afd9313a-8656-42cf-98b0-116395135eab) I was able to work around this by just doing a 'ollama cp' from the model I wanted to the model Nextcloud was expecting (gpt-3.5-turbo), then it works.
Author
Owner

@johnnyq commented on GitHub (Feb 9, 2024):

@spmfox In Nextcloud under Administration Settings > Connect accounts > OpenAI and LocalAI Integration under endpoint make sure you choose Chat Completions instead of Completions
for the API key use Ollama

<!-- gh-comment-id:1936262755 --> @johnnyq commented on GitHub (Feb 9, 2024): @spmfox In Nextcloud under **Administration Settings** > **Connect accounts** > **OpenAI and LocalAI Integration** under endpoint make sure you choose **Chat Completions** instead of Completions for the API key use **Ollama**
Author
Owner

@spmfox commented on GitHub (Feb 9, 2024):

@spmfox In Nextcloud under Administration Settings > Connect accounts > OpenAI and LocalAI Integration under endpoint make sure you choose Chat Completions instead of Completions for the API key use Ollama

I was, you can see in the screenshot that ollama is responding to /v1/chat/completions - but it does not respond to /v1/models - and that is what Nextcloud needs to enumerate the possible models that can be used.

<!-- gh-comment-id:1936338333 --> @spmfox commented on GitHub (Feb 9, 2024): > @spmfox In Nextcloud under **Administration Settings** > **Connect accounts** > **OpenAI and LocalAI Integration** under endpoint make sure you choose **Chat Completions** instead of Completions for the API key use **Ollama** I was, you can see in the screenshot that ollama is responding to /v1/chat/completions - but it does not respond to /v1/models - and that is what Nextcloud needs to enumerate the possible models that can be used.
Author
Owner

@johnnyq commented on GitHub (Feb 9, 2024):

gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #

<!-- gh-comment-id:1936350842 --> @johnnyq commented on GitHub (Feb 9, 2024): gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #
Author
Owner

@guilhermecgs commented on GitHub (Feb 12, 2024):

Hi folks,

do we already have compatibility with OpenAPI Assistant API?

https://platform.openai.com/docs/api-reference/assistants

<!-- gh-comment-id:1938902355 --> @guilhermecgs commented on GitHub (Feb 12, 2024): Hi folks, do we already have compatibility with OpenAPI Assistant API? https://platform.openai.com/docs/api-reference/assistants
Author
Owner

@Progaros commented on GitHub (Feb 12, 2024):

I was trying to get ollama running with AutoGPT.

curl works:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral:instruct",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
{"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}}

but with this AutoGPT config:

## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
OPENAI_API_KEY=ollama

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions

## SMART_LLM - Smart language model (Default: gpt-4-0314)
SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k)
FAST_LLM=mistral:instruct

I can't get the connection:

File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

maybe someone will figure it out and can post an update here

<!-- gh-comment-id:1939025206 --> @Progaros commented on GitHub (Feb 12, 2024): I was trying to get ollama running with AutoGPT. curl works: ``` bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mistral:instruct", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' {"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}} ``` but with this AutoGPT config: ``` bash ## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key) OPENAI_API_KEY=ollama ## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url # the following is an example: OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions ## SMART_LLM - Smart language model (Default: gpt-4-0314) SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K ## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k) FAST_LLM=mistral:instruct ``` I can't get the connection: ``` File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request raise APIConnectionError(request=request) from err openai.APIConnectionError: Connection error. ``` maybe someone will figure it out and can post an update here
Author
Owner

@pamelafox commented on GitHub (Feb 13, 2024):

I haven't used AutoGPT, but I would imagine that the base URL would be more like OPENAI_API_BASE_URL= http://localhost:11434/v1

One thing that I often do to debug OpenAI connections is to set my logging level to debug-

import logging

# before openAI calls happen
logging.setLevel(logging.DEBUG)

The OpenAI Python SDK always logs its HTTP request URLs, so you can see what's gone awry.

<!-- gh-comment-id:1940417974 --> @pamelafox commented on GitHub (Feb 13, 2024): I haven't used AutoGPT, but I would imagine that the base URL would be more like OPENAI_API_BASE_URL= http://localhost:11434/v1 One thing that I often do to debug OpenAI connections is to set my logging level to debug- ``` import logging # before openAI calls happen logging.setLevel(logging.DEBUG) ``` The OpenAI Python SDK always logs its HTTP request URLs, so you can see what's gone awry.
Author
Owner

@lks-ai commented on GitHub (Mar 21, 2024):

Wanted to share an update: version 0.1.24 is out with initial OpenAI compatibility.

* [Blog post](https://ollama.ai/blog/openai-compatibility) with examples such as Autogen and the Vercel AI SDK

* [Twitter post](https://twitter.com/ollama/status/1755675732997505393) with some feedback so far

Could you guys provide support for normal completion? I really really need it. I was using vLLM but switching to Ollama for a Colab project... and though you have /v1/chat/completions ... where is /v1/completions?

<!-- gh-comment-id:2012036882 --> @lks-ai commented on GitHub (Mar 21, 2024): > Wanted to share an update: [version 0.1.24](https://github.com/ollama/ollama/releases/tag/v0.1.24) is out with initial OpenAI compatibility. > > * [Blog post](https://ollama.ai/blog/openai-compatibility) with examples such as Autogen and the Vercel AI SDK > > * [Twitter post](https://twitter.com/ollama/status/1755675732997505393) with some feedback so far Could you guys provide support for normal completion? I really really need it. I was using vLLM but switching to Ollama for a Colab project... and though you have /v1/chat/completions ... where is /v1/completions?
Author
Owner

@Tanguille commented on GitHub (May 2, 2024):

gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #

Did you learn anything more about this? I can't get ollama to work within nextcloud.

<!-- gh-comment-id:2090573234 --> @Tanguille commented on GitHub (May 2, 2024): > gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue # Did you learn anything more about this? I can't get ollama to work within nextcloud.
Author
Owner

@chrisoutwright commented on GitHub (Oct 13, 2024):

Is there a reason why the "n" parameter cannot be used as opposed to openai api?

n
integer or null

Optional
Defaults to 1
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
<!-- gh-comment-id:2408957168 --> @chrisoutwright commented on GitHub (Oct 13, 2024): Is there a reason why the "n" parameter cannot be used as opposed to openai [api](https://platform.openai.com/docs/api-reference/chat/create)? ``` n integer or null Optional Defaults to 1 How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs. ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62173