[GH-ISSUE #305] OpenAI API compatibility #62173

New Issue

GiteaMirror · 2026-05-03T07:44:38-05:00

GiteaMirror commented

2026-05-03 07:44:38 -05:00

Originally created by @handrew on GitHub (Aug 7, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/305

Originally assigned to: @jmorganca on GitHub.

Any chance you would consider mirroring OpenAI's API specs and output? e.g., /completions and /chat/completions. That way, it could be a drop-in replacement for the Python openai package by changing out the url.

Originally created by @handrew on GitHub (Aug 7, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/305 Originally assigned to: @jmorganca on GitHub. Any chance you would consider mirroring OpenAI's API specs and output? e.g., /completions and /chat/completions. That way, it could be a drop-in replacement for the Python openai package by changing out the url.

GiteaMirror added the feature request label 2026-05-03 07:44:38 -05:00

GiteaMirror closed this issue

2026-05-03 07:44:39 -05:00

GiteaMirror commented

2026-05-03 07:44:40 -05:00

@priamai commented on GitHub (Aug 10, 2023):

That would be awesome and also embeddings!

@priamai commented on GitHub (Aug 10, 2023): That would be awesome and also embeddings!

GiteaMirror commented

2026-05-03 07:44:41 -05:00

@hakt0-r commented on GitHub (Aug 11, 2023):

yup I'll +1 on this too :-)

@hakt0-r commented on GitHub (Aug 11, 2023): yup I'll +1 on this too :-)

GiteaMirror commented

2026-05-03 07:44:42 -05:00

@kamuridesu commented on GitHub (Aug 11, 2023):

+1

@kamuridesu commented on GitHub (Aug 11, 2023): +1

GiteaMirror commented

2026-05-03 07:44:43 -05:00

@loyaliu commented on GitHub (Aug 30, 2023):

+1

@loyaliu commented on GitHub (Aug 30, 2023): +1

GiteaMirror commented

2026-05-03 07:44:44 -05:00

@colindotfun commented on GitHub (Sep 1, 2023):

this would be a big win

prior work: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

and

https://github.com/ggerganov/llama.cpp/blob/master/examples/server/api_like_OAI.py

@colindotfun commented on GitHub (Sep 1, 2023): this would be a big win prior work: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md and https://github.com/ggerganov/llama.cpp/blob/master/examples/server/api_like_OAI.py

GiteaMirror commented

2026-05-03 07:44:45 -05:00

@ValValu commented on GitHub (Sep 2, 2023):

yeah would be great1!

@ValValu commented on GitHub (Sep 2, 2023): yeah would be great1!

GiteaMirror commented

2026-05-03 07:44:46 -05:00

@jmorganca commented on GitHub (Sep 7, 2023):

Thanks for the issue and comments, all! Sorry for not replying sooner. Which clients/use cases are you looking to use that require the OpenAI API? Quite a few folks have mentioned LlamaIndex (also: see #278!) Would love to know!

@jmorganca commented on GitHub (Sep 7, 2023): Thanks for the issue and comments, all! Sorry for not replying sooner. Which clients/use cases are you looking to use that require the OpenAI API? Quite a few folks have mentioned LlamaIndex (also: see #278!) Would love to know!

GiteaMirror commented

2026-05-03 07:44:46 -05:00

@kamuridesu commented on GitHub (Sep 7, 2023):

Interoperability with OpenAI projects, like Auto-GPT. If you check https://github.com/go-skynet/LocalAI, you can see that their API works with pretty much every project that uses the OpenAI endpoint, in most cases you just need to point an Environment Variable to it.

@kamuridesu commented on GitHub (Sep 7, 2023): Interoperability with OpenAI projects, like Auto-GPT. If you check https://github.com/go-skynet/LocalAI, you can see that their API works with pretty much every project that uses the OpenAI endpoint, in most cases you just need to point an Environment Variable to it.

GiteaMirror commented

2026-05-03 07:44:47 -05:00

@colindotfun commented on GitHub (Sep 7, 2023):

www.galactus.ai also

@colindotfun commented on GitHub (Sep 7, 2023): www.galactus.ai also

GiteaMirror commented

2026-05-03 07:44:49 -05:00

@cori commented on GitHub (Sep 8, 2023):

I was looking to connect to it with both Continue.dev (which supports Ollama explicitly) and LocalAI, so interop was my hope as well.

@cori commented on GitHub (Sep 8, 2023): I was looking to connect to it with both Continue.dev (which supports Ollama explicitly) and LocalAI, so interop was my hope as well.

GiteaMirror commented

2026-05-03 07:44:50 -05:00

@MchLrnX commented on GitHub (Sep 19, 2023):

I'd love to be able to do this. I'm specifically looking at running ToolBench, MetaGPT and ChatDEV. I have MetaGPT ready to test with this if we get this working.

@MchLrnX commented on GitHub (Sep 19, 2023): I'd love to be able to do this. I'm specifically looking at running ToolBench, MetaGPT and ChatDEV. I have MetaGPT ready to test with this if we get this working.

GiteaMirror commented

2026-05-03 07:44:51 -05:00

@comalice commented on GitHub (Sep 28, 2023):

I'd like to throw in Ironclad's Rivet application expects an OpenAI API endpoint as well: https://github.com/Ironclad/rivet

@comalice commented on GitHub (Sep 28, 2023): I'd like to throw in Ironclad's Rivet application expects an OpenAI API endpoint as well: https://github.com/Ironclad/rivet

GiteaMirror commented

2026-05-03 07:44:52 -05:00

@mjtechguy commented on GitHub (Sep 29, 2023):

+1. I would like to use ollama as a target for LibreChat: https://github.com/danny-avila/LibreChat/tree/main

@mjtechguy commented on GitHub (Sep 29, 2023): +1. I would like to use ollama as a target for LibreChat: https://github.com/danny-avila/LibreChat/tree/main

GiteaMirror commented

2026-05-03 07:44:53 -05:00

@jtoy commented on GitHub (Sep 29, 2023):

+1

@jtoy commented on GitHub (Sep 29, 2023): +1

GiteaMirror commented

2026-05-03 07:44:54 -05:00

@Anon2578 commented on GitHub (Sep 30, 2023):

Yes this would be a plus one if we can get this working with openai api specs. Can someone notify me when this is done I might forget and this was one of the reasons I took a look at this project.

@Anon2578 commented on GitHub (Sep 30, 2023): Yes this would be a plus one if we can get this working with openai api specs. Can someone notify me when this is done I might forget and this was one of the reasons I took a look at this project.

GiteaMirror commented

2026-05-03 07:44:55 -05:00

@shtrophic commented on GitHub (Oct 1, 2023):

This would be pretty cool since Nextcloud instances could use a locally running ollama server. Nextcloud itself ships with openai/localai compatability (through a plugin).

@shtrophic commented on GitHub (Oct 1, 2023): This would be pretty cool since Nextcloud instances could use a locally running ollama server. Nextcloud itself ships with openai/localai compatability (through a plugin).

GiteaMirror commented

2026-05-03 07:44:56 -05:00

@Nivek92 commented on GitHub (Oct 4, 2023):

AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/

@Nivek92 commented on GitHub (Oct 4, 2023): AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/

GiteaMirror commented

2026-05-03 07:44:57 -05:00

@rcalv002 commented on GitHub (Oct 7, 2023):

+1

@rcalv002 commented on GitHub (Oct 7, 2023): +1

GiteaMirror commented

2026-05-03 07:44:59 -05:00

@vividfog commented on GitHub (Oct 7, 2023):

I'm surprised LiteLLM hasn't been mentioned in the thread yet. Found it from the README.md of Ollama repo today. "Call LLM APIs using the OpenAI format", 100+ of them, including Ollama. This worked for me:

pip install litellm

ollama pull codellama

litellm --model ollama/codellama --api_base http://localhost:11434 --temperature 0.3 --max_tokens 2048

Double check that the port, model name and parameters match your configuration and VRAM situation.

As an example, Continue.dev configuration then goes like this, OpenAI style:

        default=OpenAI(
            api_key="IGNORED",
            model="ollama/codellama",
            context_length=2048,
            api_base="http://your_litellm_hostname:8000"
        ),

Set context_length and max_tokens as appropriate. 2048 is a conservative value if you're gpu-poor or aren't sure.

Note that LiteLLM/Uvicorn opens the API at 0.0.0.0:8000, it's not confined to localhost by default and people can piggyback on your server if it's not a private network. I believe you need to edit litellm source code here if you want to only serve localhost, then pip install -e . from that local clone before running litellm.

@vividfog commented on GitHub (Oct 7, 2023): I'm surprised [LiteLLM](https://github.com/BerriAI/litellm) hasn't been mentioned in the thread yet. Found it from the [README.md](https://github.com/jmorganca/ollama#community-integrations) of Ollama repo today. "Call LLM APIs using the OpenAI format", 100+ of them, including Ollama. This worked for me: `pip install litellm` `ollama pull codellama` `litellm --model ollama/codellama --api_base http://localhost:11434 --temperature 0.3 --max_tokens 2048` Double check that the port, model name and parameters match your configuration and VRAM situation. As an example, [Continue.dev](https://github.com/continuedev/continue) configuration then goes like this, OpenAI style: ``` default=OpenAI( api_key="IGNORED", model="ollama/codellama", context_length=2048, api_base="http://your_litellm_hostname:8000" ), ``` Set context_length and max_tokens as appropriate. 2048 is a conservative value if you're [gpu-poor](https://github.com/RahulSChand/gpu_poor) or aren't sure. Note that LiteLLM/Uvicorn opens the API at 0.0.0.0:8000, it's not confined to localhost by default and people can piggyback on your server if it's not a private network. I believe you need to edit litellm source code [here](https://github.com/BerriAI/litellm/blob/86a835f6fd174ef64c4cb41db5eae86c2fffa555/litellm/proxy/proxy_cli.py#L118) if you want to only serve localhost, then `pip install -e .` from that local clone before running `litellm`.

GiteaMirror commented

2026-05-03 07:45:00 -05:00

@ishaan-jaff commented on GitHub (Oct 7, 2023):

Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama

Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server

Please let me know how we can make it better for the ollama community😃

@ishaan-jaff commented on GitHub (Oct 7, 2023): Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server Please let me know how we can make it better for the ollama community😃

GiteaMirror commented

2026-05-03 07:45:01 -05:00

@ghost commented on GitHub (Oct 8, 2023):

Hey @vividfog thanks for this incredible tutorial.

I added it to our docs and gave you credit for it.

Docs: https://docs.litellm.ai/docs/proxy_server#tutorial-use-with-aiderautogencontinue-dev

If you have a twitter/linkedin - happy to link to that instead!

@ghost commented on GitHub (Oct 8, 2023): Hey @vividfog thanks for this incredible tutorial. I added it to our docs and gave you credit for it. Docs: https://docs.litellm.ai/docs/proxy_server#tutorial-use-with-aiderautogencontinue-dev <img width="424" alt="Screenshot 2023-10-07 at 5 55 12 PM" src="https://github.com/jmorganca/ollama/assets/17561003/3fdefee7-493e-4902-9481-b5e5e4eebbf2"> If you have a twitter/linkedin - happy to link to that instead!

GiteaMirror commented

2026-05-03 07:45:03 -05:00

@shtrophic commented on GitHub (Oct 8, 2023):

Wow, thanks for pointing to litellm @vividfog.

For anyone on Arch Linux (btw) and interested, I came up with a PKGBUILD that sets up litellm with ollama as a systemd service. You can check it out on the AUR. Feel free to get back to me with any feedback!

@shtrophic commented on GitHub (Oct 8, 2023): Wow, thanks for pointing to litellm @vividfog. For anyone on Arch Linux (btw) and interested, I came up with a PKGBUILD that sets up litellm with ollama as a systemd service. You can check it out on the [AUR](https://aur.archlinux.org/packages/litellm-ollama). Feel free to get back to me with any feedback!

GiteaMirror commented

2026-05-03 07:45:07 -05:00

@vividfog commented on GitHub (Oct 8, 2023):

My initial advice was not complete I learned today. Continue.dev sends two parallel queries, one for the user task and another to summarize the conversation. And LiteLLM logs may show an error from Ollama after the second call. There's a fix for this client-side.

This Continue.dev configuration imports a wrapper that makes all calls sequential, queued:

Import the QueuedLLM wrapper near the top of config.py:

from continuedev.src.continuedev.libs.llm.queued import QueuedLLM

The server calls can now be made sequential like this:

    models=Models(
        default=QueuedLLM(
            llm=OpenAI(
                api_key="IGNORED",
                model="ollama/codellama",
                context_length=2048,
                api_base="http://localhost:8000"
            )
        )
    ),

This may now be leaning off-topic vs. the original issue, but hope it helps those who used the previous advice. The friendly developers at Continue.dev Github/Discord are there if needed. I learned about the QueuedLLM wrapper initially in their Discord.

What remains a little confusing is that previously I've seen Ollama handle parallel API calls in sequence, or was I hallucinating? Not sure why QueuedLLM() is then needed, but if the shoe fits, wear it I guess. Material for another issue if someone wants to drill down and verify.

What I really like is how these 3 projects work together without knowing about each other at code level, as if following the same plan. That indeed is the benefit of following the same API conventions, the topic of this issue.

@vividfog commented on GitHub (Oct 8, 2023): My initial advice was not complete I learned today. Continue.dev sends two parallel queries, one for the user task and another to summarize the conversation. And LiteLLM logs may show an error from Ollama after the second call. There's a fix for this client-side. This Continue.dev configuration imports a wrapper that makes all calls sequential, queued: 1. Import the QueuedLLM wrapper near the top of `config.py`: ``` from continuedev.src.continuedev.libs.llm.queued import QueuedLLM ``` 2. The server calls can now be made sequential like this: ``` models=Models( default=QueuedLLM( llm=OpenAI( api_key="IGNORED", model="ollama/codellama", context_length=2048, api_base="http://localhost:8000" ) ) ), ``` This may now be leaning off-topic vs. the original issue, but hope it helps those who used the previous advice. The friendly developers at [Continue.dev](https://github.com/continuedev/continue) Github/Discord are there if needed. I learned about the [QueuedLLM](https://continue.dev/docs/reference/Models/queuedllm) wrapper initially in their Discord. What remains a little confusing is that previously I've seen Ollama handle parallel API calls in sequence, or was I hallucinating? Not sure why QueuedLLM() is then needed, but if the shoe fits, wear it I guess. Material for another issue if someone wants to drill down and verify. What I really like is how these 3 projects work together without knowing about each other at code level, as if following the same plan. That indeed is the benefit of following the same API conventions, the topic of this issue.

GiteaMirror commented

2026-05-03 07:45:09 -05:00

@MilleniumDawn commented on GitHub (Nov 11, 2023):

I realise its probably my lack of knowledge that is the probleme, but my Front end can use either LM Studio or oobabooga/text-generation-webui simply by change the base_api.

I wanted to try Ollama cause its seem to be doing a lot of thing simpler/faster.

But not supporting what seem to develop as the goto format for API, openAi api is a big minus. (i realise this is free, i dont want to be a choser/begger, just trying to provide feedback).

I try LiteLLM, and its not a drop-in replacement, and now, what was suposed to be simple, need to be debbuged.

So my feedback is, i hope Ollama gonna nativly have support for openAI API rather than rely on external Library that migh seem easy for ppl who know there stuff, but not as easy for ppl that went to Ollama for its simplicity.

I'm leaving my error log of LiteLLM just as reference, i know its not this project.

@mac ~ % litellm --drop_params --debug --model ollama/dolphin --api_base http://localhost:11434
ollama called
INFO:     Started server process [42896]
INFO:     Waiting for application startup.

#------------------------------------------------------------#
#                                                            #
#            'The thing I wish you improved is...'            #
#        https://github.com/BerriAI/litellm/issues/new        #
#                                                            #
#------------------------------------------------------------#

 Thank you for using LiteLLM! - Krrish & Ishaan



Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new

Docs: https://docs.litellm.ai/docs/simple_proxy

LiteLLM: Test your local endpoint with: "litellm --test" [In a new terminal tab]


INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
litellm.caching: False; litellm.caching_with_models: False; litellm.cache: None
kwargs[caching]: False; litellm.cache: None

LiteLLM completion() model= dolphin

LiteLLM: Params passed to completion() {'functions': [], 'function_call': '', 'temperature': 0.7, 'top_p': 0.9, 'n': None, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5, 'logit_bias': {}, 'user': '', 'model': 'dolphin', 'custom_llm_provider': 'ollama', 'repetition_penalty': 1.1, 'top_k': 20}

LiteLLM: Non-Default params passed to completion() {'temperature': 0.7, 'top_p': 0.9, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5}
self.optional_params: {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}
Logging Details Pre-API Call for call id b91948c3-ba26-4ebc-a140-c141a9e68764
MODEL CALL INPUT: {'model': 'dolphin', 'messages': [{'role': 'system', 'content': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."}, {'role': 'user', 'content': 'USER : Tell me what you are in one phrase. ASSISTANT: '}], 'optional_params': {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}, 'litellm_params': {'return_async': False, 'api_key': None, 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'ollama', 'api_base': 'http://localhost:11434', 'litellm_call_id': 'b91948c3-ba26-4ebc-a140-c141a9e68764', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': None, 'stream_response': {}}, 'start_time': datetime.datetime(2023, 11, 11, 10, 0, 17, 953683), 'input': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'api_key': None, 'additional_args': {'complete_input_dict': {'text': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}}, 'log_event_type': 'pre_api_call'}


Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
self.failure_callback: []
An error occurred: Failed to parse: http://localhost:11434dolphin/generation

 Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
INFO:     127.0.0.1:61413 - "POST /chat/completions HTTP/1.1" 200 OK

@MilleniumDawn commented on GitHub (Nov 11, 2023): I realise its probably my lack of knowledge that is the probleme, but my Front end can use either LM Studio or oobabooga/text-generation-webui simply by change the base_api. I wanted to try Ollama cause its seem to be doing a lot of thing simpler/faster. But not supporting what seem to develop as the goto format for API, openAi api is a big minus. (i realise this is free, i dont want to be a choser/begger, just trying to provide feedback). I try LiteLLM, and its not a drop-in replacement, and now, what was suposed to be simple, need to be debbuged. So my feedback is, i hope Ollama gonna nativly have support for openAI API rather than rely on external Library that migh seem easy for ppl who know there stuff, but not as easy for ppl that went to Ollama for its simplicity. I'm leaving my error log of LiteLLM just as reference, i know its not this project. ``` @mac ~ % litellm --drop_params --debug --model ollama/dolphin --api_base http://localhost:11434 ollama called INFO: Started server process [42896] INFO: Waiting for application startup. #------------------------------------------------------------# # # # 'The thing I wish you improved is...' # # https://github.com/BerriAI/litellm/issues/new # # # #------------------------------------------------------------# Thank you for using LiteLLM! - Krrish & Ishaan Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new Docs: https://docs.litellm.ai/docs/simple_proxy LiteLLM: Test your local endpoint with: "litellm --test" [In a new terminal tab] INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) litellm.caching: False; litellm.caching_with_models: False; litellm.cache: None kwargs[caching]: False; litellm.cache: None LiteLLM completion() model= dolphin LiteLLM: Params passed to completion() {'functions': [], 'function_call': '', 'temperature': 0.7, 'top_p': 0.9, 'n': None, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5, 'logit_bias': {}, 'user': '', 'model': 'dolphin', 'custom_llm_provider': 'ollama', 'repetition_penalty': 1.1, 'top_k': 20} LiteLLM: Non-Default params passed to completion() {'temperature': 0.7, 'top_p': 0.9, 'stream': False, 'stop': ['<.>'], 'max_tokens': 4096, 'presence_penalty': 0.5, 'frequency_penalty': 0.5} self.optional_params: {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20} Logging Details Pre-API Call for call id b91948c3-ba26-4ebc-a140-c141a9e68764 MODEL CALL INPUT: {'model': 'dolphin', 'messages': [{'role': 'system', 'content': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."}, {'role': 'user', 'content': 'USER : Tell me what you are in one phrase. ASSISTANT: '}], 'optional_params': {'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}, 'litellm_params': {'return_async': False, 'api_key': None, 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'ollama', 'api_base': 'http://localhost:11434', 'litellm_call_id': 'b91948c3-ba26-4ebc-a140-c141a9e68764', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': None, 'stream_response': {}}, 'start_time': datetime.datetime(2023, 11, 11, 10, 0, 17, 953683), 'input': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'api_key': None, 'additional_args': {'complete_input_dict': {'text': "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER : Tell me what you are in one phrase. ASSISTANT: ", 'num_predict': 4096, 'temperature': 0.7, 'top_p': 0.9, 'repeat_penalty': 0.5, 'stop_sequences': ['<.>'], 'repetition_penalty': 1.1, 'top_k': 20}}, 'log_event_type': 'pre_api_call'} Logging Details: logger_fn - None | callable(logger_fn) - False Logging Details LiteLLM-Failure Call self.failure_callback: [] An error occurred: Failed to parse: http://localhost:11434dolphin/generation Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug` INFO: 127.0.0.1:61413 - "POST /chat/completions HTTP/1.1" 200 OK ```

GiteaMirror commented

2026-05-03 07:45:12 -05:00

@PetrarcaBruto commented on GitHub (Nov 14, 2023):

I agree about the speed of litellm vs the ollama server comment made by @MilleniumDawn. I may be wrong but I have noticed the native ollama server logs that my WSL GPU is being used, e.g. the following server message:

"ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti with Max-Q Design, compute capability 7.5"

I suspect that litellm server or workers are not using my GPU. If that is the case then it will explain the difference in speed.

Any comments/advice will be very welcomed.

@PetrarcaBruto commented on GitHub (Nov 14, 2023): I agree about the speed of litellm vs the ollama server comment made by @MilleniumDawn. I may be wrong but I have noticed the native ollama server logs that my WSL GPU is being used, e.g. the following server message: "ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 1660 Ti with Max-Q Design, compute capability 7.5" I suspect that litellm server or workers are not using my GPU. If that is the case then it will explain the difference in speed. Any comments/advice will be very welcomed.

GiteaMirror commented

2026-05-03 07:45:14 -05:00

@kylemclaren commented on GitHub (Nov 14, 2023):

@PetrarcaBruto nvidia-smi should show the ollama runner process if GPU is utilized, like this:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:00:06.0 Off |                    0 |
| N/A   37C    P0              38W / 250W |  15261MiB / 40960MiB |     16%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       501      C   ...p/gguf/build/cuda/bin/ollama-runner    15248MiB |
+---------------------------------------------------------------------------------------+

@kylemclaren commented on GitHub (Nov 14, 2023): @PetrarcaBruto `nvidia-smi` should show the ollama runner process if GPU is utilized, like this: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-PCIE-40GB Off | 00000000:00:06.0 Off | 0 | | N/A 37C P0 38W / 250W | 15261MiB / 40960MiB | 16% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 501 C ...p/gguf/build/cuda/bin/ollama-runner 15248MiB | +---------------------------------------------------------------------------------------+ ```

GiteaMirror commented

2026-05-03 07:45:14 -05:00

@ghost commented on GitHub (Nov 14, 2023):

+1

@ghost commented on GitHub (Nov 14, 2023): +1

GiteaMirror commented

2026-05-03 07:45:15 -05:00

@ghost commented on GitHub (Nov 15, 2023):

Hey @MilleniumDawn i found the issue - it was being misrouted. Just pushed a fix - 1738341dcb

Should be live in v1.0.2 by EOD. I'm really sorry for that.

@PetrarcaBruto re: litellm workers

For ollama specifically - we check if you're making an ollama call, and run ollama serve in a separate worker - c7780cbc40/litellm/proxy/proxy_cli.py (L20)

open to suggestions for how we can improve this further.

@ghost commented on GitHub (Nov 15, 2023): Hey @MilleniumDawn i found the issue - it was being misrouted. Just pushed a fix - https://github.com/BerriAI/litellm/commit/1738341dcb16884bfff42a0b2004ba5afd856c5d Should be live in v`1.0.2` by EOD. I'm really sorry for that. @PetrarcaBruto re: litellm workers For ollama specifically - we check if you're making an ollama call, and run `ollama serve` in a separate worker - https://github.com/BerriAI/litellm/blob/c7780cbc40b6d34144677d7979ba4318f0a0d5a9/litellm/proxy/proxy_cli.py#L20 open to suggestions for how we can improve this further.

GiteaMirror commented

2026-05-03 07:45:15 -05:00

@PetrarcaBruto commented on GitHub (Nov 15, 2023):

@kylemclaren & @krrishdholakia thanks for the tips. I found that my GPU is being used also when running litellm which is good news.

@PetrarcaBruto commented on GitHub (Nov 15, 2023): @kylemclaren & @krrishdholakia thanks for the tips. I found that my GPU is being used also when running litellm which is good news.

GiteaMirror commented

2026-05-03 07:45:16 -05:00

@patrickdobler commented on GitHub (Nov 16, 2023):

That would be a great addition. I would love to use Ollama with TypingMind.

@patrickdobler commented on GitHub (Nov 16, 2023): That would be a great addition. I would love to use Ollama with TypingMind.

GiteaMirror commented

2026-05-03 07:45:16 -05:00

@priamai commented on GitHub (Nov 16, 2023):

Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama

Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server

Please let me know how we can make it better for the ollama community😃

AMAZING how did I not see this before! It will be useful to add also a simple API_TOKEN so at least I can put it on a cloud service without having to fiddle with additional proxy authenticators.

@priamai commented on GitHub (Nov 16, 2023): > Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama > > Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server > > Please let me know how we can make it better for the ollama community😃 AMAZING how did I not see this before! It will be useful to add also a simple API_TOKEN so at least I can put it on a cloud service without having to fiddle with additional proxy authenticators.

GiteaMirror commented

2026-05-03 07:45:17 -05:00

@ghost commented on GitHub (Nov 16, 2023):

@priamai we have that - https://docs.litellm.ai/docs/simple_proxy#example-config

you can add a master_key in the config.yaml, and this will require all calls to pass that key as part of the bearer token.

let me know if you end up using it, would love to know how we can improve it for you - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

@ghost commented on GitHub (Nov 16, 2023): @priamai we have that - https://docs.litellm.ai/docs/simple_proxy#example-config you can add a master_key in the config.yaml, and this will require all calls to pass that key as part of the bearer token. let me know if you end up using it, would love to know how we can improve it for you - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

GiteaMirror commented

2026-05-03 07:45:17 -05:00

@agonbina commented on GitHub (Nov 18, 2023):

embeddings with Ollama do not seem to be supported through the Litellm proxy.

@agonbina commented on GitHub (Nov 18, 2023): **embeddings** with Ollama do not seem to be supported through the Litellm proxy.

GiteaMirror commented

2026-05-03 07:45:18 -05:00

@sottey commented on GitHub (Nov 25, 2023):

I, too, would love this. It would allow me to integrate in TypingMind. Thank you for your amazing stuff!

@sottey commented on GitHub (Nov 25, 2023): I, too, would love this. It would allow me to integrate in TypingMind. Thank you for your amazing stuff!

GiteaMirror commented

2026-05-03 07:45:18 -05:00

@iplayfast commented on GitHub (Nov 27, 2023):

Yeah, I'm trying to use litellm and it's a very weak crutch. If you want something done right you gotta do it yourself and build the openai api into ollama.

@iplayfast commented on GitHub (Nov 27, 2023): Yeah, I'm trying to use litellm and it's a very weak crutch. If you want something done right you gotta do it yourself and build the openai api into ollama.

GiteaMirror commented

2026-05-03 07:45:19 -05:00

@kamuridesu commented on GitHub (Nov 27, 2023):

If you want something done right you gotta do it yourself

So go ahead and contribute with a pr for ollama or improving litellm

@kamuridesu commented on GitHub (Nov 27, 2023): > If you want something done right you gotta do it yourself So go ahead and contribute with a pr for ollama or improving litellm

GiteaMirror commented

2026-05-03 07:45:19 -05:00

@flaviovs commented on GitHub (Nov 29, 2023):

Two things to be aware of when using LiteLLM:

LiteLLM does outbound network connections therefore it won't work in firewalled environments; and
By default their OpenAI API proxy does phone home (you can turn this feature off).

I hope this saves people's time if their plan is to use Ollama+LiteLLM offline for privacy/compliance reasons.

@flaviovs commented on GitHub (Nov 29, 2023): Two things to be aware of when using LiteLLM: - [LiteLLM does outbound network connections](https://github.com/BerriAI/litellm/issues/739) therefore it won't work in firewalled environments; and - [By default their OpenAI API proxy does phone home](https://docs.litellm.ai/docs/simple_proxy#--telemetry) (you can turn this feature off). I hope this saves people's time if their plan is to use Ollama+LiteLLM offline for privacy/compliance reasons.

GiteaMirror commented

2026-05-03 07:45:19 -05:00

@MARYAMJAHANIR commented on GitHub (Dec 15, 2023):

AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/

hey,
I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video.

Error:
/home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py
(autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py
Traceback (most recent call last):
File "/mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py", line 25, in
assistant = autogen.AssistantAgent(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/assistant_agent.py", line 61, in init
super().init(
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/conversable_agent.py", line 121, in init
self.client = OpenAIWrapper(**self.llm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in init
self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in
self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client
client = OpenAI(**openai_config)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in init
raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
(autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$

If you can suggest something regarding this so it wuld be great.

@MARYAMJAHANIR commented on GitHub (Dec 15, 2023): > AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/ hey, I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video. Error: /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py Traceback (most recent call last): File "/mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py", line 25, in <module> assistant = autogen.AssistantAgent( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/assistant_agent.py", line 61, in __init__ super().__init__( File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/conversable_agent.py", line 121, in __init__ self.client = OpenAIWrapper(**self.llm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in __init__ self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in <listcomp> self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client client = OpenAI(**openai_config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in __init__ raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ If you can suggest something regarding this so it wuld be great.

GiteaMirror commented

2026-05-03 07:45:20 -05:00

@clevcode commented on GitHub (Dec 19, 2023):

hey, I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video.
...
"/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client client = OpenAI(**openai_config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$

If you can suggest something regarding this so it wuld be great.

The litellm proxy doesn't care about the value of the API key, or whether it is sent or not, but since the OpenAI package requires it to be set you can simply set it to anything in order to satisfy the requirements of the OpenAI module

Either use "export OPENAI_API_KEY=whatever" in the shell before you run your agent, or set "api_key": "whatever" in the llm_config dict that you pass to the *Agent() constructors

@clevcode commented on GitHub (Dec 19, 2023): > hey, I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video. ... > "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client client = OpenAI(**openai_config) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in **init** raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable (autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ > > If you can suggest something regarding this so it wuld be great. The litellm proxy doesn't care about the value of the API key, or whether it is sent or not, but since the OpenAI package requires it to be set you can simply set it to anything in order to satisfy the requirements of the OpenAI module Either use "export OPENAI_API_KEY=whatever" in the shell before you run your agent, or set "api_key": "whatever" in the llm_config dict that you pass to the *Agent() constructors

GiteaMirror commented

2026-05-03 07:45:21 -05:00

@MARYAMJAHANIR commented on GitHub (Dec 22, 2023):

@clevcode thanks for your reply i have sorted that but the thing is when i tried this with meta got so i was getting error like this:

(metagpt) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt$ python startup.py "create a 2048 game in python"
2023-12-22 07:27:14.516 | INFO | metagpt.const:get_metagpt_package_root:32 - Package root set to /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt
2023-12-22 07:27:15.188 | INFO | metagpt.config:get_default_llm_provider_enum:88 - OpenAI API Model: gpt-4-1106-preview
2023-12-22 07:27:15.869 | INFO | metagpt.team:invest:84 - Investment: $3.0.
2023-12-22 07:27:15.873 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to PrepareDocuments
2023-12-22 07:27:16.639 | INFO | metagpt.utils.file_repository:save:60 - save to: /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt/workspace/20231222072715/docs/requirement.txt
2023-12-22 07:27:16.646 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to WritePRD
2023-12-22 07:27:16.960 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.293(s), this was the 1st time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:17.467 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.800(s), this was the 2nd time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:18.830 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 2.163(s), this was the 3rd time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:21.207 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 4.540(s), this was the 4th time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:21.955 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 5.288(s), this was the 5th time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:23.664 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 6.997(s), this was the 6th time calling it. exp: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
2023-12-22 07:27:23.668 | WARNING | metagpt.utils.common:wrapper:505 - There is a exception in role's execution, in order to resume, we delete the newest role communication message in the role's memory.
2023-12-22 07:27:23.698 | ERROR | metagpt.utils.common:wrapper:487 - Exception occurs, start to serialize the project, exp:
Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1
content = await self.llm.aask(prompt, system_msgs)
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper
return await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run
rsp = await self.react()
tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 482, in wrapper
result = await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/team.py", line 124, in run
await self.env.run()
Exception: Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1
content = await self.llm.aask(prompt, system_msgs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/base_gpt_api.py", line 53, in aask
rsp = await self.acompletion_text(message, stream=stream)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 274, in acompletion_text
return await self._achat_completion_stream(messages)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 211, in _achat_completion_stream
response: AsyncStream[ChatCompletionChunk] = await self.async_client.chat.completions.create(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 1295, in create
return await self._post(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1536, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1315, in request
return await self._request(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1392, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-1106-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper
return await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run
rsp = await self.react()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 479, in react
rsp = await self._react()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 459, in _react
rsp = await self._act() # 这个rsp是否需要publish_message？
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 380, in _act
response = await self._rc.todo.run(self._rc.important_memory)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 105, in run
prd_doc = await self._update_prd(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 146, in _update_prd
prd = await self._run_new_requirement(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 126, in _run_new_requirement
node = await WRITE_PRD_NODE.fill(context=context, llm=self.llm, schema=schema)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 314, in fill
return await self.simple_fill(schema, mode)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 288, in simple_fill
content, scontent = await self._aask_v1(prompt, class_name, mapping, schema=schema)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>]

i thought i am not configured it in a right way but don't know exactly what should i do for this.

@MARYAMJAHANIR commented on GitHub (Dec 22, 2023): @clevcode thanks for your reply i have sorted that but the thing is when i tried this with meta got so i was getting error like this: (metagpt) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt$ python startup.py "create a 2048 game in python" 2023-12-22 07:27:14.516 | INFO | metagpt.const:get_metagpt_package_root:32 - Package root set to /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt 2023-12-22 07:27:15.188 | INFO | metagpt.config:get_default_llm_provider_enum:88 - OpenAI API Model: gpt-4-1106-preview 2023-12-22 07:27:15.869 | INFO | metagpt.team:invest:84 - Investment: $3.0. 2023-12-22 07:27:15.873 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to PrepareDocuments 2023-12-22 07:27:16.639 | INFO | metagpt.utils.file_repository:save:60 - save to: /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt/workspace/20231222072715/docs/requirement.txt 2023-12-22 07:27:16.646 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to WritePRD 2023-12-22 07:27:16.960 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.293(s), this was the 1st time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:17.467 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.800(s), this was the 2nd time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:18.830 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 2.163(s), this was the 3rd time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:21.207 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 4.540(s), this was the 4th time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:21.955 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 5.288(s), this was the 5th time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:23.664 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 6.997(s), this was the 6th time calling it. exp: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} 2023-12-22 07:27:23.668 | WARNING | metagpt.utils.common:wrapper:505 - There is a exception in role's execution, in order to resume, we delete the newest role communication message in the role's memory. 2023-12-22 07:27:23.698 | ERROR | metagpt.utils.common:wrapper:487 - Exception occurs, start to serialize the project, exp: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__ result = await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1 content = await self.llm.aask(prompt, system_msgs) openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper return await func(self, *args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run rsp = await self.react() tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 482, in wrapper result = await func(self, *args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/team.py", line 124, in run await self.env.run() Exception: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__ result = await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1 content = await self.llm.aask(prompt, system_msgs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/base_gpt_api.py", line 53, in aask rsp = await self.acompletion_text(message, stream=stream) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in __call__ do = self.iter(retry_state=retry_state) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter return fut.result() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__ result = await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 274, in acompletion_text return await self._achat_completion_stream(messages) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 211, in _achat_completion_stream response: AsyncStream[ChatCompletionChunk] = await self.async_client.chat.completions.create( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 1295, in create return await self._post( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1536, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1315, in request return await self._request( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1392, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-4-1106-preview` does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}} The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper return await func(self, *args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run rsp = await self.react() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 479, in react rsp = await self._react() File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 459, in _react rsp = await self._act() # 这个rsp是否需要publish_message？ File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 380, in _act response = await self._rc.todo.run(self._rc.important_memory) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 105, in run prd_doc = await self._update_prd( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 146, in _update_prd prd = await self._run_new_requirement( File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 126, in _run_new_requirement node = await WRITE_PRD_NODE.fill(context=context, llm=self.llm, schema=schema) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 314, in fill return await self.simple_fill(schema, mode) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 288, in simple_fill content, scontent = await self._aask_v1(prompt, class_name, mapping, schema=schema) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped return await fn(*args, **kwargs) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in __call__ do = self.iter(retry_state=retry_state) File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>] i thought i am not configured it in a right way but don't know exactly what should i do for this.

GiteaMirror commented

2026-05-03 07:45:21 -05:00

@MARYAMJAHANIR commented on GitHub (Dec 22, 2023):

@clevcode I am trying metagpt with ollama model codellama with the help of litellm so that i don't need any API key but now it seems difficult because it did not work yet here is metagpt config.yaml file:

DO NOT MODIFY THIS FILE, create a new key.yaml, define OPENAI_API_KEY.

The configuration of key.yaml has a higher priority and will not enter git

Project Path Setting

WORKSPACE_PATH: "Path for placing output files"

if OpenAI

The official OPENAI_BASE_URL is https://api.openai.com/v1

Or, you can configure OPENAI_PROXY to access official OPENAI_BASE_URL.

OPENAI_BASE_URL: "http://0.0.0.0:8000"
#OPENAI_PROXY: "http://127.0.0.1:8118"

OPENAI_API_KEY: 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 # set the value to sk-xxx if you host the openai interface for open llm model

OPENAI_API_MODEL: "ollama/codellama"
MAX_TOKENS: 4096
RPM: 10

if Spark

#SPARK_APPID : "YOUR_APPID"
#SPARK_API_SECRET : "YOUR_APISecret"
#SPARK_API_KEY : "YOUR_APIKey"
#DOMAIN : "generalv2"
#SPARK_URL : "ws://spark-api.xf-yun.com/v2.1/chat"

if Anthropic

#ANTHROPIC_API_KEY: "YOUR_API_KEY"

if AZURE, check https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat.ipynb

#OPENAI_API_TYPE: "azure"
#OPENAI_BASE_URL: "YOUR_AZURE_ENDPOINT"
#OPENAI_API_KEY: "YOUR_AZURE_API_KEY"
#OPENAI_API_VERSION: "YOUR_AZURE_API_VERSION"
#DEPLOYMENT_NAME: "YOUR_DEPLOYMENT_NAME"

if zhipuai from `https://open.bigmodel.cn`. You can set here or export API_KEY="YOUR_API_KEY"

ZHIPUAI_API_KEY: "YOUR_API_KEY"

if Google Gemini from `https://ai.google.dev/` and API_KEY from `https://makersuite.google.com/app/apikey`.

You can set here or export GOOGLE_API_KEY="YOUR_API_KEY"

GEMINI_API_KEY: "YOUR_API_KEY"

if use self-host open llm model with openai-compatible interface

#OPEN_LLM_API_BASE: "http://127.0.0.1:8000/v1"
#OPEN_LLM_API_MODEL: "llama2-13b"

if use Fireworks api

#FIREWORKS_API_KEY: "YOUR_API_KEY"
#FIREWORKS_API_BASE: "https://api.fireworks.ai/inference/v1"
#FIREWORKS_API_MODEL: "YOUR_LLM_MODEL" # example, accounts/fireworks/models/llama-v2-13b-chat

for Search

Supported values: serpapi/google/serper/ddg

#SEARCH_ENGINE: serpapi

Visit https://serpapi.com/ to get key.

#SERPAPI_API_KEY: "YOUR_API_KEY"

Visit https://console.cloud.google.com/apis/credentials to get key.

#GOOGLE_API_KEY: "YOUR_API_KEY"

Visit https://programmablesearchengine.google.com/controlpanel/create to get id.

#GOOGLE_CSE_ID: "YOUR_CSE_ID"

Visit https://serper.dev/ to get key.

#SERPER_API_KEY: "YOUR_API_KEY"

for web access

Supported values: playwright/selenium

#WEB_BROWSER_ENGINE: playwright

Supported values: chromium/firefox/webkit, visit https://playwright.dev/python/docs/api/class-browsertype

##PLAYWRIGHT_BROWSER_TYPE: chromium

Supported values: chrome/firefox/edge/ie, visit https://www.selenium.dev/documentation/webdriver/browsers/

SELENIUM_BROWSER_TYPE: chrome

for TTS

#AZURE_TTS_SUBSCRIPTION_KEY: "YOUR_API_KEY"
#AZURE_TTS_REGION: "eastus"

for Stable Diffusion

Use SD service, based on https://github.com/AUTOMATIC1111/stable-diffusion-webui

#SD_URL: "YOUR_SD_URL"
#SD_T2I_API: "/sdapi/v1/txt2img"

for Execution

#LONG_TERM_MEMORY: false

for Mermaid CLI

If you installed mmdc (Mermaid CLI) only for metagpt then enable the following configuration.

#PUPPETEER_CONFIG: "./config/puppeteer-config.json"
#MMDC: "./node_modules/.bin/mmdc"

for calc_usage

CALC_USAGE: false

for Research

MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo

MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k

choose the engine for mermaid conversion,

default is nodejs, you can change it to playwright,pyppeteer or ink

MERMAID_ENGINE: nodejs

browser path for pyppeteer engine, support Chrome, Chromium,MS Edge

#PYPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable"

for repair non-openai LLM's output when parse json-text if PROMPT_FORMAT=json

due to non-openai LLM's output will not always follow the instruction, so here activate a post-process

repair operation on the content extracted from LLM's raw output. Warning, it improves the result but not fix all cases.

REPAIR_LLM_OUTPUT: false

PROMPT_FORMAT: json #json or markdown

@MARYAMJAHANIR commented on GitHub (Dec 22, 2023): @clevcode I am trying metagpt with ollama model codellama with the help of litellm so that i don't need any API key but now it seems difficult because it did not work yet here is metagpt config.yaml file: # DO NOT MODIFY THIS FILE, create a new key.yaml, define OPENAI_API_KEY. # The configuration of key.yaml has a higher priority and will not enter git #### Project Path Setting # WORKSPACE_PATH: "Path for placing output files" #### if OpenAI ## The official OPENAI_BASE_URL is https://api.openai.com/v1 ## If the official OPENAI_BASE_URL is not available, we recommend using the [openai-forward](https://github.com/beidongjiedeguang/openai-forward). ## Or, you can configure OPENAI_PROXY to access official OPENAI_BASE_URL. OPENAI_BASE_URL: "http://0.0.0.0:8000" #OPENAI_PROXY: "http://127.0.0.1:8118" # OPENAI_API_KEY: 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 # set the value to sk-xxx if you host the openai interface for open llm model OPENAI_API_MODEL: "ollama/codellama" MAX_TOKENS: 4096 RPM: 10 #### if Spark #SPARK_APPID : "YOUR_APPID" #SPARK_API_SECRET : "YOUR_APISecret" #SPARK_API_KEY : "YOUR_APIKey" #DOMAIN : "generalv2" #SPARK_URL : "ws://spark-api.xf-yun.com/v2.1/chat" #### if Anthropic #ANTHROPIC_API_KEY: "YOUR_API_KEY" #### if AZURE, check https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat.ipynb #OPENAI_API_TYPE: "azure" #OPENAI_BASE_URL: "YOUR_AZURE_ENDPOINT" #OPENAI_API_KEY: "YOUR_AZURE_API_KEY" #OPENAI_API_VERSION: "YOUR_AZURE_API_VERSION" #DEPLOYMENT_NAME: "YOUR_DEPLOYMENT_NAME" #### if zhipuai from `https://open.bigmodel.cn`. You can set here or export API_KEY="YOUR_API_KEY" # ZHIPUAI_API_KEY: "YOUR_API_KEY" #### if Google Gemini from `https://ai.google.dev/` and API_KEY from `https://makersuite.google.com/app/apikey`. #### You can set here or export GOOGLE_API_KEY="YOUR_API_KEY" # GEMINI_API_KEY: "YOUR_API_KEY" #### if use self-host open llm model with openai-compatible interface #OPEN_LLM_API_BASE: "http://127.0.0.1:8000/v1" #OPEN_LLM_API_MODEL: "llama2-13b" # ##### if use Fireworks api #FIREWORKS_API_KEY: "YOUR_API_KEY" #FIREWORKS_API_BASE: "https://api.fireworks.ai/inference/v1" #FIREWORKS_API_MODEL: "YOUR_LLM_MODEL" # example, accounts/fireworks/models/llama-v2-13b-chat #### for Search ## Supported values: serpapi/google/serper/ddg #SEARCH_ENGINE: serpapi ## Visit https://serpapi.com/ to get key. #SERPAPI_API_KEY: "YOUR_API_KEY" ## Visit https://console.cloud.google.com/apis/credentials to get key. #GOOGLE_API_KEY: "YOUR_API_KEY" ## Visit https://programmablesearchengine.google.com/controlpanel/create to get id. #GOOGLE_CSE_ID: "YOUR_CSE_ID" ## Visit https://serper.dev/ to get key. #SERPER_API_KEY: "YOUR_API_KEY" #### for web access ## Supported values: playwright/selenium #WEB_BROWSER_ENGINE: playwright ## Supported values: chromium/firefox/webkit, visit https://playwright.dev/python/docs/api/class-browsertype ##PLAYWRIGHT_BROWSER_TYPE: chromium ## Supported values: chrome/firefox/edge/ie, visit https://www.selenium.dev/documentation/webdriver/browsers/ # SELENIUM_BROWSER_TYPE: chrome #### for TTS #AZURE_TTS_SUBSCRIPTION_KEY: "YOUR_API_KEY" #AZURE_TTS_REGION: "eastus" #### for Stable Diffusion ## Use SD service, based on https://github.com/AUTOMATIC1111/stable-diffusion-webui #SD_URL: "YOUR_SD_URL" #SD_T2I_API: "/sdapi/v1/txt2img" #### for Execution #LONG_TERM_MEMORY: false #### for Mermaid CLI ## If you installed mmdc (Mermaid CLI) only for metagpt then enable the following configuration. #PUPPETEER_CONFIG: "./config/puppeteer-config.json" #MMDC: "./node_modules/.bin/mmdc" ### for calc_usage # CALC_USAGE: false ### for Research # MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo # MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k ### choose the engine for mermaid conversion, # default is nodejs, you can change it to playwright,pyppeteer or ink # MERMAID_ENGINE: nodejs ### browser path for pyppeteer engine, support Chrome, Chromium,MS Edge #PYPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable" ### for repair non-openai LLM's output when parse json-text if PROMPT_FORMAT=json ### due to non-openai LLM's output will not always follow the instruction, so here activate a post-process ### repair operation on the content extracted from LLM's raw output. Warning, it improves the result but not fix all cases. # REPAIR_LLM_OUTPUT: false # PROMPT_FORMAT: json #json or markdown

GiteaMirror commented

2026-05-03 07:45:22 -05:00

@bdurrani commented on GitHub (Dec 23, 2023):

There is also BrainGPT which requires OpenAPI compatibility

@bdurrani commented on GitHub (Dec 23, 2023): There is also [BrainGPT](https://bionic-gpt.com/docs/administration/external-api/) which requires OpenAPI compatibility

GiteaMirror commented

2026-05-03 07:45:22 -05:00

@puckettgw commented on GitHub (Dec 31, 2023):

+1 for this issue

I'm trying to use LangChain to create a GitHub coder bot. Trouble is, Ollama doesn't produce the output expected by certain tools, e.g.
GitHub Toolkit CreateFile

The output from Ollama + Mixtral is

Thought: Now, let's create the main application file `app.py` inside the 'recipe_app' directory:
{
  "action": "Create File",
  "action_input": {
    "path": "recipe_app/app.py"
  }
}

But the toolkit is expecting a formatted_file arg:

pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateFile
formatted_file
  field required (type=value_error.missing)

Of course I could implement my own tools for this, but that's kind of smelly.

@puckettgw commented on GitHub (Dec 31, 2023): +1 for this issue I'm trying to use LangChain to create a GitHub coder bot. Trouble is, Ollama doesn't produce the output expected by certain tools, e.g. [GitHub Toolkit CreateFile](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.github.toolkit.CreateFile.html) The output from Ollama + Mixtral is ``` Thought: Now, let's create the main application file `app.py` inside the 'recipe_app' directory: { "action": "Create File", "action_input": { "path": "recipe_app/app.py" } } ``` But the toolkit is expecting a formatted_file arg: ``` pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateFile formatted_file field required (type=value_error.missing) ``` Of course I could implement my own tools for this, but that's kind of smelly.

GiteaMirror commented

2026-05-03 07:45:23 -05:00

@louis030195 commented on GitHub (Jan 11, 2024):

Would be great to be able to use ollama with OpenAI SDK directly, (and not having to use stuff like litellm)

@louis030195 commented on GitHub (Jan 11, 2024): Would be great to be able to use ollama with OpenAI SDK directly, (and not having to use stuff like litellm)

GiteaMirror commented

2026-05-03 07:45:23 -05:00

@vtboyarc commented on GitHub (Jan 27, 2024):

Is this being worked?

@vtboyarc commented on GitHub (Jan 27, 2024): Is this being worked?

GiteaMirror commented

2026-05-03 07:45:23 -05:00

@johnnyq commented on GitHub (Feb 5, 2024):

Yes Please make it openai API compatible to intergrate with FusionPBX for Voicemail Transriptions and Nextcloud Integration for the AI functions

@johnnyq commented on GitHub (Feb 5, 2024): Yes Please make it openai API compatible to intergrate with FusionPBX for Voicemail Transriptions and Nextcloud Integration for the AI functions

GiteaMirror commented

2026-05-03 07:45:23 -05:00

@NeevJewalkar commented on GitHub (Feb 5, 2024):

is this being worked on?

@NeevJewalkar commented on GitHub (Feb 5, 2024): is this being worked on?

GiteaMirror commented

2026-05-03 07:45:24 -05:00

@jmorganca commented on GitHub (Feb 6, 2024):

It is! https://github.com/ollama/ollama/pull/2376

@jmorganca commented on GitHub (Feb 6, 2024): It is! https://github.com/ollama/ollama/pull/2376

GiteaMirror commented

2026-05-03 07:45:24 -05:00

@jmorganca commented on GitHub (Feb 8, 2024):

Wanted to share an update: version 0.1.24 is out with initial OpenAI compatibility.

Blog post with examples such as Autogen and the Vercel AI SDK
Twitter post with some feedback so far

@jmorganca commented on GitHub (Feb 8, 2024): Wanted to share an update: [version 0.1.24](https://github.com/ollama/ollama/releases/tag/v0.1.24) is out with initial OpenAI compatibility. * [Blog post](https://ollama.ai/blog/openai-compatibility) with examples such as Autogen and the Vercel AI SDK * [Twitter post](https://twitter.com/ollama/status/1755675732997505393) with some feedback so far

GiteaMirror commented

2026-05-03 07:45:25 -05:00

@johnnyq commented on GitHub (Feb 8, 2024):

@jmorganca works great!
We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating
Hopefully in the future Nextcloud gets full integration with the Ollama API.
Thanks a bunch for this!!

@johnnyq commented on GitHub (Feb 8, 2024): @jmorganca works great! We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating Hopefully in the future Nextcloud gets full integration with the Ollama API. Thanks a bunch for this!!

GiteaMirror commented

2026-05-03 07:45:25 -05:00

@spmfox commented on GitHub (Feb 9, 2024):

@jmorganca works great! We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating Hopefully in the future Nextcloud gets full integration with the Ollama API. Thanks a bunch for this!!

It looks like the API for /v1/models isn't implemented yet (see the 404 errors above), I assume this returns the available models - my Nextcloud could not detect them either, and it defaulted to "gpt-3.5-turbo".

I was able to work around this by just doing a 'ollama cp' from the model I wanted to the model Nextcloud was expecting (gpt-3.5-turbo), then it works.

@spmfox commented on GitHub (Feb 9, 2024): > @jmorganca works great! We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating Hopefully in the future Nextcloud gets full integration with the Ollama API. Thanks a bunch for this!! ![Screenshot from 2024-02-08 17-42-08](https://github.com/ollama/ollama/assets/2227281/3c965cf5-dc43-4b90-bb39-d97895f44c4e) It looks like the API for /v1/models isn't implemented yet (see the 404 errors above), I assume this returns the available models - my Nextcloud could not detect them either, and it defaulted to "gpt-3.5-turbo". ![Screenshot from 2024-02-08 17-45-51-2](https://github.com/ollama/ollama/assets/2227281/afd9313a-8656-42cf-98b0-116395135eab) I was able to work around this by just doing a 'ollama cp' from the model I wanted to the model Nextcloud was expecting (gpt-3.5-turbo), then it works.

GiteaMirror commented

2026-05-03 07:45:26 -05:00

@johnnyq commented on GitHub (Feb 9, 2024):

@spmfox In Nextcloud under Administration Settings > Connect accounts > OpenAI and LocalAI Integration under endpoint make sure you choose Chat Completions instead of Completions
for the API key use Ollama

@johnnyq commented on GitHub (Feb 9, 2024): @spmfox In Nextcloud under **Administration Settings** > **Connect accounts** > **OpenAI and LocalAI Integration** under endpoint make sure you choose **Chat Completions** instead of Completions for the API key use **Ollama**

GiteaMirror commented

2026-05-03 07:45:27 -05:00

@spmfox commented on GitHub (Feb 9, 2024):

@spmfox In Nextcloud under Administration Settings > Connect accounts > OpenAI and LocalAI Integration under endpoint make sure you choose Chat Completions instead of Completions for the API key use Ollama

I was, you can see in the screenshot that ollama is responding to /v1/chat/completions - but it does not respond to /v1/models - and that is what Nextcloud needs to enumerate the possible models that can be used.

@spmfox commented on GitHub (Feb 9, 2024): > @spmfox In Nextcloud under **Administration Settings** > **Connect accounts** > **OpenAI and LocalAI Integration** under endpoint make sure you choose **Chat Completions** instead of Completions for the API key use **Ollama** I was, you can see in the screenshot that ollama is responding to /v1/chat/completions - but it does not respond to /v1/models - and that is what Nextcloud needs to enumerate the possible models that can be used.

GiteaMirror commented

2026-05-03 07:45:28 -05:00

@johnnyq commented on GitHub (Feb 9, 2024):

gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #

@johnnyq commented on GitHub (Feb 9, 2024): gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #

GiteaMirror commented

2026-05-03 07:45:29 -05:00

@guilhermecgs commented on GitHub (Feb 12, 2024):

Hi folks,

do we already have compatibility with OpenAPI Assistant API?

https://platform.openai.com/docs/api-reference/assistants

@guilhermecgs commented on GitHub (Feb 12, 2024): Hi folks, do we already have compatibility with OpenAPI Assistant API? https://platform.openai.com/docs/api-reference/assistants

GiteaMirror commented

2026-05-03 07:45:31 -05:00

@Progaros commented on GitHub (Feb 12, 2024):

I was trying to get ollama running with AutoGPT.

curl works:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral:instruct",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
{"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}}

but with this AutoGPT config:

## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
OPENAI_API_KEY=ollama

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions

## SMART_LLM - Smart language model (Default: gpt-4-0314)
SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k)
FAST_LLM=mistral:instruct

I can't get the connection:

File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

maybe someone will figure it out and can post an update here

@Progaros commented on GitHub (Feb 12, 2024): I was trying to get ollama running with AutoGPT. curl works: ``` bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mistral:instruct", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' {"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}} ``` but with this AutoGPT config: ``` bash ## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key) OPENAI_API_KEY=ollama ## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url # the following is an example: OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions ## SMART_LLM - Smart language model (Default: gpt-4-0314) SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K ## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k) FAST_LLM=mistral:instruct ``` I can't get the connection: ``` File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request raise APIConnectionError(request=request) from err openai.APIConnectionError: Connection error. ``` maybe someone will figure it out and can post an update here

GiteaMirror commented

2026-05-03 07:45:32 -05:00

@pamelafox commented on GitHub (Feb 13, 2024):

I haven't used AutoGPT, but I would imagine that the base URL would be more like OPENAI_API_BASE_URL= http://localhost:11434/v1

One thing that I often do to debug OpenAI connections is to set my logging level to debug-

import logging

# before openAI calls happen
logging.setLevel(logging.DEBUG)

The OpenAI Python SDK always logs its HTTP request URLs, so you can see what's gone awry.

@pamelafox commented on GitHub (Feb 13, 2024): I haven't used AutoGPT, but I would imagine that the base URL would be more like OPENAI_API_BASE_URL= http://localhost:11434/v1 One thing that I often do to debug OpenAI connections is to set my logging level to debug- ``` import logging # before openAI calls happen logging.setLevel(logging.DEBUG) ``` The OpenAI Python SDK always logs its HTTP request URLs, so you can see what's gone awry.

GiteaMirror commented

2026-05-03 07:45:32 -05:00

@lks-ai commented on GitHub (Mar 21, 2024):

Wanted to share an update: version 0.1.24 is out with initial OpenAI compatibility.

* [Blog post](https://ollama.ai/blog/openai-compatibility) with examples such as Autogen and the Vercel AI SDK

* [Twitter post](https://twitter.com/ollama/status/1755675732997505393) with some feedback so far

Could you guys provide support for normal completion? I really really need it. I was using vLLM but switching to Ollama for a Colab project... and though you have /v1/chat/completions ... where is /v1/completions?

@lks-ai commented on GitHub (Mar 21, 2024): > Wanted to share an update: [version 0.1.24](https://github.com/ollama/ollama/releases/tag/v0.1.24) is out with initial OpenAI compatibility. > > * [Blog post](https://ollama.ai/blog/openai-compatibility) with examples such as Autogen and the Vercel AI SDK > > * [Twitter post](https://twitter.com/ollama/status/1755675732997505393) with some feedback so far Could you guys provide support for normal completion? I really really need it. I was using vLLM but switching to Ollama for a Colab project... and though you have /v1/chat/completions ... where is /v1/completions?

GiteaMirror commented

2026-05-03 07:45:33 -05:00

@Tanguille commented on GitHub (May 2, 2024):

gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #

Did you learn anything more about this? I can't get ollama to work within nextcloud.

@Tanguille commented on GitHub (May 2, 2024): > gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue # Did you learn anything more about this? I can't get ollama to work within nextcloud.

GiteaMirror commented

2026-05-03 07:45:33 -05:00

@chrisoutwright commented on GitHub (Oct 13, 2024):

Is there a reason why the "n" parameter cannot be used as opposed to openai api?

n
integer or null

Optional
Defaults to 1
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

@chrisoutwright commented on GitHub (Oct 13, 2024): Is there a reason why the "n" parameter cannot be used as opposed to openai [api](https://platform.openai.com/docs/api-reference/chat/create)? ``` n integer or null Optional Defaults to 1 How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs. ```

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#62173

[GH-ISSUE #305] OpenAI API compatibility #62173

DO NOT MODIFY THIS FILE, create a new key.yaml, define OPENAI_API_KEY.

The configuration of key.yaml has a higher priority and will not enter git

Project Path Setting

WORKSPACE_PATH: "Path for placing output files"

if OpenAI

The official OPENAI_BASE_URL is https://api.openai.com/v1

If the official OPENAI_BASE_URL is not available, we recommend using the openai-forward.

Or, you can configure OPENAI_PROXY to access official OPENAI_BASE_URL.

OPENAI_API_KEY: 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 # set the value to sk-xxx if you host the openai interface for open llm model

if Spark

if Anthropic

if AZURE, check https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat.ipynb

if zhipuai from https://open.bigmodel.cn. You can set here or export API_KEY="YOUR_API_KEY"

ZHIPUAI_API_KEY: "YOUR_API_KEY"

if Google Gemini from https://ai.google.dev/ and API_KEY from https://makersuite.google.com/app/apikey.

You can set here or export GOOGLE_API_KEY="YOUR_API_KEY"

GEMINI_API_KEY: "YOUR_API_KEY"

if use self-host open llm model with openai-compatible interface

if use Fireworks api

for Search

Supported values: serpapi/google/serper/ddg

Visit https://serpapi.com/ to get key.

Visit https://console.cloud.google.com/apis/credentials to get key.

Visit https://programmablesearchengine.google.com/controlpanel/create to get id.

Visit https://serper.dev/ to get key.

for web access

Supported values: playwright/selenium

Supported values: chromium/firefox/webkit, visit https://playwright.dev/python/docs/api/class-browsertype

Supported values: chrome/firefox/edge/ie, visit https://www.selenium.dev/documentation/webdriver/browsers/

SELENIUM_BROWSER_TYPE: chrome

for TTS

for Stable Diffusion

Use SD service, based on https://github.com/AUTOMATIC1111/stable-diffusion-webui

for Execution

for Mermaid CLI

If you installed mmdc (Mermaid CLI) only for metagpt then enable the following configuration.

for calc_usage

CALC_USAGE: false

for Research

MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo

MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k

choose the engine for mermaid conversion,

default is nodejs, you can change it to playwright,pyppeteer or ink

MERMAID_ENGINE: nodejs

browser path for pyppeteer engine, support Chrome, Chromium,MS Edge

for repair non-openai LLM's output when parse json-text if PROMPT_FORMAT=json

due to non-openai LLM's output will not always follow the instruction, so here activate a post-process

repair operation on the content extracted from LLM's raw output. Warning, it improves the result but not fix all cases.

REPAIR_LLM_OUTPUT: false

PROMPT_FORMAT: json #json or markdown

if zhipuai from `https://open.bigmodel.cn`. You can set here or export API_KEY="YOUR_API_KEY"

if Google Gemini from `https://ai.google.dev/` and API_KEY from `https://makersuite.google.com/app/apikey`.