mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 16:11:34 -05:00
Closed
opened 2026-05-03 07:44:38 -05:00 by GiteaMirror
·
60 comments
No Branch/Tag Specified
main
dhiltgen/ci
parth-launch-plan-gating
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
feature request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#62173
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @handrew on GitHub (Aug 7, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/305
Originally assigned to: @jmorganca on GitHub.
Any chance you would consider mirroring OpenAI's API specs and output? e.g., /completions and /chat/completions. That way, it could be a drop-in replacement for the Python openai package by changing out the url.
@priamai commented on GitHub (Aug 10, 2023):
That would be awesome and also embeddings!
@hakt0-r commented on GitHub (Aug 11, 2023):
yup I'll +1 on this too :-)
@kamuridesu commented on GitHub (Aug 11, 2023):
+1
@loyaliu commented on GitHub (Aug 30, 2023):
+1
@colindotfun commented on GitHub (Sep 1, 2023):
this would be a big win
prior work: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
and
https://github.com/ggerganov/llama.cpp/blob/master/examples/server/api_like_OAI.py
@ValValu commented on GitHub (Sep 2, 2023):
yeah would be great1!
@jmorganca commented on GitHub (Sep 7, 2023):
Thanks for the issue and comments, all! Sorry for not replying sooner. Which clients/use cases are you looking to use that require the OpenAI API? Quite a few folks have mentioned LlamaIndex (also: see #278!) Would love to know!
@kamuridesu commented on GitHub (Sep 7, 2023):
Interoperability with OpenAI projects, like Auto-GPT. If you check https://github.com/go-skynet/LocalAI, you can see that their API works with pretty much every project that uses the OpenAI endpoint, in most cases you just need to point an Environment Variable to it.
@colindotfun commented on GitHub (Sep 7, 2023):
www.galactus.ai also
@cori commented on GitHub (Sep 8, 2023):
I was looking to connect to it with both Continue.dev (which supports Ollama explicitly) and LocalAI, so interop was my hope as well.
@MchLrnX commented on GitHub (Sep 19, 2023):
I'd love to be able to do this. I'm specifically looking at running ToolBench, MetaGPT and ChatDEV. I have MetaGPT ready to test with this if we get this working.
@comalice commented on GitHub (Sep 28, 2023):
I'd like to throw in Ironclad's Rivet application expects an OpenAI API endpoint as well: https://github.com/Ironclad/rivet
@mjtechguy commented on GitHub (Sep 29, 2023):
+1. I would like to use ollama as a target for LibreChat: https://github.com/danny-avila/LibreChat/tree/main
@jtoy commented on GitHub (Sep 29, 2023):
+1
@Anon2578 commented on GitHub (Sep 30, 2023):
Yes this would be a plus one if we can get this working with openai api specs. Can someone notify me when this is done I might forget and this was one of the reasons I took a look at this project.
@shtrophic commented on GitHub (Oct 1, 2023):
This would be pretty cool since Nextcloud instances could use a locally running ollama server. Nextcloud itself ships with openai/localai compatability (through a plugin).
@Nivek92 commented on GitHub (Oct 4, 2023):
AutoGen would be another usecase - https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/
@rcalv002 commented on GitHub (Oct 7, 2023):
+1
@vividfog commented on GitHub (Oct 7, 2023):
I'm surprised LiteLLM hasn't been mentioned in the thread yet. Found it from the README.md of Ollama repo today. "Call LLM APIs using the OpenAI format", 100+ of them, including Ollama. This worked for me:
pip install litellmollama pull codellamalitellm --model ollama/codellama --api_base http://localhost:11434 --temperature 0.3 --max_tokens 2048Double check that the port, model name and parameters match your configuration and VRAM situation.
As an example, Continue.dev configuration then goes like this, OpenAI style:
Set context_length and max_tokens as appropriate. 2048 is a conservative value if you're gpu-poor or aren't sure.
Note that LiteLLM/Uvicorn opens the API at 0.0.0.0:8000, it's not confined to localhost by default and people can piggyback on your server if it's not a private network. I believe you need to edit litellm source code here if you want to only serve localhost, then
pip install -e .from that local clone before runninglitellm.@ishaan-jaff commented on GitHub (Oct 7, 2023):
Thanks for mentioning us @vividfog ! (I'm the maintainer of LiteLLM) We allow you to create an OpenAI compatible proxy server for ollama
Here's a link to the section on our docs on how to do this: https://docs.litellm.ai/docs/proxy_server
Please let me know how we can make it better for the ollama community😃
@ghost commented on GitHub (Oct 8, 2023):
Hey @vividfog thanks for this incredible tutorial.
I added it to our docs and gave you credit for it.
Docs: https://docs.litellm.ai/docs/proxy_server#tutorial-use-with-aiderautogencontinue-dev

If you have a twitter/linkedin - happy to link to that instead!
@shtrophic commented on GitHub (Oct 8, 2023):
Wow, thanks for pointing to litellm @vividfog.
For anyone on Arch Linux (btw) and interested, I came up with a PKGBUILD that sets up litellm with ollama as a systemd service. You can check it out on the AUR. Feel free to get back to me with any feedback!
@vividfog commented on GitHub (Oct 8, 2023):
My initial advice was not complete I learned today. Continue.dev sends two parallel queries, one for the user task and another to summarize the conversation. And LiteLLM logs may show an error from Ollama after the second call. There's a fix for this client-side.
This Continue.dev configuration imports a wrapper that makes all calls sequential, queued:
config.py:This may now be leaning off-topic vs. the original issue, but hope it helps those who used the previous advice. The friendly developers at Continue.dev Github/Discord are there if needed. I learned about the QueuedLLM wrapper initially in their Discord.
What remains a little confusing is that previously I've seen Ollama handle parallel API calls in sequence, or was I hallucinating? Not sure why QueuedLLM() is then needed, but if the shoe fits, wear it I guess. Material for another issue if someone wants to drill down and verify.
What I really like is how these 3 projects work together without knowing about each other at code level, as if following the same plan. That indeed is the benefit of following the same API conventions, the topic of this issue.
@MilleniumDawn commented on GitHub (Nov 11, 2023):
I realise its probably my lack of knowledge that is the probleme, but my Front end can use either LM Studio or oobabooga/text-generation-webui simply by change the base_api.
I wanted to try Ollama cause its seem to be doing a lot of thing simpler/faster.
But not supporting what seem to develop as the goto format for API, openAi api is a big minus. (i realise this is free, i dont want to be a choser/begger, just trying to provide feedback).
I try LiteLLM, and its not a drop-in replacement, and now, what was suposed to be simple, need to be debbuged.
So my feedback is, i hope Ollama gonna nativly have support for openAI API rather than rely on external Library that migh seem easy for ppl who know there stuff, but not as easy for ppl that went to Ollama for its simplicity.
I'm leaving my error log of LiteLLM just as reference, i know its not this project.
@PetrarcaBruto commented on GitHub (Nov 14, 2023):
I agree about the speed of litellm vs the ollama server comment made by @MilleniumDawn. I may be wrong but I have noticed the native ollama server logs that my WSL GPU is being used, e.g. the following server message:
"ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti with Max-Q Design, compute capability 7.5"
I suspect that litellm server or workers are not using my GPU. If that is the case then it will explain the difference in speed.
Any comments/advice will be very welcomed.
@kylemclaren commented on GitHub (Nov 14, 2023):
@PetrarcaBruto
nvidia-smishould show the ollama runner process if GPU is utilized, like this:@ghost commented on GitHub (Nov 14, 2023):
+1
@ghost commented on GitHub (Nov 15, 2023):
Hey @MilleniumDawn i found the issue - it was being misrouted. Just pushed a fix -
1738341dcbShould be live in v
1.0.2by EOD. I'm really sorry for that.@PetrarcaBruto re: litellm workers
For ollama specifically - we check if you're making an ollama call, and run
ollama servein a separate worker -c7780cbc40/litellm/proxy/proxy_cli.py (L20)open to suggestions for how we can improve this further.
@PetrarcaBruto commented on GitHub (Nov 15, 2023):
@kylemclaren & @krrishdholakia thanks for the tips. I found that my GPU is being used also when running litellm which is good news.
@patrickdobler commented on GitHub (Nov 16, 2023):
That would be a great addition. I would love to use Ollama with TypingMind.
@priamai commented on GitHub (Nov 16, 2023):
AMAZING how did I not see this before! It will be useful to add also a simple API_TOKEN so at least I can put it on a cloud service without having to fiddle with additional proxy authenticators.
@ghost commented on GitHub (Nov 16, 2023):
@priamai we have that - https://docs.litellm.ai/docs/simple_proxy#example-config
you can add a master_key in the config.yaml, and this will require all calls to pass that key as part of the bearer token.
let me know if you end up using it, would love to know how we can improve it for you - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
@agonbina commented on GitHub (Nov 18, 2023):
embeddings with Ollama do not seem to be supported through the Litellm proxy.
@sottey commented on GitHub (Nov 25, 2023):
I, too, would love this. It would allow me to integrate in TypingMind. Thank you for your amazing stuff!
@iplayfast commented on GitHub (Nov 27, 2023):
Yeah, I'm trying to use litellm and it's a very weak crutch. If you want something done right you gotta do it yourself and build the openai api into ollama.
@kamuridesu commented on GitHub (Nov 27, 2023):
So go ahead and contribute with a pr for ollama or improving litellm
@flaviovs commented on GitHub (Nov 29, 2023):
Two things to be aware of when using LiteLLM:
I hope this saves people's time if their plan is to use Ollama+LiteLLM offline for privacy/compliance reasons.
@MARYAMJAHANIR commented on GitHub (Dec 15, 2023):
hey,
I was trying Autogen with ollama/littellm config and using mistral and codellama models but it gave me an error when the OpenAIWrapper attempts to handle the configuration provided the same as the video.
Error:
/home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py
(autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$ /home/maryam_linux/miniconda3/envs/autogen/bin/python /mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py
Traceback (most recent call last):
File "/mnt/c/Users/Hp/autogen_wsl/autogen_yt1.py", line 25, in
assistant = autogen.AssistantAgent(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/assistant_agent.py", line 61, in init
super().init(
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/agentchat/conversable_agent.py", line 121, in init
self.client = OpenAIWrapper(**self.llm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in init
self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 83, in
self._clients = [self._client(config, openai_config) for config in config_list] # could modify the config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/autogen/oai/client.py", line 144, in _client
client = OpenAI(**openai_config)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maryam_linux/miniconda3/envs/autogen/lib/python3.11/site-packages/openai/_client.py", line 92, in init
raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
(autogen) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl$
If you can suggest something regarding this so it wuld be great.
@clevcode commented on GitHub (Dec 19, 2023):
The litellm proxy doesn't care about the value of the API key, or whether it is sent or not, but since the OpenAI package requires it to be set you can simply set it to anything in order to satisfy the requirements of the OpenAI module
Either use "export OPENAI_API_KEY=whatever" in the shell before you run your agent, or set "api_key": "whatever" in the llm_config dict that you pass to the *Agent() constructors
@MARYAMJAHANIR commented on GitHub (Dec 22, 2023):
@clevcode thanks for your reply i have sorted that but the thing is when i tried this with meta got so i was getting error like this:
(metagpt) (base) maryam_linux@Maryam:/mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt$ python startup.py "create a 2048 game in python"
2023-12-22 07:27:14.516 | INFO | metagpt.const:get_metagpt_package_root:32 - Package root set to /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt
2023-12-22 07:27:15.188 | INFO | metagpt.config:get_default_llm_provider_enum:88 - OpenAI API Model: gpt-4-1106-preview
2023-12-22 07:27:15.869 | INFO | metagpt.team:invest:84 - Investment: $3.0.
2023-12-22 07:27:15.873 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to PrepareDocuments
2023-12-22 07:27:16.639 | INFO | metagpt.utils.file_repository:save:60 - save to: /mnt/c/Users/Hp/autogen_wsl/Metagpt/metagpt/workspace/20231222072715/docs/requirement.txt
2023-12-22 07:27:16.646 | INFO | metagpt.roles.role:_act:379 - Alice(Product Manager): ready to WritePRD
2023-12-22 07:27:16.960 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.293(s), this was the 1st time calling it. exp: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}2023-12-22 07:27:17.467 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 0.800(s), this was the 2nd time calling it. exp: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}2023-12-22 07:27:18.830 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 2.163(s), this was the 3rd time calling it. exp: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}2023-12-22 07:27:21.207 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 4.540(s), this was the 4th time calling it. exp: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}2023-12-22 07:27:21.955 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 5.288(s), this was the 5th time calling it. exp: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}2023-12-22 07:27:23.664 | ERROR | metagpt.utils.common:log_it:433 - Finished call to 'metagpt.actions.action_node.ActionNode._aask_v1' after 6.997(s), this was the 6th time calling it. exp: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}2023-12-22 07:27:23.668 | WARNING | metagpt.utils.common:wrapper:505 - There is a exception in role's execution, in order to resume, we delete the newest role communication message in the role's memory.
2023-12-22 07:27:23.698 | ERROR | metagpt.utils.common:wrapper:487 - Exception occurs, start to serialize the project, exp:
Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1
content = await self.llm.aask(prompt, system_msgs)
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper
return await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run
rsp = await self.react()
tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 482, in wrapper
result = await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/team.py", line 124, in run
await self.env.run()
Exception: Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 256, in _aask_v1
content = await self.llm.aask(prompt, system_msgs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/base_gpt_api.py", line 53, in aask
rsp = await self.acompletion_text(message, stream=stream)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 274, in acompletion_text
return await self._achat_completion_stream(messages)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/provider/openai_api.py", line 211, in _achat_completion_stream
response: AsyncStream[ChatCompletionChunk] = await self.async_client.chat.completions.create(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 1295, in create
return await self._post(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1536, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1315, in request
return await self._request(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/openai/_base_client.py", line 1392, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model
gpt-4-1106-previewdoes not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/utils/common.py", line 496, in wrapper
return await func(self, *args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 528, in run
rsp = await self.react()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 479, in react
rsp = await self._react()
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 459, in _react
rsp = await self._act() # 这个rsp是否需要publish_message?
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/roles/role.py", line 380, in _act
response = await self._rc.todo.run(self._rc.important_memory)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 105, in run
prd_doc = await self._update_prd(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 146, in _update_prd
prd = await self._run_new_requirement(
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/write_prd.py", line 126, in _run_new_requirement
node = await WRITE_PRD_NODE.fill(context=context, llm=self.llm, schema=schema)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 314, in fill
return await self.simple_fill(schema, mode)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/metagpt-0.5.2-py3.9.egg/metagpt/actions/action_node.py", line 288, in simple_fill
content, scontent = await self._aask_v1(prompt, class_name, mapping, schema=schema)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
File "/home/maryam_linux/miniconda3/envs/metagpt/lib/python3.9/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f00d4958dc0 state=finished raised NotFoundError>]
i thought i am not configured it in a right way but don't know exactly what should i do for this.
@MARYAMJAHANIR commented on GitHub (Dec 22, 2023):
@clevcode I am trying metagpt with ollama model codellama with the help of litellm so that i don't need any API key but now it seems difficult because it did not work yet here is metagpt config.yaml file:
DO NOT MODIFY THIS FILE, create a new key.yaml, define OPENAI_API_KEY.
The configuration of key.yaml has a higher priority and will not enter git
Project Path Setting
WORKSPACE_PATH: "Path for placing output files"
if OpenAI
The official OPENAI_BASE_URL is https://api.openai.com/v1
If the official OPENAI_BASE_URL is not available, we recommend using the openai-forward.
Or, you can configure OPENAI_PROXY to access official OPENAI_BASE_URL.
OPENAI_BASE_URL: "http://0.0.0.0:8000"
#OPENAI_PROXY: "http://127.0.0.1:8118"
OPENAI_API_KEY: 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 # set the value to sk-xxx if you host the openai interface for open llm model
OPENAI_API_MODEL: "ollama/codellama"
MAX_TOKENS: 4096
RPM: 10
if Spark
#SPARK_APPID : "YOUR_APPID"
#SPARK_API_SECRET : "YOUR_APISecret"
#SPARK_API_KEY : "YOUR_APIKey"
#DOMAIN : "generalv2"
#SPARK_URL : "ws://spark-api.xf-yun.com/v2.1/chat"
if Anthropic
#ANTHROPIC_API_KEY: "YOUR_API_KEY"
if AZURE, check https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat.ipynb
#OPENAI_API_TYPE: "azure"
#OPENAI_BASE_URL: "YOUR_AZURE_ENDPOINT"
#OPENAI_API_KEY: "YOUR_AZURE_API_KEY"
#OPENAI_API_VERSION: "YOUR_AZURE_API_VERSION"
#DEPLOYMENT_NAME: "YOUR_DEPLOYMENT_NAME"
if zhipuai from
https://open.bigmodel.cn. You can set here or export API_KEY="YOUR_API_KEY"ZHIPUAI_API_KEY: "YOUR_API_KEY"
if Google Gemini from
https://ai.google.dev/and API_KEY fromhttps://makersuite.google.com/app/apikey.You can set here or export GOOGLE_API_KEY="YOUR_API_KEY"
GEMINI_API_KEY: "YOUR_API_KEY"
if use self-host open llm model with openai-compatible interface
#OPEN_LLM_API_BASE: "http://127.0.0.1:8000/v1"
#OPEN_LLM_API_MODEL: "llama2-13b"
if use Fireworks api
#FIREWORKS_API_KEY: "YOUR_API_KEY"
#FIREWORKS_API_BASE: "https://api.fireworks.ai/inference/v1"
#FIREWORKS_API_MODEL: "YOUR_LLM_MODEL" # example, accounts/fireworks/models/llama-v2-13b-chat
for Search
Supported values: serpapi/google/serper/ddg
#SEARCH_ENGINE: serpapi
Visit https://serpapi.com/ to get key.
#SERPAPI_API_KEY: "YOUR_API_KEY"
Visit https://console.cloud.google.com/apis/credentials to get key.
#GOOGLE_API_KEY: "YOUR_API_KEY"
Visit https://programmablesearchengine.google.com/controlpanel/create to get id.
#GOOGLE_CSE_ID: "YOUR_CSE_ID"
Visit https://serper.dev/ to get key.
#SERPER_API_KEY: "YOUR_API_KEY"
for web access
Supported values: playwright/selenium
#WEB_BROWSER_ENGINE: playwright
Supported values: chromium/firefox/webkit, visit https://playwright.dev/python/docs/api/class-browsertype
##PLAYWRIGHT_BROWSER_TYPE: chromium
Supported values: chrome/firefox/edge/ie, visit https://www.selenium.dev/documentation/webdriver/browsers/
SELENIUM_BROWSER_TYPE: chrome
for TTS
#AZURE_TTS_SUBSCRIPTION_KEY: "YOUR_API_KEY"
#AZURE_TTS_REGION: "eastus"
for Stable Diffusion
Use SD service, based on https://github.com/AUTOMATIC1111/stable-diffusion-webui
#SD_URL: "YOUR_SD_URL"
#SD_T2I_API: "/sdapi/v1/txt2img"
for Execution
#LONG_TERM_MEMORY: false
for Mermaid CLI
If you installed mmdc (Mermaid CLI) only for metagpt then enable the following configuration.
#PUPPETEER_CONFIG: "./config/puppeteer-config.json"
#MMDC: "./node_modules/.bin/mmdc"
for calc_usage
CALC_USAGE: false
for Research
MODEL_FOR_RESEARCHER_SUMMARY: gpt-3.5-turbo
MODEL_FOR_RESEARCHER_REPORT: gpt-3.5-turbo-16k
choose the engine for mermaid conversion,
default is nodejs, you can change it to playwright,pyppeteer or ink
MERMAID_ENGINE: nodejs
browser path for pyppeteer engine, support Chrome, Chromium,MS Edge
#PYPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable"
for repair non-openai LLM's output when parse json-text if PROMPT_FORMAT=json
due to non-openai LLM's output will not always follow the instruction, so here activate a post-process
repair operation on the content extracted from LLM's raw output. Warning, it improves the result but not fix all cases.
REPAIR_LLM_OUTPUT: false
PROMPT_FORMAT: json #json or markdown
@bdurrani commented on GitHub (Dec 23, 2023):
There is also BrainGPT which requires OpenAPI compatibility
@puckettgw commented on GitHub (Dec 31, 2023):
+1 for this issue
I'm trying to use LangChain to create a GitHub coder bot. Trouble is, Ollama doesn't produce the output expected by certain tools, e.g.
GitHub Toolkit CreateFile
The output from Ollama + Mixtral is
But the toolkit is expecting a formatted_file arg:
Of course I could implement my own tools for this, but that's kind of smelly.
@louis030195 commented on GitHub (Jan 11, 2024):
Would be great to be able to use ollama with OpenAI SDK directly, (and not having to use stuff like litellm)
@vtboyarc commented on GitHub (Jan 27, 2024):
Is this being worked?
@johnnyq commented on GitHub (Feb 5, 2024):
Yes Please make it openai API compatible to intergrate with FusionPBX for Voicemail Transriptions and Nextcloud Integration for the AI functions
@NeevJewalkar commented on GitHub (Feb 5, 2024):
is this being worked on?
@jmorganca commented on GitHub (Feb 6, 2024):
It is! https://github.com/ollama/ollama/pull/2376
@jmorganca commented on GitHub (Feb 8, 2024):
Wanted to share an update: version 0.1.24 is out with initial OpenAI compatibility.
@johnnyq commented on GitHub (Feb 8, 2024):
@jmorganca works great!
We just connected it with our Nextcloud instance, unfortunately though Nextcloud doesn't let you select models so we basically just copied over llama2 to gpt-4 and Nextcloud is now communicating
Hopefully in the future Nextcloud gets full integration with the Ollama API.
Thanks a bunch for this!!
@spmfox commented on GitHub (Feb 9, 2024):
It looks like the API for /v1/models isn't implemented yet (see the 404 errors above), I assume this returns the available models - my Nextcloud could not detect them either, and it defaulted to "gpt-3.5-turbo".
I was able to work around this by just doing a 'ollama cp' from the model I wanted to the model Nextcloud was expecting (gpt-3.5-turbo), then it works.
@johnnyq commented on GitHub (Feb 9, 2024):
@spmfox In Nextcloud under Administration Settings > Connect accounts > OpenAI and LocalAI Integration under endpoint make sure you choose Chat Completions instead of Completions
for the API key use Ollama
@spmfox commented on GitHub (Feb 9, 2024):
I was, you can see in the screenshot that ollama is responding to /v1/chat/completions - but it does not respond to /v1/models - and that is what Nextcloud needs to enumerate the possible models that can be used.
@johnnyq commented on GitHub (Feb 9, 2024):
gotchya yeah deff an upstream thing with Nextcloud, ill take a look to see if this issue was reported on their github and raise it with them referencing this issue #
@guilhermecgs commented on GitHub (Feb 12, 2024):
Hi folks,
do we already have compatibility with OpenAPI Assistant API?
https://platform.openai.com/docs/api-reference/assistants
@Progaros commented on GitHub (Feb 12, 2024):
I was trying to get ollama running with AutoGPT.
curl works:
but with this AutoGPT config:
I can't get the connection:
maybe someone will figure it out and can post an update here
@pamelafox commented on GitHub (Feb 13, 2024):
I haven't used AutoGPT, but I would imagine that the base URL would be more like OPENAI_API_BASE_URL= http://localhost:11434/v1
One thing that I often do to debug OpenAI connections is to set my logging level to debug-
The OpenAI Python SDK always logs its HTTP request URLs, so you can see what's gone awry.
@lks-ai commented on GitHub (Mar 21, 2024):
Could you guys provide support for normal completion? I really really need it. I was using vLLM but switching to Ollama for a Colab project... and though you have /v1/chat/completions ... where is /v1/completions?
@Tanguille commented on GitHub (May 2, 2024):
Did you learn anything more about this? I can't get ollama to work within nextcloud.
@chrisoutwright commented on GitHub (Oct 13, 2024):
Is there a reason why the "n" parameter cannot be used as opposed to openai api?