[PR #4917] [MERGED] convert bert model from safetensors #11618

Closed
opened 2026-04-12 23:33:47 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4917
Author: @mxyng
Created: 6/7/2024
Status: Merged
Merged: 8/21/2024
Merged by: @mxyng

Base: mainHead: mxyng/convert-bert


📝 Commits (2)

📊 Changes

7 files changed (+344 additions, -15 deletions)

View changed files

📝 cmd/cmd.go (+13 -0)
📝 convert/convert.go (+12 -0)
convert/convert_bert.go (+176 -0)
📝 convert/convert_test.go (+1 -0)
📝 convert/reader.go (+2 -0)
convert/testdata/all-MiniLM-L6-v2.json (+124 -0)
📝 convert/tokenizer.go (+16 -15)

📄 Description

add a moreParser interface which converters can implement to signal a need for more configuration parsing

fix a bug in the tokenizer.json parsing where vocab size might exceed intended count if added_token.json contains tokens already defined

fix a bug in cmd where create will flatten the directory structure potentially creating conflicting files


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4917 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 6/7/2024 **Status:** ✅ Merged **Merged:** 8/21/2024 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/convert-bert` --- ### 📝 Commits (2) - [`5a28b9c`](https://github.com/ollama/ollama/commit/5a28b9cf5fcb3994aa1a143118c73c7d1fbf3bf9) bert - [`beb49ee`](https://github.com/ollama/ollama/commit/beb49eef65acefc64a6ae0562ce58467e6974fde) create bert models from cli ### 📊 Changes **7 files changed** (+344 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+13 -0) 📝 `convert/convert.go` (+12 -0) ➕ `convert/convert_bert.go` (+176 -0) 📝 `convert/convert_test.go` (+1 -0) 📝 `convert/reader.go` (+2 -0) ➕ `convert/testdata/all-MiniLM-L6-v2.json` (+124 -0) 📝 `convert/tokenizer.go` (+16 -15) </details> ### 📄 Description add a `moreParser` interface which converters can implement to signal a need for more configuration parsing fix a bug in the tokenizer.json parsing where vocab size might exceed intended count if added_token.json contains tokens already defined fix a bug in cmd where create will flatten the directory structure potentially creating conflicting files --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:33:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11618