[PR #1115] [MERGED] fix: miscellaneous fix for Tokenizer #1125

Closed
opened 2026-03-22 16:01:19 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1115
Author: @minhdang26403
Created: 1/19/2026
Status: Merged
Merged: 1/19/2026
Merged by: @profvjreddi

Base: devHead: fix/tokenizer


📝 Commits (1)

  • 7af2499 fix: miscellaneous fix for Tokenizer

📊 Changes

1 file changed (+7 additions, -9 deletions)

View changed files

📝 tinytorch/src/10_tokenization/10_tokenization.py (+7 -9)

📄 Description

The Pull Request contains several miscellaneous fix for the Tokenizer module implementation:

  • When checking whether "" exists in the vocab list that the user passes in, we should use vocab (which is a set) instead of self.vocab (which is a list) for O(1) lookup.
  • sorted(...) function accept an iterable and returns a list, so we don't need to convert a Python set to a Python list before passing it to sorted.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1115 **Author:** [@minhdang26403](https://github.com/minhdang26403) **Created:** 1/19/2026 **Status:** ✅ Merged **Merged:** 1/19/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `fix/tokenizer` --- ### 📝 Commits (1) - [`7af2499`](https://github.com/harvard-edge/cs249r_book/commit/7af2499751391fada202d4872cab155890b4a22c) fix: miscellaneous fix for Tokenizer ### 📊 Changes **1 file changed** (+7 additions, -9 deletions) <details> <summary>View changed files</summary> 📝 `tinytorch/src/10_tokenization/10_tokenization.py` (+7 -9) </details> ### 📄 Description The Pull Request contains several miscellaneous fix for the Tokenizer module implementation: - When checking whether "<UNK>" exists in the vocab list that the user passes in, we should use `vocab` (which is a set) instead of `self.vocab` (which is a list) for O(1) lookup. - `sorted(...)` function accept an iterable and returns a list, so we don't need to convert a Python set to a Python list before passing it to `sorted`. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-03-22 16:01:19 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#1125