fix: Add missing typing imports to Module 10 tokenization

Issue: CharTokenizer was failing with NameError: name 'List' is not defined Root cause: typing imports were not marked with #| export Fix: ✅ Added #| export directive to import block in tokenization_dev.py ✅ Re-exported module using 'tito export 10_tokenization' ✅ typing.List, Dict, Tuple, Optional, Set now properly exported Verification: - CharTokenizer.build_vocab() works ✅ - encode() and decode() work ✅ - Tested on Shakespeare sample text ✅ This fixes the integration with vaswani_shakespeare.py which now properly uses CharTokenizer from Module 10 instead of manual tokenization.
2026-03-11 22:25:29 -05:00 · 2025-10-28 09:44:24 -04:00
parent 876d3406a0
commit 62636fa92a
3 changed files with 246 additions and 84 deletions
--- a/tinytorch/text/tokenization.py
+++ b/tinytorch/text/tokenization.py
@@ -21,6 +21,16 @@ __all__ = ['Tokenizer', 'CharTokenizer', 'BPETokenizer']
 #| default_exp text.tokenization
 #| export

+# %% ../../modules/source/10_tokenization/tokenization_dev.ipynb 3
+import numpy as np
+from typing import List, Dict, Tuple, Optional, Set
+import json
+import re
+from collections import defaultdict, Counter
+
+# Import only Module 01 (Tensor) - this module has minimal dependencies
+from ..core.tensor import Tensor
+
 # %% ../../modules/source/10_tokenization/tokenization_dev.ipynb 8
 class Tokenizer:
    """