TinyTorch

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-07-21 00:35:39 -05:00

Files

T

Vijay Janapa Reddi 6d0afe4949 Document KV caching as inference-only (no gradient flow concerns)

Added comprehensive documentation clarifying that KV caching is designed
ONLY for inference (generation), not training.

Key Clarifications:
- Cache operations use .data (no gradient tracking)
- This is correct and intentional for maximum speed
- During generation: no gradients computed (model.eval() mode)
- During training: cache not used (standard forward pass)
- DO NOT use caching during training

Why This is Safe:
1. Training: Uses standard forward pass (full gradient flow)
2. Generation: No backward pass (no gradients needed)
3. Cache is inference optimization, not training component
4. .data usage is correct for generation-only use case

Documentation Updates:
- Added prominent warning in class docstring
- Updated update() method docs
- Updated get() method docs
- Added inline comments explaining .data usage

This addresses gradient flow concerns by making it crystal clear that
caching is never used when gradients are needed.

2025-11-05 14:05:47 -05:00

01_tensor

fix(module-01): Fix batched matmul and transpose grad preservation

2025-10-27 20:28:53 -04:00

02_activations

fix(module-02): Rewrite Softmax to use Tensor operations

2025-10-27 20:29:35 -04:00

03_layers

fix(module-03): Rewrite Dropout to use Tensor operations

2025-10-27 20:29:43 -04:00

04_losses

feat: Add Milestone 04 (CNN Revolution 1998) + Clean spatial imports