mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-06-02 08:32:31 -05:00

Files

Vijay Janapa Reddi cd89364a69 Add difficulty ratings to all module README files

- Add Module Info sections with difficulty ratings to all README.md files
- Use consistent 4-star difficulty scale: ⭐ Beginner, ⭐⭐ Intermediate, ⭐⭐⭐ Advanced, ⭐⭐⭐⭐ Expert
- Include time estimates, prerequisites, and next steps for each module
- Maintain clear separation: README.md = student experience, module.yaml = system metadata
- Difficulty progression: Setup (⭐) → Tensor/Activations/Layers (⭐⭐) → Networks/CNN/DataLoader (⭐⭐⭐) → Transformer (⭐⭐⭐⭐)
- Help students plan their learning journey and set appropriate expectations

2025-07-11 23:53:43 -04:00

README.md

Add difficulty ratings to all module README files

2025-07-11 23:53:43 -04:00

README.md

Transformer Module

📊 Module Info

Difficulty: ⭐⭐⭐⭐ Expert
Time Estimate: 8-12 hours
Prerequisites: Tensor, Layers, Networks, Autograd, Training modules
Next Steps: Advanced NLP, Large Language Models

Status: 🚧 Coming Soon

Overview

The Transformer module will be a lightweight implementation of transformer architecture, teaching students how modern attention-based models work from the ground up.

Learning Goals

Understand attention mechanisms and their computational complexity
Implement multi-head attention from scratch
Learn about positional encoding and layer normalization
Explore transformer architecture design patterns
Understand memory and computational optimization for attention

Module Dependencies

This module builds on:

tensor - For all computations
layers - For feed-forward networks
networks - For composing transformer blocks
autograd - For training attention models
training - For training transformer models

Planned Components

Attention mechanism implementation
Multi-head attention
Positional encoding
Layer normalization
Transformer blocks
Complete transformer architecture
Memory optimization techniques
Attention visualization tools

Systems Focus

Memory management for attention matrices
Computational complexity analysis
Parallelization of multi-head attention
Optimization techniques (sparse attention, linear attention)
Scaling considerations for large sequences

This module will be implemented after the core modules (tensor, layers, networks, autograd, training) are complete.