mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-06-02 18:36:30 -05:00

Files

Vijay Janapa Reddi cd89364a69 Add difficulty ratings to all module README files

- Add Module Info sections with difficulty ratings to all README.md files
- Use consistent 4-star difficulty scale: ⭐ Beginner, ⭐⭐ Intermediate, ⭐⭐⭐ Advanced, ⭐⭐⭐⭐ Expert
- Include time estimates, prerequisites, and next steps for each module
- Maintain clear separation: README.md = student experience, module.yaml = system metadata
- Difficulty progression: Setup (⭐) → Tensor/Activations/Layers (⭐⭐) → Networks/CNN/DataLoader (⭐⭐⭐) → Transformer (⭐⭐⭐⭐)
- Help students plan their learning journey and set appropriate expectations

2025-07-11 23:53:43 -04:00

1.6 KiB

Raw Blame History

Transformer Module

📊 Module Info

Difficulty: ⭐⭐⭐⭐ Expert
Time Estimate: 8-12 hours
Prerequisites: Tensor, Layers, Networks, Autograd, Training modules
Next Steps: Advanced NLP, Large Language Models

Status: 🚧 Coming Soon

Overview

The Transformer module will be a lightweight implementation of transformer architecture, teaching students how modern attention-based models work from the ground up.

Learning Goals

Understand attention mechanisms and their computational complexity
Implement multi-head attention from scratch
Learn about positional encoding and layer normalization
Explore transformer architecture design patterns
Understand memory and computational optimization for attention

Module Dependencies

This module builds on:

tensor - For all computations
layers - For feed-forward networks
networks - For composing transformer blocks
autograd - For training attention models
training - For training transformer models

Planned Components

Attention mechanism implementation
Multi-head attention
Positional encoding
Layer normalization
Transformer blocks
Complete transformer architecture
Memory optimization techniques
Attention visualization tools

Systems Focus

Memory management for attention matrices
Computational complexity analysis
Parallelization of multi-head attention
Optimization techniques (sparse attention, linear attention)
Scaling considerations for large sequences

This module will be implemented after the core modules (tensor, layers, networks, autograd, training) are complete.

1.6 KiB Raw Blame History