Standardize section headers to use colons instead of dashes

This commit is contained in:
Vijay Janapa Reddi
2025-12-05 13:03:00 -08:00
parent 3aa6a9b040
commit 42025d34aa
4 changed files with 21 additions and 21 deletions

View File

@@ -165,7 +165,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 1. Introduction - The Memory Wall Problem
## 1. Introduction: The Memory Wall Problem
Imagine trying to fit a library in your backpack. Neural networks face the same challenge - models are getting huge, but devices have limited memory!
@@ -241,7 +241,7 @@ Today you'll build the production-quality quantization system that makes all thi
# %% [markdown]
"""
## 2. Foundations - The Mathematics of Compression
## 2. Foundations: The Mathematics of Compression
### Understanding the Core Challenge
@@ -354,7 +354,7 @@ INT8 gives us 4× memory reduction with <1% accuracy loss - the perfect balance
# %% [markdown]
"""
## 3. Implementation - Building the Quantization Engine
## 3. Implementation: Building the Quantization Engine
### Our Implementation Strategy
@@ -932,7 +932,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 4. Integration - Scaling to Full Neural Networks
## 4. Integration: Scaling to Full Neural Networks
### The Model Quantization Challenge
@@ -1331,7 +1331,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 5. Verification - Proving Optimization Works
## 5. Verification: Proving Optimization Works
Before analyzing quantization in production, let's verify that our optimization actually works using real measurements.
"""
@@ -1413,7 +1413,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 6. Systems Analysis - Quantization in Production
## 6. Systems Analysis: Quantization in Production
Now let's measure the real-world impact of quantization through systematic analysis.
"""

View File

@@ -338,7 +338,7 @@ Reconstruction Error:
# %% [markdown]
"""
## 3. Sparsity Measurement - Understanding Model Density
## 3. Sparsity Measurement: Understanding Model Density
Before we can compress models, we need to understand how dense they are. Sparsity measurement tells us what percentage of weights are zero (or effectively zero).
@@ -436,7 +436,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 4. Magnitude-Based Pruning - Removing Small Weights
## 4. Magnitude-Based Pruning: Removing Small Weights
Magnitude pruning is the simplest and most intuitive compression technique. It's based on the observation that weights with small magnitudes contribute little to the model's output.
@@ -593,7 +593,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 5. Structured Pruning - Hardware-Friendly Compression
## 5. Structured Pruning: Hardware-Friendly Compression
While magnitude pruning creates scattered zeros throughout the network, structured pruning removes entire computational units (channels, neurons, heads). This creates sparsity patterns that modern hardware can actually accelerate.
@@ -766,7 +766,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 6. Low-Rank Approximation - Matrix Compression Through Factorization
## 6. Low-Rank Approximation: Matrix Compression Through Factorization
Low-rank approximation discovers that large weight matrices often contain redundant information that can be captured with much smaller matrices through mathematical decomposition.
@@ -914,7 +914,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 7. Knowledge Distillation - Learning from Teacher Models
## 7. Knowledge Distillation: Learning from Teacher Models
Knowledge distillation is like having an expert teacher simplify complex concepts for a student. The large "teacher" model shares its knowledge with a smaller "student" model, achieving similar performance with far fewer parameters.
@@ -1332,7 +1332,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 5. Verification - Proving Pruning Works
## 5. Verification: Proving Pruning Works
Before analyzing compression in production, let's verify that our pruning actually achieves sparsity using real measurements.
"""
@@ -1403,7 +1403,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 6. Systems Analysis - Compression Techniques
## 6. Systems Analysis: Compression Techniques
Understanding the real-world effectiveness of different compression techniques through systematic measurement and comparison.

View File

@@ -1367,7 +1367,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 5. Verification - Proving KV Cache Speedup
## 5. Verification: Proving KV Cache Speedup
Before analyzing KV cache performance, let's verify that caching actually provides the dramatic speedup we expect using real timing measurements.
"""
@@ -1463,7 +1463,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 6. Systems Analysis - KV Cache Performance
## 6. Systems Analysis: KV Cache Performance
Now let's analyze the performance characteristics and trade-offs of KV caching.
"""

View File

@@ -91,7 +91,7 @@ We'll fix these issues with vectorization and kernel fusion, achieving 2-5× spe
# %% [markdown]
"""
## 1. Introduction - The Performance Challenge
## 1. Introduction: The Performance Challenge
Modern neural networks face two fundamental bottlenecks that limit their speed:
@@ -153,7 +153,7 @@ from tinytorch.core.tensor import Tensor
# %% [markdown]
"""
## 2. Foundations - Vectorization: From Loops to Lightning
## 2. Foundations: Vectorization: From Loops to Lightning
### The SIMD Revolution
@@ -328,7 +328,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 3. Implementation - Kernel Fusion: Eliminating Memory Bottlenecks
## 3. Implementation: Kernel Fusion: Eliminating Memory Bottlenecks
### The Memory Bandwidth Crisis
@@ -754,7 +754,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 4. Verification - Proving Vectorization Speedup
## 4. Verification: Proving Vectorization Speedup
Before analyzing acceleration performance, let's verify that vectorization actually provides significant speedup using real timing measurements.
"""
@@ -849,7 +849,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 5. Systems Analysis - Performance Scaling Patterns
## 5. Systems Analysis: Performance Scaling Patterns
Let's analyze how our acceleration techniques perform across different scenarios and understand their scaling characteristics.
"""
@@ -1062,7 +1062,7 @@ if __name__ == "__main__":
# %% [markdown]
"""
## 5. Optimization Insights - Production Acceleration Strategy
## 5. Optimization Insights: Production Acceleration Strategy
Understanding when and how to apply different acceleration techniques in real-world scenarios.
"""