mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-05 22:37:41 -05:00
- Improve module descriptions and learning objectives - Standardize documentation format and structure - Add clearer guidance for students - Enhance module-specific context and examples
440 lines
20 KiB
Markdown
440 lines
20 KiB
Markdown
---
|
||
title: "Quantization - Reduced Precision for Efficiency"
|
||
description: "INT8 quantization fundamentals, calibration strategies, and accuracy-efficiency trade-offs"
|
||
difficulty: "⭐⭐⭐"
|
||
time_estimate: "5-6 hours"
|
||
prerequisites: ["Profiling"]
|
||
next_steps: ["Compression"]
|
||
learning_objectives:
|
||
- "Understand how quantization reduces memory by 4× through precision reduction from FP32 to INT8"
|
||
- "Implement symmetric and asymmetric quantization with scale and zero-point parameters"
|
||
- "Design calibration strategies using representative data to minimize accuracy degradation"
|
||
- "Measure the accuracy-efficiency frontier: when 1% accuracy loss justifies 4× memory savings"
|
||
- "Recognize quantization as educational foundation vs production INT8 hardware acceleration"
|
||
---
|
||
|
||
# 15. Quantization - Reduced Precision for Efficiency
|
||
|
||
**OPTIMIZATION TIER** | Difficulty: ⭐⭐⭐ (3/4) | Time: 5-6 hours
|
||
|
||
## Overview
|
||
|
||
This module implements quantization fundamentals: converting FP32 tensors to INT8 representation to reduce memory by 4×. You'll build the mathematics of scale/zero-point quantization, implement quantized linear layers, and measure accuracy-efficiency trade-offs. CRITICAL HONESTY: You're implementing quantization math in Python, NOT actual hardware INT8 operations. This teaches the principles that enable TensorFlow Lite/PyTorch Mobile deployment, but real speedups require specialized hardware (Edge TPU, Neural Engine) or compiled frameworks with INT8 kernels. Your implementation will be 4× more memory-efficient but not faster - understanding WHY teaches you what production quantization frameworks must optimize.
|
||
|
||
## Learning Objectives
|
||
|
||
By the end of this module, you will be able to:
|
||
|
||
- **Quantization Mathematics**: Implement symmetric and asymmetric INT8 quantization with scale/zero-point parameter calculation
|
||
- **Calibration Strategies**: Design percentile-based calibration to minimize accuracy loss when selecting quantization parameters
|
||
- **Memory-Accuracy Trade-offs**: Measure when 4× memory reduction justifies 0.5-2% accuracy degradation for deployment
|
||
- **Production Reality**: Distinguish between educational quantization (Python simulation) vs production INT8 (hardware acceleration, kernel fusion)
|
||
- **When to Quantize**: Recognize deployment scenarios where quantization is mandatory (mobile/edge) vs optional (cloud serving)
|
||
|
||
## Build → Use → Optimize
|
||
|
||
This module follows TinyTorch's **Build → Use → Optimize** framework:
|
||
|
||
1. **Build**: Implement INT8 quantization/dequantization, calibration logic, QuantizedLinear layers
|
||
2. **Use**: Quantize trained models, measure accuracy degradation vs memory savings on MNIST/CIFAR
|
||
3. **Optimize**: Analyze the accuracy-efficiency frontier - when does quantization enable deployment vs hurt accuracy unacceptably?
|
||
|
||
## Implementation Guide
|
||
|
||
### Quantization Flow: FP32 → INT8
|
||
|
||
Quantization compresses weights by reducing precision, trading accuracy for memory efficiency:
|
||
|
||
```{mermaid}
|
||
graph LR
|
||
A[FP32 Weight<br/>4 bytes<br/>-3.14159] --> B[Quantize<br/>scale + zero_point]
|
||
B --> C[INT8 Weight<br/>1 byte<br/>-126]
|
||
C --> D[Dequantize<br/>Inference]
|
||
D --> E[FP32 Compute<br/>Result]
|
||
|
||
style A fill:#e3f2fd
|
||
style B fill:#fff3e0
|
||
style C fill:#f3e5f5
|
||
style D fill:#ffe0b2
|
||
style E fill:#f0fdf4
|
||
```
|
||
|
||
**Flow**: Original FP32 → Calibrate scale → Store as INT8 (4× smaller) → Dequantize for computation → FP32 result
|
||
|
||
### What You're Actually Building (Educational Quantization)
|
||
|
||
**Your Implementation:**
|
||
- Quantization math: FP32 → INT8 conversion with scale/zero-point
|
||
- QuantizedLinear: Store weights as INT8, compute in simulated quantized arithmetic
|
||
- Calibration: Find optimal scale parameters from representative data
|
||
- Memory measurement: Verify 4× reduction (32 bits → 8 bits)
|
||
|
||
**What You're NOT Building:**
|
||
- Actual INT8 hardware operations (requires CPU VNNI, ARM NEON, GPU Tensor Cores)
|
||
- Kernel fusion (eliminating quantize/dequantize overhead)
|
||
- Mixed-precision execution graphs (FP32 for sensitive ops, INT8 for matmul)
|
||
- Production deployment pipelines (TensorFlow Lite converter, ONNX Runtime optimization)
|
||
|
||
**Why This Matters:** Understanding quantization math is essential. But knowing that production speedups require hardware acceleration + compiler optimization prevents unrealistic expectations. Your 4× memory reduction is real; your lack of speedup teaches why TensorFlow Lite needs custom kernels.
|
||
|
||
### Core Quantization Mathematics
|
||
|
||
**Symmetric Quantization (Zero-Point = 0)**
|
||
|
||
Assumes data is centered around zero (common after BatchNorm):
|
||
|
||
```python
|
||
# Quantization: FP32 → INT8
|
||
scale = max(abs(tensor)) / 127.0 # Scale factor
|
||
quantized = round(tensor / scale).clip(-128, 127).astype(int8)
|
||
|
||
# Dequantization: INT8 → FP32
|
||
dequantized = quantized.astype(float32) * scale
|
||
```
|
||
|
||
- **Range**: INT8 is [-128, 127] (256 values)
|
||
- **Scale**: Maps largest FP32 value to 127
|
||
- **Zero-point**: Always 0 (symmetric around origin)
|
||
- **Use case**: Weights after normalization, activations after BatchNorm
|
||
|
||
**Asymmetric Quantization (With Zero-Point)**
|
||
|
||
Handles arbitrary data ranges (e.g., activations after ReLU: [0, max]):
|
||
|
||
```python
|
||
# Quantization: FP32 → INT8
|
||
min_val, max_val = tensor.min(), tensor.max()
|
||
scale = (max_val - min_val) / 255.0
|
||
zero_point = round(-min_val / scale)
|
||
quantized = round(tensor / scale + zero_point).clip(-128, 127).astype(int8)
|
||
|
||
# Dequantization: INT8 → FP32
|
||
dequantized = (quantized.astype(float32) - zero_point) * scale
|
||
```
|
||
|
||
- **Range**: Uses full [-128, 127] even if data is [0, 5]
|
||
- **Scale**: Maps data range to INT8 range
|
||
- **Zero-point**: Offset ensuring FP32 zero maps to specific INT8 value
|
||
- **Use case**: ReLU activations, input images, any non-centered data
|
||
|
||
**Trade-off:** Symmetric is simpler (no zero-point storage/computation), asymmetric uses range more efficiently (better for skewed distributions).
|
||
|
||
### Calibration - The Critical Step
|
||
|
||
Quantization quality depends entirely on scale/zero-point selection. Poor choices destroy accuracy.
|
||
|
||
**Naive Approach (Don't Do This):**
|
||
```python
|
||
# Use global min/max from training data
|
||
scale = (tensor_max - tensor_min) / 255
|
||
# Problem: Single outlier wastes most INT8 range
|
||
# Example: data in [0, 5] but one outlier at 100 → scale = 100/255
|
||
# Result: 95% of data maps to only 13 INT8 values (5/100 * 255 = 13)
|
||
```
|
||
|
||
**Calibration Approach (Correct):**
|
||
```python
|
||
# Use percentile-based clipping
|
||
max_val = np.percentile(np.abs(calibration_data), 99.9)
|
||
scale = max_val / 127
|
||
# Clips 0.1% outliers, uses INT8 range efficiently
|
||
# 99.9th percentile ignores rare outliers, preserves typical range
|
||
```
|
||
|
||
**Calibration Process:**
|
||
1. Collect 100-1000 samples of representative data (validation set)
|
||
2. For each layer, record activation statistics during forward passes
|
||
3. Compute percentile-based min/max (typically 99.9th percentile)
|
||
4. Calculate scale/zero-point from clipped statistics
|
||
5. Quantize weights/activations using calibrated parameters
|
||
|
||
**Why It Works:** Most activations follow normal-ish distributions. Outliers are rare but dominate min/max. Clipping 0.1% of outliers uses INT8 range 10-100× more efficiently with negligible accuracy loss.
|
||
|
||
### Per-Tensor vs Per-Channel Quantization
|
||
|
||
**Per-Tensor Quantization:**
|
||
- One scale/zero-point for entire weight tensor
|
||
- Simple: store 2 parameters per layer
|
||
- Example: Conv2D with 64×3×3×3 weights uses 1 scale, 1 zero-point
|
||
|
||
**Per-Channel Quantization:**
|
||
- Separate scale/zero-point per output channel
|
||
- Better accuracy: each channel uses its natural range
|
||
- Example: Conv2D with 64 output channels uses 64 scales, 64 zero-points
|
||
- Overhead: 128 extra parameters (64 scales + 64 zero-points)
|
||
|
||
**When to Use Per-Channel:**
|
||
- Weight magnitudes vary significantly across channels (common in Conv layers)
|
||
- Accuracy improvement (0.5-1.5%) justifies 0.1-0.5% memory overhead
|
||
- Production frameworks (PyTorch, TensorFlow Lite) default to per-channel for Conv/Linear
|
||
|
||
**Trade-off Table:**
|
||
|
||
| Quantization Scheme | Parameters | Accuracy | Complexity | Use Case |
|
||
|---------------------|------------|----------|------------|----------|
|
||
| Per-Tensor | 2 per layer | Baseline | Simple | Fast prototyping, small models |
|
||
| Per-Channel (Conv) | 2N (N=channels) | +0.5-1.5% | Medium | Production Conv layers |
|
||
| Per-Channel (Linear) | 2N (N=out_features) | +0.3-0.8% | Medium | Production Linear layers |
|
||
| Mixed (Conv per-channel, Linear per-tensor) | Hybrid | +0.4-1.2% | Medium | Balanced approach |
|
||
|
||
### QuantizedLinear - Quantized Neural Network Layer
|
||
|
||
Replaces regular Linear layer with quantized equivalent:
|
||
|
||
```python
|
||
class QuantizedLinear:
|
||
def __init__(self, linear_layer: Linear):
|
||
# Quantize weights at initialization
|
||
self.weights_int8, self.weight_scale, self.weight_zp = quantize_int8(linear_layer.weight)
|
||
self.bias_int8, self.bias_scale, self.bias_zp = quantize_int8(linear_layer.bias)
|
||
|
||
# Store original FP32 for accuracy comparison
|
||
self.original_weight = linear_layer.weight
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
# EDUCATIONAL VERSION: Dequantize → compute in FP32 → quantize result
|
||
# (Simulates quantization math but doesn't speed up computation)
|
||
weight_fp32 = dequantize_int8(self.weights_int8, self.weight_scale, self.weight_zp)
|
||
bias_fp32 = dequantize_int8(self.bias_int8, self.bias_scale, self.bias_zp)
|
||
|
||
# Compute in FP32 (not actually faster - just lower precision storage)
|
||
output = x @ weight_fp32.T + bias_fp32
|
||
return output
|
||
```
|
||
|
||
**What Happens in Production (TensorFlow Lite, PyTorch Mobile):**
|
||
|
||
```python
|
||
# Production quantized matmul (conceptual - happens in C++/assembly)
|
||
def quantized_matmul_production(x_int8, weight_int8, x_scale, weight_scale, output_scale):
|
||
# 1. INT8 x INT8 matmul using VNNI/NEON/Tensor Cores (FAST)
|
||
accum_int32 = matmul_int8_hardware(x_int8, weight_int8) # Specialized instruction
|
||
|
||
# 2. Requantize accumulated INT32 → INT8 output
|
||
combined_scale = (x_scale * weight_scale) / output_scale
|
||
output_int8 = (accum_int32 * combined_scale).clip(-128, 127)
|
||
|
||
# 3. Stay in INT8 for next layer (no dequantization unless necessary)
|
||
return output_int8
|
||
```
|
||
|
||
**Key Differences:**
|
||
- **Your implementation**: Dequantize → FP32 compute → quantize (educational, slow)
|
||
- **Production**: INT8 → INT8 throughout, specialized hardware (4-10× speedup)
|
||
|
||
**Memory Savings (Real):** 4× reduction from storing INT8 instead of FP32
|
||
**Speed Improvement (Your Code):** ~0× (Python overhead dominates)
|
||
**Speed Improvement (Production):** 2-10× (hardware acceleration, kernel fusion)
|
||
|
||
### Model-Level Quantization
|
||
|
||
```python
|
||
def quantize_model(model, calibration_data=None):
|
||
"""
|
||
Quantize all Linear layers in model.
|
||
|
||
Args:
|
||
model: Neural network with Linear layers
|
||
calibration_data: Representative samples for activation calibration
|
||
|
||
Returns:
|
||
quantized_model: Model with QuantizedLinear layers
|
||
calibration_stats: Scale/zero-point parameters per layer
|
||
"""
|
||
quantized_layers = []
|
||
for layer in model.layers:
|
||
if isinstance(layer, Linear):
|
||
q_layer = QuantizedLinear(layer)
|
||
if calibration_data:
|
||
q_layer.calibrate(calibration_data) # Find optimal scales
|
||
quantized_layers.append(q_layer)
|
||
else:
|
||
quantized_layers.append(layer) # Keep ReLU, Softmax in FP32
|
||
|
||
return quantized_layers
|
||
```
|
||
|
||
**Calibration in Practice:**
|
||
1. Run 100-1000 samples through original FP32 model
|
||
2. Record min/max activations for each layer
|
||
3. Compute percentile-clipped scales
|
||
4. Quantize weights with calibrated parameters
|
||
5. Test accuracy on validation set
|
||
|
||
## Getting Started
|
||
|
||
### Prerequisites
|
||
|
||
Ensure you've completed profiling fundamentals:
|
||
|
||
```bash
|
||
# Activate TinyTorch environment
|
||
source bin/activate-tinytorch.sh
|
||
|
||
# Verify prerequisite modules
|
||
tito test --module profiling
|
||
```
|
||
|
||
**Required Understanding:**
|
||
- Memory profiling (Module 14): Measuring memory consumption
|
||
- Tensor operations (Module 01): Understanding FP32 representation
|
||
- Linear layers (Module 03): Matrix multiplication mechanics
|
||
|
||
### Development Workflow
|
||
|
||
1. **Open the development file**: `modules/15_quantization/quantization_dev.py`
|
||
2. **Implement quantize_int8()**: FP32 → INT8 conversion with scale/zero-point calculation
|
||
3. **Implement dequantize_int8()**: INT8 → FP32 restoration
|
||
4. **Build QuantizedLinear**: Replace Linear layers with quantized versions
|
||
5. **Add calibration logic**: Percentile-based scale selection
|
||
6. **Implement quantize_model()**: Convert entire networks to quantized form
|
||
7. **Export and verify**: `tito module complete 15 && tito test --module quantization`
|
||
|
||
## Testing
|
||
|
||
### Comprehensive Test Suite
|
||
|
||
Run the full test suite to verify quantization functionality:
|
||
|
||
```bash
|
||
# TinyTorch CLI (recommended)
|
||
tito test --module quantization
|
||
|
||
# Direct pytest execution
|
||
python -m pytest tests/ -k quantization -v
|
||
```
|
||
|
||
### Test Coverage Areas
|
||
|
||
- ✅ **Quantization Correctness**: FP32 → INT8 → FP32 roundtrip error bounds (< 0.5% mean error)
|
||
- ✅ **Memory Reduction**: Verify 4× reduction in model size (weights + biases)
|
||
- ✅ **Symmetric vs Asymmetric**: Both schemes produce valid INT8 in [-128, 127]
|
||
- ✅ **Calibration Impact**: Percentile clipping reduces quantization error vs naive min/max
|
||
- ✅ **QuantizedLinear Equivalence**: Output matches FP32 Linear within tolerance (< 1% difference)
|
||
- ✅ **Model-Level Quantization**: Full network quantization preserves accuracy (< 2% degradation)
|
||
|
||
### Inline Testing & Quantization Analysis
|
||
|
||
The module includes comprehensive validation with real-time feedback:
|
||
|
||
```python
|
||
# Example inline test output
|
||
🔬 Unit Test: quantize_int8()...
|
||
✅ Symmetric quantization: range [-128, 127] ✓
|
||
✅ Scale calculation: max_val / 127 = 0.0234 ✓
|
||
✅ Roundtrip error: 0.31% mean error ✓
|
||
📈 Progress: quantize_int8() ✓
|
||
|
||
🔬 Unit Test: QuantizedLinear...
|
||
✅ Memory reduction: 145KB → 36KB (4.0×) ✓
|
||
✅ Output equivalence: 0.43% max difference vs FP32 ✓
|
||
📈 Progress: QuantizedLinear ✓
|
||
```
|
||
|
||
### Manual Testing Examples
|
||
|
||
```python
|
||
from quantization_dev import quantize_int8, dequantize_int8, QuantizedLinear
|
||
from tinytorch.nn import Linear
|
||
|
||
# Test quantization on random tensor
|
||
tensor = Tensor(np.random.randn(100, 100).astype(np.float32))
|
||
q_tensor, scale, zero_point = quantize_int8(tensor)
|
||
|
||
print(f"Original range: [{tensor.data.min():.2f}, {tensor.data.max():.2f}]")
|
||
print(f"Quantized range: [{q_tensor.data.min()}, {q_tensor.data.max()}]")
|
||
print(f"Scale: {scale:.6f}, Zero-point: {zero_point}")
|
||
|
||
# Dequantize and measure error
|
||
restored = dequantize_int8(q_tensor, scale, zero_point)
|
||
error = np.abs(tensor.data - restored.data).mean()
|
||
print(f"Roundtrip error: {error:.4f} ({error/np.abs(tensor.data).mean()*100:.2f}%)")
|
||
|
||
# Quantize a Linear layer
|
||
linear = Linear(128, 64)
|
||
q_linear = QuantizedLinear(linear)
|
||
|
||
print(f"\nOriginal weights: {linear.weight.data.nbytes} bytes")
|
||
print(f"Quantized weights: {q_linear.weights_int8.data.nbytes} bytes")
|
||
print(f"Reduction: {linear.weight.data.nbytes / q_linear.weights_int8.data.nbytes:.1f}×")
|
||
```
|
||
|
||
## Systems Thinking Questions
|
||
|
||
### Real-World Applications
|
||
|
||
- **Mobile ML Deployment**: TensorFlow Lite converts all models to INT8 for Android/iOS. Without quantization, models exceed app size limits (100-200MB) and drain battery 4× faster. Google Photos, Translate, Keyboard all run quantized models on-device.
|
||
|
||
- **Edge AI Devices**: Google Edge TPU (Coral), NVIDIA Jetson, Intel Neural Compute Stick require INT8 models. Hardware is designed exclusively for quantized operations - FP32 isn't supported or is 10× slower.
|
||
|
||
- **Cloud Inference Optimization**: AWS Inferentia, Azure Inferentia, Google Cloud TPU serve quantized models. INT8 reduces memory bandwidth (bottleneck for inference) and increases throughput by 2-4×. At scale (millions of requests/day), this saves millions in infrastructure costs.
|
||
|
||
- **Large Language Models**: LLaMA-65B is 130GB in FP16, doesn't fit on single 80GB A100 GPU. INT8 quantization → 65GB, enables serving. GPTQ pushes to 4-bit (33GB) with < 1% perplexity increase. Quantization is how enthusiasts run 70B models on consumer GPUs.
|
||
|
||
### Quantization Mathematics
|
||
|
||
- **Why INT8 vs INT4 or INT16?** INT8 is the sweet spot: 4× memory reduction with < 1% accuracy loss. INT4 gives 8× reduction but 2-5% accuracy loss (harder to deploy). INT16 only 2× reduction (not worth complexity). Hardware acceleration (VNNI, NEON, Tensor Cores) standardized on INT8.
|
||
|
||
- **Symmetric vs Asymmetric Trade-offs**: Symmetric is simpler (no zero-point) but wastes range for skewed data. ReLU activations are [0, max] - symmetric centers around 0, wasting negative range. Asymmetric uses full INT8 range but costs extra zero-point storage and computation.
|
||
|
||
- **Calibration Data Requirements**: Theory: more data → better statistics. Practice: diminishing returns after 500-1000 samples. Percentile estimates stabilize quickly. Critical requirement: calibration data MUST match deployment distribution. If calibration is ImageNet but deployment is medical images, quantization fails catastrophically.
|
||
|
||
- **Per-Channel Justification**: Conv2D with 64 output channels: per-channel stores 64 scales + 64 zero-points = 512 bytes. Total weights: 3×3×64×64 FP32 = 147KB. Overhead: 0.35%. Accuracy improvement: 0.5-1.5%. Clear win - explains why production frameworks default to per-channel.
|
||
|
||
### Production Deployment Characteristics
|
||
|
||
- **Speed Reality Check**: INT8 matmul is theoretically 4× faster (4× less memory bandwidth). Practice: 2-3× on CPU (quantize/dequantize overhead), 4-10× on specialized hardware (Edge TPU, Neural Engine designed for pure INT8 graphs). Your Python implementation is 0× faster (simulation overhead > bandwidth savings).
|
||
|
||
- **When Quantization is Mandatory**: Mobile deployment (app size limits, battery constraints, Neural Engine acceleration), Edge devices (limited memory/compute), Cloud serving at scale (cost optimization). Not negotiable - models either quantize or don't ship.
|
||
|
||
- **When to Avoid Quantization**: Accuracy-critical applications where 1% matters (medical diagnosis, autonomous vehicles), Early research iteration (quantization adds complexity), Models already tiny (< 10MB - quantization overhead not worth it), Cloud serving with abundant resources (FP32 throughput sufficient).
|
||
|
||
- **Quantization-Aware Training vs Post-Training**: PTQ (Post-Training Quantization) is fast (minutes) but loses 1-2% accuracy. QAT (Quantization-Aware Training) requires retraining (days/weeks) but loses < 0.5%. Choose PTQ for rapid iteration, QAT for production deployment. If using pretrained models you don't own (BERT, ResNet), PTQ is only option.
|
||
|
||
## Ready to Build?
|
||
|
||
You're about to implement the precision reduction mathematics that make mobile ML deployment possible. Quantization is the difference between a model that exists in research and a model that ships in apps used by billions.
|
||
|
||
This module teaches honest quantization: you'll implement the math correctly, achieve 4× memory reduction, and understand precisely why your Python code isn't faster (hardware acceleration requires specialized silicon + compiled kernels). This clarity prepares you for production deployment where TensorFlow Lite, PyTorch Mobile, and ONNX Runtime apply your quantization mathematics with real INT8 hardware operations.
|
||
|
||
Understanding quantization from first principles - implementing the scale/zero-point calculations yourself, calibrating with real data, measuring accuracy-efficiency trade-offs - gives you deep insight into the constraints that define production ML systems.
|
||
|
||
Choose your preferred way to engage with this module:
|
||
|
||
````{grid} 1 2 3 3
|
||
|
||
```{grid-item-card} Launch Binder
|
||
:link: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/15_quantization/quantization_dev.ipynb
|
||
:class-header: bg-light
|
||
|
||
Run this module interactively in your browser. No installation required.
|
||
```
|
||
|
||
```{grid-item-card} Open in Colab
|
||
:link: https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/15_quantization/quantization_dev.ipynb
|
||
:class-header: bg-light
|
||
|
||
Use Google Colab for GPU access and cloud compute power.
|
||
```
|
||
|
||
```{grid-item-card} View Source
|
||
:link: https://github.com/mlsysbook/TinyTorch/blob/main/modules/15_quantization/quantization_dev.py
|
||
:class-header: bg-light
|
||
|
||
Browse the Python source code and understand the implementation.
|
||
```
|
||
|
||
````
|
||
|
||
```{admonition} Save Your Progress
|
||
:class: tip
|
||
Binder sessions are temporary. Download your completed notebook when done, or switch to local development for persistent work.
|
||
```
|
||
|
||
---
|
||
|
||
<div class="prev-next-area">
|
||
<a class="left-prev" href="../modules/14_profiling/ABOUT.html" title="previous page">← Module 14: Profiling</a>
|
||
<a class="right-next" href="../modules/16_compression/ABOUT.html" title="next page">Module 16: Compression →</a>
|
||
</div>
|