mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-27 02:28:12 -05:00
Replace 9 old tutorials with 12 new numbered tutorials (00-11) covering roofline through full-stack audit. Redesign landing page, add models-and-solvers and extending-the-engine guides. Add __main__.py, cli.py, and cli/ package for command-line interface.
55 lines
2.5 KiB
Plaintext
55 lines
2.5 KiB
Plaintext
# core.solver.CompressionModel { #mlsysim.core.solver.CompressionModel }
|
|
|
|
```python
|
|
core.solver.CompressionModel()
|
|
```
|
|
|
|
Analyzes model compression trade-offs (Accuracy vs. Efficiency).
|
|
|
|
This solver models the 'Compression Tax' — the accuracy degradation
|
|
that occurs when reducing model size via quantization or pruning,
|
|
balanced against the gains in memory footprint and inference latency.
|
|
|
|
Literature Source:
|
|
1. Han et al. (2015), "Deep Compression: Compressing Deep Neural Networks
|
|
with Pruning, Trained Quantization and Huffman Coding."
|
|
2. Gholami et al. (2021), "A Survey of Quantization Methods for
|
|
Efficient Neural Network Inference."
|
|
3. Blalock et al. (2020), "What is the State of Neural Network Pruning?"
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [solve](#mlsysim.core.solver.CompressionModel.solve) | Solves for compression gains and estimated accuracy impact. |
|
|
|
|
### solve { #mlsysim.core.solver.CompressionModel.solve }
|
|
|
|
```python
|
|
core.solver.CompressionModel.solve(
|
|
model,
|
|
hardware,
|
|
method='quantization',
|
|
target_bitwidth=8,
|
|
sparsity=0.0,
|
|
)
|
|
```
|
|
|
|
Solves for compression gains and estimated accuracy impact.
|
|
|
|
#### Parameters {.doc-section .doc-section-parameters}
|
|
|
|
| Name | Type | Description | Default |
|
|
|-----------------|--------------|---------------------------------------------------------------------|------------------|
|
|
| model | Workload | The model to be compressed. | _required_ |
|
|
| hardware | HardwareNode | The target execution hardware. | _required_ |
|
|
| method | str | The compression method ('quantization', 'pruning', 'distillation'). | `'quantization'` |
|
|
| target_bitwidth | int | Target numerical precision in bits (e.g., 8 for INT8, 4 for INT4). | `8` |
|
|
| sparsity | float | Target sparsity ratio (0.0 to 1.0) for pruning. | `0.0` |
|
|
|
|
#### Returns {.doc-section .doc-section-returns}
|
|
|
|
| Name | Type | Description |
|
|
|--------|------------------|-----------------------------------------------------------------------------------------------|
|
|
| | Dict\[str, Any\] | Compression metrics including memory savings, latency speedup, and estimated accuracy delta. |
|