mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 09:57:21 -05:00
The solver.py refactoring renamed most solver classes from *Solver to *Model (e.g. DistributedSolver → DistributedModel). The docs still referenced the old names, causing the Quarto site build to fail with: ImportError: cannot import name 'DistributedSolver' from 'mlsysim' - Fix executable code cells in tutorials/distributed.qmd - Update non-executable code examples across 10 doc files - Rename 19 API reference files from *Solver.qmd to *Model.qmd - SensitivitySolver and SynthesisSolver retain their names (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
55 lines
2.5 KiB
Plaintext
55 lines
2.5 KiB
Plaintext
# core.solver.CompressionModel { #mlsysim.core.solver.CompressionModel }
|
|
|
|
```python
|
|
core.solver.CompressionModel()
|
|
```
|
|
|
|
Analyzes model compression trade-offs (Accuracy vs. Efficiency).
|
|
|
|
This solver models the 'Compression Tax' — the accuracy degradation
|
|
that occurs when reducing model size via quantization or pruning,
|
|
balanced against the gains in memory footprint and inference latency.
|
|
|
|
Literature Source:
|
|
1. Han et al. (2015), "Deep Compression: Compressing Deep Neural Networks
|
|
with Pruning, Trained Quantization and Huffman Coding."
|
|
2. Gholami et al. (2021), "A Survey of Quantization Methods for
|
|
Efficient Neural Network Inference."
|
|
3. Blalock et al. (2020), "What is the State of Neural Network Pruning?"
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [solve](#mlsysim.core.solver.CompressionModel.solve) | Solves for compression gains and estimated accuracy impact. |
|
|
|
|
### solve { #mlsysim.core.solver.CompressionModel.solve }
|
|
|
|
```python
|
|
core.solver.CompressionModel.solve(
|
|
model,
|
|
hardware,
|
|
method='quantization',
|
|
target_bitwidth=8,
|
|
sparsity=0.0,
|
|
)
|
|
```
|
|
|
|
Solves for compression gains and estimated accuracy impact.
|
|
|
|
#### Parameters {.doc-section .doc-section-parameters}
|
|
|
|
| Name | Type | Description | Default |
|
|
|-----------------|--------------|---------------------------------------------------------------------|------------------|
|
|
| model | Workload | The model to be compressed. | _required_ |
|
|
| hardware | HardwareNode | The target execution hardware. | _required_ |
|
|
| method | str | The compression method ('quantization', 'pruning', 'distillation'). | `'quantization'` |
|
|
| target_bitwidth | int | Target numerical precision in bits (e.g., 8 for INT8, 4 for INT4). | `8` |
|
|
| sparsity | float | Target sparsity ratio (0.0 to 1.0) for pruning. | `0.0` |
|
|
|
|
#### Returns {.doc-section .doc-section-returns}
|
|
|
|
| Name | Type | Description |
|
|
|--------|------------------|-----------------------------------------------------------------------------------------------|
|
|
| | Dict\[str, Any\] | Compression metrics including memory savings, latency speedup, and estimated accuracy delta. |
|