Files
cs249r_book/mlsysim/docs/api/core.solver.CompressionModel.qmd
Vijay Janapa Reddi 611de228d9 fix(mlsysim): align docs with *Model naming convention
The solver.py refactoring renamed most solver classes from *Solver to
*Model (e.g. DistributedSolver → DistributedModel). The docs still
referenced the old names, causing the Quarto site build to fail with:
  ImportError: cannot import name 'DistributedSolver' from 'mlsysim'

- Fix executable code cells in tutorials/distributed.qmd
- Update non-executable code examples across 10 doc files
- Rename 19 API reference files from *Solver.qmd to *Model.qmd
- SensitivitySolver and SynthesisSolver retain their names (correct)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:39:11 -04:00

55 lines
2.5 KiB
Plaintext

# core.solver.CompressionModel { #mlsysim.core.solver.CompressionModel }
```python
core.solver.CompressionModel()
```
Analyzes model compression trade-offs (Accuracy vs. Efficiency).
This solver models the 'Compression Tax' — the accuracy degradation
that occurs when reducing model size via quantization or pruning,
balanced against the gains in memory footprint and inference latency.
Literature Source:
1. Han et al. (2015), "Deep Compression: Compressing Deep Neural Networks
with Pruning, Trained Quantization and Huffman Coding."
2. Gholami et al. (2021), "A Survey of Quantization Methods for
Efficient Neural Network Inference."
3. Blalock et al. (2020), "What is the State of Neural Network Pruning?"
## Methods
| Name | Description |
| --- | --- |
| [solve](#mlsysim.core.solver.CompressionModel.solve) | Solves for compression gains and estimated accuracy impact. |
### solve { #mlsysim.core.solver.CompressionModel.solve }
```python
core.solver.CompressionModel.solve(
model,
hardware,
method='quantization',
target_bitwidth=8,
sparsity=0.0,
)
```
Solves for compression gains and estimated accuracy impact.
#### Parameters {.doc-section .doc-section-parameters}
| Name | Type | Description | Default |
|-----------------|--------------|---------------------------------------------------------------------|------------------|
| model | Workload | The model to be compressed. | _required_ |
| hardware | HardwareNode | The target execution hardware. | _required_ |
| method | str | The compression method ('quantization', 'pruning', 'distillation'). | `'quantization'` |
| target_bitwidth | int | Target numerical precision in bits (e.g., 8 for INT8, 4 for INT4). | `8` |
| sparsity | float | Target sparsity ratio (0.0 to 1.0) for pruning. | `0.0` |
#### Returns {.doc-section .doc-section-returns}
| Name | Type | Description |
|--------|------------------|-----------------------------------------------------------------------------------------------|
| | Dict\[str, Any\] | Compression metrics including memory savings, latency speedup, and estimated accuracy delta. |