refactor: introduce System Archetypes in mlsys/systems.py and integrate into Introduction and Serving chapters; verify math integrity and rationale for LEGO blocks

This commit is contained in:
Vijay Janapa Reddi
2026-02-24 19:12:51 -05:00
parent a0ce7cc746
commit e881d92625
7 changed files with 1088 additions and 12 deletions

View File

@@ -2524,7 +2524,98 @@ This book makes a stronger claim: ML systems engineering is not merely a collect
::: {.callout-chapter-connection title="From Vision to Architecture"}
Where should an ML model actually run? The answer is not "wherever is most convenient." Physical laws dictate what is possible. The speed of light makes distant cloud servers useless for emergency braking. Thermodynamics prevents datacenter-class models from running in your pocket. Memory physics creates bandwidth ceilings that faster chips cannot overcome. @sec-ml-systems introduces the four deployment paradigms (Cloud, Edge, Mobile, and TinyML) that span nine orders of magnitude in power and memory, explaining why each exists and how to choose among them.
# ┌─────────────────────────────────────────────────────────────────────────────
# │ SYSTEM ARCHETYPES (THE HIERARCHY)
# ├─────────────────────────────────────────────────────────────────────────────
# │ Context: Section "Where should an ML model run?"
# │
# │ Goal: Quantify the 9-order-of-magnitude span of the ML Systems landscape.
# │ Show: RAM, Compute, and Power constraints for the 4 deployment paradigms.
# │ How: Reference the Systems Archetypes (Cloud, Edge, Mobile, Tiny).
# │
# │ Imports: mlsys.Systems, mlsys.formatting
# │ Exports: cloud_*, edge_*, mobile_*, tiny_* formatted strings
# └─────────────────────────────────────────────────────────────────────────────
from mlsys import Systems, Archetypes
from mlsys.constants import GB, MB, KiB, watt, milliwatt, TFLOPs, second, flop
from mlsys.formatting import fmt, check
# ┌── LEGO ───────────────────────────────────────────────
class DeploymentSystems:
"""
Namespace for the 4 Deployment Archetypes.
"""
# ┌── 1. LOAD (Archetypes) ───────────────────────────────────────────────
s_cloud = Systems.Cloud # H100
s_edge = Systems.Edge # Jetson
s_mobile = Systems.Mobile # Smartphone
s_tiny = Systems.Tiny # ESP32
# ┌── 2. EXECUTE (The Compute) ─────────────────────────────────────────
# We compare the scaling factors (Cloud vs Tiny)
mem_scaling = (s_cloud.ram / s_tiny.ram).to('count').magnitude
compute_scaling = (s_cloud.peak_flops / s_tiny.peak_flops).to('count').magnitude
power_scaling = (s_cloud.power_budget / s_tiny.power_budget).to('count').magnitude
# ┌── 3. GUARD (Invariants) ───────────────────────────────────────────
check(mem_scaling > 1e5, "Cloud memory should be >100,000x TinyML memory.")
check(compute_scaling > 1e6, "Cloud compute should be >1,000,000x TinyML compute.")
# ┌── 4. OUTPUT (Formatting) ──────────────────────────────────────────────
cloud_mem_str = fmt(s_cloud.ram.m_as(GB), precision=0)
cloud_compute_str = fmt(s_cloud.peak_flops.m_as(TFLOPs/second), precision=0)
cloud_power_str = fmt(s_cloud.power_budget.m_as(watt), precision=0)
edge_mem_str = fmt(s_edge.ram.m_as(GB), precision=0)
edge_compute_str = fmt(s_edge.peak_flops.m_as(TFLOPs/second), precision=1)
edge_power_str = fmt(s_edge.power_budget.m_as(watt), precision=0)
mobile_mem_str = fmt(s_mobile.ram.m_as(GB), precision=0)
mobile_compute_str = fmt(s_mobile.peak_flops.m_as(TFLOPs/second), precision=1)
mobile_power_str = fmt(s_mobile.power_budget.m_as(watt), precision=0)
tiny_mem_str = fmt(s_tiny.ram.m_as(KiB), precision=0)
tiny_compute_str = fmt(s_tiny.peak_flops.m_as(TFLOPs/second), precision=4)
tiny_power_str = fmt(s_tiny.power_budget.m_as(milliwatt), precision=0)
mem_span_str = f"{mem_scaling:.0e}".replace("e+0", "10^").replace("e+", "10^")
compute_span_str = f"{compute_scaling:.0e}".replace("e+0", "10^").replace("e+", "10^")
# ┌── EXPORTS (Bridge to Text) ─────────────────────────────────────────────────
cloud_mem_str = DeploymentSystems.cloud_mem_str
cloud_compute_str = DeploymentSystems.cloud_compute_str
cloud_power_str = DeploymentSystems.cloud_power_str
edge_mem_str = DeploymentSystems.edge_mem_str
edge_compute_str = DeploymentSystems.edge_compute_str
edge_power_str = DeploymentSystems.edge_power_str
mobile_mem_str = DeploymentSystems.mobile_mem_str
mobile_compute_str = DeploymentSystems.mobile_compute_str
mobile_power_str = DeploymentSystems.mobile_power_str
tiny_mem_str = DeploymentSystems.tiny_mem_str
tiny_compute_str = DeploymentSystems.tiny_compute_str
tiny_power_str = DeploymentSystems.tiny_power_str
mem_span_str = DeploymentSystems.mem_span_str
compute_span_str = DeploymentSystems.compute_span_str
```
::: {.callout-perspective #perspective-deployment-archetypes title="The ML Systems Landscape: Four Archetypes"}
The machine learning systems landscape spans nine orders of magnitude in computational power and memory capacity. We categorize this continuum into four **System Archetypes** that define the constraints for every subsequent chapter:
| **Archetype** | **Example System** | **RAM / Memory** | **Peak Compute** | **Power Budget** |
|:---|:---|:---|:---|:---|
| **Cloud** | H100 Cluster | `{python} cloud_mem_str` GB | `{python} cloud_compute_str` TFLOPS | `{python} cloud_power_str` W |
| **Edge** | Jetson Robotics | `{python} edge_mem_str` GB | `{python} edge_compute_str` TFLOPS | `{python} edge_power_str` W |
| **Mobile** | Smartphone | `{python} mobile_mem_str` GB | `{python} mobile_compute_str` TFLOPS | `{python} mobile_power_str` W |
| **TinyML** | ESP32-S3 | `{python} tiny_mem_str` KiB | `{python} tiny_compute_str` TFLOPS | `{python} tiny_power_str` mW |
**The Scaling Gap**: The gap between the Cloud and TinyML archetypes is roughly `{python} mem_span_str` in memory and `{python} compute_span_str` in compute power. This divergence is precisely why we cannot simply "shrink" a cloud model to run at the edge; each tier requires a fundamental redesign of the D·A·M axes.
:::
Where should an ML model actually run? The answer is not "wherever is most convenient." Physical laws dictate what is possible.
The speed of light makes distant cloud servers useless for emergency braking. Thermodynamics prevents datacenter-class models from running in your pocket. Memory physics creates bandwidth ceilings that faster chips cannot overcome. @sec-ml-systems introduces the four deployment paradigms (Cloud, Edge, Mobile, and TinyML) that span nine orders of magnitude in power and memory, explaining why each exists and how to choose among them.
Welcome to AI Engineering.

View File

@@ -592,7 +592,7 @@ To make these architectural differences concrete, consider *how* a single model
# │ Imports: mlsys, mlsys.constants, mlsys.formatting
# │ Exports: cloud_*, mobile_*, tiny_* formatted strings
# └─────────────────────────────────────────────────────────────────────────────
from mlsys import Models, Tiers
from mlsys import Models, Systems, Archetypes
from mlsys.constants import BYTES_FP16, BYTES_INT8
from mlsys.formatting import fmt, check
@@ -607,24 +607,24 @@ class ResNetServingSpectrum:
m_resnet = Models.ResNet50
m_mobilenet = Models.MobileNetV2
t_cloud = Tiers.Cloud
t_mobile = Tiers.Mobile
t_tiny = Tiers.Tiny
s_cloud = Archetypes.Cloud_V100
s_mobile = Systems.Mobile
s_tiny = Archetypes.TinyML_M7
# Cloud (V100)
# Cloud (V100) Performance - Source: MLPerf/Vendor reports
cloud_inf_b1_ms = 1.4
cloud_inf_b16_ms = 14.0
cloud_throughput = 1143
cloud_vram_gb = 2
# Mobile (Pixel 6)
# Mobile (Smartphone) Performance
mobile_inf_npu_ms = 12.0
mobile_inf_cpu_ms = 45.0
mobile_throughput = 80
mobile_energy_npu_mj = 0.8
mobile_energy_cpu_mj = 4.2
# TinyML (Cortex-M7)
# TinyML (Cortex-M7) Performance
tiny_inf_ms = 120.0
tiny_energy_mj = 12.0
@@ -635,7 +635,8 @@ class ResNetServingSpectrum:
tiny_original_mb = m_resnet.size_in_bytes(BYTES_INT8).m_as('MB')
tiny_alt_mb = m_mobilenet.size_in_bytes(BYTES_INT8).m_as('MB')
tiny_limit_mb = t_tiny.storage.m_as('MB')
# TinyML feasibility check
tiny_limit_mb = s_tiny.ram.m_as('MB')
tiny_feasibility = tiny_original_mb < tiny_limit_mb
# ┌── 3. GUARD (Invariants) ───────────────────────────────────────────
@@ -645,6 +646,11 @@ class ResNetServingSpectrum:
"NPU should be significantly more energy efficient than CPU.")
# ┌── 4. OUTPUT (Formatting) ──────────────────────────────────────────────
# System names
cloud_name = s_cloud.name
mobile_name = s_mobile.name
tiny_name = s_tiny.name
cloud_model_mb_str = fmt(cloud_size_mb, precision=0)
cloud_inf_b1_ms_str = f"{cloud_inf_b1_ms}"
cloud_inf_b16_ms_str = f"{cloud_inf_b16_ms}"
@@ -659,6 +665,19 @@ class ResNetServingSpectrum:
mobile_energy_cpu_mj_str = f"{mobile_energy_cpu_mj}"
mobile_mem_mb_str = "150"
tiny_model_mb_str = fmt(tiny_original_mb, precision=0)
tiny_alt_mb_str = fmt(tiny_alt_mb, precision=1)
tiny_inf_ms_str = f"{tiny_inf_ms}"
tiny_throughput_str = "8"
tiny_arena_kb_str = "320"
tiny_sram_kb_str = fmt(s_tiny.ram.m_as('KiB'), precision=0)
tiny_energy_mj_str = f"{tiny_energy_mj}"
# ┌── EXPORTS (Bridge to Text) ─────────────────────────────────────────────────
cloud_name = ResNetServingSpectrum.cloud_name
mobile_name = ResNetServingSpectrum.mobile_name
tiny_name = ResNetServingSpectrum.tiny_name
tiny_model_mb_str = fmt(tiny_original_mb, precision=0)
tiny_alt_mb_str = fmt(tiny_alt_mb, precision=1)
tiny_inf_ms_str = f"{tiny_inf_ms}"
@@ -693,14 +712,14 @@ tiny_energy_mj_str = ResNetServingSpectrum.tiny_energy_mj_str
The same ResNet-50 architecture requires dramatically different serving strategies across deployment contexts:
**Cloud (V100 GPU):**
**`{python} cloud_name`:**
- Model format: TensorRT FP16 engine (`{python} cloud_model_mb_str`MB)
- Inference: `{python} cloud_inf_b1_ms_str`ms at batch-1, `{python} cloud_inf_b16_ms_str`ms at batch-16
- Throughput: `{python} cloud_throughput_str` images/second (batched)
- Memory: `{python} cloud_vram_gb_str`GB VRAM (model + activations for batch-32)
**Mobile (Pixel 6 NPU):**
**`{python} mobile_name`:**
- Model format: TensorFlow Lite INT8 (`{python} mobile_model_mb_str`MB)
- Inference: `{python} mobile_inf_npu_ms_str`ms at batch-1 (NPU), `{python} mobile_inf_cpu_ms_str`ms (CPU fallback)
@@ -708,7 +727,7 @@ The same ResNet-50 architecture requires dramatically different serving strategi
- Memory: `{python} mobile_mem_mb_str`MB peak (shared with app)
- Energy: `{python} mobile_energy_npu_mj_str`mJ per inference (NPU), `{python} mobile_energy_cpu_mj_str`mJ (CPU)
**TinyML (Cortex-M7):**
**`{python} tiny_name`:**
- Model format: Not feasible; ResNet-50 requires `{python} tiny_model_mb_str`MB weights
- Alternative: MobileNetV2-0.35 quantized to INT8 (`{python} tiny_alt_mb_str`MB)

View File

@@ -4,6 +4,7 @@
from .hardware import Hardware
from .models import Models
from .deployment import Tiers
from .systems import Systems, Archetypes
# Export constants and registry for legacy support
from .constants import ureg, Q_

View File

@@ -0,0 +1,103 @@
# systems.py
# System Archetypes for MLSys Textbook
# Ties Hardware, Tier, and Environment into a single "Environment Context".
from dataclasses import dataclass
from .hardware import HardwareSpec, Hardware
from .deployment import DeploymentTier, Tiers
from .constants import ureg, Q_
@dataclass(frozen=True)
class SystemArchetype:
name: str
hardware: HardwareSpec
tier: DeploymentTier
network_bw: Q_
power_budget: Q_
@property
def ram(self):
return self.hardware.memory_capacity
@property
def peak_flops(self):
return self.hardware.peak_flops
@property
def memory_bw(self):
return self.hardware.memory_bw
class Archetypes:
# --- CLOUD LAYER ---
Cloud_H100 = SystemArchetype(
name="Cloud (H100 Node)",
hardware=Hardware.H100,
tier=Tiers.Cloud,
network_bw=400 * ureg.Gbps, # NDR InfiniBand
power_budget=700 * ureg.watt
)
Cloud_A100 = SystemArchetype(
name="Cloud (A100 Node)",
hardware=Hardware.A100,
tier=Tiers.Cloud,
network_bw=200 * ureg.Gbps, # HDR InfiniBand
power_budget=400 * ureg.watt
)
Cloud_V100 = SystemArchetype(
name="Cloud (V100 Node)",
hardware=Hardware.V100,
tier=Tiers.Cloud,
network_bw=100 * ureg.Gbps, # EDR InfiniBand
power_budget=300 * ureg.watt
)
# --- EDGE LAYER ---
Edge_Server = SystemArchetype(
name="Edge Server",
hardware=Hardware.Edge.GenericServer,
tier=Tiers.Edge,
network_bw=10 * ureg.Gbps,
power_budget=300 * ureg.watt
)
Edge_Robotics = SystemArchetype(
name="Edge (Jetson Orin)",
hardware=Hardware.Edge.JetsonOrinNX,
tier=Tiers.Edge,
network_bw=1 * ureg.Gbps,
power_budget=25 * ureg.watt
)
# --- MOBILE LAYER ---
Mobile_Phone = SystemArchetype(
name="Mobile (Smartphone)",
hardware=Hardware.Edge.Generic_Phone,
tier=Tiers.Mobile,
network_bw=100 * ureg.Mbps,
power_budget=5 * ureg.watt
)
# --- TINYML LAYER ---
TinyML_MCU = SystemArchetype(
name="TinyML (ESP32)",
hardware=Hardware.Tiny.ESP32,
tier=Tiers.Tiny,
network_bw=1 * ureg.Mbps,
power_budget=0.5 * ureg.watt
)
TinyML_M7 = SystemArchetype(
name="TinyML (Cortex-M7)",
hardware=Hardware.Tiny.Generic_MCU,
tier=Tiers.Tiny,
network_bw=1 * ureg.Mbps,
power_budget=0.1 * ureg.watt
)
class Systems:
Cloud = Archetypes.Cloud_H100
Edge = Archetypes.Edge_Robotics
Mobile = Archetypes.Mobile_Phone
Tiny = Archetypes.TinyML_MCU

View File

@@ -39,6 +39,7 @@ scripts/
- `content/manage_section_ids.py` - Manage `@sec-` cross-reference IDs
### Validation
- `check_references_hallucinator.py` - Validate .bib entries against academic DBs ([hallucinator](https://github.com/gianlucasb/hallucinator)); requires `pip install hallucinator bibtexparser`
- `content/check_duplicate_labels.py` - Find duplicate labels
- `content/check_fig_references.py` - Validate figure references
- `content/check_unreferenced_labels.py` - Find unused labels

View File

@@ -0,0 +1,366 @@
#!/usr/bin/env python3
"""
Validate book bibliography entries with hallucinator.
Uses the hallucinator library (https://github.com/gianlucasb/hallucinator) to check
references from the project's .bib files against academic databases (CrossRef, arXiv,
DBLP, Semantic Scholar, etc.). Helps detect typos, wrong DOIs, or fabricated refs.
Requirements (install separately):
pip install hallucinator bibtexparser
Usage:
# From repo root
python3 book/tools/scripts/check_references_hallucinator.py
# Specific .bib files
python3 book/tools/scripts/check_references_hallucinator.py \\
book/quarto/contents/vol1/backmatter/references.bib
# Save report
python3 book/tools/scripts/check_references_hallucinator.py --output report.txt
Optional API keys (env vars) for better coverage and fewer rate limits:
OPENALEX_KEY - OpenAlex (openalex.org); free at https://openalex.org/settings/api
S2_API_KEY - Semantic Scholar; free, request at https://www.semanticscholar.org/product/api
Without keys, those DBs still work but with stricter rate limits.
"""
from __future__ import annotations
import argparse
import json
import os
import re
import subprocess
import sys
import unicodedata
from pathlib import Path
from types import SimpleNamespace
try:
import bibtexparser
except ImportError:
print("Missing dependency: pip install bibtexparser", file=sys.stderr)
sys.exit(1)
try:
from hallucinator import Reference, Validator, ValidatorConfig
except ImportError:
print("Missing dependency: pip install hallucinator", file=sys.stderr)
sys.exit(1)
# Default .bib files relative to repo root (from project root)
DEFAULT_BIB_PATHS = [
"book/quarto/contents/vol1/backmatter/references.bib",
"book/quarto/contents/vol2/backmatter/references.bib",
]
MIN_TITLE_WORDS = 4 # Skip very short titles (likely false matches)
def _to_ascii(s: str) -> str:
"""Replace non-ASCII chars with ASCII equivalents so hallucinator's Rust code doesn't panic on Unicode."""
if not s:
return s
n = unicodedata.normalize("NFKD", s)
return n.encode("ascii", "ignore").decode("ascii")
def _normalize_title(raw: str) -> str:
"""Strip braces and collapse whitespace."""
if not raw:
return ""
t = re.sub(r"[\{\}]", "", raw)
t = re.sub(r"\s+", " ", t).strip()
return t
def _parse_authors(author_field: str) -> list[str]:
"""Parse BibTeX author string into list of family names (or full name if no comma)."""
if not author_field or not author_field.strip():
return []
authors = []
for part in re.split(r"\s+and\s+", author_field, flags=re.IGNORECASE):
part = part.strip()
if not part:
continue
# BibTeX often "Last, First" — use Last for matching
if "," in part:
family = part.split(",", 1)[0].strip()
else:
family = part
# Drop LaTeX accents/braces for matching
family = re.sub(r"\\[a-z]+\{([^}]*)\}", r"\1", family)
family = re.sub(r"[{}\\]", "", family).strip()
# Hallucinator's Rust code panics on non-ASCII; normalize to ASCII
family = _to_ascii(family)
if family:
authors.append(family)
return authors[:15] # hallucinator caps at 15
def _extract_arxiv_id(entry: dict) -> str | None:
"""Get arXiv id from eprint + archiveprefix or from url."""
ap = (entry.get("archiveprefix") or "").strip().lower()
eprint = (entry.get("eprint") or "").strip()
if ap == "arxiv" and eprint:
return eprint
url = entry.get("url") or ""
m = re.search(r"arxiv\.org/abs/(\d+\.\d+v?\d*)", url, re.IGNORECASE)
if m:
return m.group(1)
return None
def bib_entries_to_references(bib_path: Path) -> list[tuple[str, Reference]]:
"""Load a .bib file and return [(citation_key, Reference), ...]."""
with open(bib_path, encoding="utf-8", errors="replace") as f:
bib_str = f.read()
parser = bibtexparser.bparser.BibTexParser(common_strings=True)
parser.ignore_nonstandard_types = False
db = bibtexparser.loads(bib_str, parser)
out = []
for entry in db.entries:
key = entry.get("ID", "")
title = _normalize_title(entry.get("title", ""))
if not title:
continue
if len(title.split()) < MIN_TITLE_WORDS:
continue
title = _to_ascii(title) # avoid Rust Unicode panic
authors = _parse_authors(entry.get("author", ""))
doi = (entry.get("doi") or "").strip() or None
arxiv_id = _extract_arxiv_id(entry)
ref = Reference(
title=title,
authors=authors,
doi=doi,
arxiv_id=arxiv_id,
)
out.append((key, ref))
return out
def dedupe_refs(items: list[tuple[str, Reference]]) -> list[tuple[str, Reference]]:
"""Deduplicate by (title, doi, arxiv_id), keeping first citation key."""
seen: set[tuple[str, str | None, str | None]] = set()
out = []
for key, ref in items:
sig = (ref.title, ref.doi, ref.arxiv_id)
if sig in seen:
continue
seen.add(sig)
out.append((key, ref))
return out
_CHILD_SCRIPT = r"""
import json, os, sys
from hallucinator import Reference, Validator, ValidatorConfig
ref_dict = json.loads(sys.argv[1])
ref = Reference(
ref_dict["title"],
authors=ref_dict.get("authors") or [],
doi=ref_dict.get("doi"),
arxiv_id=ref_dict.get("arxiv_id"),
)
config = ValidatorConfig()
if os.environ.get("OPENALEX_KEY"):
config.openalex_key = os.environ["OPENALEX_KEY"]
if os.environ.get("S2_API_KEY"):
config.s2_api_key = os.environ["S2_API_KEY"]
validator = Validator(config)
results = validator.check([ref])
r = results[0]
print(r.status, r.source or "", r.title, sep="\t")
"""
def _validate_resilient(refs: list) -> list:
"""Validate each ref in a subprocess; on crash, record as error and continue."""
results = []
for i, ref in enumerate(refs):
payload = {
"title": ref.title,
"authors": ref.authors,
"doi": ref.doi,
"arxiv_id": ref.arxiv_id,
}
try:
proc = subprocess.run(
[sys.executable, "-c", _CHILD_SCRIPT, json.dumps(payload)],
capture_output=True,
text=True,
timeout=90,
env=os.environ,
)
except subprocess.TimeoutExpired:
results.append(SimpleNamespace(status="error", title=ref.title, source="timeout"))
icon = "!"
print(f" [{i+1}/{len(refs)}] {icon} error (timeout): {ref.title[:60]}...")
continue
if proc.returncode != 0 or not proc.stdout.strip():
results.append(SimpleNamespace(status="error", title=ref.title, source="validator crash"))
icon = "!"
print(f" [{i+1}/{len(refs)}] {icon} error (validator crash): {ref.title[:60]}...")
continue
parts = proc.stdout.strip().split("\t", 2)
status = parts[0] if parts else "error"
source = (parts[1] or None) if len(parts) > 1 else None
title_out = parts[2] if len(parts) > 2 else ref.title
results.append(SimpleNamespace(status=status, title=title_out, source=source))
icon = {"verified": "+", "not_found": "?", "author_mismatch": "~"}.get(status, " ")
src = f" ({source})" if source else ""
print(f" [{i+1}/{len(refs)}] {icon} {status}: {title_out[:60]}{src}")
return results
def main() -> int:
parser = argparse.ArgumentParser(
description="Validate .bib references with hallucinator (academic DBs)."
)
parser.add_argument(
"bib_files",
nargs="*",
help="Paths to .bib files (default: vol1 and vol2 references.bib)",
)
parser.add_argument(
"--output",
"-o",
metavar="FILE",
help="Write report to FILE",
)
parser.add_argument(
"--no-dedupe",
action="store_true",
help="Do not deduplicate references across .bib files",
)
parser.add_argument(
"--root",
default=".",
metavar="DIR",
help="Project root (default: current directory)",
)
parser.add_argument(
"--limit",
type=int,
metavar="N",
help="Validate only first N references (for quick test)",
)
parser.add_argument(
"--no-resilient",
action="store_true",
help="Use batch validation (faster but may crash on some DB responses with Unicode)",
)
args = parser.parse_args()
root = Path(args.root).resolve()
if args.bib_files:
bib_paths = [root / p for p in args.bib_files]
else:
bib_paths = [root / p for p in DEFAULT_BIB_PATHS]
missing = [p for p in bib_paths if not p.exists()]
if missing:
for p in missing:
print(f"Not found: {p}", file=sys.stderr)
return 1
# Collect (key, Reference) from all files
all_refs: list[tuple[str, Reference]] = []
for p in bib_paths:
all_refs.extend(bib_entries_to_references(p))
if not all_refs:
print("No references to validate.", file=sys.stderr)
return 0
if not args.no_dedupe:
all_refs = dedupe_refs(all_refs)
if args.limit is not None:
all_refs = all_refs[: args.limit]
refs = [r for _, r in all_refs]
keys = [k for k, _ in all_refs]
n = len(refs)
print(f"Validating {n} references against academic databases...")
if os.environ.get("OPENALEX_KEY") or os.environ.get("S2_API_KEY"):
print("(Using OPENALEX_KEY / S2_API_KEY for better coverage)\n")
else:
print("(Optional: OPENALEX_KEY, S2_API_KEY for better coverage)\n")
if not args.no_resilient:
results = _validate_resilient(refs)
else:
config = ValidatorConfig()
if os.environ.get("OPENALEX_KEY"):
config.openalex_key = os.environ["OPENALEX_KEY"]
if os.environ.get("S2_API_KEY"):
config.s2_api_key = os.environ["S2_API_KEY"]
validator = Validator(config)
def progress(event):
if event.event_type == "result":
r = event.result
idx = event.index + 1
icon = {"verified": "+", "not_found": "?", "author_mismatch": "~"}.get(
r.status, " "
)
src = f" ({r.source})" if r.source else ""
print(f" [{idx}/{event.total}] {icon} {r.status}: {r.title}{src}")
results = validator.check(refs, progress=progress)
# Summary
verified = sum(1 for r in results if r.status == "verified")
not_found = sum(1 for r in results if r.status == "not_found")
mismatch = sum(1 for r in results if r.status == "author_mismatch")
errors = sum(1 for r in results if r.status == "error")
lines = [
"",
"Summary",
"-------",
f" Verified: {verified}",
f" Not found: {not_found}",
f" Author mismatch: {mismatch}",
f" Total: {n}",
]
if errors:
lines.insert(-1, f" Error (skipped): {errors}")
for line in lines:
print(line)
# Optional report file
if args.output:
report_path = Path(args.output)
with open(report_path, "w", encoding="utf-8") as f:
f.write("Hallucinator reference check report\n")
f.write("====================================\n\n")
f.write(f"Sources: {[str(p) for p in bib_paths]}\n")
f.write("\n".join(lines) + "\n\n")
f.write("Not found (potential typos or non-indexed):\n")
for key, r in zip(keys, results):
if r.status == "not_found":
f.write(f" [{key}] {r.title}\n")
f.write("\nAuthor mismatch:\n")
for key, r in zip(keys, results):
if r.status == "author_mismatch":
f.write(f" [{key}] {r.title}\n")
errors_list = [(k, r) for k, r in zip(keys, results) if r.status == "error"]
if errors_list:
f.write("\nError (validator crash or timeout):\n")
for key, r in errors_list:
f.write(f" [{key}] {r.title}\n")
print(f"\nReport written to {report_path}")
return 0 if not_found == 0 and mismatch == 0 and errors == 0 else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,495 @@
Validating 1095 references against academic databases...
(Optional: OPENALEX_KEY, S2_API_KEY for better coverage)
[1/1095] + verified: Driving in the Matrix: Can virtual worlds replace human-gene (DOI)
[2/1095] + verified: TensorFlow: Large-Scale Machine Learning on Heterogeneous Di (arXiv)
[3/1095] ? not_found: Tiny ML: The Next Big Opportunity in Tech
[4/1095] + verified: Machine Learning and the Cancer-Diagnosis Problem -- No Gold (CrossRef)
[5/1095] ? not_found: Manifesto for Agile Software Development
[6/1095] + verified: Taming Throughput-Latency Tradeoff in LLM Inference with Sar (arXiv)
[7/1095] + verified: Theano: A Python framework for fast computation of mathemati (arXiv)
[8/1095] + verified: ImageNet Classification with Deep Convolutional Neural Netwo (CrossRef)
[9/1095] + verified: Validity of the single processor approach to achieving large (CrossRef)
[10/1095] + verified: Software Engineering for Machine Learning: A Case Study (DOI)
[11/1095] + verified: On Global Electricity Usage of Communication Technology: Tre (DOI)
[12/1095] + verified: Queries and Concept Learning (DOI)
[13/1095] + verified: ANNETTE: Accurate Neural Network Execution Time Estimation W (DOI)
[14/1095] + verified: PyTorch 2: Faster Machine Learning Through Dynamic Python By (DOI)
[15/1095] + verified: Automating Server Deployments with Ansible: Utilizing Automa (DBLP)
[16/1095] ? not_found: Synthetic Data for Artificial Intelligence
[17/1095] + verified: Common Voice: A Massively-Multilingual Speech Corpus (arXiv)
[18/1095] ? not_found: BFloat16 floating-point widening multiply-add long
[19/1095] ? not_found: Arm Cortex-M55 Processor Technical Reference Manual
[20/1095] + verified: Lakehouse: A New Generation of Open Platforms that Unify Dat (DBLP)
[21/1095] + verified: Noninvasive assessment of dofetilide plasma concentration us (DOI)
[22/1095] ~ author_mismatch: Amazon Web Services (AWS) (CrossRef)
[23/1095] ~ author_mismatch: Amazon Simple Storage Service (S3) (CrossRef)
[24/1095] + verified: Put Deep Learning to Work (DOI)
[25/1095] + verified: Do Deep Nets Really Need to be Deep? (DBLP)
[26/1095] + verified: Neural Machine Translation by Jointly Learning to Align and (DBLP)
[27/1095] ? not_found: ONNX: Open Neural Network Exchange
[28/1095] + verified: Benchmarking TinyML Systems: Challenges and Direction (arXiv)
[29/1095] + verified: MicroNets: Neural network architectures for deploying TinyML (arXiv)
[30/1095] + verified: Wake Vision: A Tailored Dataset and Benchmark Suite for Tiny (arXiv)
[31/1095] + verified: Big Data's Disparate Impact (DOI)
[32/1095] + verified: The Case for Energy-Proportional Computing (DOI)
[33/1095] + verified: Attack of the killer microseconds (DOI)
[34/1095] + verified: The Datacenter as a Computer (DOI)
[35/1095] + verified: Automatic Differentiation in Machine Learning: A Survey (arXiv)
[36/1095] ? not_found: Understanding the Python GIL
[37/1095] + verified: Reconciling modern machine-learning practice and the classic (DOI)
[38/1095] + verified: AI Fairness 360: An extensible toolkit for detecting and mit (DOI)
[39/1095] + verified: Long Short-Term Memory and Learning-to-Learn in Networks of (arXiv)
[40/1095] + verified: Demystifying Parallel and Distributed Deep Learning (DOI)
[41/1095] + verified: The Resource-as-a-Service (RaaS) Cloud (DBLP)
[42/1095] + verified: Data Statements for Natural Language Processing: Toward Miti (CrossRef)
[43/1095] + verified: On the Dangers of Stochastic Parrots (CrossRef)
[44/1095] + verified: Estimating or Propagating Gradients Through Stochastic Neuro (arXiv)
[45/1095] + verified: Representation Learning: A Review and New Perspectives (DOI)
[46/1095] + verified: Conditional Computation in Neural Networks for faster models (DBLP)
[47/1095] + verified: Theano: A CPU and GPU Math Compiler in Python (DOI)
[48/1095] + verified: Algorithms for Hyper-Parameter Optimization (DBLP)
[49/1095] + verified: Site Reliability Engineering: How Google Runs Production Sys (Semantic Scholar)
[50/1095] ! error (validator crash): Are we done with ImageNet?...
[51/1095] ? not_found: Fairlearn: A toolkit for assessing and improving fairness in
[52/1095] ~ author_mismatch: Pattern Recognition and Machine Learning (CrossRef)
[53/1095] + verified: Evolution of thread-level parallelism in desktop application (DOI)
[54/1095] ? not_found: What is the State of Neural Network Pruning?
[55/1095] + verified: Basic Linear Algebra Subprograms for Fortran Usage (CrossRef)
[56/1095] + verified: Natural Language Input for a Computer Problem Solving System (Semantic Scholar)
[57/1095] + verified: On the Opportunities and Risks of Foundation Models (arXiv)
[58/1095] + verified: Google Workloads for Consumer Devices (DOI)
[59/1095] + verified: Data Validation for Machine Learning. (Semantic Scholar)
[60/1095] + verified: The ML Test Score: A Rubric for ML Production Readiness and (DOI)
[61/1095] + verified: Towards robust distributed systems (abstract) (CrossRef)
[62/1095] + verified: Probabilistic Interpretation of Feedforward Classification N (CrossRef)
[63/1095] + verified: On the resemblance and containment of documents (DOI)
[64/1095] + verified: Language Models are Few-Shot Learners (DOI)
[65/1095] ? not_found: Speed Matters for Google Web Search
[66/1095] + verified: The second machine age: work, progress, and prosperity in a (DBLP)
[67/1095] + verified: Gender Shades: Intersectional Accuracy Disparities in Commer (DBLP)
[68/1095] + verified: Caffe: Convolutional Architecture for Fast Feature Embedding (arXiv)
[69/1095] + verified: ProxylessNAS: Direct Neural Architecture Search on Target Ta (arXiv)
[70/1095] + verified: Once-for-All: Train One Network and Specialize it for Effici (arXiv)
[71/1095] ? not_found: Cerebras Systems: Wafer-Scale AI Accelerators
[72/1095] ? not_found: Wafer-Scale Deep Learning Acceleration with the Cerebras CS-
[73/1095] ? not_found: The Wafer-Scale Engine 2: Scaling AI Compute Beyond GPUs
[74/1095] + verified: Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [ (DOI)
[75/1095] ? not_found: CRISP-DM 1.0: Step-by-step data mining guide
[76/1095] + verified: MXNet: A Flexible and Efficient Machine Learning Library for (arXiv)
[77/1095] + verified: Using Dataflow to Optimize Energy Efficiency of Deep Neural (DOI)
[78/1095] + verified: Eyeriss: A Spatial Architecture for Energy-Efficient Dataflo (CrossRef)
[79/1095] + verified: Training Deep Nets with Sublinear Memory Cost (arXiv)
[80/1095] + verified: Machine Learning and Prediction in Medicine -- Beyond the Pe (DOI)
[81/1095] + verified: TVM: An Automated End-to-End Optimizing Compiler for Deep Le (arXiv)
[82/1095] + verified: Improved Baselines with Momentum Contrastive Learning (arXiv)
[83/1095] + verified: Reinforcement Learning for Combinatorial Optimization: A Sur (CrossRef)
[84/1095] + verified: A Simple Framework for Contrastive Learning of Visual Repres (arXiv)
[85/1095] ? not_found: Evaluating Large Language Models Trained on Code
[86/1095] + verified: A framework for integrating artificial intelligence for clin (DOI)
[87/1095] + verified: EE-LLM: Large-Scale Training and Inference of Early-Exit Lar (arXiv)
[88/1095] + verified: cuDNN: Efficient Primitives for Deep Learning (arXiv)
[89/1095] + verified: On the properties of neural machine translation: Encoder-dec (arXiv)
[90/1095] + verified: PACT: Parameterized Clipping Activation for Quantized Neural (arXiv)
[91/1095] ? not_found: Data Echoing for Efficient Training
[92/1095] ~ author_mismatch: Deep Learning with Python (CrossRef)
[93/1095] + verified: A comprehensive survey on model compression and acceleration (DOI)
[94/1095] + verified: Low-bit Quantization of Neural Networks for Efficient Infere (CrossRef)
[95/1095] + verified: Fair Prediction with Disparate Impact: A Study of Bias in Re (DOI)
[96/1095] + verified: PaLM: Scaling Language Modeling with Pathways (arXiv)
[97/1095] + verified: Discovering Multi-Hardware Mobile Models via Architecture Se (DOI)
[98/1095] ? not_found: Learning Multiple Layers of Features from Tiny Images
[99/1095] ? not_found: CircleCI: Continuous Integration and Delivery Platform
[100/1095] + verified: What Does BERT Look at? An Analysis of BERT's Attention (DOI)
[101/1095] + verified: CNN Explainer: Learning Convolutional Neural Networks with I (DOI)
[102/1095] ? not_found: Microsoft Cognitive Toolkit (CNTK)
[103/1095] + verified: Group Equivariant Convolutional Networks (arXiv)
[104/1095] + verified: Analysis of DAWNBench, a Time-to-Accuracy Machine Learning P (DOI)
[105/1095] + verified: Similarity Search for Efficient Active Learning and Search o (DOI)
[106/1095] + verified: Repeatability in computer systems research (DOI)
[107/1095] + verified: BinaryConnect: Training Deep Neural Networks with Binary Wei (arXiv)
[108/1095] + verified: Elements of Information Theory (DOI)
[109/1095] + verified: Clipper: A Low-Latency Online Prediction Serving System (arXiv)
[110/1095] ? not_found: Data Science Report 2016
[111/1095] + verified: AutoAugment: Learning Augmentation Strategies From Data (DOI)
[112/1095] + verified: Randaugment: Practical automated data augmentation with a re (DOI)
[113/1095] + verified: The Deep Learning Compiler: A Comprehensive Survey (CrossRef)
[114/1095] + verified: Approximation by superpositions of a sigmoidal function (DOI)
[115/1095] + verified: Histograms of Oriented Gradients for Human Detection (DOI)
[116/1095] + verified: Hardware for Deep Learning (DOI)
[117/1095] + verified: Evolution of the Graphics Processing Unit (GPU) (DOI)
[118/1095] + verified: FlashAttention: Fast and Memory-Efficient Exact Attention wi (arXiv)
[119/1095] + verified: Monarch: Expressive Structured Matrices for Efficient and Ac (arXiv)
[120/1095] + verified: FlashAttention-2: Faster Attention with Better Parallelism a (arXiv)
[121/1095] + verified: Amazon scraps secret AI recruiting tool that showed bias aga (DOI)
[122/1095] + verified: Tensorflow lite micro: Embedded machine learning for tinyml (arXiv)
[123/1095] + verified: Advancing Neuromorphic Computing with Loihi: A Survey of Res (DOI)
[124/1095] ? not_found: dbt (data build tool)
[125/1095] + verified: Large Scale Distributed Deep Networks. (DBLP)
[126/1095] ? not_found: Achieving Rapid Response Times in Large Online Services
[127/1095] + verified: The tail at scale (CrossRef)
[128/1095] + verified: A New Golden Age in Computer Architecture: Empowering the Ma (DOI)
[129/1095] ? not_found: DeepBench: Benchmarking Deep Learning Operations on Differen
[130/1095] ? not_found: GPipe: Efficient Training of Giant Neural Networks using Pip
[131/1095] ? not_found: DeepSpeed: Extreme-scale Model Training for Everyone
[132/1095] ? not_found: Data Mesh: Delivering Data-Driven Value at Scale
[133/1095] + verified: ImageNet: A large-scale hierarchical image database (DOI)
[134/1095] + verified: The MNIST Database of Handwritten Digit Images for Machine L (CrossRef)
[135/1095] + verified: Design of ion-implanted MOSFET's with very small physical di (DOI)
[136/1095] + verified: Exploiting Linear Structure Within Convolutional Networks fo (DBLP)
[137/1095] + verified: Sparse Networks from Scratch: Faster Training without Losing (arXiv)
[138/1095] + verified: LLM.int8(): 8-bit Matrix Multiplication for Transformers at (arXiv)
[139/1095] + verified: BERT: Pre-training of Deep Bidirectional Transformers for La (arXiv)
[140/1095] ? not_found: Why Discord is switching from Go to Rust
[141/1095] ? not_found: Docker: Lightweight Linux Containers for Consistent Developm
[142/1095] + verified: A few useful things to know about machine learning (DOI)
[143/1095] ~ author_mismatch: The master algorithm: how the quest for the ultimate learnin (DOI)
[144/1095] + verified: SplitNets: Designing Neural Architectures for Efficient Dist (CrossRef)
[145/1095] + verified: An extended set of FORTRAN basic linear algebra subprograms (DOI)
[146/1095] + verified: An image is worth 16x16 words: Transformers for image recogn (DBLP)
[147/1095] + verified: FastML Science Benchmarks: Accelerating Real-Time Scientific (DBLP)
[148/1095] ? not_found: Data Version Control (DVC)
[149/1095] + verified: Calibrating Noise to Sensitivity in Private Data Analysis (DBLP)
[150/1095] + verified: Differential Privacy: A Survey of Results (CrossRef)
[151/1095] + verified: A Dynamic Pruning Method on Multiple Sparse Structures in De (DOI)
[152/1095] ? not_found: Elasticsearch: Distributed Search and Analytics Engine
[153/1095] + verified: Finding Structure in Time (DBLP)
[154/1095] + verified: Compute Trends Across Three Eras of Machine Learning (arXiv)
[155/1095] + verified: Dark silicon and the end of multicore scaling (CrossRef)
[156/1095] + verified: Learned Step Size Quantization (arXiv)
[157/1095] + verified: The Pascal Visual Object Classes (VOC) Challenge (DOI)
[158/1095] + verified: hls4ml: An Open-Source Codesign Workflow to Empower Scientif (arXiv)
[159/1095] ? not_found: FarmBeats: AI, Edge and IoT for Agriculture
[160/1095] ? not_found: fast.ai: Making Neural Nets Uncool Again
[161/1095] + verified: From Data Mining to Knowledge Discovery in Databases. (DBLP)
[162/1095] ? not_found: Artificial Intelligence/Machine Learning (AI/ML)-Based Softw
[163/1095] + verified: Switch Transformers: Scaling to Trillion Parameter Models wi (DBLP)
[164/1095] + verified: Learning Generative Visual Models from Few Training Examples (DOI)
[165/1095] ? not_found: The Cerebras Wafer-Scale Engine: Opportunities and Challenge
[166/1095] + verified: Auto-sklearn: Efficient and Robust Automated Machine Learnin (DBLP)
[167/1095] ? not_found: Architectural Styles and the Design of Network-based Softwar
[168/1095] ? not_found: The 8087 Numeric Data Processor
[169/1095] ! error (validator crash): Flexpoint: An Adaptive Numerical Format for Efficient Traini...
[170/1095] + verified: Very high-speed computing systems (DOI)
[171/1095] + verified: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neu (arXiv)
[172/1095] ? not_found: Compiling Machine Learning Programs via High-Level Tracing
[173/1095] + verified: DataComp: In search of the next generation of multimodal dat (DBLP)
[174/1095] + verified: The State of Sparsity in Deep Neural Networks (DBLP)
[175/1095] + verified: On the Bauer-Furuta and Seiberg-Witten invariants of familie (CrossRef)
[176/1095] + verified: MegaBlocks: Efficient Sparse Training with Mixture-of-Expert (arXiv)
[177/1095] + verified: A survey on concept drift adaptation (DOI)
[178/1095] + verified: The Decision Maker's Handbook to Data Science (DOI)
[179/1095] ? not_found: Gartner Forecasts Worldwide Public Cloud End-User Spending t
[180/1095] + verified: The Q-List manifesto: How to get things right in generalist (DOI)
[181/1095] + verified: Neural Networks and the Bias/Variance Dilemma (CrossRef)
[182/1095] + verified: A Survey of Quantization Methods for Efficient Neural Networ (DBLP)
[183/1095] + verified: AI and Memory Wall (DOI)
[184/1095] + verified: Brewer's conjecture and the feasibility of consistent, avail (DOI)
[185/1095] + verified: GitLab2PROV - Provenance of Software Projects hosted on GitL (DBLP)
[186/1095] + verified: Understanding the difficulty of training deep feedforward ne (DBLP)
[187/1095] + verified: What every computer scientist should know about floating-poi (DBLP)
[188/1095] + verified: The Netflix Recommender System (DOI)
[189/1095] + verified: Problems of Monetary Management: The U.K. Experience (CrossRef)
[190/1095] ? not_found: BFloat16: The secret to high performance on Cloud TPUs
[191/1095] ? not_found: Google Cloud Platform Documentation
[192/1095] ? not_found: Crowdsource by Google: A Platform for Collecting Inclusive a
[193/1095] ? not_found: LiteRT (formerly TensorFlow Lite)
[194/1095] + verified: reCAPTCHA: Human-Based Character Recognition via Web Securit (DOI)
[195/1095] ~ author_mismatch: Gemini: A Family of Highly Capable Multimodal Models (DBLP)
[196/1095] ? not_found: Static vs. Dynamic Inference
[197/1095] ? not_found: XLA: Optimizing Compiler for Machine Learning
[198/1095] + verified: MorphNet: Fast \&amp; Simple Resource-Constrained Structure (DOI)
[199/1095] + verified: Compressing BERT: Studying the Effects of Weight Pruning on (CrossRef)
[200/1095] + verified: Knowledge Distillation: A Survey (DOI)
[201/1095] + verified: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (arXiv)
[202/1095] ? not_found: The Colossus MK2 IPU Processor
[203/1095] + verified: OLMo: Accelerating the Science of Language Models (CrossRef)
[204/1095] ? not_found: Deep Learning Model Compression (ii) by Ivy Gu Medium
[205/1095] ? not_found: Data quality considerations for big data and machine learnin
[206/1095] + verified: Serving DNNs like Clockwork: Performance Predictability from (arXiv)
[207/1095] + verified: Development and Validation of a Deep Learning Algorithm for (DOI)
[208/1095] ? not_found: A Survey on Deep Learning Based Mobile and Online Payment Se
[209/1095] + verified: Deep learning with limited numerical precision (DBLP)
[210/1095] + verified: The Unreasonable Effectiveness of Data (DOI)
[211/1095] + verified: Learning both Weights and Connections for Efficient Neural N (arXiv)
[212/1095] + verified: Learning both Weights and Connections for Efficient Neural N (arXiv)
[213/1095] ? not_found: Deep Compression: Compressing Deep Neural Networks with Prun
[214/1095] + verified: EIE: Efficient Inference Engine on Compressed Deep Neural Ne (DOI)
[215/1095] + verified: Performance Modeling and Design of Computer Systems (DOI)
[216/1095] + verified: Equality of Opportunity in Supervised Learning (DBLP)
[217/1095] ? not_found: Does ChatGPT Violate New York Times Copyrights?
[218/1095] + verified: Ps and Qs: Quantization-Aware Pruning for Efficient Low Late (DOI)
[219/1095] + verified: Applied Machine Learning at Facebook: A Datacenter Infrastru (DOI)
[220/1095] + verified: Delving Deep into Rectifiers: Surpassing Human-Level Perform (DOI)
[221/1095] + verified: Deep Residual Learning for Image Recognition (DOI)
[222/1095] + verified: AMC: AutoML for Model Compression and Acceleration on Mobile (CrossRef)
[223/1095] + verified: Momentum Contrast for Unsupervised Visual Representation Lea (DOI)
[224/1095] + verified: Towards a new interpretation of separable convolutions (DOI)
[225/1095] + verified: Deep Reinforcement Learning That Matters (DOI)
[226/1095] + verified: Towards the Systematic Reporting of the Energy and Carbon Fo (DOI)
[227/1095] + verified: Gaussian Error Linear Units (GELUs) (arXiv)
[228/1095] + verified: Measuring Massive Multitask Language Understanding (DBLP)
[229/1095] + verified: A new golden age for computer architecture (DOI)
[230/1095] ? not_found: Meet Michelangelo: Uber's Machine Learning Platform
[231/1095] + verified: Measuring the Algorithmic Efficiency of Neural Networks (DBLP)
[232/1095] + verified: Deep Learning Scaling is Predictable, Empirically (DBLP)
[233/1095] + verified: Distilling the Knowledge in a Neural Network (DBLP)
[234/1095] + verified: Advances in natural language processing (DOI)
[235/1095] + verified: A Best Possible Heuristic for the k-Center Problem (CrossRef)
[236/1095] + verified: The Vanishing Gradient Problem During Learning Recurrent Neu (DOI)
[237/1095] + verified: Sparsity in Deep Learning: Pruning and growth for efficient (DBLP)
[238/1095] + verified: Training Compute-Optimal Large Language Models (DBLP)
[239/1095] + verified: The Curious Case of Neural Text Degeneration. (arXiv)
[240/1095] + verified: Multilayer feedforward networks are universal approximators (DOI)
[241/1095] + verified: 1.1 Computing's energy problem (and what we can do about it) (DOI)
[242/1095] + verified: MobileNets: Efficient Convolutional Neural Networks for Mobi (arXiv)
[243/1095] + verified: Fastai: A Layered API for Deep Learning (DOI)
[244/1095] + verified: LoRA: Low-Rank Adaptation of Large Language Models (DBLP)
[245/1095] + verified: Triple Wins: Boosting Accuracy, Robustness and Efficiency To (arXiv)
[246/1095] + verified: Densely Connected Convolutional Networks (DOI)
[247/1095] + verified: Beyond the Memory Wall: A Case for Memory-Centric HPC System (DBLP)
[248/1095] + verified: Quantized Neural Networks: Training Neural Networks with Low (DBLP)
[249/1095] + verified: Receptive fields, binocular interaction and functional archi (DOI)
[250/1095] ? not_found: Hydra: A Framework for Elegantly Configuring Complex Applica
[251/1095] + verified: Efficient Deep Learning Hyperparameter Tuning Using Cloud In (DOI)
[252/1095] + verified: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters (arXiv)
[253/1095] ? not_found: IBM Watson OpenScale: Data Drift Detection
[254/1095] ? not_found: IEEE 2416-2019: Standard for Power Modeling to Enable System
[255/1095] ? not_found: IEEE 754-2019: Standard for Floating-Point Arithmetic
[256/1095] ? not_found: The History of the ReLU
[257/1095] ? not_found: IEEE Standards Association: Working Groups for AI and ML
[258/1095] + verified: Approximate Nearest Neighbors: Towards Removing the Curse of (DOI)
[259/1095] ~ author_mismatch: Building the data warehouse (CrossRef)
[260/1095] ? not_found: Intel Advanced Matrix Extensions (Intel AMX)
[261/1095] + verified: Neural Network Compression and Knowledge Distillation: Tutor (CrossRef)
[262/1095] + verified: Batch Normalization: Accelerating Deep Network Training by R (DBLP)
[263/1095] ? not_found: ISO/IEC JTC 1/SC 42 Artificial Intelligence
[264/1095] + verified: Quantization and Training of Neural Networks for Efficient I (DOI)
[265/1095] ? not_found: MLPerf Mobile v2. 0: An Industry-Standard Benchmark Suite fo
[266/1095] + verified: Edge Impulse: An MLOps Platform for Tiny Machine Learning (arXiv)
[267/1095] ? not_found: JAX: composable transformations of Python+NumPy programs
[268/1095] + verified: Affine projection methods in fault tolerant adaptive filteri (CrossRef)
[269/1095] + verified: Beyond Data and Model Parallelism for Deep Neural Networks (DBLP)
[270/1095] + verified: Highly Scalable Deep Learning Training System with Mixed-Pre (DBLP)
[271/1095] + verified: Optimizing DNN Computation with Relaxed Graph Substitutions (DBLP)
[272/1095] + verified: Accuracy vs. Efficiency: Achieving Both through FPGA-Impleme (arXiv)
[273/1095] + verified: TinyBERT: Distilling BERT for Natural Language Understanding (DOI)
[274/1095] + verified: Billion-scale similarity search with GPUs (arXiv)
[275/1095] + verified: How to stop data centres from gobbling up the world's electr (DOI)
[276/1095] + verified: A Guide to Parallel Computation and Some Cray-1 Experiences (DOI)
[277/1095] + verified: Bag of Tricks for Efficient Text Classification (DBLP)
[278/1095] + verified: In-Datacenter Performance Analysis of a Tensor Processing Un (DOI)
[279/1095] + verified: A domain-specific supercomputer for training deep neural net (DOI)
[280/1095] + verified: Ten Lessons From Three Generations Shaped Google's TPUv4i : (DOI)
[281/1095] + verified: TPU v4: An Optically Reconfigurable Supercomputer for Machin (DOI)
[282/1095] + verified: Highly accurate protein structure prediction with AlphaFold (DOI)
[283/1095] ? not_found: State of Machine Learning and Data Science 2021
[284/1095] + verified: Scaling Laws for Neural Language Models (DBLP)
[285/1095] + verified: Key challenges for delivering clinical impact with artificia (DOI)
[286/1095] + verified: Stochastic Processes Occurring in the Theory of Queues and T (DOI)
[287/1095] ? not_found: Keras: Deep Learning for Humans
[288/1095] + verified: Dynabench: Rethinking Benchmarking in NLP (DOI)
[289/1095] + verified: System level analysis of fast, per-core DVFS using on-chip s (DOI)
[290/1095] + verified: Adam: A Method for Stochastic Optimization (DBLP)
[291/1095] ? not_found: Semi-Supervised Classification with Graph Convolutional Netw
[292/1095] + verified: Inherent Trade-Offs in the Fair Determination of Risk Scores (DBLP)
[293/1095] ? not_found: Designing Data-Intensive Applications: The Big Ideas Behind
[294/1095] + verified: Jupyter Notebooks \&amp;ndash; a publishing format for repro (DOI)
[295/1095] + verified: WILDS: A Benchmark of in-the-Wild Distribution Shifts. (arXiv)
[296/1095] + verified: ToyADMOS: A Dataset of Miniature-Machine Operating Sounds fo (DOI)
[297/1095] + verified: Implications of Historical Trends in the Electrical Efficien (DOI)
[298/1095] + verified: Matrix Factorization Techniques for Recommender Systems (DOI)
[299/1095] + verified: Machine Learning Operations (MLOps): Overview, Definition, a (DBLP)
[300/1095] + verified: RAMAN: A Re-configurable and Sparse tinyML Accelerator for I (arXiv)
[301/1095] + verified: Quantizing deep convolutional networks for efficient inferen (DBLP)
[302/1095] + verified: Self-supervised learning in medicine and healthcare (DOI)
[303/1095] ? not_found: Learning multiple layers of features from tiny images
[304/1095] ? not_found: KServe: Highly Scalable and Standards-Based Model Inference
[305/1095] + verified: Large-Scale Cluster Management at Google with Borg (DOI)
[306/1095] + verified: Kuhn's Structure of Scientific Revolutions between sociology (DOI)
[307/1095] ~ author_mismatch: On Information and Sufficiency (CrossRef)
[308/1095] ? not_found: Systolic arrays (for VLSI)
[309/1095] + verified: FP8 Quantization: The Power of the Exponent (arXiv)
[310/1095] + verified: Hardware/Software Co-Design for TinyML Voice-Recognition App (CrossRef)
[311/1095] + verified: Efficient Memory Management for Large Language Model Serving (DOI)
[312/1095] ? not_found: Label Studio: Open Source Data Labeling Platform
[313/1095] + verified: Quantifying the Carbon Emissions of Machine Learning (arXiv)
[314/1095] + verified: CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M (arXiv)
[315/1095] + verified: The cache performance and optimizations of blocked algorithm (DBLP)
[316/1095] + verified: Identifying Shades of Green: The SPECpower Benchmarks (DOI)
[317/1095] ! error (validator crash): Unmasking Clever Hans Predictors and Assessing What Machines...
[318/1095] + verified: A CMOS 2D Transmit Beamformer With Integrated PZT Ultrasound (DOI)
[319/1095] ~ author_mismatch: Dynamic voltage and frequency scaling: The laws of diminishi (CrossRef)
[320/1095] + verified: Backpropagation Applied to Handwritten Zip Code Recognition (DOI)
[321/1095] + verified: Gradient-based learning applied to document recognition (DOI)
[322/1095] + verified: Deduplicating Training Data Makes Language Models Better (CrossRef)
[323/1095] + verified: GShard: Scaling Giant Models with Conditional Computation an (DBLP)
[324/1095] + verified: Communication Efficient Distributed Machine Learning with th (DBLP)
[325/1095] + verified: Pruning Filters for Efficient ConvNets (arXiv)
[326/1095] ? not_found: Estimating the Training Cost of GPT-3
[327/1095] + verified: Federated Learning: Challenges, Methods, and Future Directio (DOI)
[328/1095] + verified: Hyperband: A Novel Bandit-Based Approach to Hyperparameter O (arXiv)
[329/1095] + verified: Non-invasive Monitoring of Three Glucose Ranges Based On ECG (DOI)
[330/1095] ? not_found: A Survey on Memory Management Strategies for Machine Learnin
[331/1095] + verified: $\$AlpaServe$\$: Statistical multiplexing with model paralle (arXiv)
[332/1095] ! error (validator crash): Holistic Evaluation of Language Models...
[333/1095] ~ author_mismatch: LIME (Local Interpretable Model-Agnostic Explanations) (CrossRef)
[334/1095] + verified: Microsoft COCO: Common Objects in Context (arXiv)
[335/1095] + verified: MCUNet: Tiny Deep Learning on IoT Devices (arXiv)
[336/1095] + verified: AWQ: Activation-aware Weight Quantization for LLM Compressio (arXiv)
[337/1095] + verified: Tiny Machine Learning: Progress and Futures [Feature] (DOI)
[338/1095] ? not_found: Marissa Mayer at Web 2.0
[339/1095] + verified: NVIDIA Tesla: A Unified Graphics and Computing Architecture (DOI)
[340/1095] ? not_found: A Proof for the Queuing Formula: <i>L</i> = \ensuremath\lamb
[341/1095] ? not_found: DARTS: Differentiable Architecture Search
[342/1095] + verified: Monitoring gait at home with radio waves in Parkinson's dise (DOI)
[343/1095] + verified: LLaMA: Open and Efficient Foundation Language Models (DBLP)
[344/1095] + verified: Decoupled Weight Decay Regularization. (arXiv)
[345/1095] + verified: Object recognition from local scale-invariant features (DOI)
[346/1095] + verified: STEP: Learning N:M Structured Sparsity Masks from Scratch wi (arXiv)
[347/1095] + verified: A Gradient Flow Framework For Analyzing Network Pruning (arXiv)
[348/1095] + verified: Are GANs Created Equal? A Large-Scale Study (DBLP)
[349/1095] + verified: A Unified Approach to Interpreting Model Predictions (arXiv)
[350/1095] ~ author_mismatch: Understanding Digital Signal Processing (CrossRef)
[351/1095] ? not_found: Multilingual spoken words corpus
[352/1095] + verified: DataPerf: Benchmarks for Data-centric AI Development (arXiv)
[353/1095] + verified: Montreal Forced Aligner: Trainable Text-Speech Alignment Usi (DOI)
[354/1095] ? not_found: A Proposal for the Dartmouth Summer Research Project on Arti
[355/1095] + verified: A logical calculus of the ideas immanent in nervous activity (DOI)
[356/1095] ? not_found: The Internet of Things: Catching Up to an Accelerating Oppor
[357/1095] + verified: Communication-Efficient Learning of Deep Networks from Decen (DBLP)
[358/1095] + verified: If beam search is the answer, what was the question? (DOI)
[359/1095] + verified: Mixed Precision Training With 8-bit Floating Point (arXiv)
[360/1095] + verified: Pointer Sentinel Mixture Models (arXiv)
[361/1095] + verified: OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Pr (DOI)
[362/1095] + verified: FP8 Formats for Deep Learning (DBLP)
[363/1095] + verified: Efficient Estimation of Word Representations in Vector Space (DBLP)
[364/1095] ? not_found: Device Placement Optimization with Reinforcement Learning
[365/1095] ? not_found: The Need for Biases in Learning Generalizations
[366/1095] + verified: Model Cards for Model Reporting (DOI)
[367/1095] ? not_found: MLflow: An Open Source Platform for the Machine Learning Lif
[368/1095] + verified: MLIR: A Compiler Infrastructure for the End of Moore's Law (arXiv)
[369/1095] + verified: Human-level control through deep reinforcement learning (DOI)
[370/1095] + verified: Analyzing and mitigating data stalls in DNN training (DOI)
[371/1095] + verified: Cramming More Components Onto Integrated Circuits (DOI)
[372/1095] + verified: Cramming More Components onto Integrated Circuits (1965) (CrossRef)
[373/1095] + verified: Relay: A New IR for Machine Learning Frameworks (DOI)
[374/1095] ? not_found: A White Paper on Neural Network Quantization
[375/1095] + verified: Rectified Linear Units Improve Restricted Boltzmann Machines (DBLP)
[376/1095] + verified: Deep Double Descent: Where Bigger Models and More Data Hurt (CrossRef)
[377/1095] + verified: Efficient large-scale language model training on GPU cluster (DOI)
[378/1095] + verified: Deep Learning Recommendation Model for Personalization and R (DBLP)
[379/1095] + verified: Improving Voice Trigger Detection with Metric Learning (DBLP)
[380/1095] + verified: Neural Architecture Search: A Survey (DBLP)
[381/1095] + verified: Exploring Generalization in Deep Learning (DBLP)
[382/1095] ? not_found: MLOps: From Model-centric to Data-centric AI
[383/1095] + verified: Scalable Parallel Programming with CUDA (DOI)
[384/1095] + verified: The Design Process for Google's Training Chips: TPUv2 and TP (DOI)
[385/1095] + verified: Pervasive Label Errors in Test Sets Destabilize Machine Lear (DBLP)
[386/1095] ? not_found: Collision Between Vehicle Controlled by Developmental Automa
[387/1095] + verified: Array programming with NumPy (PubMed)
[388/1095] ? not_found: cuBLAS: CUDA Basic Linear Algebra Subprograms
[389/1095] ? not_found: Accelerating Matrix Multiplication with Block Sparse Format
[390/1095] ? not_found: NVIDIA Collective Communications Library (NCCL)
[391/1095] ? not_found: NVIDIA Omniverse and Simulation
[392/1095] ? not_found: TensorRT: High-Performance Deep Learning Inference Library
[393/1095] ? not_found: Training with Mixed Precision
[394/1095] ? not_found: TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC
[395/1095] ? not_found: NVIDIA Triton Inference Server
[396/1095] ? not_found: NVIDIA Tesla V100 GPU Architecture
[397/1095] + verified: Demystifying the Nvidia Ampere Architecture through Microben (DOI)
[398/1095] + verified: NVIDIA A100 Tensor Core GPU: Performance and Innovation (DBLP)
[399/1095] ? not_found: NVIDIA A100 Tensor Core GPU Architecture
[400/1095] ? not_found: NVLink: Scalable High-Performance Interconnect
[401/1095] ? not_found: NVIDIA cuDNN Developer Guide
[402/1095] + verified: NVIDIA Hopper H100 GPU: Scaling Performance (DOI)
[403/1095] ? not_found: NVIDIA TensorRT: Programmable Inference Accelerator
[404/1095] ? not_found: NVIDIA Triton Inference Server: Developer Documentation
[405/1095] + verified: Hidden stratification causes clinically meaningful failures (DOI)
[406/1095] + verified: Dissecting racial bias in an algorithm used to manage the he (DOI)
[407/1095] ? not_found: Measuring the Geographic Distribution of AI Computing Capaci
[408/1095] + verified: TensorFlow-Serving: Flexible, High-Performance ML Serving (DBLP)
[409/1095] ? not_found: oneDNN: Intel's Deep Learning Neural Network Library
[410/1095] ? not_found: ONNX Runtime: Cross-Platform Inference and Training Machine-
[411/1095] + verified: BS-80K: The first large open-access dataset of bone scan ima (PubMed)
[412/1095] + verified: Training language models to follow instructions with human f (DBLP)
[413/1095] + verified: Challenges in Deploying Machine Learning: A Survey of Case S (DOI)
[414/1095] ? not_found: The INTEL\textregistered 8087 numeric data processor
[415/1095] + verified: SpecAugment: A Simple Data Augmentation Method for Automatic (DBLP)
[416/1095] + verified: PyTorch: An Imperative Style, High-Performance Deep Learning (DBLP)
[417/1095] + verified: Bandwidth optimal all-reduce algorithms for clusters of work (DOI)
[418/1095] + verified: Introduction to common crawl datasets (CrossRef)
[419/1095] ? not_found: DevOpsDays: The Birth of DevOps
[420/1095] ? not_found: Computer Organization and Design RISC-V Edition: The Hardwar
[421/1095] + verified: Carbon Emissions and Large Neural Network Training (DBLP)
[422/1095] + verified: Computer Architecture: A Quantitative Approach (DBLP)
[423/1095] ? not_found: Deep Learning on a Data Diet: Finding Important Examples Ear
[424/1095] + verified: The FineWeb Datasets: Decanting the Web for the Finest Text (arXiv)
[425/1095] + verified: Artificial Intelligence Index Report 2024. (DBLP)
[426/1095] ? not_found: Improving reproducibility in machine learning research (a re
[427/1095] + verified: Data Management Challenges in Production Machine Learning (DOI)
[428/1095] + verified: Efficiently Scaling Transformer Inference (DBLP)
[429/1095] + verified: Fast Approximations of Activation Functions in Deep Neural N (DOI)
[430/1095] + verified: CFU Playground: Full-Stack Open-Source Framework for Tiny Ma (DOI)
[431/1095] ? not_found: Prefect: Workflow Orchestration Framework for Python
[432/1095] ? not_found: Prometheus: Monitoring System and Time Series Database
[433/1095] + verified: Wearable Insulin Biosensors for Diabetes Management: Advance (DOI)
[434/1095] + verified: Data Cards: Purposeful and Transparent Dataset Documentation (arXiv)
[435/1095] + verified: A reconfigurable fabric for accelerating large-scale datacen (DOI)
[436/1095] ? not_found: Accelerating Neural Network Training with Sparse Tensors
[437/1095] + verified: An efficient pruning scheme of deep neural networks for Inte (DOI)
[438/1095] ! error (validator crash): Winning the lottery ahead of time: Efficient early network p...
[439/1095] ? not_found: Improving language understanding by generative pre-training
[440/1095] + verified: Learning transferable visual models from natural language su (DBLP)
[441/1095] + verified: Designing Network Design Spaces (DOI)
[442/1095] + verified: Appraising the Brain's Energy Budget (DOI)
[443/1095] + verified: Evaluation metrics and statistical tests for machine learnin (DOI)
[444/1095] + verified: ZeRO: Memory Optimization Towards Training Trillion Paramete (DBLP)
[445/1095] + verified: ZeRO: Memory optimizations Toward Training Trillion Paramete (DOI)
[446/1095] + verified: Closing the AI accountability gap (DOI)
[447/1095] ~ author_mismatch: Machine Learning in Medicine (CrossRef)
[448/1095] + verified: SQuAD: 100,000+ Questions for Machine Comprehension of Text (DOI)
[449/1095] ! error (validator crash): Twenty Five Years of Warehouse-Scale Computing...
[450/1095] + verified: XNOR-Net: ImageNet Classification Using Binary Convolutional (DBLP)
[451/1095] + verified: Deep Learning for Computer Architects (DOI)
[452/1095] + verified: Regularized Evolution for Image Classifier Architecture Sear (DOI)
[453/1095] + verified: Do ImageNet Classifiers Generalize to ImageNet? (DBLP)
[454/1095] + verified: Widening Access to Applied Machine Learning with TinyML (DOI)
[455/1095] + verified: A Survey of Deep Active Learning (DOI)
[456/1095] + verified: "Why Should I Trust You?" (CrossRef)
[457/1095] + verified: Auditing radicalization pathways on YouTube (DOI)
[458/1095] + verified: Beyond Accuracy: Behavioral Testing of NLP Models with Check (DOI)
[459/1095] + verified: The RISC-V instruction set (DOI)
[460/1095] + verified: Communications Signal Processing Using RISC-V Vector Extensi (DOI)
[461/1095] ? not_found: INFaaS: Automated Model-less Inference Serving
[462/1095] ? not_found: The Perceptron: A Perceiving and Recognizing Automaton
[463/1095] + verified: The perceptron: A probabilistic model for information storag (DOI)
[464/1095] + verified: Managing the Development of Large Software Systems (DBLP)
[465/1095] + verified: Learning representations by back-propagating errors (DOI)
[466/1095] + verified: ImageNet Large Scale Visual Recognition Challenge (DOI)
[467/1095] ? not_found: Dynamic Routing Between Capsules
[468/1095] ? not_found: SambaNova: The Fastest AI Inference Platform and Hardware
[469/1095] + verified: ``Everyone wants to do the model work, not the data work'': (DOI)
[470/1095] + verified: MobileNetV2: Inverted Residuals and Linear Bottlenecks (DOI)
[471/1095] + verified: DistilBERT, a distilled version of BERT: smaller, faster, ch (arXiv)
[472/1095] ? not_found: The Flaw of Averages: Why We Underestimate Risk in the Face
[473/1095] ? not_found: Why should I trust you? A survey of explainability of machin
[474/1095] ? not_found: Automating large-scale machine learning model management
[475/1095] ? not_found: sklearn.metrics.confusion\_matrix --- scikit-learn documenta
[476/1095] ? not_found: Feature selection --- scikit-learn documentation
[477/1095] ? not_found: Metrics and scoring: quantifying the quality of predictions
[478/1095] + verified: SciPy 1.0: fundamental algorithms for scientific computing i (DOI)
[479/1095] ? not_found: Technical Debt in Machine Learning Systems
[480/1095] ? not_found: Securities Exchange Act of 1934, Release No. 70694: Knight C
[481/1095] + verified: Active Learning for Convolutional Neural Networks: A Core-Se (DBLP)
[482/1095] + verified: NeuroFlow: Development of Lightweight and Efficient Model In (DBLP)
[483/1095] + verified: Horovod: fast and easy distributed deep learning in TensorFl (DBLP)
[484/1095] + verified: Measuring the Effects of Data Parallelism on Neural Network (DBLP)
[485/1095] ? not_found: Accelerating Genomic Data Analysis with Domain-Specific Arch
[486/1095] + verified: A Mathematical Theory of Communication (DOI)
[487/1095] + verified: Outrageously large neural networks: The sparsely-gated mixtu (arXiv)
[488/1095] + verified: Mesh-TensorFlow: Deep Learning for Supercomputers (arXiv)
[489/1095] + verified: Q-BERT: Hessian Based Ultra Low Precision Quantization of BE (DOI)
[490/1095] + verified: Edge Computing: Vision and Challenges (DOI)
[491/1095] + verified: Megatron-LM: Training Multi-Billion Parameter Language Model (arXiv)
[492/1095] + verified: A survey on Image Data Augmentation for Deep Learning (DBLP)