Updated a figure in chapter11

This commit is contained in:
Zeljko Hrcek
2026-02-23 12:05:51 +01:00
parent 96efa3bf29
commit 25db8e9d4b

View File

@@ -1293,354 +1293,215 @@ To understand why "Structured" patterns are required for hardware speedup, consi
::: {#fig-sparse-formats fig-env="figure" fig-pos="htb" fig-cap="**Sparse Storage Formats**: Hardware efficiency depends on how sparse matrices are stored. **Dense** storage (top left) is simple but wasteful for zeros. **Block Sparse** (top right) and **CSR** (bottom) compress the matrix by storing only non-zero values and their indices. Structured sparsity (like N:M or Blocks) makes this indexing predictable, allowing hardware to fetch data and skip zeros efficiently." fig-alt="Grid of 3x3 matrix blocks. Top left: Dense Matrix. Top right: Block Sparse Matrix showing dense sub-blocks. Bottom: Sparse Matrix (CSR) and Block Sparse (BSR) representations showing values and index arrays."}
```{.tikz}
\begin{tikzpicture}[line join=round,font=\usefont{T1}{phv}{m}{n}]
\begin{tikzpicture}[ x=5mm,y=5mm,line join=round,font=\usefont{T1}{phv}{m}{n}\small]
\tikzset{%
helvetica/.style={align=flush center,font=\small\usefont{T1}{phv}{m}{n}},
Line/.style={line width=1.0pt,black!50,text=black},
cell/.style={draw=white, line width=0.6pt},
gridline/.style={draw=black!70, line width=1.5pt},
Arr/.style={->,>=Latex,line width=0.75pt},
}
\definecolor{Blue1}{RGB}{23,68,150}
\definecolor{Blue2}{RGB}{84,131,217}
\definecolor{Blue3}{RGB}{145,177,237}
\def\columns{3}
\def\rows{3}
\def\cellsize{5mm}
\def\cellheight{5mm}
% dimensions
\def\Rows{3}
\def\Cols{9}
\begin{scope}[local bounding box=BL1]
\begin{scope}[local bounding box=matrica1]
\def\rowone{Blue2,Blue3,Blue2}
\def\rowtwo{Blue2,Blue1,Blue2}
\def\rowthree{Blue2,Blue2,Blue2}
\definecolor{blue1}{RGB}{23,68,150}
\definecolor{blue2}{RGB}{84,131,217}
\definecolor{blue3}{RGB}{145,177,237}
% --- 2) Macro: paint cell (r,c) with color ---
\newcommand{\ColorCell}[3]{% #1=row, #2=col, #3=color
\pgfmathtruncatemacro{\yy}{\Rows-#1}
\fill[#3] (#2-1,\yy) rectangle ++(1,1);
\draw[cell] (#2-1,\yy) rectangle ++(1,1);
}
%Grid 9x3
\newcommand{\Gridd}{%
\foreach \r in {1,...,\Rows}{
\foreach \c in {1,...,\Cols}{
\pgfmathtruncatemacro{\yy}{\Rows-\r}
% draw cell
\fill[gray!25] (\c-1,\yy) rectangle ++(1,1);
\draw[cell] (\c-1,\yy) rectangle ++(1,1);
% corners
\coordinate (\br-cell-\r-\c-sw) at (\c-1,\yy);
\coordinate (\br-cell-\r-\c-se) at (\c,\yy);
\coordinate (\br-cell-\r-\c-nw) at (\c-1,\yy+1);
\coordinate (\br-cell-\r-\c-ne) at (\c,\yy+1);
\coordinate (\br-cell-\r-\c-n) at (\c-0.5,\yy+1);
% center
\coordinate (cell-\r-\c) at (\c-0.5,\yy+0.5);
}
}
}
%%%%%%%%%%%%%
%Dense Matrix
%%%%%%%%%%%%%
\begin{scope}[local bounding box=BLOCK-A,shift={(0.0,0.0)}]
% --- 1) Base + all cell coordinates ---
\def\br{A}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
\Gridd
\ColorCell{1}{1}{blue2}
\ColorCell{1}{2}{blue3}
\ColorCell{1}{3}{blue2}
\ColorCell{2}{1}{blue2}
\ColorCell{2}{2}{blue1}
\ColorCell{2}{3}{blue2}
\ColorCell{3}{1}{blue2}
\ColorCell{3}{2}{blue2}
\ColorCell{3}{3}{blue2}
%Center
\ColorCell{1}{4}{blue1}
\ColorCell{1}{5}{blue2}
\ColorCell{1}{6}{blue2}
\ColorCell{2}{4}{blue3}
\ColorCell{2}{5}{blue2}
\ColorCell{2}{6}{blue1}
\ColorCell{3}{4}{blue2}
\ColorCell{3}{5}{blue1}
\ColorCell{3}{6}{blue2}
% --- 4) Thick frame around the entire 9x3 grid ---
\draw[gridline] (0,0) rectangle (\Cols,\Rows);
%
\foreach \color [count=\x] in \rowone {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-1\br) {};
}
\foreach \color [count=\x] in \rowtwo {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-2\br) {};
}
\foreach \color [count=\x] in \rowthree {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-3\br) {};
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
% (optional) thick vertical division after 3 and after 6 columns (as in the picture)
\draw[gridline] (3,0) -- (3,\Rows);
\draw[gridline] (6,0) -- (6,\Rows);
\end{scope}
%%%%%%
\begin{scope}[shift={(1.5,0)}]
\def\rowone{Blue1,Blue2,Blue2}
\def\rowtwo{Blue3,Blue2,Blue1}
\def\rowthree{Blue2,Blue1,Blue2}
%%%%%%%%%%%%%
%Sparse Matrix (CSR)
%%%%%%%%%%%%%
\begin{scope}[local bounding box=BLOCK-B,shift={(0.0,-7.0)}]
% --- 1) Base + all cell coordinates ---
\def\br{B}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
\Gridd
%
\foreach \color [count=\x] in \rowone {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-1\br) {};
}
\foreach \color [count=\x] in \rowtwo {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-2\br) {};
}
\foreach \color [count=\x] in \rowthree {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-3\br) {};
}
\ColorCell{1}{7}{blue1}
\ColorCell{1}{8}{blue2}
\ColorCell{1}{9}{blue2}
\ColorCell{2}{7}{blue3}
\ColorCell{2}{8}{blue2}
\ColorCell{2}{9}{blue1}
\ColorCell{3}{7}{blue2}
\ColorCell{3}{8}{blue1}
\ColorCell{3}{9}{blue2}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
% --- 4) Thick frame around the entire 9x3 grid ---
\draw[gridline] (0,0) rectangle (\Cols,\Rows);
% (optional) thick vertical division after 3 and after 6 columns (as in the picture)
\draw[gridline] (3,0) -- (3,\Rows);
\draw[gridline] (6,0) -- (6,\Rows);
\end{scope}
%%%%%%
\begin{scope}[shift={(3.0,0)}]
%%%%%%%%%%%%%
%Block Sparse Matrix
%%%%%%%%%%%%%
\begin{scope}[local bounding box=BLOCK-C,shift={(10.5,0,0)}]
% --- 1) Base + all cell coordinates ---
\def\br{C}
\Gridd
%LEFT
\ColorCell{1}{1}{blue2}
\ColorCell{1}{2}{blue3}
\ColorCell{1}{3}{blue2}
\ColorCell{2}{1}{blue2}
\ColorCell{2}{2}{blue1}
\ColorCell{2}{3}{blue2}
\ColorCell{3}{1}{blue2}
\ColorCell{3}{2}{blue2}
\ColorCell{3}{3}{blue2}
%RIGHT
\ColorCell{1}{7}{blue1}
\ColorCell{1}{8}{blue2}
\ColorCell{1}{9}{blue2}
\ColorCell{2}{7}{blue3}
\ColorCell{2}{8}{blue2}
\ColorCell{2}{9}{blue1}
\ColorCell{3}{7}{blue2}
\ColorCell{3}{8}{blue1}
\ColorCell{3}{9}{blue2}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
\end{scope}
\begin{scope}[local bounding box=BL2,shift={(5.0,0)}]
\begin{scope}[local bounding box=matrica1]
\def\rowone{Blue2,Blue3,Blue2}
\def\rowtwo{Blue2,Blue1,Blue2}
\def\rowthree{Blue2,Blue2,Blue2}
\draw[gridline] (0,0) rectangle (\Cols,\Rows);
\draw[gridline] (3,0) -- (3,\Rows);
\draw[gridline] (6,0) -- (6,\Rows);
\end{scope}
%%%%%%%%%%%%%
%Block Sparse (BSR)
%%%%%%%%%%%%%
\begin{scope}[local bounding box=BLOCK-D,shift={(10.5,-7.0)}]
% --- 1) Base + all cell coordinates ---
\def\br{D}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
\Gridd
\ColorCell{1}{4}{blue1}
\ColorCell{1}{5}{blue2}
\ColorCell{1}{6}{blue2}
\ColorCell{2}{4}{blue3}
\ColorCell{2}{5}{blue2}
\ColorCell{2}{6}{blue1}
\ColorCell{3}{4}{blue2}
\ColorCell{3}{5}{blue1}
\ColorCell{3}{6}{blue2}
%
\foreach \color [count=\x] in \rowone {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-1\br) {};
}
\foreach \color [count=\x] in \rowtwo {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-2\br) {};
}
\foreach \color [count=\x] in \rowthree {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-3\br) {};
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\draw[gridline] (0,0) rectangle (\Cols,\Rows);
\draw[gridline] (3,0) -- (3,\Rows);
\draw[gridline] (6,0) -- (6,\Rows);
\end{scope}
% =========================
% MINI GRID: 1 x 6
% =========================
\begin{scope}[local bounding box=MINI-G,shift={(24,2.1)},node distance=-0.75pt,
mCell/.style={draw=black!60,rectangle,minimum width=7.5mm,
minimum height=7.5mm,line width=0.75pt, fill=orange!15}]
\node[mCell](R1){1};
\node[mCell,below =of R1](R2){2};
\node[mCell,below =of R2](R3){4};
\node[mCell,below =of R3](R4){5};
\node[mCell,below =of R4](R5){7};
\node[mCell,below =of R5](R6){9};
\draw[black!70, line width=1pt] (R1.north west) rectangle (R6.south east);
\end{scope}
%%%%%%
\begin{scope}[shift={(1.5,0)}]
\def\br{E}
\node[below=1pt of BLOCK-A]{Dense Matrix};
\node[below=1pt of BLOCK-B]{Sparse Matrix (CSR)};
\node[below=1pt of BLOCK-C]{Block Sparse Matrix};
\node[below=1pt of BLOCK-D]{Block Sparse (BSR)};
\node[below=13pt of MINI-G,align=center]{Non-zero Block Indices};
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
%%%%%%
\begin{scope}[shift={(3.0,0)}]
\def\rowone{Blue1,Blue2,Blue2}
\def\rowtwo{Blue3,Blue2,Blue1}
\def\rowthree{Blue2,Blue1,Blue2}
\def\br{F}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
% --- Above (R1,R2,R3 -> BLOCK-C) ---
\coordinate (busC) at ([xshift=-7mm]R1.west);
%
\coordinate (tapC1) at (busC |- R1.west);
\coordinate (tapC2) at (busC |- R2.west);
\coordinate (tapC3) at (busC |- R3.west);
% Arrows:
\draw[] (R1.west) -- (tapC1) ;
\draw[] (R2.west) -- (tapC2);
\draw[] (R3.west) -- (tapC3);
%
\draw[Arr] (tapC2)--++(-1.2,0)--++(0,3.5)-| (C-cell-1-1-n);
\draw[Arr] (tapC2)--++(-1.2,0)--++(0,3.5)-| (C-cell-1-3-n);
\draw[Arr] (tapC2)--++(-1.2,0)--++(0,3.5)-| (C-cell-1-9-n);
\draw[black!60,line width=4.0pt]
([yshift= 4mm]busC |- R1.west) -- ([yshift=-3mm]busC |- R3.west);
\fill[black!70] ($(tapC2)+(-3pt,0)$) circle (1.75pt);
%
\foreach \color [count=\x] in \rowone {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-1\br) {};
}
\foreach \color [count=\x] in \rowtwo {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-2\br) {};
}
\foreach \color [count=\x] in \rowthree {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-3\br) {};
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
\end{scope}
\begin{scope}[local bounding box=BL3,shift={(0.0,-2.0)}]
\begin{scope}[local bounding box=matrica1]
\def\br{G}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
%%%%%%
\begin{scope}[shift={(1.5,0)}]
\def\br{H}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
%%%%%%
\begin{scope}[shift={(3.0,0)}]
\def\rowone{Blue1,Blue2,Blue2}
\def\rowtwo{Blue3,Blue2,Blue1}
\def\rowthree{Blue2,Blue1,Blue2}
\def\br{I}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%
\foreach \color [count=\x] in \rowone {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-1\br) {};
}
\foreach \color [count=\x] in \rowtwo {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-2\br) {};
}
\foreach \color [count=\x] in \rowthree {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-3\br) {};
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
\end{scope}
\begin{scope}[local bounding box=BL4,shift={(5.0,-2.0)}]
\begin{scope}[local bounding box=matrica1]
\def\br{J}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
%%%%%%
\begin{scope}[shift={(1.5,0)}]
\def\rowone{Blue1,Blue2,Blue2}
\def\rowtwo{Blue3,Blue2,Blue1}
\def\rowthree{Blue2,Blue1,Blue2}
\def\br{K}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%
\foreach \color [count=\x] in \rowone {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-1\br) {};
}
\foreach \color [count=\x] in \rowtwo {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-2\br) {};
}
\foreach \color [count=\x] in \rowthree {
\node[fill=\color,draw=white,line width=1pt, minimum size=\cellsize,
minimum height=\cellheight] at (cell-\x-3\br) {};
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
%%%%%%
\begin{scope}[shift={(3.0,0)}]
\def\br{L}
%
\foreach \x in {1,...,\columns}{
\foreach \y in {1,...,\rows}{
%
\node[draw=white, line width=1pt,fill=black!10, minimum width=\cellsize,
minimum height=\cellheight, line width=0.5pt] (cell-\x-\y\br) at (\x*\cellsize,-\y*\cellheight) {};
}
}
%countur line
\draw[line width=2pt,black!80]
(0.5*\cellsize,-0.5*\cellheight) rectangle
(\columns*\cellsize+0.5*\cellsize,-\rows*\cellheight-0.5*\cellheight);
\end{scope}
\end{scope}
\node[helvetica,anchor=south] at (BL1.north){Dense Matrix};
\node[helvetica,anchor=south] at (BL2.north){Block Sparse Matrix};
\node[helvetica,anchor=south] at (BL3.north){Sparse Matrix (CSR)};
\node[helvetica,anchor=south] at (BL4.north){Block Sparse (BSR)};
%array
\node[draw=black, line width=1pt,fill=white, minimum width=5mm,
minimum height=5mm, line width=0.5pt] (arr-1) at (9.2,1.25) {1};
\node[draw=black, line width=1pt,fill=white, minimum width=5mm,
minimum height=5mm, line width=0.5pt,anchor=west] (arr-2) at (arr-1.east) {2};
\node[draw=black, line width=1pt,fill=white, minimum width=5mm,
minimum height=5mm, line width=0.5pt,anchor=west] (arr-3) at (arr-2.east) {4};
\node[draw=black, line width=1pt,fill=white, minimum width=5mm,
minimum height=5mm, line width=0.5pt,anchor=west] (arr-4) at (arr-3.east) {5};
\node[draw=black, line width=1pt,fill=white, minimum width=5mm,
minimum height=5mm, line width=0.5pt,anchor=west] (arr-5) at (arr-4.east) {7};
\node[draw=black, line width=1pt,fill=white, minimum width=5mm,
minimum height=5mm, line width=0.5pt,anchor=west] (arr-6) at (arr-5.east) {9};
\node[helvetica,anchor=south,text width=30mm] at (10.5,1.7){Non-zero Block Indices};
\draw[-latex,Line] (arr-1.south) to [out=-90,in=90] (BL2.north west);
\draw[-latex,Line] (arr-2.south) to [out=-90,in=90] ($(BL2.north west)+(1.5,0)$);
\draw[-latex,Line] (arr-3.south) to [out=-90,in=90] ($(BL2.north west)+(4.5,0)$);
\draw[-latex,Line] (arr-4.south) to [out=-90,in=90] ($(BL2.north west)+(1.5,-2)$);
\draw[-latex,Line] (arr-5.south) to [out=-90,in=90] ($(BL2.north west)+(3.0,-2)$);
\draw[-latex,Line] (arr-6.south) to [out=-90,in=90] ($(BL2.north west)+(4.5,-2)$);
% ---Below (R4,R5,R6 -> BLOCK-D) ---
\coordinate (busD) at ([xshift=-7mm]R4.west);
\coordinate (tapD4) at (busD |- R4.west);
\coordinate (tapD5) at (busD |- R5.west);
\coordinate (tapD6) at (busD |- R6.west);
\draw[] (R4.west) -- (tapD4);
\draw[] (R5.west) -- (tapD5);
\draw[] (R6.west) -- (tapD6);
%
\draw[Arr] (tapD5)--++(-1.2,0)--++(0,1.2)-| (D-cell-1-4-n);
\draw[Arr] (tapD5)--++(-1.2,0)--++(0,1.2)-| (D-cell-1-7-n);
\draw[Arr] (tapD5)--++(-1.2,0)--++(0,1.2)-| (D-cell-1-9-n);
\draw[black!60,line width=4.0pt]
([yshift= 2mm]busD |- R4.west) -- ([yshift=-3mm]busD |- R6.west);
\fill[black!70] ($(tapD5)+(-3pt,0)$) circle (1.75pt);
\end{tikzpicture}
```
:::
@@ -2498,7 +2359,7 @@ To reduce the need for constant data movement between registers and external mem
For larger working datasets, many AI accelerators include scratchpad memory, which offers more storage than caches but with a key difference: it allows explicit software control over what data is stored and when it is evicted. Unlike caches, which rely on hardware-based eviction policies, scratchpad memory enables machine learning workloads to retain key values such as activations and filter weights for multiple layers of computation. This capability is useful in models like convolutional neural networks, where the same input feature maps and filter weights are reused across multiple operations. By keeping this data in scratchpad memory rather than reloading it from external memory, accelerators can significantly reduce unnecessary memory transfers and improve overall efficiency [@Chen2016].
::: {.callout-war-story title="The 5% Utilization Mystery"}
::: {.callout-war-story title="The 5 percent Utilization Mystery"}
**The Context**: Engineers at Tencent deployed a massive Transformer model on NVIDIA A100 GPUs, expecting a $10 \times$ speedup over their old V100s due to the new Tensor Cores.
**The Failure**: The model ran only $1.2 \times$ faster. Profiling revealed the Tensor Cores were active 0% of the time. The team had implemented their custom accumulation kernel in FP32 (32-bit float) to maintain precision.