Possibly wrong computation of total number of weights? #521

New Issue

GiteaMirror · 2026-03-22T15:45:08-05:00

GiteaMirror commented

2026-03-22 15:45:08 -05:00

Originally created by @paolo-estavillo on GitHub (Mar 11, 2026).

In the book chapter named "DL Primer", a paragraph in the "Summary" section says the following:

This chapter established mathematics and systems implications through fully-connected architectures. The multilayer perceptrons explored here demonstrate universal function approximation. With enough neurons and appropriate weights, such networks can theoretically learn any continuous function. This mathematical generality comes with computational costs. Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons). Neighboring pixels are highly correlated while distant pixels rarely interact. Fully-connected architectures expend computational resources learning irrelevant long-range relationships.

My concern is the following sentence here:

Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons).

61,400 is clearly not (784 × 100). If we consider the bias parameter per neuron, shouldn't this be (784 inputs × 100 neurons + 100 bias params = 78500 params)?

Is this a valid concern? Or did I miss something? How did we came up with 61,400 weights?

Originally created by @paolo-estavillo on GitHub (Mar 11, 2026). In the book chapter named "DL Primer", a paragraph in the "Summary" section says the following: > This chapter established mathematics and systems implications through fully-connected architectures. The multilayer perceptrons explored here demonstrate universal function approximation. With enough neurons and appropriate weights, such networks can theoretically learn any continuous function. This mathematical generality comes with computational costs. Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons). Neighboring pixels are highly correlated while distant pixels rarely interact. Fully-connected architectures expend computational resources learning irrelevant long-range relationships. My concern is the following sentence here: > Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons). 61,400 is clearly not (784 × 100). If we consider the bias parameter per neuron, shouldn't this be (784 inputs × 100 neurons + 100 bias params = 78500 params)? Is this a valid concern? Or did I miss something? How did we came up with 61,400 weights?

GiteaMirror added the area: book type: errata labels 2026-03-22 15:45:08 -05:00

GiteaMirror closed this issue

2026-03-22 15:45:08 -05:00

GiteaMirror commented

2026-03-22 15:45:09 -05:00

@profvjreddi commented on GitHub (Mar 11, 2026):

Good catch. This is indeed an error. The arithmetic is straightforward, and I'm pretty sure I know exactly when this snuck in. I was streamlining all the networks to ensure consistency throughout the book, and this was one of those stragglers that didn't get updated with everything else. In the latest version I'll be releasing soon, I've updated everything to ensure there's a single source of truth and that values are always calculated, so these kinds of errors don't happen. Nonetheless, thank you!

Weights only: 784 × 100 = 78,400 weights
Total parameters (weights + biases): 784 × 100 + 100 = 78,500 parameters

All other instances in the chapter already use the correct value. This has been fixed in commit 7c30232, and the live site will be updated shortly.

@all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book

@profvjreddi commented on GitHub (Mar 11, 2026): Good catch. This is indeed an error. The arithmetic is straightforward, and I'm pretty sure I know exactly when this snuck in. I was streamlining all the networks to ensure consistency throughout the book, and this was one of those stragglers that didn't get updated with everything else. In the latest version I'll be releasing soon, I've updated everything to ensure there's a single source of truth and that values are always calculated, so these kinds of errors don't happen. Nonetheless, thank you! Weights only: 784 × 100 = 78,400 weights Total parameters (weights + biases): 784 × 100 + 100 = 78,500 parameters All other instances in the chapter already use the correct value. This has been fixed in commit 7c30232, and the live site will be updated shortly. @all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book

GiteaMirror commented

2026-03-22 15:45:09 -05:00

@github-actions[bot] commented on GitHub (Mar 11, 2026):

I've added @paolo-estavillo as a contributor to book! 🎉

Recognized for: doc
Project(s): book (explicitly mentioned in comment)
Based on: @all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book

The contributor list has been updated in:

book/.all-contributorsrc, book/README.md
Main README.md

We love recognizing our contributors! ❤️

@github-actions[bot] commented on GitHub (Mar 11, 2026): I've added @paolo-estavillo as a contributor to **book**! :tada: **Recognized for:** doc **Project(s):** book (explicitly mentioned in comment) **Based on:** @all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book The contributor list has been updated in: - `book/.all-contributorsrc`, `book/README.md` - Main `README.md` We love recognizing our contributors! :heart:

GiteaMirror referenced this issue

2026-03-22 15:51:07 -05:00

[PR #521] [MERGED] Getting ready for major v0.3 release #783

GiteaMirror referenced this issue

2026-04-11 08:13:54 -05:00

[PR #521] [MERGED] Getting ready for major v0.3 release #2006

GiteaMirror referenced this issue

2026-04-13 13:14:10 -05:00

[PR #521] [MERGED] Getting ready for major v0.3 release #2728

GiteaMirror referenced this issue

2026-04-16 01:47:38 -05:00

[PR #521] [MERGED] Getting ready for major v0.3 release #3407