Possibly wrong computation of total number of weights? #521

Closed
opened 2026-03-22 15:45:08 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @paolo-estavillo on GitHub (Mar 11, 2026).

In the book chapter named "DL Primer", a paragraph in the "Summary" section says the following:

This chapter established mathematics and systems implications through fully-connected architectures. The multilayer perceptrons explored here demonstrate universal function approximation. With enough neurons and appropriate weights, such networks can theoretically learn any continuous function. This mathematical generality comes with computational costs. Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons). Neighboring pixels are highly correlated while distant pixels rarely interact. Fully-connected architectures expend computational resources learning irrelevant long-range relationships.

My concern is the following sentence here:

Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons).

61,400 is clearly not (784 × 100). If we consider the bias parameter per neuron, shouldn't this be (784 inputs × 100 neurons + 100 bias params = 78500 params)?

Is this a valid concern? Or did I miss something? How did we came up with 61,400 weights?

Originally created by @paolo-estavillo on GitHub (Mar 11, 2026). In the book chapter named "DL Primer", a paragraph in the "Summary" section says the following: > This chapter established mathematics and systems implications through fully-connected architectures. The multilayer perceptrons explored here demonstrate universal function approximation. With enough neurons and appropriate weights, such networks can theoretically learn any continuous function. This mathematical generality comes with computational costs. Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons). Neighboring pixels are highly correlated while distant pixels rarely interact. Fully-connected architectures expend computational resources learning irrelevant long-range relationships. My concern is the following sentence here: > Consider our MNIST example: a 28×28 pixel image contains 784 input values, and a fully-connected network treats each pixel independently, learning 61,400 weights just in the first layer (784 inputs × 100 neurons). 61,400 is clearly not (784 × 100). If we consider the bias parameter per neuron, shouldn't this be (784 inputs × 100 neurons + 100 bias params = 78500 params)? Is this a valid concern? Or did I miss something? How did we came up with 61,400 weights?
GiteaMirror added the area: booktype: errata labels 2026-03-22 15:45:08 -05:00
Author
Owner

@profvjreddi commented on GitHub (Mar 11, 2026):

Good catch. This is indeed an error. The arithmetic is straightforward, and I'm pretty sure I know exactly when this snuck in. I was streamlining all the networks to ensure consistency throughout the book, and this was one of those stragglers that didn't get updated with everything else. In the latest version I'll be releasing soon, I've updated everything to ensure there's a single source of truth and that values are always calculated, so these kinds of errors don't happen. Nonetheless, thank you!

Weights only: 784 × 100 = 78,400 weights
Total parameters (weights + biases): 784 × 100 + 100 = 78,500 parameters

All other instances in the chapter already use the correct value. This has been fixed in commit 7c30232, and the live site will be updated shortly.

@all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book

@profvjreddi commented on GitHub (Mar 11, 2026): Good catch. This is indeed an error. The arithmetic is straightforward, and I'm pretty sure I know exactly when this snuck in. I was streamlining all the networks to ensure consistency throughout the book, and this was one of those stragglers that didn't get updated with everything else. In the latest version I'll be releasing soon, I've updated everything to ensure there's a single source of truth and that values are always calculated, so these kinds of errors don't happen. Nonetheless, thank you! Weights only: 784 × 100 = 78,400 weights Total parameters (weights + biases): 784 × 100 + 100 = 78,500 parameters All other instances in the chapter already use the correct value. This has been fixed in commit 7c30232, and the live site will be updated shortly. @all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book
Author
Owner

@github-actions[bot] commented on GitHub (Mar 11, 2026):

I've added @paolo-estavillo as a contributor to book! 🎉

Recognized for: doc
Project(s): book (explicitly mentioned in comment)
Based on: @all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book

The contributor list has been updated in:

  • book/.all-contributorsrc, book/README.md
  • Main README.md

We love recognizing our contributors! ❤️

@github-actions[bot] commented on GitHub (Mar 11, 2026): I've added @paolo-estavillo as a contributor to **book**! :tada: **Recognized for:** doc **Project(s):** book (explicitly mentioned in comment) **Based on:** @all-contributors please add @paolo-estavillo as a contributor for ✍️ Doc in Book The contributor list has been updated in: - `book/.all-contributorsrc`, `book/README.md` - Main `README.md` We love recognizing our contributors! :heart:
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#521