[GH-ISSUE #974] Missing transpose in Equation of Section 4.2.2 (matrix multiplication inconsistency) #1652

Closed
opened 2026-04-11 08:00:13 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @VThuong99 on GitHub (Oct 11, 2025).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/974

In Section 4.2.2, the layer computation equation is currently written as:

h^(l) = f^(l)(W^(l) h^(l−1) + b^(l))

However, this is inconsistent with the presentation of the layer’s outputs h^(l) in Section 3.4.2.4.

To make the matrix dimensions consistent, the equation should be:

h^(l) = f^(l)((h^(l−1))^T W^(l) + b^(l))

Suggested fix: Add the transpose to h^(l−1) in the equation and to h^(0) in the example shown in Section 4.2.2.

Location: Section 4.2.2, in the example immediately below the equation.

I may be mistaken, but it seems there might be a minor inconsistency here. If confirmed, I’d be happy to contribute a PR to address it.

Originally created by @VThuong99 on GitHub (Oct 11, 2025). Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/974 In Section 4.2.2, the layer computation equation is currently written as: h^(l) = f^(l)(W^(l) h^(l−1) + b^(l)) However, this is inconsistent with the presentation of the layer’s outputs h^(l) in Section 3.4.2.4. To make the matrix dimensions consistent, the equation should be: h^(l) = f^(l)((h^(l−1))^T W^(l) + b^(l)) **Suggested fix:** Add the transpose to h^(l−1) in the equation and to h^(0) in the example shown in Section 4.2.2. **Location:** Section 4.2.2, in the example immediately below the equation. I may be mistaken, but it seems there might be a minor inconsistency here. If confirmed, I’d be happy to contribute a PR to address it.
GiteaMirror added the area: book label 2026-04-11 08:00:14 -05:00
Author
Owner

@profvjreddi commented on GitHub (Oct 11, 2025):

Thanks so much for catching this and writing it up clearly. 🙏 You’re right in keeping the equations consistent.

I’m traveling right now and won’t be able to look at it in detail until Monday, but if you’re up for it, I’d love for you to open a quick PR with the fix. I’ll make sure to review it as soon as I’m back.

And if you happen to spot any other little inconsistencies or improvements along the way, please keep them coming. These kinds of contributions make the material better for everyone, and I really value and appreciate your time. Plus, anyone who submits a PR automatically gets acknowledged as a contributor 🙂 Thanks again!

<!-- gh-comment-id:3393468765 --> @profvjreddi commented on GitHub (Oct 11, 2025): Thanks so much for catching this and writing it up clearly. 🙏 You’re right in keeping the equations consistent. I’m traveling right now and won’t be able to look at it in detail until Monday, but if you’re up for it, I’d love for you to open a quick PR with the fix. I’ll make sure to review it as soon as I’m back. And if you happen to spot any other little inconsistencies or improvements along the way, please keep them coming. These kinds of contributions make the material better for everyone, and I really value and appreciate your time. Plus, anyone who submits a PR automatically gets acknowledged as a contributor 🙂 Thanks again!
Author
Owner

@VThuong99 commented on GitHub (Oct 11, 2025):

Hi, I’ve reviewed the notation again. Initially, I noticed that in Section 3.4.2.4 the formula includes a transpose, while in Section 4.2.2 it does not. However, since the following explanations and the example itself don’t depend on whether h is treated as a row or column vector, the simplest and most consistent fix is just to adjust the computation in the example (as I did in this PR).

If you prefer to make it fully consistent with the earlier section, an alternative would be to also transpose h in the main formula.

<!-- gh-comment-id:3393545862 --> @VThuong99 commented on GitHub (Oct 11, 2025): Hi, I’ve reviewed the notation again. Initially, I noticed that in Section 3.4.2.4 the formula includes a transpose, while in Section 4.2.2 it does not. However, since the following explanations and the example itself don’t depend on whether h is treated as a row or column vector, the simplest and most consistent fix is just to adjust the computation in the example (as I did in this PR). If you prefer to make it fully consistent with the earlier section, an alternative would be to also transpose h in the main formula.
Author
Owner

@profvjreddi commented on GitHub (Oct 17, 2025):

Thanks for doing this @VThuong99 - I am back on my computer now, so if you have additional fixes, do let me know.

<!-- gh-comment-id:3415833502 --> @profvjreddi commented on GitHub (Oct 17, 2025): Thanks for doing this @VThuong99 - I am back on my computer now, so if you have additional fixes, do let me know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#1652