Foundations --> CNN Doubts #35

Closed
opened 2025-11-02 00:02:01 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @shashankvasisht on GitHub (Jun 7, 2022).

Hi, Thank you for such excellent lessons!!!

I had 3 doubts in the lecture, can you please explain them:

  1. When we pad the one-hot sequences to max number of seq length, why do we not put 1 at the 0th index? (so as to make it to correspond to < pad > token) Why is it currently all zeros ?

  2. When we're loading the weights in the interpretableCNN model, why dont we get the weight mis-match error ? (as we have dropped the FC layer part and we're also not using strict=False )

  3. My sns heatmap / conv_output have all the values 1 . It does not resemble yours...Can you help me with this?

image

Originally created by @shashankvasisht on GitHub (Jun 7, 2022). Hi, Thank you for such excellent lessons!!! I had 3 doubts in the lecture, can you please explain them: 1. When we pad the one-hot sequences to max number of seq length, why do we not put 1 at the 0th index? (so as to make it to correspond to < pad > token) Why is it currently all zeros ? 2. When we're loading the weights in the interpretableCNN model, why dont we get the weight mis-match error ? (as we have dropped the FC layer part and we're also not using strict=False ) 3. My sns heatmap / conv_output have all the values 1 . It does not resemble yours...Can you help me with this? ![image](https://user-images.githubusercontent.com/13146464/172468816-c2b54a33-b2f6-417c-952f-2ee1d6c074ba.png)
Author
Owner

@GokuMohandas commented on GitHub (Jun 8, 2022):

  1. The token index is 0 in our code (see the Tokenizer class), unless it's configured otherwise. But you made me realize that we should pass in the index to pad_sequences instead of assuming it's 0. These kinds of mismatches lead to silent bugs! I'll push this change towards the end of this month.
  2. Good question, I'll add more details to the lesson to make this clear. But we're actually not dropping the FC layers. If you look at our __init__ function for InterpretableCNN, it has all the layers. The only difference is that we're returning an earlier artifact in the forward function.
  3. I've seen this happen if you don't train it to completion but also make sure that the PAD token index is zero. Until I solve the mismatch in pad_sequences function, we force PAD to be zeros.
@GokuMohandas commented on GitHub (Jun 8, 2022): 1. The <PAD> token index is 0 in our code (see the `Tokenizer` class), unless it's configured otherwise. But you made me realize that we should pass in the index to `pad_sequences` instead of assuming it's 0. These kinds of mismatches lead to silent bugs! I'll push this change towards the end of this month. 2. Good question, I'll add more details to the lesson to make this clear. But we're actually not dropping the FC layers. If you look at our `__init__` function for `InterpretableCNN`, it has all the layers. The only difference is that we're returning an earlier artifact in the `forward` function. 3. I've seen this happen if you don't train it to completion but also make sure that the PAD token index is zero. Until I solve the mismatch in `pad_sequences` function, we force PAD to be zeros.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/Made-With-ML#35