No Batch Normalization in CNN? #41

New Issue

GiteaMirror · 2025-11-02T00:02:12-05:00

GiteaMirror commented

2025-11-02 00:02:12 -05:00

Originally created by @knosing on GitHub (Nov 24, 2022).

Hi there,
While doing CNN module, I found that no batch normalization is applied in the forward pass?

class CNN(nn.Module):
    def __init__(self, vocab_size, num_filters, filter_size,
                 hidden_dim, dropout_p, num_classes):
        super(CNN, self).__init__()

        # Convolutional filters
        self.filter_size = filter_size
        self.conv = nn.Conv1d(
            in_channels=vocab_size, out_channels=num_filters,
            kernel_size=filter_size, stride=1, padding=0, padding_mode="zeros")
        self.batch_norm = nn.BatchNorm1d(num_features=num_filters)

        # FC layers
        self.fc1 = nn.Linear(num_filters, hidden_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.fc2 = nn.Linear(hidden_dim, num_classes)

    def forward(self, inputs, channel_first=False,):

        # Rearrange input so num_channels is in dim 1 (N, C, L)
        x_in, = inputs
        if not channel_first:
            x_in = x_in.transpose(1, 2)

        # Padding for `SAME` padding
        max_seq_len = x_in.shape[2]
        padding_left = int((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2)
        padding_right = int(math.ceil((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2))

        # Conv outputs
        z = self.conv(F.pad(x_in, (padding_left, padding_right)))
        # ---------MISSING Batch Normalization here ? -----------
        z = F.max_pool1d(z, z.size(2)).squeeze(2)

        # FC layer
        z = self.fc1(z)
        z = self.dropout(z)
        z = self.fc2(z)
        return z

Originally created by @knosing on GitHub (Nov 24, 2022). Hi there, While doing CNN module, I found that no batch normalization is applied in the forward pass? ```python class CNN(nn.Module): def __init__(self, vocab_size, num_filters, filter_size, hidden_dim, dropout_p, num_classes): super(CNN, self).__init__() # Convolutional filters self.filter_size = filter_size self.conv = nn.Conv1d( in_channels=vocab_size, out_channels=num_filters, kernel_size=filter_size, stride=1, padding=0, padding_mode="zeros") self.batch_norm = nn.BatchNorm1d(num_features=num_filters) # FC layers self.fc1 = nn.Linear(num_filters, hidden_dim) self.dropout = nn.Dropout(dropout_p) self.fc2 = nn.Linear(hidden_dim, num_classes) def forward(self, inputs, channel_first=False,): # Rearrange input so num_channels is in dim 1 (N, C, L) x_in, = inputs if not channel_first: x_in = x_in.transpose(1, 2) # Padding for `SAME` padding max_seq_len = x_in.shape[2] padding_left = int((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2) padding_right = int(math.ceil((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2)) # Conv outputs z = self.conv(F.pad(x_in, (padding_left, padding_right))) # ---------MISSING Batch Normalization here ? ----------- z = F.max_pool1d(z, z.size(2)).squeeze(2) # FC layer z = self.fc1(z) z = self.dropout(z) z = self.fc2(z) return z ```

GiteaMirror closed this issue

2025-11-02 00:02:12 -05:00

GiteaMirror commented

2025-11-02 00:02:13 -05:00

@GokuMohandas commented on GitHub (Nov 25, 2022):

Hi @knosing, you're absolutely welcome to add batch normalization here. I can't recall exactly but I don't think BN changed the performance much here. In general, some tasks benefit from BN layers but it's almost always validated empirically. And this validation is really important because batch normalization isn't always perfect. When we're dealing with small batches or trying to optimize on training compute/time, it's not always favorable.

@GokuMohandas commented on GitHub (Nov 25, 2022): Hi @knosing, you're absolutely welcome to add batch normalization here. I can't recall exactly but I don't think BN changed the performance much here. In general, some tasks benefit from BN layers but it's almost always validated empirically. And this validation is really important because batch normalization isn't always perfect. When we're dealing with small batches or trying to optimize on training compute/time, it's not always favorable.

GiteaMirror referenced this issue

2025-11-02 00:04:44 -05:00

[PR #41] [MERGED] updated environment.yml #115

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/Made-With-ML#41