No Batch Normalization in CNN? #41

Closed
opened 2025-11-02 00:02:12 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @knosing on GitHub (Nov 24, 2022).

Hi there,
While doing CNN module, I found that no batch normalization is applied in the forward pass?

class CNN(nn.Module):
    def __init__(self, vocab_size, num_filters, filter_size,
                 hidden_dim, dropout_p, num_classes):
        super(CNN, self).__init__()

        # Convolutional filters
        self.filter_size = filter_size
        self.conv = nn.Conv1d(
            in_channels=vocab_size, out_channels=num_filters,
            kernel_size=filter_size, stride=1, padding=0, padding_mode="zeros")
        self.batch_norm = nn.BatchNorm1d(num_features=num_filters)

        # FC layers
        self.fc1 = nn.Linear(num_filters, hidden_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.fc2 = nn.Linear(hidden_dim, num_classes)

    def forward(self, inputs, channel_first=False,):

        # Rearrange input so num_channels is in dim 1 (N, C, L)
        x_in, = inputs
        if not channel_first:
            x_in = x_in.transpose(1, 2)

        # Padding for `SAME` padding
        max_seq_len = x_in.shape[2]
        padding_left = int((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2)
        padding_right = int(math.ceil((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2))

        # Conv outputs
        z = self.conv(F.pad(x_in, (padding_left, padding_right)))
        # ---------MISSING Batch Normalization here ? -----------
        z = F.max_pool1d(z, z.size(2)).squeeze(2)

        # FC layer
        z = self.fc1(z)
        z = self.dropout(z)
        z = self.fc2(z)
        return z
Originally created by @knosing on GitHub (Nov 24, 2022). Hi there, While doing CNN module, I found that no batch normalization is applied in the forward pass? ```python class CNN(nn.Module): def __init__(self, vocab_size, num_filters, filter_size, hidden_dim, dropout_p, num_classes): super(CNN, self).__init__() # Convolutional filters self.filter_size = filter_size self.conv = nn.Conv1d( in_channels=vocab_size, out_channels=num_filters, kernel_size=filter_size, stride=1, padding=0, padding_mode="zeros") self.batch_norm = nn.BatchNorm1d(num_features=num_filters) # FC layers self.fc1 = nn.Linear(num_filters, hidden_dim) self.dropout = nn.Dropout(dropout_p) self.fc2 = nn.Linear(hidden_dim, num_classes) def forward(self, inputs, channel_first=False,): # Rearrange input so num_channels is in dim 1 (N, C, L) x_in, = inputs if not channel_first: x_in = x_in.transpose(1, 2) # Padding for `SAME` padding max_seq_len = x_in.shape[2] padding_left = int((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2) padding_right = int(math.ceil((self.conv.stride[0]*(max_seq_len-1) - max_seq_len + self.filter_size)/2)) # Conv outputs z = self.conv(F.pad(x_in, (padding_left, padding_right))) # ---------MISSING Batch Normalization here ? ----------- z = F.max_pool1d(z, z.size(2)).squeeze(2) # FC layer z = self.fc1(z) z = self.dropout(z) z = self.fc2(z) return z ```
Author
Owner

@GokuMohandas commented on GitHub (Nov 25, 2022):

Hi @knosing, you're absolutely welcome to add batch normalization here. I can't recall exactly but I don't think BN changed the performance much here. In general, some tasks benefit from BN layers but it's almost always validated empirically. And this validation is really important because batch normalization isn't always perfect. When we're dealing with small batches or trying to optimize on training compute/time, it's not always favorable.

@GokuMohandas commented on GitHub (Nov 25, 2022): Hi @knosing, you're absolutely welcome to add batch normalization here. I can't recall exactly but I don't think BN changed the performance much here. In general, some tasks benefit from BN layers but it's almost always validated empirically. And this validation is really important because batch normalization isn't always perfect. When we're dealing with small batches or trying to optimize on training compute/time, it's not always favorable.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/Made-With-ML#41