From the course: Computer Vision for Data Scientists

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Zero weight decay on BatchNorm and bias

Zero weight decay on BatchNorm and bias

From the course: Computer Vision for Data Scientists

Zero weight decay on BatchNorm and bias

- [Instructor] Weight decay is a regularization technique commonly used in training neural networks where a penalty on the weight is added to the loss function to prevent them from growing too large. Most computer vision tasks involve models with BatchNorm layers and biases added to linear or convolutional layers. More parameters in the model can help capture interactions between different parts of the network but they also increase the risk of overfitting. That is where regularization techniques like weight decay come into play. Weight decay essentially pulls weights towards zero. While this is beneficial for convolutional and linear layer weights, it's not ideal for BatchNorm layers. BatchNorm layer parameters are meant to scale and shift the normalized input of the layer. Forcing these values to a lower value would adversely affect their distribution and result in an inferior model. BatchNorm itself has a slight…

Contents