From the course: Computer Vision for Data Scientists
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Zero weight decay on BatchNorm and bias
From the course: Computer Vision for Data Scientists
Zero weight decay on BatchNorm and bias
- [Instructor] Weight decay is a regularization technique commonly used in training neural networks where a penalty on the weight is added to the loss function to prevent them from growing too large. Most computer vision tasks involve models with BatchNorm layers and biases added to linear or convolutional layers. More parameters in the model can help capture interactions between different parts of the network but they also increase the risk of overfitting. That is where regularization techniques like weight decay come into play. Weight decay essentially pulls weights towards zero. While this is beneficial for convolutional and linear layer weights, it's not ideal for BatchNorm layers. BatchNorm layer parameters are meant to scale and shift the normalized input of the layer. Forcing these values to a lower value would adversely affect their distribution and result in an inferior model. BatchNorm itself has a slight…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.