What does batch normalization do, what are different parameters to it?
Anónimo
It normalizes a mini-batch to a zero mean and unit variance along the batch and feature map dimensions (e.g., N,H,W if we have NCHW feature maps). It keeps track of the running mean and variance so as to apply them at test time. It can optionally keep learnable parameters for the scale and shift that are applied to the normalized feature maps to de-normalize them.