Dying ReLu

  1. Dying ReLU problem:
    • When inputs are negative, the gradient becomes zero.
    • This can cause neurons to stop learning entirely if they consistently receive negative inputs.
  2. Information loss:
    • By mapping all negative values to zero, ReLU loses information about the degree of "negativeness."
  3. Mean activation drift:
    • ReLU can cause the mean activation to drift, potentially leading to unstable dynamics in very deep networks.

To address these issues, several variants have been developed:

  1. Leaky ReLU :
    • Allows a small, non-zero gradient when the input is negative.
    • f(x) = αx for x<0, f(x)=x for x0, where α is a small constant.
  2. Parametric ReLU (PReLU):
    • Similar to Leaky ReLU, but α is learned during training.
  3. Exponential Linear Unit (ELU):
    • Smoother transition around zero, can produce negative outputs.
  4. Swish:
    • f(x)=xsigmoid(x), proposed by researchers at Google.