Dying ReLu
- Dying ReLU problem:
- When inputs are negative, the gradient becomes zero.
- This can cause neurons to stop learning entirely if they consistently receive negative inputs.
- Information loss:
- By mapping all negative values to zero, ReLU loses information about the degree of "negativeness."
- Mean activation drift:
- ReLU can cause the mean activation to drift, potentially leading to unstable dynamics in very deep networks.
To address these issues, several variants have been developed:
- Leaky ReLU :
- Allows a small, non-zero gradient when the input is negative.
= , , where α is a small constant.
- Parametric ReLU (PReLU):
- Similar to Leaky ReLU, but
is learned during training.
- Similar to Leaky ReLU, but
- Exponential Linear Unit (ELU):
- Smoother transition around zero, can produce negative outputs.
- Swish:
, proposed by researchers at Google.