Dying ReLu

Dying ReLU problem:
- When inputs are negative, the gradient becomes zero.
- This can cause neurons to stop learning entirely if they consistently receive negative inputs.
Information loss:
- By mapping all negative values to zero, ReLU loses information about the degree of "negativeness."
Mean activation drift:
- ReLU can cause the mean activation to drift, potentially leading to unstable dynamics in very deep networks.

To address these issues, several variants have been developed:

Leaky ReLU :
- Allows a small, non-zero gradient when the input is negative.
- $f (x)$ = $α x$ $f o r$ $x < 0$ , $f (x) = x$ $f o r$ $x \geq 0$ , where α is a small constant.
Parametric ReLU (PReLU):
- Similar to Leaky ReLU, but $α$ is learned during training.
Exponential Linear Unit (ELU):
- Smoother transition around zero, can produce negative outputs.
Swish:
- $f (x) = x * s i g m o i d (x)$ , proposed by researchers at Google.