ReLU
ReLU (Rectified Linear Unit) is an activation function that outputs the input directly if it's positive, and zero otherwise.
Equation:
Python Implementation:
def relu(x):
return np.maximum(0, x)
Why used:
-
Simplicity: ReLU is computationally efficient, being a simple max(0,x) operation.
-
Sparsity: It naturally creates sparse representations, as negative inputs become zero.
-
Gradient flow: ReLU allows for better gradient flow during backpropagation, mitigating the vanishing gradient problem.
-
Non-linearity: It introduces non-linearity without causing saturation like sigmoid or tanh functions.
-
Biological plausibility: ReLU is more similar to the firing of biological neurons than previous activation functions.
The success of ReLU led to several variants and improvements:
- Leaky ReLU
- Parametric ReLU (PReLU)
- Exponential Linear Unit (ELU)
- Scaled Exponential Linear Unit (SELU)