Solution Batch Normalization
Batch Normalization: Exercise Solutions
Exercise 1: Basic Batch Normalization Calculation
Given:
X = [
[2, 4, 6],
[4, 6, 8],
[6, 8, 10],
[8, 10, 12]
]
a) Mini-batch mean (μᵦ):
μᵦ = [5, 7, 9]
b) Mini-batch variance (σ²ᵦ):
σ²ᵦ = [5, 5, 5]
c) Normalized values (x̂), ε = 0.01:
x̂ = [
[-1.34, -1.34, -1.34],
[-0.45, -0.45, -0.45],
[0.45, 0.45, 0.45],
[1.34, 1.34, 1.34]
]
d) Final output (y), γ = 2, β = 1:
y = [
[-1.68, -1.68, -1.68],
[0.10, 0.10, 0.10],
[1.90, 1.90, 1.90],
[3.68, 3.68, 3.68]
]
Exercise 2: Backpropagation through Batch Normalization
Given: x = [1, 2, 3], ∂L/∂x̂ = [0.1, 0.2, 0.3], ε = 0.01
First, calculate intermediate values:
μ = 2, σ² = 0.6667, σ = 0.8165
a) ∂L/∂x:
∂L/∂x = [0.0068, 0.0167, 0.0265]
b) ∂L/∂μ:
∂L/∂μ = -0.05
c) ∂L/∂σ²:
∂L/∂σ² = -0.0637
Exercise 3: Effect of Batch Size on Normalization
Given: X = [1, 2, 3, 4, 5, 6]
a) Single batch of 6 samples:
μ = 3.5, σ² = 2.9167
x̂ = [-1.4567, -0.8740, -0.2913, 0.2913, 0.8740, 1.4567]
b) Two mini-batches of 3 samples:
Batch 1: μ = 2, σ² = 0.6667
x̂₁ = [-1.2247, 0, 1.2247]
Batch 2: μ = 5, σ² = 0.6667
x̂₂ = [-1.2247, 0, 1.2247]
The smaller batch size leads to different normalization results for the same data points, as each batch is normalized independently.
Exercise 4: Conditional Batch Normalization
Given:
Class 0: [1, 2, 3], γ₀ = 1, β₀ = 0
Class 1: [4, 5, 6], γ₁ = 2, β₁ = 1
For Class 0:
μ₀ = 2, σ²₀ = 0.6667
x̂₀ = [-1.2247, 0, 1.2247]
y₀ = [-1.2247, 0, 1.2247]
For Class 1:
μ₁ = 5, σ²₁ = 0.6667
x̂₁ = [-1.2247, 0, 1.2247]
y₁ = [-1.4494, 1, 3.4494]
Bonus Challenge: Adaptive Normalization
One possible solution:
γ = |μ|, β = sign(μ)
y = |μ| * ((x - μ) / √(σ² + ε)) + sign(μ)
Reasoning: This scheme adapts the scale (γ) based on the magnitude of the mean, and the shift (β) based on whether the mean is positive or negative. This could help preserve information about the original scale and sign of the data while still normalizing.