Solution Batch Normalization

Batch Normalization: Exercise Solutions

Exercise 1: Basic Batch Normalization Calculation

Given:

X = [
    [2, 4, 6],
    [4, 6, 8],
    [6, 8, 10],
    [8, 10, 12]
]

a) Mini-batch mean (μᵦ):
μᵦ = [5, 7, 9]

b) Mini-batch variance (σ²ᵦ):
σ²ᵦ = [5, 5, 5]

c) Normalized values (x̂), ε = 0.01:
x̂ = [
[-1.34, -1.34, -1.34],
[-0.45, -0.45, -0.45],
[0.45, 0.45, 0.45],
[1.34, 1.34, 1.34]
]

d) Final output (y), γ = 2, β = 1:
y = [
[-1.68, -1.68, -1.68],
[0.10, 0.10, 0.10],
[1.90, 1.90, 1.90],
[3.68, 3.68, 3.68]
]

Exercise 2: Backpropagation through Batch Normalization

Given: x = [1, 2, 3], ∂L/∂x̂ = [0.1, 0.2, 0.3], ε = 0.01

First, calculate intermediate values:
μ = 2, σ² = 0.6667, σ = 0.8165

a) ∂L/∂x:
∂L/∂x = [0.0068, 0.0167, 0.0265]

b) ∂L/∂μ:
∂L/∂μ = -0.05

c) ∂L/∂σ²:
∂L/∂σ² = -0.0637

Exercise 3: Effect of Batch Size on Normalization

Given: X = [1, 2, 3, 4, 5, 6]

a) Single batch of 6 samples:
μ = 3.5, σ² = 2.9167
x̂ = [-1.4567, -0.8740, -0.2913, 0.2913, 0.8740, 1.4567]

b) Two mini-batches of 3 samples:
Batch 1: μ = 2, σ² = 0.6667
x̂₁ = [-1.2247, 0, 1.2247]

Batch 2: μ = 5, σ² = 0.6667
x̂₂ = [-1.2247, 0, 1.2247]

The smaller batch size leads to different normalization results for the same data points, as each batch is normalized independently.

Exercise 4: Conditional Batch Normalization

Given:
Class 0: [1, 2, 3], γ₀ = 1, β₀ = 0
Class 1: [4, 5, 6], γ₁ = 2, β₁ = 1

For Class 0:
μ₀ = 2, σ²₀ = 0.6667
x̂₀ = [-1.2247, 0, 1.2247]
y₀ = [-1.2247, 0, 1.2247]

For Class 1:
μ₁ = 5, σ²₁ = 0.6667
x̂₁ = [-1.2247, 0, 1.2247]
y₁ = [-1.4494, 1, 3.4494]

Bonus Challenge: Adaptive Normalization

One possible solution:
γ = |μ|, β = sign(μ)

y = |μ| * ((x - μ) / √(σ² + ε)) + sign(μ)

Reasoning: This scheme adapts the scale (γ) based on the magnitude of the mean, and the shift (β) based on whether the mean is positive or negative. This could help preserve information about the original scale and sign of the data while still normalizing.