Matern Kernel

Matérn Kernel: Mathematical Deep Dive and Implementation

1. Mathematical Definition

The Matérn Kernel is defined as:

$k (x ₁, x ₂) = \frac{2 ¹ ⁻ ᵛ}{Γ (ν)} * (\sqrt (2 ν) * d) ᵛ * K ᵥ (\sqrt (2 ν) * d)$

Where:

$x ₁, x ₂ \in ℝ ᵈ$ (d-dimensional real vector space)
$d = \sqrt [(x ₁ - x ₂) ᵀ Θ ⁻ ² (x ₁ - x ₂)]$ (scaled distance)
ν is the smoothness parameter
$Γ (\cdot)$ is the gamma function
$K ᵥ (\cdot)$ is the modified Bessel function of the second kind
$Θ$ is the lengthscale parameter

2. Key Properties

2.1 Smoothness Control

The ν parameter controls the smoothness of the kernel:

ν = 1/2: Exponential kernel (less smooth)
ν = 3/2: Once differentiable functions
ν = 5/2: Twice differentiable functions

2.2 Positive Definiteness

The Matérn Kernel is positive definite for all valid inputs.

2.3 Stationarity

It is a stationary kernel, depending only on the difference between inputs.

3. Implementation Details

3.1 Supported ν Values

The implementation supports ν = 0.5, 1.5, and 2.5, corresponding to practical cases with simplified forms.

3.2 Efficient Computation

The code uses an efficient computation strategy:

Center the inputs: $x' = x - m e a n (x)$
Scale by lengthscale: $x ″ = \frac{x'}{Θ}$
Compute distances: $d = | | x ″ ₁ - x ″ ₂ | |$
Apply kernel-specific calculations

3.3 Simplified Forms

For the supported ν values, the kernel uses simplified forms:

ν = 0.5: $k (d) = e x p (- \sqrt d)$
ν = 1.5: $k (d) = (1 + \sqrt 3 d) * e x p (- \sqrt 3 d)$
ν = 2.5: $k (d) = (1 + \sqrt 5 d + 5 / 3 * d ²) * e x p (- \sqrt 5 d)$

3.4 Gradient Considerations

The implementation checks for cases where gradients are required and uses a more general computation in these cases.

4. Mathematical Breakdown of the Implementation

4.1 Distance Computation

The covar_dist method computes scaled distances:

$d = | | \frac{(x ₁ - m e a n (x ₁))}{Θ} - \frac{(x ₂ - m e a n (x ₂))}{Θ} | |$

4.2 Exponential Component

For all ν values:

$e x p_{c o m p o n e n t} = e x p (- \sqrt (2 ν) * d)$

4.3 Constant Component

Varies based on ν:

ν = 0.5: constant_component = 1
ν = 1.5: constant_component = $1 + \sqrt 3 * d$
ν = 2.5: constant_component = $1 + \sqrt 5 * d + 5 / 3 * d ²$

4.4 Final Computation

k = constant_component * exp_component

5. Relation to Other Kernels

As $ν \to \infty$ , the Matérn kernel approaches the RBF (Squared Exponential) kernel
When $ν = 1 / 2$ , it's equivalent to the Exponential kernel
It generalizes between the Exponential and RBF kernels

6. Practical Considerations

No explicit outputscale parameter (use with ScaleKernel for scaling)
Handles batch computations
Supports ARD (Automatic Relevance Determination)
Allows for active dimension selection
Supports priors and constraints on the lengthscale parameter

7. Use Cases

Particularly useful in geostatistics and spatial statistics
Good for modeling natural phenomena that exhibit varying degrees of smoothness
Provides more flexibility than RBF kernel in modeling real-world processes

8. Computational Considerations

More computationally expensive than simpler kernels like RBF
The implementation uses optimized forms for specific ν values to improve efficiency
For cases without gradients or ARD, it uses MaternCovariance.apply, which likely implements a more efficient, possibly GPU-optimized version of the computation