RBF Kernel
RBF Kernel: Mathematical Deep Dive and Implementation
1. Mathematical Definition
The RBF Kernel is defined as:
Where:
(d-dimensional real vector space) is the lengthscale parameter (can be a scalar or vector for ARD) is the exponential function
2. Key Properties
2.1 Positive Definiteness
The RBF Kernel is positive definite for all valid inputs, which is crucial for its use in Gaussian Processes.
2.2 Stationarity
It is a stationary kernel, meaning it only depends on the difference between inputs:
2.3 Isotropy / Anisotropy
- Isotropic when
is a scalar (same in all directions) - Anisotropic when
is a vector (different scales for each dimension)
3. Lengthscale Parameter (Θ)
- Controls the smoothness of the function
- Larger
: smoother functions, longer-range correlations - Smaller
: more complex functions, shorter-range correlations
4. Implementation Details
4.1 Efficient Computation
The code uses an efficient computation strategy:
- Divide inputs by lengthscale:
- Compute squared distances:
- Apply RBF function:
4.2 Automatic Relevance Determination (ARD)
When ard_num_dims > 1, each input dimension gets its own lengthscale, allowing the model to determine which features are most relevant.
4.3 Handling Diagonal Computations
The diag parameter allows efficient computation when only the diagonal of the kernel matrix is needed.
4.4 Gradient Considerations
The implementation checks for cases where gradients are required (e.g., when inputs require gradients or using ARD) and uses a more general computation in these cases.
5. Mathematical Breakdown of the Implementation
5.1 Distance Computation
The covar_dist method computes squared distances:
5.2 RBF Function
The postprocess_rbf function applies the RBF transformation:
5.3 Optimized Computation
For cases without gradients or ARD, it uses RBFCovariance.apply, which likely implements a more efficient, possibly GPU-optimized version of the computation.
6. Relation to Other Concepts
6.1 Fourier Transform
The Fourier transform of the RBF kernel is another Gaussian, making it useful for spectral methods.
6.2 Connection to Normal Distribution
The RBF kernel can be interpreted as the correlation between outputs when the latent function is modeled as a Gaussian process with a particular covariance structure.
7. Practical Considerations
- No explicit outputscale parameter (use with ScaleKernel for scaling)
- Handles batch computations
- Allows for active dimension selection
- Supports priors and constraints on the lengthscale parameter
8. Derivatives
The derivative with respect to x is:
This is useful for optimization and certain GP techniques like gradient matching.