Linear Kernel
Linear Kernel: Mathematical Deep Dive
1. Basic Definition
The Linear Kernel is defined as:
Where:
(d-dimensional real vector space) (positive real number)
2. Matrix Formulation
For a set of n input vectors
The kernel matrix K is then:
3. Properties
3.1 Positive Semi-Definiteness
The Linear Kernel is positive semi-definite (PSD), which is a crucial property for kernel functions. To prove this:
For any vector
This holds because v > 0 and the squared norm is always non-negative.
3.2 Linearity in Feature Space
The Linear Kernel corresponds to a linear function in the feature space. If ฯ(x) = x is our feature map, then:
Where
4. Connection to Linear Regression
The Linear Kernel is closely related to linear regression. In the context of Gaussian Processes:
With a Linear Kernel, this is equivalent to Bayesian Linear Regression:
5. Eigendecomposition
The eigendecomposition of K can provide insights:
Where:
is the matrix of eigenvectors is a diagonal matrix of eigenvalues
For the Linear Kernel:
- The rank of K is at most min(n,d)
- The non-zero eigenvalues correspond to the directions of maximum variance in the data
6. Gram Matrix Computation
In practice, we often work with the Gram matrix. For inputs
Elements:
7. Derivative
The derivative of the kernel with respect to its inputs is useful for optimization:
8. Variance Parameter
The variance parameter v scales the kernel:
This gradient is used when learning v from data.
9. Relation to Distance Metrics
The Linear Kernel is related to the Euclidean distance:
If xโ and xโ are normalized
This shows how the kernel relates to similarity in Euclidean space.