Linear Kernel

Linear Kernel: Mathematical Deep Dive

1. Basic Definition

The Linear Kernel is defined as:

k(xโ‚,xโ‚‚)=vโˆ—xโ‚แต€โˆ—xโ‚‚

Where:

2. Matrix Formulation

For a set of n input vectors xโ‚,...,xโ‚™, we can form a design matrix X:

X=[xโ‚แต€;xโ‚‚แต€;...;xโ‚™แต€]

The kernel matrix K is then:

K=vโˆ—Xโˆ—Xแต€

3. Properties

3.1 Positive Semi-Definiteness

The Linear Kernel is positive semi-definite (PSD), which is a crucial property for kernel functions. To prove this:

For any vector ฮฑโˆˆโ„โฟ:

ฮฑแต€Kฮฑ=vโˆ—ฮฑแต€(XXแต€)ฮฑ=vโˆ—(Xแต€ฮฑ)แต€(Xแต€ฮฑ)=vโˆ—||Xแต€ฮฑ||ยฒโ‰ฅ0

This holds because v > 0 and the squared norm is always non-negative.

3.2 Linearity in Feature Space

The Linear Kernel corresponds to a linear function in the feature space. If ฯ†(x) = x is our feature map, then:

k(xโ‚,xโ‚‚)=vโˆ—โŸจฯ†(xโ‚),ฯ†(xโ‚‚)โŸฉ

Where โŸจยท,ยทโŸฉ denotes the inner product.

4. Connection to Linear Regression

The Linear Kernel is closely related to linear regression. In the context of Gaussian Processes:

f(x)ย GP(0,k(x,xโ€ฒ))

With a Linear Kernel, this is equivalent to Bayesian Linear Regression:

f(x)=wโˆ—x,wherewย N(0,vI)

5. Eigendecomposition

The eigendecomposition of K can provide insights:

K=Uฮ›Uแต€

Where:

For the Linear Kernel:

6. Gram Matrix Computation

In practice, we often work with the Gram matrix. For inputs X=[xโ‚,...,xโ‚™] and Y=[yโ‚,...,yโ‚˜] :

KXY=vโˆ—XYแต€

Elements: Kij=vโˆ—xiแต€โˆ—yj

7. Derivative

The derivative of the kernel with respect to its inputs is useful for optimization:

โˆ‚k(x1,x2)โˆ‚x1=vโ‹…x2

โˆ‚k(x1,x2)โˆ‚x2=vโ‹…x1

8. Variance Parameter

The variance parameter v scales the kernel:

โˆ‚k(x1,x2)โˆ‚v=x1โŠคx2

This gradient is used when learning v from data.

9. Relation to Distance Metrics

The Linear Kernel is related to the Euclidean distance:

||xโ‚โˆ’xโ‚‚||ยฒ=(xโ‚โˆ’xโ‚‚)แต€(xโ‚โˆ’xโ‚‚)=xโ‚แต€xโ‚+xโ‚‚แต€xโ‚‚โˆ’2xโ‚แต€xโ‚‚

If xโ‚ and xโ‚‚ are normalized (||xโ‚||=||xโ‚‚||=1), then:

k(xโ‚,xโ‚‚)โˆ1โˆ’ยฝ||xโ‚โˆ’xโ‚‚||ยฒ

This shows how the kernel relates to similarity in Euclidean space.