Where δi=k′(xi,xj), and Z′i is the derivitive of the normalization constant.
The goal today is to derive the second derivitive, H. Like the first derivitive, it will have two terms,
H=H1−H2
Ultimately, we'll split the second term H2 into two sub-terms:
H=H1−H2,A−H2,B
Prerequisite: ∂δi∂xj
Recall that the elements of δi are the partial derivatives of the kernel function w.r.t. its first input. The second derivatives δi will be given by the second partial derivatives of the kernel function.
Our goal is the generalize this to a single expression for the entire hessian matrix.
Note that when i≠j, the third term disappears, so that term will become a diagonal matrix in the hessian expression.
Let Z′ be the jacobian of z. We can express the hessian asWe can express the hessian as
The last identity exploits the fact that Traces are invariant under cyclic permutations. Note that both expressions inside the trace operator are scalar products, which makes the trace operator redundant.
Tr[A]=−2S⊤iU−1SδjS⊤jU−1Sδi−2δ⊤iS⊤U−1SδjS⊤jU−1Si=−2(S⊤iU−1Sδj)(S⊤jU−1Sδi)−2(δ⊤iS⊤U−1Sδj)(S⊤jU−1Si)
Here, we've regrouped the dot-products in each term to be a product of two dot-products. We can generalize this for the full hessian as follows:
H2,A=−(S⊤U−1SΔ⊤)⊤⊙(S⊤U−1SΔ⊤)−(ΔS⊤U−1SΔ⊤)⊙(S⊤U−1S)
This is for a single term of the Hessian. We can rewrite it to compute the entire Hessian using matrix arithmetic:
H2,B=M⊙S⊤U−1S+(S⊤(U−1SA))⊙I
Here, M is defined as, mij=min(xi,xj), and A is the matrix whose i-th column is Ai. Note that only the diagonal elements of the second term are preserved; in implementation, this can be implemented as
diag(sum(S .* (inv(U) * S * A)))
Full Hessian
The full Hessian is the sum of the three parts above